Tidygeocoder v1.0.4 is released! This release adds support for the Geoapify geocoding service (thanks Daniel Possenriede!), a progress bar, more helpful console output, and new functions for combining the results of multiple geocoding queries. A more detailed overview of the changes in this release is available in the changelog.
Progress Bars and Console Output
Progress bars are now displayed for single input geocoding queries (ie. not batch queries). Additionally, console messages now by default show which geocoding service was used, how many addresses or coordinates were given to it, and how long the query took to execute.
The progress_bar parameter can be used to toggle the use of the progress bar while the quiet parameter can be used to silence console messages that are displayed by default. See the documentation for geo() or reverse_geo() for details.
Additionally, the quiet, progress_bar, and verbose parameters can now be set permanently via options(). For example, options(tidygeocoder.progress_bar = FALSE) will disable progress bars for all queries.
Combining Multiple Queries
In past releases of the package, method = “cascade” could be used in the geo() and geocode() functions to combine the results of geocoding queries from two different services. The “cascade” method is now deprecated in favor of two new and more flexible functions: geocode_combine() and geo_combine(). These functions allow for executing and combining the results of more than two queries and they allow the queries to be fully customized.
To demonstrate the utility of these new functions, below I’ve assembled a dataset of addresses to be geocoded. The first 5 are street level addresses in the United States that can be geocoded with the US Census geocoding service. However, three of these addresses will not return results with the US Census batch service (see issue #87 for more information) and must instead be geocoded with the US Census single address geocoder. Also, the last three addresses are cities outside the United States and require a different geocoding service entirely (the US Census service is limited to the United States).
library(tidyverse) library(tidygeocoder) mixed_addresses <- tribble( ~street_address, ~city, ~state_cd, ~zip_cd, "624 W DAVIS ST #1D", "BURLINGTON", "NC", 27215, "201 E CENTER ST #268", "MEBANE", "NC", 27302, "7833 WOLFE LN", "SNOW CAMP", "NC", 27349, "202 C St", "San Diego", "CA", 92101, "121 N Rouse Ave", "Bozeman", "MT", 59715 ) %>% bind_rows( tibble(city = c('Taipei', 'Moscow', 'Buenos Aires')) )
If we wanted to geocode a large dataset with addresses such as these, we might first try to geocode as many as possible via the US Census batch service, then attempt the remaining addresses with the US Census single address geocoder, and then finally send any remaining unfound addresses to another service. We’ll accomplish this workflow in the code below.
The geocode_combine() function accepts a dataframe input and a list of queries provided as lists (ie. a list of lists). Each list in the queries argument contains parameters that are passed to the geocode() function. Optionally, the query_names argument can be used to specify a label to be used for each query’s results.
Below, the street, city, state, and postalcode arguments are specified for the first two queries while the address argument (ie. single line address) is pointed at the city column for the third query (the ArcGIS service only accepts a single line address argument and doesn’t use address component arguments like city and state).
results <- mixed_addresses %>% geocode_combine( queries = list( list(method = 'census', mode = 'batch', street = 'street_address', city = 'city', state = 'state_cd', postalcode = 'zip_cd'), list(method = 'census', mode = 'single', street = 'street_address', city = 'city', state = 'state_cd', postalcode = 'zip_cd'), list(method = 'arcgis', address = 'city') ), query_names = c('census - batch', 'census - single', 'arcgis') ) ## ## Passing 8 addresses to the US Census batch geocoder ## Query completed in: 1.9 seconds ## Passing 6 addresses to the US Census single address geocoder ## Query completed in: 3.6 seconds ## Passing 3 addresses to the ArcGIS single address geocoder ## Query completed in: 1.4 seconds
|624 W DAVIS ST #1D||BURLINGTON||NC||27215||36.09598||-79.44453||census – single|
|201 E CENTER ST #268||MEBANE||NC||27302||36.09683||-79.26977||census – single|
|7833 WOLFE LN||SNOW CAMP||NC||27349||35.89866||-79.43713||census – single|
|202 C St||San Diego||CA||92101||32.71676||-117.16283||census – batch|
|121 N Rouse Ave||Bozeman||MT||59715||45.68066||-111.03203||census – batch|
By default the results of the queries are combined into a single dataframe as shown above and the query column shows which query produced each result. Alternatively, the results of each query can be returned as separate dataframes in a list by using return_list = TRUE.
By default, only addresses that are not found in a query are passed to the subsequent query. However, setting cascade = FALSE will pass all addresses to all queries. See the documentation for the geocode_combine() function for more usage details.
The return_type, geocodio_v, mapbox_permanent, mapquest_open, iq_region, and here_request_id parameters are now deprecated in favor of the new api_options parameter. For instance, instead of using return_type = “geographies” you should now instead use api_options = list(census_return_type = “geographies”).
Also, the cascade_order, param_error, and batch_limit_error parameters in geo() are now deprecated as they were only required because of the deprecated “cascade” method. Refer to the documentation for geo() or reverse_geo() for details.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.