censusdis.maps
Utilities for loading and rendering maps.
This module relies on shapefiles from the US Census, which it downloads as needed and caches locally.
- exception censusdis.maps.MapException[source]
Bases:
CensusApiExceptionAn exception generated from censusdis.maps code.
- class censusdis.maps.ShapeReader(shapefile_root: str | Path | None = None, year: int = 2020, auto_fetch: bool = True)[source]
Bases:
objectA class for reading shapefiles into GeoPandas GeoDataFrames.
See the demo notebooks for more details. The shapefiles need to already have been downloaded to the local machine. We may add a lazy option in the future that will fetch them if they don’t exist.
- Parameters:
shapefile_root – The location in the filesystem where shapefiles are stored.
year – The year we want shapefiles for,
auto_fetch – If True then fetch remote shape files as needed.
- read_cb_shapefile(shapefile_scope: str, geography: str, resolution: str = '500k', crs=None, *, timeout: int = 30) <Mock name='mock.GeoDataFrame' id='139698493266336'>[source]
Read the cartographic boundaries of a given geography.
These are smaller files suited for plotting maps, as compared to those returned by
read_shapefile(), which returns higher resolution geometries.The files are read from the US Census servers and cached locally. They are in most cases the same files you can download manually from https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.2020.html or similar URLs for other years.
Individual files the API may download follow a naming convention that has evolved a bit over time. So for example a 2010 census tract cartographic bounds file for New Jersey at 500,000:1 resolution would be found at https://www2.census.gov/geo/tiger/GENZ2010/gz_2010_34_140_00_500k.zip whereas a similar file for 2020 would be at https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_34_tract_500k.zip
This method knows many of the subtle changes that have occurred over the years, so you should mostly not have to worry about them. It is unlikely it knows them all, so please submit an issue at https://github.com/vengroff/censusdis/issues if you find otherwise.
Once read, the files are cached locally so that when we reuse the same files we do not have to go back to the server.
- Parameters:
shapefile_scope – The geography that is covered by the entire shapefile. In some cases, this is a state, e.g. NJ. For cases where files are available for the entire country, the string “us” is typically used. In some rare cases, like for the Alaska Native Regional Corporations (
"anrc") geography, other strings like"02"are used. See the dowload links at https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.2020.html if you need to debug issues with a given geography.geography – The geography we want to download bounds for. Supported geometries are “state’, “county”, “cousub” (county subdivision), “tract”, and “bg” (block group)
resolution – What resolution shapes should we use. Permitted options are “500k”, “5m”, and “20m” for 1:500,000, 1:5,000,000, and 1:20,000,000 resolution respectively. Availability varies, but for most geographies “500k” is available even if others are not.
crs – The crs to make the file to. If None, use the default crs of the shapefile. Setting this is useful if we plan to merge the resulting GeoDataFrame with another so we can make sure they use the same crs.
timeout – Time out limit (in seconds) for the remote call.
- Returns:
A gpd.GeoDataFrame containing the boundaries of the requested
geometries.
- read_shapefile(shapefile_scope: str, geography: str, crs=None, *, timeout: int = 30)[source]
Read the geometries of geographies.
This method reads maps suitable for use with geometric joins and queries of various types. If you are only interested in plotting maps, the
read_cb_shapefile()method may be more suitable.The files are read from the US Census servers and cached locally. They are in most cases the same files you can download manually from https://www.census.gov/cgi-bin/geo/shapefiles/index.php.
Individual files the API may download follow a naming convention that has evolved a bit over time. So for example a 2010 block group file for New Jersey would be found at https://www2.census.gov/geo/tiger/TIGER2010/BG/2010/tl_2010_34_bg10.zip whereas a similar file for 2020 would be at https://www2.census.gov/geo/tiger/TIGER2020/BG/tl_2020_34_bg.zip.
This method knows many of the subtle changes that have occurred over the years, so you should mostly not have to worry about them. It is unlikely it knows them all, so please submit an issue at https://github.com/vengroff/censusdis/issues if you find otherwise.
Once read, the files are cached locally so that when we reuse the same files we do not have to go back to the server.
- Parameters:
shapefile_scope – The geography that is covered by the entire shapefile. In some cases, this is a state, e.g. NJ. For cases where files are available for the entire country, the string “us” is typically used. In some rare cases, like for the Alaska Native Regional Corporations (
"anrc") geography, other strings like"02"are used. See the dowload links at https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.2020.html if you need to debug issues with a given geography.geography – The geography we want to download bounds for. Supported geometries are “state’, “county”, “cousub” (county subdivision), “tract”, and “bg” (block group). Other geometries as defined by the US Census may work, but have not been thoroughly tested.
crs – The crs to make the file to. If None, use the default crs of the shapefile. Setting this is useful if we plan to merge the resulting GeoDataFrame with another so we can make sure they use the same crs.
timeout – Time out limit (in seconds) for the remote call.
- Returns:
A gpd.GeoDataFrame containing the requested
geometries.
- property shapefile_root: Path
The path at which shapefiles are cached locally.
- censusdis.maps.clip_to_states(gdf, gdf_bounds)[source]
Clip every geometry in a gdf to the state it belongs to, from the states in the state bounds.
We clip to state bounds so that we don’t plot areas outside the state. Typically, this clips areas that extend out into the water in coastal areas so we don’t get strange artifacts in the water in plots.
The way we tell what state an input geometry belongs to is by looking at the STATEFP column for that geometry’s row in the input.
- Parameters:
gdf – The input geometries.
gdf_bounds – The state bounds.
- Returns:
The input geometries where each is clipped to the bounds
of the state to which it belongs.
- censusdis.maps.geographic_centroids(gdf: <Mock name='mock.GeoDataFrame' id='139698493266336'>) <Mock name='mock.GeoSeries' id='139698346002512'>[source]
Compute the centroid of a geography.
We do this by projecting to epsg 3857 (https://epsg.io/3857), computing the centroid, and then projecting back. This gives a reasonable answer for most geometries and avoids warnings from GeoPandas.
- Parameters:
gdf – A geo data frame in any crs.
- Returns:
A geo data series of the centroids of all the geometries in
gdf.
- censusdis.maps.plot_map(gdf: <Mock name='mock.GeoDataFrame' id='139698493266336'>, *args, with_background: bool = False, epsg: int | None = None, **kwargs)[source]
Plot a map, optionally with a background.
- Parameters:
gdf – The geo data frame to plot
args – Optional args to matplotlib
with_background – Should we put in a background map from Open Street maps?
epsg – The EPSG to project to. Otherwise a suitable one for the geometry will be inferred.
kwargs – keyword args to pass on to matplotlib
- Return type:
The ax of the resulting plot.
- censusdis.maps.plot_us(gdf: <Mock name='mock.GeoDataFrame' id='139698493266336'>, *args, do_relocate_ak_hi_pr: bool = True, with_background: bool = False, epsg: int = 9311, **kwargs)[source]
Plot a map of the US with AK and HI relocated.
This function will move and scale AK and HI so that they are plotted at the lower left of the other 48 states, just below CA, AZ, and NM.
It also moves the Aleutian islands that are west of -180° longitude so that they are plotted next to the rest of AK. Otherwise, they tend to be plotted at longitudes just less than +180°, which creates visual discontinuities.
Note: the expectation is that the crs or the incoming geo data frame is EPSG:4269 or something that closely approximates it, in units of degrees of latitude and longitude. If this is not the case, results are unpredictable.
- Parameters:
gdf – The geometries to be plotted.
do_relocate_ak_hi_pr – If True try to relocate AK, HI, and PR. Otherwise, still wrap the Aleutian islands west of -180° longitude if present.
args – Args to pass to the plot.
with_background – Should we put in a background map from Open Street maps?
epsg – The EPSG CRS to project to before plotting. Default is 9311, which is equal area. See https://epsg.io/9311.
kwargs – Keyword args to pass to the plot.
- Return type:
ax of the plot.
- censusdis.maps.plot_us_boundary(gdf: <Mock name='mock.GeoDataFrame' id='139698493266336'>, *args, do_relocate_ak_hi_pr: bool = True, with_background: bool = False, epsg: int = 9311, **kwargs)[source]
Plot a map of boundaries the US with AK and HI relocated.
This function is very much like
plot_us()except that it plots only the boundaries of geometries.Note: the expectation is that the crs or the incoming geo data frame is EPSG:4269 or something that closely approximates it, in units of degrees of latitude and longitude. If this is not the case, results are unpredictable.
- Parameters:
gdf – The geometries to be plotted.
args – Args to pass to the plot.
do_relocate_ak_hi_pr – If True try to relocate AK, HI, and PR. Otherwise, still wrap the Aleutian islands west of -180° longitude if present.
with_background – Should we put in a background map from Open Street maps?
epsg – The EPSG CRS to project to before plotting. Default is 9311, which is equal area. See https://epsg.io/9311.
kwargs – Keyword args to pass to the plot.
- Return type:
ax of the plot.
- censusdis.maps.relocate_ak_hi_pr(gdf: <Mock name='mock.GeoDataFrame' id='139698493266336'>) <Mock name='mock.GeoDataFrame' id='139698493266336'>[source]
Relocate any geometry that is in Alaska or Hawaii for plotting purposes.
We first try an optimization. If there is a STATEFP column then we relocate rows where that column has a value of AK, HI or PR. If there is not a STATEFP column we check for a STATE column and do the same. If neither column exists then we dig down into the geometries themselves and relocate those that intersect bounding rectangles of the two states.
Note: the expectation is that the crs or the incoming geo data frame is EPSG:4269 or something that closely approximates it, in units of degrees of latitude and longitude. If this is not the case, results are unpredictable.
- Parameters:
gdf – the geo data frame to relocate.
- Return type:
a geo data frame with any geometry in AK or HI moved for plotting.
- censusdis.maps.sjoin_mostly_contains(gdf_large_geos: <Mock name='mock.GeoDataFrame' id='139698493266336'>, gdf_small_geos: <Mock name='mock.GeoDataFrame' id='139698493266336'>, large_suffix: str = 'large', small_suffix: str = 'small', area_threshold: float = 0.8, area_epsg: int = 3857)[source]
Spatial join based on fraction of contained area.
This function is designed to implement the common case where we have a number of small geo areas like census tracts or block groups in a large area like a CBSA. The reason to use this instead of gpd.GeoDataFrame.sjoin directly is that the smaller geos may not all be strictly contained in the bounds of the larger geos. And small geos outside the bounds of the larger one may intersect along the boundary. So instead, this method looks for small geos whose area is at least 80% (or another chosen number) within the larger area,
- Parameters:
gdf_large_geos – A geo data frame of one or more large geo areas like CBSAs.
gdf_small_geos – A geo data frame of smaller areas like census tracts.
large_suffix – Suffix to add to column names from the large side when the same name appears in both.
small_suffix – Suffix to add to column names from the small side when the same name appears in both.
area_threshold – The fraction of each smaller area that must be covered by one of the large areas to be joined with it.
area_epsg – The CRS to use project to before doing area calculations. Defaults to 3857. (https://epsg.io/3857),
- Return type:
Geo data frame of the spatially joined results.