censusdis.cli.yamlspec

Classes that are loaded from YAML config files for the CLI.

class censusdis.cli.yamlspec.CensusGroup(group: str | Iterable[str], *, leaves_only: bool = False, denominator: str | None = None, frac_prefix: str | None = None, frac_not: bool = False)[source]

Bases: VariableSpec

Specification of a group of variables to download from the U.S. Census API.

Parameters:

group – The name of a census group, such as B03002, or a list of several such groups.
leaves_only – If True, then only download the variables that are at the leaves of the group, not the internal variables.
denominator – The denominator to divide by when constructing fractional variables. If False then no fractional variables are added. If the name of a variable, that variable will be downloaded and used as a denominator to compute fractional versions of all of the other variables. If True then the denominator will be computed as the sum of all the other variables.
frac_prefix – The prefix to prepend to fractional variables. If None a default prefix of ‘frac_’ is used.

groups_to_download() → List[Tuple[str, bool]][source]

Return the names of groups of variables that need to be downloaded from the U.S. Census API.

The returned value are simply the groups specificed at construction time.

Return type:: The names of groups to download.

synthesize(df_downloaded: <Mock name='mock.DataFrame' id='140351106196176'> | <Mock name='mock.GeoDataFrame' id='140351106196272'>)[source]

Post-process after downloading to compute variables like fractional variables are constructed.

This is where fractional variables are generated.

Parameters:: df_downloaded – A data frame of variables that were downloaded. Any systhesized variables are added as new columns.
Return type:: None. Any additions are made in-place in df_downloaded.

class censusdis.cli.yamlspec.DataSpec(dataset: str, vintage: int | Literal['timeseries'], specs: VariableSpec | Iterable[VariableSpec], geography: Dict[str, str | List[str]], *, contained_within: Dict[str, str | List[str]] | None = None, area_threshold: float = 0.8, with_geometry: bool = False, remove_water: bool = False)[source]

Bases: object

A specification for what data we want from the U.S. Census API.

In order to download data we must know the data set and vintage and have one or more :py:class:`~VariableSpec`s that tell us what variables we need and what synthetic variables to create, for example fractional variables.

Parameters:

dataset – The dataset to download from. For example “acs/acs5”, “dec/pl”, or “timeseries/poverty/saipe/schdist”. There are symbolic names for datasets, like ACS5 for “acs/acs5” in :py:module:`censusdis.datasets.
vintage – The vintage to download data for. For most data sets this is an integer year, for example, 2020. specs
geography – A specification of the geography, for example {‘state’: ‘*’} for all states or {‘state’: censusdis.states.NJ, ‘county’: ‘*’} for all counties in New Jersey.
contained_within – An optional specification for the geometry the results should be contained within. For example, we could select a CBSA here and put wildcards for state and county in geography to get all counties contained within the CBSA. We need this in cases like this because CBSAs are off-spine while states and counties are on-spine.
area_threshold – How much of the area of a geometry must be contained in an outer geometry for it to be included.
with_geometry – If True a gpd.GeoDataFrame will be returned and each row will have a geometry that is a cartographic boundary suitable for platting a map. See https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.2020.html for details of the shapefiles that will be downloaded on your behalf to generate these boundaries.
remove_water – If True and if with_geometry=True, will query TIGER for AREAWATER shapefiles and remove water areas from returned geometry.

property contained_within: None | ContainedWithin: What geometry are we contained within.

property dataset: str: What data set to query.

download(api_key: str | None = None) → <Mock name='mock.DataFrame' id='140351106196176'> | <Mock name='mock.GeoDataFrame' id='140351106196272'>[source]

Download the data we want from the U.S. Census API.

Parameters:: api_key – An optional API key. If you don’t have or don’t use a key, the number of calls you can make will be limited to 500 per day.
Return type:: A DataFrame or ~gpd.GeoDataFrame containing the requested US Census data.

property geography: Dict[str, str | List[str]]: What geography to download data for.

classmethod load_yaml(path: str | Path)[source]: Load a YAML file containing a DataSpec.

classmethod map_state_and_county_names(geography: Dict[str, str | List[str]]) → Dict[str, str | List[str]][source]: If there is a state and optionally counties a geography, try to map them.

property remove_water: bool: Should we improve the geometry by masking off water.

property variable_spec: VariableSpec: The specification of variables to download.

property vintage: int | Literal['timeseries']: What vintage.

property with_geometry: bool: Do we want to download geometry as well as data so we can plot maps.

class censusdis.cli.yamlspec.PlotSpec(*, variable: str | None = None, boundary: bool = False, title: str | None = None, with_background: bool = False, plot_kwargs: Dict[str, Any] | None = None, projection: str | None = None, legend: bool = True, legend_format: str | None = None)[source]

Bases: object

A specification for how to plot data we downloaded.

Parameters:

variable – What variable to plot. Specify this to shade geographies based on the value of the variable. Leave out and set boundary=True to plot boundaries instead.
boundary – Should we plot boundaries instead of filled geographies? If True, variable should not be specified.
title – A title for the plot.
with_background – If True, plot over a background map.
legend – If True and plotting a variable (not a boundary) then add a legend.
legend_format – How to format the numbers on the legend. The options are ‘“float”’, “int”, “dollar”, “percent”, or a format string like “${x:.2f}” to choose any Python string format you want.
projection – What projection to use. “US” means move AK, HI, and PR. None means use what the map is already in. Anything else is interpreted as an EPSG.
plot_kwargs – Additional keyword args for matplotlib to use in plotting.

property boundary: bool: Should we plot boundaries instead of a variable.

property legend: Is there a legend.

property legend_format: Format for the legend numbers.

classmethod load_yaml(path: str | Path) → PlotSpec[source]: Load a YAML file containing a PlotSpec.

plot(gdf: <Mock name='mock.GeoDataFrame' id='140351106196272'>, ax=None)[source]

Plot data on a map according to the specification.

Parameters:

gdf – The data to plot.
ax – Optional existing ax to plot on top of.

Return type:

ax of the plot.

property plot_kwargs: Dict[str, Any]

Additional keyword args to control the plot.

e.g. `{‘figsize’: [12, 8]} to change the default size of the plot.

property projection: What projection to use when plotting.

property title: The plot title.

property variable: str | None: What variable will we plot.

property with_background: bool: Should we plot a background map from Open Street Maps.

class censusdis.cli.yamlspec.VariableList(variables: str | Iterable[str], *, denominator: str | bool = False, frac_prefix: str | None = None, frac_not: bool | None = False)[source]

Bases: VariableSpec

Specification of a list of variables to download from the U.S. Census API.

Parameters:

variables – The variables to download.
denominator – The denominator to divide by when constructing fractional variables. If False then no fractional variables are added. If the name of a variable, that variable will be downloaded and used as a denominator to compute fractional versions of all of the other variables. If True then the denominator will be computed as the sum of all the other variables.
frac_prefix – The prefix to prepend to fractional variables. If None a default prefix of ‘frac_’ is used.

synthesize(df_downloaded: <Mock name='mock.DataFrame' id='140351106196176'> | <Mock name='mock.GeoDataFrame' id='140351106196272'>)[source]

Post-process after downloading to compute variables like fractional variables are constructed.

This is where fractional variables are generated.

Parameters:: df_downloaded – A data frame of variables that were downloaded. Any systhesized variables are added as new columns.
Return type:: None. Any additions are made in-place in df_downloaded.

variables_to_download() → List[str][source]

Return a list of the variables that need to be downloaded from the U.S. Census API.

This consists of the variables passed at construction time, and a denominator variable if one was specified.

class censusdis.cli.yamlspec.VariableSpec(*, denominator: str | bool = False, frac_prefix: str | None = None, frac_not: bool = False)[source]

Bases: ABC

Abstract ase class for specification of variables to download from the U.S. Census API.

Parameters:

denominator – The denominator to divide by when constructing fractional variables. If False then no fractional variables are added. If the name of a variable, that variable will be downloaded and used as a denominator to compute fractional versions of all of the other variables. If True then the denominator will be computed as the sum of all the other variables.
frac_prefix – The prefix to prepend to fractional variables. If None a default prefix of ‘frac_’ is used.

property denominator: str | bool: The denominator to divide by when constructing fractional variables.

download(dataset: str, vintage: int | Literal['timeseries'], *, set_to_nan: bool | Iterable[int] = True, skip_annotations: bool = True, with_geometry: bool = False, contained_within: ContainedWithin | None = None, remove_water: bool = False, api_key: str | None = None, row_keys: str | Iterable[str] | None = None, **kwargs: str | Iterable[str]) → <Mock name='mock.DataFrame' id='140351106196176'> | <Mock name='mock.GeoDataFrame' id='140351106196272'>[source]

Download the variables we need from the U.S. Census API.

Most of the optional parameters here mirror those in download().

Parameters:

dataset – The dataset to download from. For example “acs/acs5”, “dec/pl”, or “timeseries/poverty/saipe/schdist”. There are symbolic names for datasets, like ACS5 for “acs/acs5” in :py:module:`censusdis.datasets.
vintage – The vintage to download data for. For most data sets this is an integer year, for example, 2020. But for a timeseries data set, pass the string ‘timeseries’.
set_to_nan – A list of values that should be set to NaN. Normally these are special values that the U.S. Census API sometimes returns. If True, then all values in censusdis.values.ALL_SPECIAL_VALUES will be replaced. If False, no replacements will be made.
skip_annotations – If True try to filter out group or leaves_of_group variables that are annotations rather than actual values. See VariableCache.group_variables() for more details. Variable names passed in download_variables are not affected by this flag.
with_geometry – If True a gpd.GeoDataFrame will be returned and each row will have a geometry that is a cartographic boundary suitable for platting a map. See https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.2020.html for details of the shapefiles that will be downloaded on your behalf to generate these boundaries.
contained_within – An optional ContainedWithin if we want to download geometries contained within others.
remove_water – If True and if with_geometry=True, will query TIGER for AREAWATER shapefiles and remove water areas from returned geometry.
api_key – An optional API key. If you don’t have or don’t use a key, the number of calls you can make will be limited to 500 per day.
row_keys – An optional set of identifier keys to help merge together requests for more than the census API limit of 50 variables per query. These keys are useful for census datasets such as the Current Population Survey where the geographic identifiers do not uniquely identify each row.
kwargs – A specification of the geometry that we want data for. For example, state = “*”, county = “*” will download county-level data for the entire US.

Return type:

A DataFrame or ~gpd.GeoDataFrame containing the requested US Census data.

property frac_not: str: Should we return 1 - fraction instead of fraction.

property frac_prefix: str: The prefix to prepend to fractional variables.

groups_to_download() → List[Tuple[str, bool]][source]

Return the names of groups of variables that need to be downloaded from the U.S. Census API.

Return type:: The names of groups to download.

classmethod load_yaml(path: str | Path)[source]: Load a YAML file containing a VariableSpec.

synthesize(df_downloaded: <Mock name='mock.DataFrame' id='140351106196176'> | <Mock name='mock.GeoDataFrame' id='140351106196272'>) → None[source]

Post-process after downloading to compute variables like fractional variables are constructed.

Parameters:: df_downloaded – A data frame of variables that were downloaded. Any systhesized variables are added as new columns.
Return type:: None. Any additions are made in-place in df_downloaded.

variables_to_download() → List[str][source]: Return a list of the variables that need to be downloaded from the U.S. Census API.

class censusdis.cli.yamlspec.VariableSpecCollection(variable_specs: Iterable[VariableSpec])[source]

Bases: VariableSpec

Specification built on top of a collection of other :py:class:`~VariableSpec`s.

When downloading, all the groups and all the variables specified in any of the constituent specs will be downloaded.

Parameters:: variable_specs – A collection of other :py:class:`~VariableSpec`s.

groups_to_download() → List[Tuple[str, bool]][source]

Return the names of groups of variables that need to be downloaded from the U.S. Census API.

The result is a list of the unique groups returned by all the VariableSpec’s given at construction time.

Return type:: The names of groups to download.

synthesize(df_downloaded: <Mock name='mock.DataFrame' id='140351106196176'> | <Mock name='mock.GeoDataFrame' id='140351106196272'>)[source]

Post-process after downloading to compute variables like fractional variables are constructed.

We do this by calling synthesize on each of our constituent variable specifications.

Parameters:: df_downloaded – A data frame of variables that were downloaded. Any systhesized variables are added as new columns.
Return type:: None. Any additions are made in-place in df_downloaded.

variables_to_download() → List[str][source]

Return a list of the variables that need to be downloaded from the U.S. Census API.

Returns all the variables to be downloaded by the VariableSpec’s in the collection.