Seeing White* on a Map

This notebook demonstrates some of the data wrangling and mapping capabilities of the censusdis package. It does the following:

Load metadata on the US Census redistricting data set from 2020
Load data on total population and the population of various ractial groups for every county in the United States
Determine the fraction of the population that is white
Plot the results on a map

*With apologies to the Seeing White podcast for borrowing the name.

Basic imports

[1]:

import censusdis.data as ced
import censusdis.maps as cdm
from censusdis.states import ALL_STATES_AND_DC

Setup

For more details on what we are setting up here, see the comments in this introductory notebook.

[2]:

CENSUS_API_KEY = None

[3]:

YEAR = 2020
DATASET = "dec/pl"
GROUP = "P2"

Fetch metadata

We will fetch the metadata on what fields are available and then select the ones that represent the population count of people who identify as white, possibly mixed with one or more other races and those who identify as white alone.

[4]:

leaves = ced.variables.group_leaves(DATASET, YEAR, GROUP)

df_all_variables = ced.variables.all_variables(DATASET, YEAR, GROUP)

df_all_variables

[4]:

	YEAR	DATASET	GROUP	VARIABLE	LABEL	SUGGESTED_WEIGHT	VALUES
0	2020	dec/pl	P2	P2_001N	!!Total:	NaN	None
1	2020	dec/pl	P2	P2_002N	!!Total:!!Hispanic or Latino	NaN	None
2	2020	dec/pl	P2	P2_003N	!!Total:!!Not Hispanic or Latino:	NaN	None
3	2020	dec/pl	P2	P2_004N	!!Total:!!Not Hispanic or Latino:!!Population...	NaN	None
4	2020	dec/pl	P2	P2_005N	!!Total:!!Not Hispanic or Latino:!!Population...	NaN	None
...	...	...	...	...	...	...	...
68	2020	dec/pl	P2	P2_069N	!!Total:!!Not Hispanic or Latino:!!Population...	NaN	None
69	2020	dec/pl	P2	P2_070N	!!Total:!!Not Hispanic or Latino:!!Population...	NaN	None
70	2020	dec/pl	P2	P2_071N	!!Total:!!Not Hispanic or Latino:!!Population...	NaN	None
71	2020	dec/pl	P2	P2_072N	!!Total:!!Not Hispanic or Latino:!!Population...	NaN	None
72	2020	dec/pl	P2	P2_073N	!!Total:!!Not Hispanic or Latino:!!Population...	NaN	None

73 rows × 7 columns

[5]:

df_total_variables = df_all_variables[
    df_all_variables["LABEL"].str.endswith("!!Total:")
]
df_white_alone_variables = df_all_variables[
    df_all_variables["LABEL"].str.endswith("White alone")
]

total_variables = list(df_total_variables["VARIABLE"])
white_alone_variables = list(df_white_alone_variables["VARIABLE"])

Load data

Now that we know what fields we are interested in we can load data for those fields at the county level for all 50 states and DC. Since we are going to want to plot it, we add with_geometry=True so that we get back a gpd.GeoDataFrame with the geometry of every county instead of a plain pd.DataFrame.

[6]:

gdf_counties = ced.download(
    DATASET,
    YEAR,
    total_variables + white_alone_variables,
    state=ALL_STATES_AND_DC,
    county="*",
    api_key=CENSUS_API_KEY,
    with_geometry=True,
)

[7]:

gdf_counties.head()

[7]:

	STATE	COUNTY	P2_001N	P2_005N	geometry
0	01	001	58805	41582	POLYGON ((-86.92120 32.65754, -86.92035 32.658...
1	01	003	231767	186495	POLYGON ((-88.02858 30.22676, -88.02399 30.230...
2	01	005	25223	11086	POLYGON ((-85.74803 31.61918, -85.74544 31.618...
3	01	007	22293	16442	POLYGON ((-87.42194 33.00338, -87.33177 33.005...
4	01	009	59134	49764	POLYGON ((-86.96336 33.85822, -86.95967 33.857...

Summarize the white population

The next step is to total up the people who identify as white, possibly with other races, and those who identify as white alone.

[8]:

gdf_counties["white_alone"] = gdf_counties[white_alone_variables].sum(axis=1)

[9]:

gdf_counties["pct_white_alone"] = 100.0 * (
    gdf_counties["white_alone"] / gdf_counties[total_variables[0]]
)

[10]:

gdf_counties[total_variables + ["white_alone", "pct_white_alone"]].head()

[10]:

	P2_001N	white_alone	pct_white_alone
0	58805	41582	70.711674
1	231767	186495	80.466589
2	25223	11086	43.951949
3	22293	16442	73.754093
4	59134	49764	84.154632

[11]:

gdf_counties[["pct_white_alone"]].describe()

[11]:

	pct_white_alone
count	3143.000000
mean	74.150915
std	19.804085
min	1.776396
25%	62.575904
50%	80.725998
75%	90.337848
max	97.404829

Load state shapefile

When we plot the data on a map, we also want to plot the boundaries of the states. The API we will use will download the shapefiles from the US Census and cache them in a local filesystem.

If you want to browse available files you can look at the US Census cartographic boundary page.

The specific zip file the code will download for us is the 2020 500,000:1 scale state boundary file `cb_2020_us_state_500k.zip <https://www2.census.gov/geo/tiger/GENZ2020/shp/cb_2020_us_state_500k.zip>`__.

[12]:

reader = cdm.ShapeReader(year=YEAR)

[13]:

gdf_state_bounds = reader.read_cb_shapefile("us", "state")
gdf_state_bounds = gdf_state_bounds[gdf_state_bounds["STATEFP"].isin(ALL_STATES_AND_DC)]

Plot on a map

This is a basic plot, that you can style as you wish. Note that we use cdm.plot_us and cdm.plot_us_boundary because they take care of moving Alaska and Hawaii to a location where they are easy to visualize. If we did not do this, they would appear in their actual geographic locations relatively far from the continental US and the Aleutian islands would be split at ±180° longitude.

[14]:

import matplotlib.pyplot as plt

plt.rcParams["figure.figsize"] = (14, 8)

col, name = "pct_white_alone", "White Alone"

ax = cdm.plot_us(
    gdf_counties,
    col,
    cmap="gray",
    legend=True,
    vmin=0.0,
    vmax=100.0,
)

ax = cdm.plot_us_boundary(gdf_state_bounds, edgecolor="black", linewidth=0.5, ax=ax)

ax.set_title(f"{name} Population as a Percent of County Population")

ax.tick_params(
    left=False,
    right=False,
    bottom=False,
    labelleft=False,
    labelbottom=False,
)

[ ]: