Population Change 2020-2021

ACS Data

Part 1 of this notebook demonstrates how to query and merge the same data set from different years. It also demonstrates how, behind the scenes, censusdis loads metadata and works across multiple years. Nothing is hard-coded in the library about what variables or data sets are available in what years. So when the 2021 data was published, this notebook just worked.

However, it also demonstrates some issues one must be keenly aware of when looking at ACS data. ACS5 data covers a wide range of variables, but it also computes representative numbers over a five year time period, not numbers as of a specific data as some other data sets do. So the values presented in the 2020 data set are based on survey results from 2016, 2017, 2018, 2019, and 2020. If we compare 2021 ACS5 to 2020 ACS5, 4/5 of the sample comes from 2017, 2018, 2019 and 2020 in both. In the 2020 data set, there is also data from 2016, whereas in the 2021 data set there is also data from 2021.

PEP Data

Part 2 looks at the Population Estimates (PEP) data set, which seeks to estimate population at a specific point in time, specifically June 1 of each year. It also tracks changes from the previous year in both numeric and percentage terms.

As we can see by comparing the data and maps in parts 1 and 2, it is important to understand not just what was measured in each data set, but how. Look at the difference in New York State for a particularly extreme difference based on which data set we look at.

[1]:
# So we can run from within the censusdis project and find the packages we need.
import os
import sys

sys.path.append(
    os.path.join(os.path.abspath(os.path.join(os.path.curdir, os.path.pardir)))
)

Imports and configuration

[2]:
import censusdis.data as ced
import censusdis.maps as cem

from censusdis.states import STATE_NY

Part 1: Using ACS5 Data

[3]:
DATASET_ACS5 = "acs/acs5"
[4]:
TOTAL_POPULATION_VARIABLE = "B01003_001E"
[5]:
gdf_states_2020 = ced.download(
    DATASET_ACS5,
    2020,
    ["NAME", TOTAL_POPULATION_VARIABLE],
    state="*",
    with_geometry=True,
)

# Note that we don't load the geometry here.
# We already have it above, and it will
# still be available after the merge.
df_states_2021 = ced.download(
    DATASET_ACS5, 2021, ["NAME", TOTAL_POPULATION_VARIABLE], state="*"
)
[6]:
gdf_both_years = gdf_states_2020.merge(
    df_states_2021, on=["STATE", "NAME"], suffixes=("_2020", "_2021")
)
[7]:
TOTAL_POPULATION_2020_VARIABLE = f"{TOTAL_POPULATION_VARIABLE}_2020"
TOTAL_POPULATION_2021_VARIABLE = f"{TOTAL_POPULATION_VARIABLE}_2021"

TOTAL_POPULATION_CHANGE = f"{TOTAL_POPULATION_VARIABLE}_DELTA"
TOTAL_POPULATION_CHANGE_PERCENT = f"{TOTAL_POPULATION_CHANGE}_PERCENT"
[8]:
gdf_both_years[TOTAL_POPULATION_CHANGE] = (
    gdf_both_years[TOTAL_POPULATION_2021_VARIABLE]
    - gdf_both_years[TOTAL_POPULATION_2020_VARIABLE]
)

gdf_both_years[TOTAL_POPULATION_CHANGE_PERCENT] = (
    100
    * gdf_both_years[TOTAL_POPULATION_CHANGE]
    / gdf_both_years[TOTAL_POPULATION_2020_VARIABLE]
)

Absolute

[9]:
gdf_both_years.sort_values(TOTAL_POPULATION_CHANGE, ascending=False)[
    ["NAME", TOTAL_POPULATION_CHANGE]
].head()
[9]:
NAME B01003_001E_DELTA
4 New York 599896
29 New Jersey 348606
24 Texas 227139
0 Pennsylvania 175765
7 Florida 122838

Percentagewise

[10]:
gdf_both_years.sort_values(TOTAL_POPULATION_CHANGE_PERCENT, ascending=False)[
    ["NAME", TOTAL_POPULATION_CHANGE_PERCENT]
].head()
[10]:
NAME B01003_001E_DELTA_PERCENT
29 New Jersey 3.923349
19 Idaho 3.263285
42 Rhode Island 3.228499
4 New York 3.074049
32 Vermont 2.770446

Fastest shrinking states

Absolute

[11]:
gdf_both_years.sort_values(TOTAL_POPULATION_CHANGE, ascending=True)[
    ["NAME", TOTAL_POPULATION_CHANGE]
].head()
[11]:
NAME B01003_001E_DELTA
16 Arizona -94861
33 North Carolina -19205
5 District of Columbia -18820
39 Mississippi -14812
8 South Carolina -12614

Percentagewise

[12]:
gdf_both_years.sort_values(TOTAL_POPULATION_CHANGE_PERCENT, ascending=True)[
    ["NAME", TOTAL_POPULATION_CHANGE_PERCENT]
].head()
[12]:
NAME B01003_001E_DELTA_PERCENT
5 District of Columbia -2.681011
16 Arizona -1.322277
15 Wyoming -0.809670
39 Mississippi -0.496741
2 West Virginia -0.352822

Plot on maps

[13]:
from matplotlib.ticker import FuncFormatter

ax = cem.plot_us(
    gdf_both_years,
    TOTAL_POPULATION_CHANGE,
    legend=True,
    figsize=(12, 6),
    cmap="PRGn",
    edgecolor="black",
    linewidth=0.25,
    vmin=-600_000,
    vmax=600_000,
    legend_kwds={"format": FuncFormatter(lambda x, pos: f"{x:+,.0f}")},
)

ax.axis("off")
ax.set_title("Total Population Change by State, 2020-2021")
[13]:
Text(0.5, 1.0, 'Total Population Change by State, 2020-2021')
../_images/nb_Population_Change_2020-2021_25_1.png
[14]:
ax = cem.plot_us(
    gdf_both_years,
    TOTAL_POPULATION_CHANGE_PERCENT,
    legend=True,
    figsize=(12, 6),
    cmap="PRGn",
    edgecolor="black",
    linewidth=0.25,
    vmin=-4,
    vmax=4,
    legend_kwds={"format": FuncFormatter(lambda x, pos: f"{x:+.1f}%")},
)

ax.axis("off")
ax.set_title("Total Population Percentage Change by State, 2020-2021")
[14]:
Text(0.5, 1.0, 'Total Population Percentage Change by State, 2020-2021')
../_images/nb_Population_Change_2020-2021_26_1.png

Part 2: Using the Population Estimates Dataset

Instead of using two different years of the ACS5 data set, we could also use a specialized U.S. Census Data Set that estimates population and population change. It is called pep/population and is described at https://api.census.gov/data/2021/pep/population.html.

[15]:
DATASET_PEP = "pep/population"
[16]:
# See https://api.census.gov/data/2021/pep/population/variables.html

POP_2020_VARIABLE = "POP_2020"
POP_2021_VARIABLE = "POP_2021"
NUM_CHANGE_VARIABLE = "NPOPCHG_2021"
PCT_CHANGE_VARIABLE = "PPOPCHG_2021"

PEP_VARIABLES = [
    "NAME",
    POP_2020_VARIABLE,
    POP_2021_VARIABLE,
    NUM_CHANGE_VARIABLE,
    PCT_CHANGE_VARIABLE,
]
[17]:
gdf_pep = ced.download(DATASET_PEP, 2021, PEP_VARIABLES, state="*", with_geometry=True)

Absolute

[18]:
gdf_pep[["NAME", NUM_CHANGE_VARIABLE]].sort_values(
    NUM_CHANGE_VARIABLE, ascending=False
).head()
[18]:
NAME NPOPCHG_2021
10 Texas 310288
40 Florida 211196
45 Arizona 98330
15 North Carolina 93985
25 Georgia 73766

Percentagewise

[19]:
gdf_pep[["NAME", PCT_CHANGE_VARIABLE]].sort_values(
    PCT_CHANGE_VARIABLE, ascending=False
).head()
[19]:
NAME PPOPCHG_2021
34 Idaho 2.876491
46 Utah 1.715308
36 Montana 1.664345
45 Arizona 1.369883
17 South Carolina 1.168957

Fastest shrinking states

Absolute

[20]:
gdf_pep[["NAME", NUM_CHANGE_VARIABLE]].sort_values(
    NUM_CHANGE_VARIABLE, ascending=True
).head()
[20]:
NAME NPOPCHG_2021
37 New York -319020
19 California -261902
43 Illinois -113776
42 Massachusetts -37497
21 Louisiana -27156

Percentagewise

[21]:
gdf_pep[["NAME", PCT_CHANGE_VARIABLE]].sort_values(
    PCT_CHANGE_VARIABLE, ascending=True
).head()
[21]:
NAME PPOPCHG_2021
9 District of Columbia -2.904391
37 New York -1.582838
43 Illinois -0.889901
2 Hawaii -0.713405
19 California -0.663047

Plot on maps

[22]:
from matplotlib.ticker import FuncFormatter

ax = cem.plot_us(
    gdf_pep,
    NUM_CHANGE_VARIABLE,
    legend=True,
    figsize=(12, 6),
    cmap="PRGn",
    edgecolor="black",
    linewidth=0.25,
    vmin=-600_000,
    vmax=600_000,
    legend_kwds={"format": FuncFormatter(lambda x, pos: f"{x:+,.0f}")},
)

ax.axis("off")
ax.set_title("PEP Data - Total Population Change by State, 2020-2021")
[22]:
Text(0.5, 1.0, 'PEP Data - Total Population Change by State, 2020-2021')
../_images/nb_Population_Change_2020-2021_43_1.png
[23]:
ax = cem.plot_us(
    gdf_pep,
    PCT_CHANGE_VARIABLE,
    legend=True,
    figsize=(12, 6),
    cmap="PRGn",
    edgecolor="black",
    linewidth=0.25,
    vmin=-4,
    vmax=4,
    legend_kwds={"format": FuncFormatter(lambda x, pos: f"{x:+.1f}%")},
)

ax.axis("off")
ax.set_title("PEP Data - Total Population Percentage Change by State, 2020-2021")
[23]:
Text(0.5, 1.0, 'PEP Data - Total Population Percentage Change by State, 2020-2021')
../_images/nb_Population_Change_2020-2021_44_1.png

Part 3. Compare NY Results

[24]:
gdf_both_years[gdf_both_years.STATE == STATE_NY][
    ["NAME", TOTAL_POPULATION_CHANGE, TOTAL_POPULATION_CHANGE_PERCENT]
]
[24]:
NAME B01003_001E_DELTA B01003_001E_DELTA_PERCENT
4 New York 599896 3.074049
[25]:
gdf_pep[gdf_pep.STATE == STATE_NY][["NAME", NUM_CHANGE_VARIABLE, PCT_CHANGE_VARIABLE]]
[25]:
NAME NPOPCHG_2021 PPOPCHG_2021
37 New York -319020 -1.582838
[ ]: