Census Bureau API¶

Functions for working with the U.S. Census Bureau API.

API Documentation¶

oi_tools.census_api.get_acs( variables: Sequence[str] | str, *, years: Sequence[int] | int | None = None, acs_version: Literal['acs1', 'acs3', 'acs5'] = 'acs5', geography: Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group'] = 'us', state: Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None = None, include_metadata: bool = True, include_ses: bool = True, handle_exception_values: bool = True, api_key: str | None = None, ) → DataFrame¶

Retrieve American Community Survey data.

This function is a thin wrapper around get_variables() that offers a convenient way to get ACS survey data.

Parameters:

variables (Sequence[str] | str) – ACS variable codes. Trailing E or M suffixes are stripped; both estimates and margins of error are fetched automatically when include_ses=True.
years (Sequence[int] | int | None) – Survey years to retrieve. Defaults to all available years for the given acs_version.
acs_version (Literal['acs1', 'acs3', 'acs5']) – ACS product: "acs1", "acs3", or "acs5".
geography (Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group']) – Geographic level of aggregation.
state (Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None) – Two-letter state abbreviation. Required for "tract" and "block group" geographies.
include_metadata (bool) – If True, join variable metadata (concept, label) onto the result.
include_ses (bool) – If True, include a se column with standard errors derived from the margin of error.
handle_exception_values (bool) – If True, replace Census exception sentinel values with null.
api_key (str | None) – Census API key. Falls back to the CENSUS_API_KEY environment variable if not provided.

Returns:

DataFrame with one row per (year, geography, variable).

Return type:

pl.DataFrame

Examples

Fetch median household income at the county level, without standard errors or metadata:

>>> import oi_tools.census_api as census
>>> df = census.get_acs(
...     "B19013_001",
...     years=2021,
...     geography="county",
...     include_ses=False,
...     include_metadata=False,
... )
>>> df.columns
['year', 'county', 'variable', 'value']
>>> print(df)
shape: (3_221, 4)
┌──────┬────────┬────────────┬─────────┐
│ year ┆ county ┆ variable   ┆ value   │
│ ---  ┆ ---    ┆ ---        ┆ ---     │
│ i32  ┆ cat    ┆ str        ┆ f32     │
╞══════╪════════╪════════════╪═════════╡
│ 2021 ┆ 01001  ┆ B19013_001 ┆ 62660.0 │
│ 2021 ┆ 01003  ┆ B19013_001 ┆ 64346.0 │
│ 2021 ┆ 01005  ┆ B19013_001 ┆ 36422.0 │
│ 2021 ┆ 01007  ┆ B19013_001 ┆ 54277.0 │
│ 2021 ┆ 01009  ┆ B19013_001 ┆ 52830.0 │
│ …    ┆ …      ┆ …          ┆ …       │
│ 2021 ┆ 72145  ┆ B19013_001 ┆ 21507.0 │
│ 2021 ┆ 72147  ┆ B19013_001 ┆ 14942.0 │
│ 2021 ┆ 72149  ┆ B19013_001 ┆ 20722.0 │
│ 2021 ┆ 72151  ┆ B19013_001 ┆ 17267.0 │
│ 2021 ┆ 72153  ┆ B19013_001 ┆ 16444.0 │
└──────┴────────┴────────────┴─────────┘

Include standard errors and metadata (default):

>>> df = census.get_acs(
...     "B19013_001",
...     years=2021,
...     geography="county",
... )
>>> df.columns
['year', 'county', 'concept', 'label', 'variable', 'value', 'se']

Multiple years and variables are supported:

>>> df = census.get_acs(
...     ["B19013_001", "B01003_001"],
...     years=[2019, 2021],
...     geography="county",
...     include_ses=False,
...     include_metadata=True,
... )
>>> print(df)
shape: (12_882, 6)
┌──────┬────────┬──────────────────┬──────────────────┬────────────┬──────────┐
│ year ┆ county ┆ concept          ┆ label            ┆ variable   ┆ value    │
│ ---  ┆ ---    ┆ ---              ┆ ---              ┆ ---        ┆ ---      │
│ i32  ┆ cat    ┆ str              ┆ list[str]        ┆ str        ┆ f32      │
╞══════╪════════╪══════════════════╪══════════════════╪════════════╪══════════╡
│ 2019 ┆ 01001  ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 55380.0  │
│ 2019 ┆ 01001  ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 58731.0  │
│ 2019 ┆ 01003  ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 212830.0 │
│ 2019 ┆ 01003  ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 58320.0  │
│ 2019 ┆ 01005  ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 25361.0  │
│ …    ┆ …      ┆ …                ┆ …                ┆ …          ┆ …        │
│ 2021 ┆ 72149  ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 20722.0  │
│ 2021 ┆ 72151  ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 31047.0  │
│ 2021 ┆ 72151  ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 17267.0  │
│ 2021 ┆ 72153  ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 34704.0  │
│ 2021 ┆ 72153  ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 16444.0  │
└──────┴────────┴──────────────────┴──────────────────┴────────────┴──────────┘

oi_tools.census_api.get_metadata( dataset: Literal['acs/acs5', 'dec/sf3', 'geoinfo'] | str, years: int | Sequence[int], *, api_key: str | None = None, ) → DataFrame¶

Retrieve variable metadata for a Census dataset.

Parameters:

dataset (Literal['acs/acs5', 'dec/sf3', 'geoinfo'] | str) – Dataset identifier, e.g. "acs/acs5", "dec/sf3", or geoinfo. See Dataset for more info.
years (int | Sequence[int]) – One or more survey years.
api_key (str | None) – Census API key. Falls back to the CENSUS_API_KEY environment variable if not provided.

Returns:

DataFrame with columns year, variable, group, row, concept, and label.

Return type:

pl.DataFrame

Examples

Retrieve variable metadata for the ACS 5-year survey:

>>> import oi_tools.census_api as census
>>> df = census.get_metadata("acs/acs5", 2021)
>>> df.columns
['year', 'variable', 'group', 'row', 'concept', 'label']

oi_tools.census_api.get_variables( dataset: Literal['acs/acs5', 'dec/sf3', 'geoinfo'] | str, *, years: int | Sequence[int], variables: str | Sequence[str], geography: Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group'] = 'us', state: Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None = None, include_metadata: bool = True, handle_exception_values: bool = True, api_key: str | None = None, ) → DataFrame¶

Fetch variables from the Census API.

Parameters:

dataset (Literal['acs/acs5', 'dec/sf3', 'geoinfo'] | str) – Dataset identifier, e.g. "acs/acs5" or "dec/sf3". See Dataset for more info.
years (int | Sequence[int]) – One or more survey years.
variables (str | Sequence[str]) – Variable codes to retrieve. Accepts "group(CODE)" syntax to fetch entire variable groups.
geography (Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group']) – Geographic level of aggregation.
state (Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None) – Two-letter state abbreviation. Required for "tract" and "block group" geographies.
include_metadata (bool) – If True, join variable metadata (concept, label) onto the result.
handle_exception_values (bool) – If True, replace Census exception sentinel values with null.
api_key (str | None) – Census API key. Falls back to the CENSUS_API_KEY environment variable if not provided.

Returns:

DataFrame with one row per (year, geography, variable).

Return type:

pl.DataFrame

Examples

Fetch county-level total population from the 2010 Decennial Census:

>>> import oi_tools.census_api as census
>>> df = census.get_variables(
...     "dec/sf1",
...     years=2010,
...     variables="P001001",
...     geography="county",
...     include_metadata=True,
... )
>>> df.columns
['year', 'county', 'concept', 'label', 'variable', 'value']
>>> print(df)
shape: (3_221, 6)
┌──────┬────────┬──────────────────┬───────────┬──────────┬──────────┐
│ year ┆ county ┆ concept          ┆ label     ┆ variable ┆ value    │
│ ---  ┆ ---    ┆ ---              ┆ ---       ┆ ---      ┆ ---      │
│ i32  ┆ cat    ┆ str              ┆ list[str] ┆ str      ┆ f32      │
╞══════╪════════╪══════════════════╪═══════════╪══════════╪══════════╡
│ 2010 ┆ 01001  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 54571.0  │
│ 2010 ┆ 01003  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 182265.0 │
│ 2010 ┆ 01005  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 27457.0  │
│ 2010 ┆ 01007  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 22915.0  │
│ 2010 ┆ 01009  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 57322.0  │
│ …    ┆ …      ┆ …                ┆ …         ┆ …        ┆ …        │
│ 2010 ┆ 72145  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 59662.0  │
│ 2010 ┆ 72147  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 9301.0   │
│ 2010 ┆ 72149  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 26073.0  │
│ 2010 ┆ 72151  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 37941.0  │
│ 2010 ┆ 72153  ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001  ┆ 42043.0  │
└──────┴────────┴──────────────────┴───────────┴──────────┴──────────┘

Multiple years and variables are supported:

>>> df = census.get_variables(
...     "dec/sf1",
...     years=[2000, 2010],
...     variables=["P001001", "P002001"],
...     geography="county",
...     include_metadata=False,
... )
>>> df.get_column("year").unique().sort().to_list()
[2000, 2010]
>>> df.get_column("variable").unique().sort().to_list()
['P001001', 'P002001']

Use group(CODE) syntax to fetch all variables in a group at once:

>>> df = census.get_variables(
...     "dec/sf3",
...     years=2000,
...     variables="group(P148A)",
...     geography="county",
... )
>>> print(df)
shape: (54_723, 6)
┌──────┬────────┬──────────────────┬──────────────────┬──────────┬─────────┐
│ year ┆ county ┆ concept          ┆ label            ┆ variable ┆ value   │
│ ---  ┆ ---    ┆ ---              ┆ ---              ┆ ---      ┆ ---     │
│ i32  ┆ cat    ┆ str              ┆ list[str]        ┆ str      ┆ f32     │
╞══════╪════════╪══════════════════╪══════════════════╪══════════╪═════════╡
│ 2000 ┆ 01001  ┆ SEX BY EDUCATIO… ┆ ["Total"]        ┆ P148A001 ┆ 22790.0 │
│ 2000 ┆ 01001  ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A002 ┆ 10954.0 │
│ 2000 ┆ 01001  ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A003 ┆ 424.0   │
│ 2000 ┆ 01001  ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A004 ┆ 1503.0  │
│ 2000 ┆ 01001  ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A005 ┆ 3461.0  │
│ …    ┆ …      ┆ …                ┆ …                ┆ …        ┆ …       │
│ 2000 ┆ 72153  ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A013 ┆ 2747.0  │
│ 2000 ┆ 72153  ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A014 ┆ 1255.0  │
│ 2000 ┆ 72153  ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A015 ┆ 1107.0  │
│ 2000 ┆ 72153  ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A016 ┆ 2104.0  │
│ 2000 ┆ 72153  ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A017 ┆ 368.0   │
└──────┴────────┴──────────────────┴──────────────────┴──────────┴─────────┘
>>> for label in df.get_column("label").unique().sort().to_list():
...     print(label)
['Total']
['Total', 'Female']
['Total', 'Female', '9th to 12th grade, no diploma']
['Total', 'Female', 'Associate degree']
['Total', 'Female', "Bachelor's degree"]
['Total', 'Female', 'Graduate or professional degree']
['Total', 'Female', 'High school graduate (includes equivalency)']
['Total', 'Female', 'Less than 9th grade']
['Total', 'Female', 'Some college, no degree']
['Total', 'Male']
['Total', 'Male', '9th to 12th grade, no diploma']
['Total', 'Male', 'Associate degree']
['Total', 'Male', "Bachelor's degree"]
['Total', 'Male', 'Graduate or professional degree']
['Total', 'Male', 'High school graduate (includes equivalency)']
['Total', 'Male', 'Less than 9th grade']
['Total', 'Male', 'Some college, no degree']