Census Bureau API¶
Functions for working with the U.S. Census Bureau API.
API Documentation¶
- oi_tools.census_api.get_acs(
- variables: Sequence[str] | str,
- *,
- years: Sequence[int] | int | None = None,
- acs_version: Literal['acs1', 'acs3', 'acs5'] = 'acs5',
- geography: Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group'] = 'us',
- state: Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None = None,
- include_metadata: bool = True,
- include_ses: bool = True,
- handle_exception_values: bool = True,
- api_key: str | None = None,
Retrieve American Community Survey data.
This function is a thin wrapper around
get_variables()that offers a convenient way to get ACS survey data.- Parameters:
variables (Sequence[str] | str) – ACS variable codes. Trailing
EorMsuffixes are stripped; both estimates and margins of error are fetched automatically wheninclude_ses=True.years (Sequence[int] | int | None) – Survey years to retrieve. Defaults to all available years for the given
acs_version.acs_version (Literal['acs1', 'acs3', 'acs5']) – ACS product:
"acs1","acs3", or"acs5".geography (Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group']) – Geographic level of aggregation.
state (Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None) – Two-letter state abbreviation. Required for
"tract"and"block group"geographies.include_metadata (bool) – If
True, join variable metadata (concept, label) onto the result.include_ses (bool) – If
True, include asecolumn with standard errors derived from the margin of error.handle_exception_values (bool) – If
True, replace Census exception sentinel values with null.api_key (str | None) – Census API key. Falls back to the
CENSUS_API_KEYenvironment variable if not provided.
- Returns:
DataFrame with one row per (year, geography, variable).
- Return type:
pl.DataFrame
Examples
Fetch median household income at the county level, without standard errors or metadata:
>>> import oi_tools.census_api as census >>> df = census.get_acs( ... "B19013_001", ... years=2021, ... geography="county", ... include_ses=False, ... include_metadata=False, ... ) >>> df.columns ['year', 'county', 'variable', 'value'] >>> print(df) shape: (3_221, 4) ┌──────┬────────┬────────────┬─────────┐ │ year ┆ county ┆ variable ┆ value │ │ --- ┆ --- ┆ --- ┆ --- │ │ i32 ┆ cat ┆ str ┆ f32 │ ╞══════╪════════╪════════════╪═════════╡ │ 2021 ┆ 01001 ┆ B19013_001 ┆ 62660.0 │ │ 2021 ┆ 01003 ┆ B19013_001 ┆ 64346.0 │ │ 2021 ┆ 01005 ┆ B19013_001 ┆ 36422.0 │ │ 2021 ┆ 01007 ┆ B19013_001 ┆ 54277.0 │ │ 2021 ┆ 01009 ┆ B19013_001 ┆ 52830.0 │ │ … ┆ … ┆ … ┆ … │ │ 2021 ┆ 72145 ┆ B19013_001 ┆ 21507.0 │ │ 2021 ┆ 72147 ┆ B19013_001 ┆ 14942.0 │ │ 2021 ┆ 72149 ┆ B19013_001 ┆ 20722.0 │ │ 2021 ┆ 72151 ┆ B19013_001 ┆ 17267.0 │ │ 2021 ┆ 72153 ┆ B19013_001 ┆ 16444.0 │ └──────┴────────┴────────────┴─────────┘
Include standard errors and metadata (default):
>>> df = census.get_acs( ... "B19013_001", ... years=2021, ... geography="county", ... ) >>> df.columns ['year', 'county', 'concept', 'label', 'variable', 'value', 'se']
Multiple years and variables are supported:
>>> df = census.get_acs( ... ["B19013_001", "B01003_001"], ... years=[2019, 2021], ... geography="county", ... include_ses=False, ... include_metadata=True, ... ) >>> print(df) shape: (12_882, 6) ┌──────┬────────┬──────────────────┬──────────────────┬────────────┬──────────┐ │ year ┆ county ┆ concept ┆ label ┆ variable ┆ value │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i32 ┆ cat ┆ str ┆ list[str] ┆ str ┆ f32 │ ╞══════╪════════╪══════════════════╪══════════════════╪════════════╪══════════╡ │ 2019 ┆ 01001 ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 55380.0 │ │ 2019 ┆ 01001 ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 58731.0 │ │ 2019 ┆ 01003 ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 212830.0 │ │ 2019 ┆ 01003 ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 58320.0 │ │ 2019 ┆ 01005 ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 25361.0 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 2021 ┆ 72149 ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 20722.0 │ │ 2021 ┆ 72151 ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 31047.0 │ │ 2021 ┆ 72151 ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 17267.0 │ │ 2021 ┆ 72153 ┆ TOTAL POPULATIO… ┆ ["Estimate", "T… ┆ B01003_001 ┆ 34704.0 │ │ 2021 ┆ 72153 ┆ MEDIAN HOUSEHOL… ┆ ["Estimate", "M… ┆ B19013_001 ┆ 16444.0 │ └──────┴────────┴──────────────────┴──────────────────┴────────────┴──────────┘
- oi_tools.census_api.get_metadata(
- dataset: Literal['acs/acs5', 'dec/sf3', 'geoinfo'] | str,
- years: int | Sequence[int],
- *,
- api_key: str | None = None,
Retrieve variable metadata for a Census dataset.
- Parameters:
- Returns:
DataFrame with columns
year,variable,group,row,concept, andlabel.- Return type:
pl.DataFrame
Examples
Retrieve variable metadata for the ACS 5-year survey:
>>> import oi_tools.census_api as census >>> df = census.get_metadata("acs/acs5", 2021) >>> df.columns ['year', 'variable', 'group', 'row', 'concept', 'label']
- oi_tools.census_api.get_variables(
- dataset: Literal['acs/acs5', 'dec/sf3', 'geoinfo'] | str,
- *,
- years: int | Sequence[int],
- variables: str | Sequence[str],
- geography: Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group'] = 'us',
- state: Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None = None,
- include_metadata: bool = True,
- handle_exception_values: bool = True,
- api_key: str | None = None,
Fetch variables from the Census API.
- Parameters:
dataset (Literal['acs/acs5', 'dec/sf3', 'geoinfo'] | str) – Dataset identifier, e.g.
"acs/acs5"or"dec/sf3". SeeDatasetfor more info.variables (str | Sequence[str]) – Variable codes to retrieve. Accepts
"group(CODE)"syntax to fetch entire variable groups.geography (Literal['us', 'region', 'division', 'state', 'county', 'tract', 'block group']) – Geographic level of aggregation.
state (Literal['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'FL', 'GA', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY', 'DC'] | None) – Two-letter state abbreviation. Required for
"tract"and"block group"geographies.include_metadata (bool) – If
True, join variable metadata (concept, label) onto the result.handle_exception_values (bool) – If
True, replace Census exception sentinel values with null.api_key (str | None) – Census API key. Falls back to the
CENSUS_API_KEYenvironment variable if not provided.
- Returns:
DataFrame with one row per (year, geography, variable).
- Return type:
pl.DataFrame
Examples
Fetch county-level total population from the 2010 Decennial Census:
>>> import oi_tools.census_api as census >>> df = census.get_variables( ... "dec/sf1", ... years=2010, ... variables="P001001", ... geography="county", ... include_metadata=True, ... ) >>> df.columns ['year', 'county', 'concept', 'label', 'variable', 'value'] >>> print(df) shape: (3_221, 6) ┌──────┬────────┬──────────────────┬───────────┬──────────┬──────────┐ │ year ┆ county ┆ concept ┆ label ┆ variable ┆ value │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i32 ┆ cat ┆ str ┆ list[str] ┆ str ┆ f32 │ ╞══════╪════════╪══════════════════╪═══════════╪══════════╪══════════╡ │ 2010 ┆ 01001 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 54571.0 │ │ 2010 ┆ 01003 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 182265.0 │ │ 2010 ┆ 01005 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 27457.0 │ │ 2010 ┆ 01007 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 22915.0 │ │ 2010 ┆ 01009 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 57322.0 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 2010 ┆ 72145 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 59662.0 │ │ 2010 ┆ 72147 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 9301.0 │ │ 2010 ┆ 72149 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 26073.0 │ │ 2010 ┆ 72151 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 37941.0 │ │ 2010 ┆ 72153 ┆ TOTAL POPULATIO… ┆ ["Total"] ┆ P001001 ┆ 42043.0 │ └──────┴────────┴──────────────────┴───────────┴──────────┴──────────┘
Multiple years and variables are supported:
>>> df = census.get_variables( ... "dec/sf1", ... years=[2000, 2010], ... variables=["P001001", "P002001"], ... geography="county", ... include_metadata=False, ... ) >>> df.get_column("year").unique().sort().to_list() [2000, 2010] >>> df.get_column("variable").unique().sort().to_list() ['P001001', 'P002001']
Use
group(CODE)syntax to fetch all variables in a group at once:>>> df = census.get_variables( ... "dec/sf3", ... years=2000, ... variables="group(P148A)", ... geography="county", ... ) >>> print(df) shape: (54_723, 6) ┌──────┬────────┬──────────────────┬──────────────────┬──────────┬─────────┐ │ year ┆ county ┆ concept ┆ label ┆ variable ┆ value │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ i32 ┆ cat ┆ str ┆ list[str] ┆ str ┆ f32 │ ╞══════╪════════╪══════════════════╪══════════════════╪══════════╪═════════╡ │ 2000 ┆ 01001 ┆ SEX BY EDUCATIO… ┆ ["Total"] ┆ P148A001 ┆ 22790.0 │ │ 2000 ┆ 01001 ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A002 ┆ 10954.0 │ │ 2000 ┆ 01001 ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A003 ┆ 424.0 │ │ 2000 ┆ 01001 ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A004 ┆ 1503.0 │ │ 2000 ┆ 01001 ┆ SEX BY EDUCATIO… ┆ ["Total", "Male… ┆ P148A005 ┆ 3461.0 │ │ … ┆ … ┆ … ┆ … ┆ … ┆ … │ │ 2000 ┆ 72153 ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A013 ┆ 2747.0 │ │ 2000 ┆ 72153 ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A014 ┆ 1255.0 │ │ 2000 ┆ 72153 ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A015 ┆ 1107.0 │ │ 2000 ┆ 72153 ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A016 ┆ 2104.0 │ │ 2000 ┆ 72153 ┆ SEX BY EDUCATIO… ┆ ["Total", "Fema… ┆ P148A017 ┆ 368.0 │ └──────┴────────┴──────────────────┴──────────────────┴──────────┴─────────┘ >>> for label in df.get_column("label").unique().sort().to_list(): ... print(label) ['Total'] ['Total', 'Female'] ['Total', 'Female', '9th to 12th grade, no diploma'] ['Total', 'Female', 'Associate degree'] ['Total', 'Female', "Bachelor's degree"] ['Total', 'Female', 'Graduate or professional degree'] ['Total', 'Female', 'High school graduate (includes equivalency)'] ['Total', 'Female', 'Less than 9th grade'] ['Total', 'Female', 'Some college, no degree'] ['Total', 'Male'] ['Total', 'Male', '9th to 12th grade, no diploma'] ['Total', 'Male', 'Associate degree'] ['Total', 'Male', "Bachelor's degree"] ['Total', 'Male', 'Graduate or professional degree'] ['Total', 'Male', 'High school graduate (includes equivalency)'] ['Total', 'Male', 'Less than 9th grade'] ['Total', 'Male', 'Some college, no degree']