Weighted Statistics¶
Statistics-related helper functions including weighted means, variances, and correlations.
API Documentation¶
- oi_tools.stats.center( ) Expr¶
Subtract the weighted mean from an expression.
- Parameters:
- Returns:
An expression with mean zero.
- Return type:
pl.Expr
Examples
>>> df = pl.DataFrame({"x": [1.0, 2.0, 3.0], "w": [1.0, 1.0, 2.0]}) >>> df.select(center("x", "w")).to_series().to_list() [-1.25, -0.25, 0.75]
- oi_tools.stats.scale(
- x: Expr | str | int | float,
- w: Expr | str | int | float,
- *,
- weight_type: Literal['frequency', 'precision'] = 'precision',
- ddof: int = 1,
Divide an expression by its weighted standard deviation.
- Parameters:
- Returns:
An expression with unit variance.
- Return type:
pl.Expr
Examples
>>> df = pl.DataFrame({"x": [0.0, 1.0, 2.0], "w": [1.0, 3.0, 1.0]}) >>> df.select(scale("x", "w", weight_type="frequency")).to_series().round( ... 4 ... ).to_list() [0.0, 1.4142, 2.8284]
- oi_tools.stats.weighted_covariance(
- x: Expr | str | int | float,
- y: Expr | str | int | float,
- w: Expr | str | int | float | None = None,
- *,
- weight_type: Literal['frequency', 'precision'] = 'precision',
- ddof: int = 1,
Compute the weighted covariance between two expressions.
Rows where any of
x,y, orwis null are omitted. See the documentation for np.cov for more.- Parameters:
weight_type (Literal['frequency', 'precision']) –
The type of weight.
"frequency"weights treat each weight as a repeat count, giving normalization1 / (sum(w) - ddof). Likefweightsin Stata."precision"(analytic/reliability) weights treat each weight as an inverse variance, giving normalizationsum(w) / (sum(w)**2 - ddof * sum(w**2)). Likeaweightsin Stata.
ddof (int) – Delta degrees of freedom.
- Returns:
The weighted covariance.
- Return type:
pl.Expr
Examples
>>> df = pl.DataFrame( ... {"x": [0.0, 1.0, 2.0], "y": [2.0, 1.0, 0.0], "w": [1.0, 3.0, 1.0]} ... ) >>> df.select(weighted_covariance("x", "y", "w", weight_type="frequency")).item() -0.5
>>> df = pl.DataFrame( ... {"x": [0.0, 1.0, 2.0], "y": [2.0, 1.0, 0.0], "w": [1.0, 2.0, 1.0]} ... ) >>> df.select(weighted_covariance("x", "y", "w", weight_type="precision")).item() -0.8
See also
- oi_tools.stats.weighted_mean( ) Expr¶
Compute the weighted mean of an expression.
Rows where either
xorwis null are omitted.- Parameters:
- Returns:
The weighted mean.
- Return type:
pl.Expr
Examples
>>> df = pl.DataFrame({"x": [0.0, 1.0], "w": [1.0, 3.0]}) >>> df.select(weighted_mean("x", "w")).item() 0.75
Null values in either
xorware omitted:>>> df = pl.DataFrame({"x": [0.0, None, 1.0], "w": [1.0, 1.0, 3.0]}) >>> df.select(weighted_mean("x", "w")).item() 0.75
- oi_tools.stats.weighted_rank(
- x: str | Collection[str] | Selector | Expr | int | float,
- w: str | Collection[str] | Selector | Expr | int | float,
- *,
- ties: Literal['arbitrary', 'average'] = 'average',
Compute the weighted quantile rank of an expression.
- Parameters:
x (str | Collection[str] | Selector | Expr | int | float) – The values to rank.
w (str | Collection[str] | Selector | Expr | int | float) – The weights associated with each value.
ties (Literal['arbitrary', 'average']) –
How to handle assigning quantiles in the case of ties:
"arbitrary": break ties arbitrarily,"average": assign each unit the average rank of all units with the samexvalue.
- Returns:
A Polars expression producing ranks in (0, 1).
- Return type:
pl.Expr
Examples
>>> df = pl.DataFrame({"x": [1.0, 2.0], "w": [1.0, 3.0]}) >>> df.select(weighted_rank("x", "w")).to_series().to_list() [0.125, 0.625]
Notes
Behavior is undefined if
wcontains null values.
- oi_tools.stats.weighted_variance(
- x: Expr | str | int | float,
- w: Expr | str | int | float | None = None,
- *,
- weight_type: Literal['frequency', 'precision'] = 'precision',
- ddof: int = 1,
Compute the weighted variance of an expression.
Rows where either
xorwis null are omitted. See the documentation for np.cov for more.- Parameters:
- Returns:
The weighted variance.
- Return type:
pl.Expr
Examples
>>> df = pl.DataFrame({"x": [0.0, 1.0, 2.0], "w": [1.0, 3.0, 1.0]}) >>> df.select(weighted_variance("x", "w", weight_type="frequency")).item() 0.5
>>> df = pl.DataFrame({"x": [0.0, 1.0, 2.0], "w": [1.0, 2.0, 1.0]}) >>> df.select(weighted_variance("x", "w", weight_type="precision")).item() 0.8
See also