Debiased / desparsified lasso¶

Confidence intervals and Wald p-values for high-dimensional penalized fits. Van de Geer–Bühlmann–Ritov (VBR 2014) for least squares; the GLM extension uses the canonical-link score plus the Fisher information as the surrogate Gram.

The penalized lasso estimator β̂ is biased toward zero — the penalty trades variance for bias. The debiased estimator corrects the bias by walking back along an approximate score direction:

\[ \hat\beta_d \;=\; \hat\beta \;+\; \frac{1}{n}\,\hat\Theta\,X^\top\!\bigl(y - \hat\mu(\hat\beta)\bigr) \]

with Θ̂ ≈ Σ⁻¹ (LS) or Θ̂ ≈ J⁻¹ (GLM, with J = (1/n) Xᵀ W X the empirical Fisher), built nodewise — one lasso per column. Under sparsity + restricted-eigenvalue regularity:

\[ \sqrt{n}\,(\hat\beta_d - \beta) \;\rightsquigarrow\; \mathcal N\!\bigl(0,\;\sigma^2 \hat\Theta\hat\Sigma\hat\Theta^\top\bigr) \]

for LS (no σ² for GLMs — the noise is in W). Diagonal of that matrix gives the per-coordinate standard error, and the rest is a two-sided Wald test.

Why¶

glmnet / ncvreg / grpreg ship the penalized fit but no inference. The canonical R interface for this is hdi::lasso.proj; skein matches that semantics — free function on (X, y) returning a result dataclass, plus a thin sklearn-style estimator wrapper for pipeline use.

LS¶

skein_glm.debiased.debiased_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶

Van de Geer–Bühlmann–Ritov debiased lasso for least squares.

Parameters:

X (array-like (n, p))
y (array-like (n,))
lambda (float or None, default None) – Regularisation for the main lasso fit. If None, the theoretical scale √(2 log p / n) is used on standardized features. For best inference quality, pass a CV-tuned λ from a prior ElasticNetPathCV fit.
lambda_nodewise (float, array-like (p,), or None, default None) – Per-column λ for the nodewise lassos that build Θ̂. Scalar broadcasts to every column; an array selects per column. None uses √(2 log p / n) uniformly (the VBR theoretical choice on standardized columns).
alpha (float, default 0.05) – Two-sided CI / p-value level.
fit_intercept (bool, default True) – Center y and X; intercept is recovered post-fit. The intercept itself is not an inferential target — VBR theory applies to the slopes.
standardize (bool, default True) – Scale columns of X to unit variance before fitting. The dimensionless lambda_nodewise default is calibrated to standardized columns; turning this off without supplying column-specific lambda_nodewise typically gives wrong scale on the CIs.
max_iter (int, float) – Forwarded to every underlying lasso fit.
tol (int, float) – Forwarded to every underlying lasso fit.
n_jobs (int or None) – joblib parallelism over the p nodewise lassos. -1 uses all cores. The main lasso is single-threaded; the nodewise loop dominates for p ≳ 50.
lambda_ (float | None)

Return type:

DebiasedLassoResult

Notes

Memory is O(p²) for Θ̂ plus O(n p) for the standardized design — practical up to p ~ 5000 on a modern laptop. For larger p, the nodewise loop runtime (p lasso fits) is the binding constraint rather than memory.

The asymptotic distribution requires (a) lasso consistency at rate ‖β̂ − β‖₁ = o(1/√(log p)), (b) row-wise sparsity of the true Σ⁻¹, (c) sub-Gaussian design and noise. Failures show up as poor coverage in the empirical CI simulation tests.

class skein_glm.debiased.DebiasedLassoResult(coef_debiased, coef_lasso, intercept_, se, ci_lower, ci_upper, pvalues, z_scores, sigma_hat, Theta, lambda_main, lambda_nodewise, alpha)[source]¶

Bases: object

Output of debiased_lasso().

All coefficient arrays are returned on the original (un- standardized) feature scale. Theta is on the standardized scale used internally (where the nodewise lasso defaults are dimensionless); it is exposed for inspection and reuse.

Parameters:

coef_debiased (ndarray[tuple[Any, ...], dtype[float64]])
coef_lasso (ndarray[tuple[Any, ...], dtype[float64]])
intercept_ (float)
se (ndarray[tuple[Any, ...], dtype[float64]])
ci_lower (ndarray[tuple[Any, ...], dtype[float64]])
ci_upper (ndarray[tuple[Any, ...], dtype[float64]])
pvalues (ndarray[tuple[Any, ...], dtype[float64]])
z_scores (ndarray[tuple[Any, ...], dtype[float64]])
sigma_hat (float)
Theta (ndarray[tuple[Any, ...], dtype[float64]])
lambda_main (float)
lambda_nodewise (ndarray[tuple[Any, ...], dtype[float64]])
alpha (float)

coef_debiased¶

Desparsified estimator β̂_d.

Type:: ndarray (p,)

coef_lasso¶

The underlying penalized lasso fit β̂ (also original-scale).

Type:: ndarray (p,)

intercept_¶

Type:: float

se¶

Asymptotic standard error from the diagonal of σ̂² · Θ̂ Σ̂ Θ̂ᵀ / n.

Type:: ndarray (p,)

ci_lower, ci_upper

Two-sided (1 − alpha) CIs.

Type:: ndarray (p,)

pvalues¶

Two-sided Wald p-values for H_0: β_j = 0.

Type:: ndarray (p,)

z_scores¶

β̂_d / se.

Type:: ndarray (p,)

sigma_hat¶

Residual scale ‖y − ŷ‖ / sqrt(n − ‖β̂‖₀).

Type:: float

Theta¶

Approximate inverse Gram on the standardized scale.

Type:: ndarray (p, p)

lambda_main¶

Type:: float

lambda_nodewise¶

Type:: ndarray (p,)

alpha¶

CI level used.

Type:: float

class skein_glm.debiased.DebiasedLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶

Bases: BaseEstimator, RegressorMixin

Sklearn-style facade over debiased_lasso().

Exposes the debiased estimator as coef_ / intercept_ so the result composes with sklearn Pipeline and metric helpers. The VBR-specific attributes (se_, ci_lower_, ci_upper_, pvalues_, z_scores_, Theta_, sigma_hat_) live on the fitted estimator.

Parameters mirror debiased_lasso().

Examples

>>> from skein_glm import DebiasedLassoRegressor
>>> est = DebiasedLassoRegressor(random_state=0).fit(X, y)
>>> est.coef_, est.ci_lower_, est.ci_upper_, est.pvalues_

Parameters:

lambda_ (float | None)
lambda_nodewise (float | NDArray[np.float64] | None)
alpha (float)
fit_intercept (bool)
standardize (bool)
max_iter (int)
tol (float)
n_jobs (int | None)

set_score_request(*, sample_weight='$UNCHANGED$')¶

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
self (DebiasedLassoRegressor)

Returns:

self – The updated object.

Return type:

object

Logistic + Poisson¶

For canonical-link GLMs (binomial logit, Poisson log), the score is Xᵀ(y − μ̂) and the Fisher information uses working weights W = diag(μ̂(1−μ̂)) (binomial) or diag(μ̂) (Poisson). Θ̂ is built nodewise on the weighted design X̃ = W^{1/2} X — the same lasso primitive as the LS case, applied to a re-weighted matrix.

skein_glm.debiased.debiased_logistic_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶

VBR debiased lasso for binomial logistic regression.

Construction (Van de Geer 2014 §3): fit penalized logistic lasso to obtain β̂, compute μ̂_i = σ(η̂_i) and working weights W_ii = μ̂_i (1 − μ̂_i), then build the approximate inverse Fisher Θ̂ by nodewise lassos on X̃ = W^{1/2} X. The debiased estimator is β̂_d = β̂ + (1/n) · Θ̂ · Xᵀ (y − μ̂) with asymptotic variance diag(Θ̂ X̃ᵀX̃ Θ̂ᵀ) / n (no σ² — the noise is encoded in W).

Parameters mirror debiased_lasso(). The penalized fit is obtained from LogisticLassoRegressor, the new convex-L1 primitive — debiasing on top of the prior MCP(γ=1e9) approximation would have inherited its bias.

Notes

Working weights are floored at 1e-8 to avoid degenerate columns in X̃ when fitted probabilities saturate near 0 or 1. If you see suspiciously narrow CIs, inspect mu_fitted for boundary cases.

Parameters:

X (Any)
y (Any)
lambda_ (float | None)
lambda_nodewise (float | ndarray[tuple[Any, ...], dtype[float64]] | None)
alpha (float)
fit_intercept (bool)
standardize (bool)
max_iter (int)
tol (float)
n_jobs (int | None)

Return type:

DebiasedGLMResult

skein_glm.debiased.debiased_poisson_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, offset=None, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶

VBR debiased lasso for Poisson log-link regression.

Construction: fit penalized Poisson lasso to obtain β̂, compute μ̂_i = exp(η̂_i + offset_i) and working weights W_ii = μ̂_i, build Θ̂ nodewise on X̃ = W^{1/2} X, debias as in the logistic case.

Parameters mirror debiased_logistic_lasso() plus a Poisson offset (log-exposure) that participates in the linear predictor but is not an inferential target.

Notes

Poisson working weights can have very large dynamic range (μ̂ = exp(η̂)). If some μ̂_i are tiny, the corresponding rows of X̃ are heavily down-weighted and the asymptotic theory’s regularity conditions may be strained. The same 1e-8 floor as the logistic case applies — well-conditioned problems are unaffected.

Parameters:

X (Any)
y (Any)
lambda_ (float | None)
lambda_nodewise (float | ndarray[tuple[Any, ...], dtype[float64]] | None)
alpha (float)
fit_intercept (bool)
standardize (bool)
offset (ndarray[tuple[Any, ...], dtype[float64]] | None)
max_iter (int)
tol (float)
n_jobs (int | None)

Return type:

DebiasedGLMResult

class skein_glm.debiased.DebiasedGLMResult(coef_debiased, coef_glm, intercept_, se, ci_lower, ci_upper, pvalues, z_scores, mu_fitted, Theta, lambda_main, lambda_nodewise, alpha, family)[source]¶

Bases: object

Output of debiased_logistic_lasso() / debiased_poisson_lasso().

Same fields as DebiasedLassoResult minus sigma_hat (GLMs have no residual scale — the noise is encoded in the working weights W) and plus mu_fitted and family.

Parameters:

coef_debiased (ndarray[tuple[Any, ...], dtype[float64]])
coef_glm (ndarray[tuple[Any, ...], dtype[float64]])
intercept_ (float)
se (ndarray[tuple[Any, ...], dtype[float64]])
ci_lower (ndarray[tuple[Any, ...], dtype[float64]])
ci_upper (ndarray[tuple[Any, ...], dtype[float64]])
pvalues (ndarray[tuple[Any, ...], dtype[float64]])
z_scores (ndarray[tuple[Any, ...], dtype[float64]])
mu_fitted (ndarray[tuple[Any, ...], dtype[float64]])
Theta (ndarray[tuple[Any, ...], dtype[float64]])
lambda_main (float)
lambda_nodewise (ndarray[tuple[Any, ...], dtype[float64]])
alpha (float)
family (str)

coef_debiased¶

Type:: ndarray (p,)

coef_glm¶

Underlying penalized GLM fit on the original feature scale.

Type:: ndarray (p,)

intercept_¶

Intercept of the underlying fit; not an inferential target (VBR theory applies to slopes only).

Type:: float

se¶

sqrt(diag(Θ̂ X̃ᵀX̃ Θ̂ᵀ)) / n.

Type:: ndarray (p,)

ci_lower, ci_upper, pvalues, z_scores

Type:: ndarray (p,)

mu_fitted¶

μ̂ at the penalized fit. Useful for diagnostics.

Type:: ndarray (n,)

Theta¶

Approximate inverse Fisher on the standardized scale.

Type:: ndarray (p, p)

lambda_main¶

Type:: float

lambda_nodewise¶

Type:: ndarray (p,)

alpha¶

Type:: float

family¶

"binomial" or "poisson".

Type:: str

class skein_glm.debiased.DebiasedLogisticLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶

Bases: _DebiasedGLMRegressorBase

Sklearn-style facade over debiased_logistic_lasso().

Parameters:

lambda_ (float | None)
lambda_nodewise (float | NDArray[np.float64] | None)
alpha (float)
fit_intercept (bool)
standardize (bool)
max_iter (int)
tol (float)
n_jobs (int | None)

class skein_glm.debiased.DebiasedPoissonLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, offset=None, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶

Bases: _DebiasedGLMRegressorBase

Sklearn-style facade over debiased_poisson_lasso().

Parameters:

lambda_ (float | None)
lambda_nodewise (float | NDArray[np.float64] | None)
alpha (float)
fit_intercept (bool)
standardize (bool)
offset (NDArray[np.float64] | None)
max_iter (int)
tol (float)
n_jobs (int | None)

predict(X)[source]¶

μ̂ = exp(η̂) — matches PoissonLassoRegressor’s convention.

Parameters:: X (Any)
Return type:: ndarray[tuple[Any, …], dtype[float64]]

Tuning¶

lambda_ (main fit): default theoretical scale √(2 log p / n) on standardized features. For best inference quality, pass a CV-tuned λ from a prior ElasticNetPathCV (LS) or LogisticLassoPathCV / PoissonLassoPathCV (GLM) fit.
lambda_nodewise: per-column λ for the nodewise lassos that build Θ̂. Scalar or array of length p. Default also √(2 log p / n) uniformly.
alpha (CI level): default 0.05 (95% intervals).
n_jobs: joblib parallelism over the p nodewise lassos. Materially faster for p ≳ 50 since each nodewise lasso releases the GIL during compute.
standardize (default True): the dimensionless lambda_nodewise default is calibrated to unit-variance columns. Turning standardize off without supplying a column-specific lambda_nodewise usually gives the wrong scale on the CIs.

Diagnostics¶

DebiasedLassoResult / DebiasedGLMResult expose the underlying penalized fit (coef_lasso / coef_glm), Theta for inspection or reuse, lambda_main and lambda_nodewise, and (for GLMs) mu_fitted. The GLM working weights are floored at 1e-8 to prevent degenerate columns in X̃ near boundary fitted probabilities — inspect mu_fitted if CIs look implausibly narrow.

Example¶

import numpy as np
from skein_glm import DebiasedLogisticLassoRegressor

rng = np.random.default_rng(0)
n, p = 200, 30
X = rng.standard_normal((n, p))
beta = np.zeros(p); beta[:3] = [1.5, -1.0, 0.8]
y = (rng.uniform(size=n) < 1.0 / (1.0 + np.exp(-X @ beta))).astype(float)

est = DebiasedLogisticLassoRegressor(n_jobs=-1, random_state=0).fit(X, y)
significant = np.where(est.pvalues_ < 0.05)[0]
print("significant features:", significant)
print("CI for β_0:", (est.ci_lower_[0], est.ci_upper_[0]))

References¶

Van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42(3): 1166–1202.
Zhang, C.-H., Zhang, S. (2014). Confidence intervals for low- dimensional parameters in high-dimensional linear models. JRSS B 76(1): 217–242.
Reid, S., Tibshirani, R., Friedman, J. (2016). A study of error variance estimation in lasso regression. Statistica Sinica 26: 35–67.