Debiased / desparsified lasso

Confidence intervals and Wald p-values for high-dimensional penalized fits. Van de Geer–Bühlmann–Ritov (VBR 2014) for least squares; the GLM extension uses the canonical-link score plus the Fisher information as the surrogate Gram.

The penalized lasso estimator β̂ is biased toward zero — the penalty trades variance for bias. The debiased estimator corrects the bias by walking back along an approximate score direction:

\[ \hat\beta_d \;=\; \hat\beta \;+\; \frac{1}{n}\,\hat\Theta\,X^\top\!\bigl(y - \hat\mu(\hat\beta)\bigr) \]

with Θ̂ Σ⁻¹ (LS) or Θ̂ J⁻¹ (GLM, with J = (1/n) Xᵀ W X the empirical Fisher), built nodewise — one lasso per column. Under sparsity + restricted-eigenvalue regularity:

\[ \sqrt{n}\,(\hat\beta_d - \beta) \;\rightsquigarrow\; \mathcal N\!\bigl(0,\;\sigma^2 \hat\Theta\hat\Sigma\hat\Theta^\top\bigr) \]

for LS (no σ² for GLMs — the noise is in W). Diagonal of that matrix gives the per-coordinate standard error, and the rest is a two-sided Wald test.

Why

glmnet / ncvreg / grpreg ship the penalized fit but no inference. The canonical R interface for this is hdi::lasso.proj; skein matches that semantics — free function on (X, y) returning a result dataclass, plus a thin sklearn-style estimator wrapper for pipeline use.

LS

skein_glm.debiased.debiased_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]

Van de Geer–Bühlmann–Ritov debiased lasso for least squares.

Parameters:
  • X (array-like (n, p))

  • y (array-like (n,))

  • lambda (float or None, default None) – Regularisation for the main lasso fit. If None, the theoretical scale √(2 log p / n) is used on standardized features. For best inference quality, pass a CV-tuned λ from a prior ElasticNetPathCV fit.

  • lambda_nodewise (float, array-like (p,), or None, default None) – Per-column λ for the nodewise lassos that build Θ̂. Scalar broadcasts to every column; an array selects per column. None uses √(2 log p / n) uniformly (the VBR theoretical choice on standardized columns).

  • alpha (float, default 0.05) – Two-sided CI / p-value level.

  • fit_intercept (bool, default True) – Center y and X; intercept is recovered post-fit. The intercept itself is not an inferential target — VBR theory applies to the slopes.

  • standardize (bool, default True) – Scale columns of X to unit variance before fitting. The dimensionless lambda_nodewise default is calibrated to standardized columns; turning this off without supplying column-specific lambda_nodewise typically gives wrong scale on the CIs.

  • max_iter (int, float) – Forwarded to every underlying lasso fit.

  • tol (int, float) – Forwarded to every underlying lasso fit.

  • n_jobs (int or None) – joblib parallelism over the p nodewise lassos. -1 uses all cores. The main lasso is single-threaded; the nodewise loop dominates for p ≳ 50.

  • lambda_ (float | None)

Return type:

DebiasedLassoResult

Notes

Memory is O(p²) for Θ̂ plus O(n p) for the standardized design — practical up to p ~ 5000 on a modern laptop. For larger p, the nodewise loop runtime (p lasso fits) is the binding constraint rather than memory.

The asymptotic distribution requires (a) lasso consistency at rate ‖β̂ − β‖₁ = o(1/√(log p)), (b) row-wise sparsity of the true Σ⁻¹, (c) sub-Gaussian design and noise. Failures show up as poor coverage in the empirical CI simulation tests.

class skein_glm.debiased.DebiasedLassoResult(coef_debiased, coef_lasso, intercept_, se, ci_lower, ci_upper, pvalues, z_scores, sigma_hat, Theta, lambda_main, lambda_nodewise, alpha)[source]

Bases: object

Output of debiased_lasso().

All coefficient arrays are returned on the original (un- standardized) feature scale. Theta is on the standardized scale used internally (where the nodewise lasso defaults are dimensionless); it is exposed for inspection and reuse.

Parameters:
coef_debiased

Desparsified estimator β̂_d.

Type:

ndarray (p,)

coef_lasso

The underlying penalized lasso fit β̂ (also original-scale).

Type:

ndarray (p,)

intercept_
Type:

float

se

Asymptotic standard error from the diagonal of σ̂² · Θ̂ Σ̂ Θ̂ᵀ / n.

Type:

ndarray (p,)

ci_lower, ci_upper

Two-sided (1 − alpha) CIs.

Type:

ndarray (p,)

pvalues

Two-sided Wald p-values for H_0: β_j = 0.

Type:

ndarray (p,)

z_scores

β̂_d / se.

Type:

ndarray (p,)

sigma_hat

Residual scale ‖y − ŷ‖ / sqrt(n − ‖β̂‖₀).

Type:

float

Theta

Approximate inverse Gram on the standardized scale.

Type:

ndarray (p, p)

lambda_main
Type:

float

lambda_nodewise
Type:

ndarray (p,)

alpha

CI level used.

Type:

float

class skein_glm.debiased.DebiasedLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]

Bases: BaseEstimator, RegressorMixin

Sklearn-style facade over debiased_lasso().

Exposes the debiased estimator as coef_ / intercept_ so the result composes with sklearn Pipeline and metric helpers. The VBR-specific attributes (se_, ci_lower_, ci_upper_, pvalues_, z_scores_, Theta_, sigma_hat_) live on the fitted estimator.

Parameters mirror debiased_lasso().

Examples

>>> from skein_glm import DebiasedLassoRegressor
>>> est = DebiasedLassoRegressor(random_state=0).fit(X, y)
>>> est.coef_, est.ci_lower_, est.ci_upper_, est.pvalues_
Parameters:
  • lambda_ (float | None)

  • lambda_nodewise (float | NDArray[np.float64] | None)

  • alpha (float)

  • fit_intercept (bool)

  • standardize (bool)

  • max_iter (int)

  • tol (float)

  • n_jobs (int | None)

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
Returns:

self – The updated object.

Return type:

object

Logistic + Poisson

For canonical-link GLMs (binomial logit, Poisson log), the score is Xᵀ(y μ̂) and the Fisher information uses working weights W = diag(μ̂(1−μ̂)) (binomial) or diag(μ̂) (Poisson). Θ̂ is built nodewise on the weighted design = W^{1/2} X — the same lasso primitive as the LS case, applied to a re-weighted matrix.

skein_glm.debiased.debiased_logistic_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]

VBR debiased lasso for binomial logistic regression.

Construction (Van de Geer 2014 §3): fit penalized logistic lasso to obtain β̂, compute μ̂_i = σ(η̂_i) and working weights W_ii = μ̂_i (1 μ̂_i), then build the approximate inverse Fisher Θ̂ by nodewise lassos on = W^{1/2} X. The debiased estimator is β̂_d = β̂ + (1/n) · Θ̂ · Xᵀ (y μ̂) with asymptotic variance diag(Θ̂ X̃ᵀX̃ Θ̂ᵀ) / n (no σ² — the noise is encoded in W).

Parameters mirror debiased_lasso(). The penalized fit is obtained from LogisticLassoRegressor, the new convex-L1 primitive — debiasing on top of the prior MCP(γ=1e9) approximation would have inherited its bias.

Notes

Working weights are floored at 1e-8 to avoid degenerate columns in when fitted probabilities saturate near 0 or 1. If you see suspiciously narrow CIs, inspect mu_fitted for boundary cases.

Parameters:
Return type:

DebiasedGLMResult

skein_glm.debiased.debiased_poisson_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, offset=None, max_iter=1000, tol=1e-07, n_jobs=None)[source]

VBR debiased lasso for Poisson log-link regression.

Construction: fit penalized Poisson lasso to obtain β̂, compute μ̂_i = exp(η̂_i + offset_i) and working weights W_ii = μ̂_i, build Θ̂ nodewise on = W^{1/2} X, debias as in the logistic case.

Parameters mirror debiased_logistic_lasso() plus a Poisson offset (log-exposure) that participates in the linear predictor but is not an inferential target.

Notes

Poisson working weights can have very large dynamic range (μ̂ = exp(η̂)). If some μ̂_i are tiny, the corresponding rows of are heavily down-weighted and the asymptotic theory’s regularity conditions may be strained. The same 1e-8 floor as the logistic case applies — well-conditioned problems are unaffected.

Parameters:
Return type:

DebiasedGLMResult

class skein_glm.debiased.DebiasedGLMResult(coef_debiased, coef_glm, intercept_, se, ci_lower, ci_upper, pvalues, z_scores, mu_fitted, Theta, lambda_main, lambda_nodewise, alpha, family)[source]

Bases: object

Output of debiased_logistic_lasso() / debiased_poisson_lasso().

Same fields as DebiasedLassoResult minus sigma_hat (GLMs have no residual scale — the noise is encoded in the working weights W) and plus mu_fitted and family.

Parameters:
coef_debiased
Type:

ndarray (p,)

coef_glm

Underlying penalized GLM fit on the original feature scale.

Type:

ndarray (p,)

intercept_

Intercept of the underlying fit; not an inferential target (VBR theory applies to slopes only).

Type:

float

se

sqrt(diag(Θ̂ X̃ᵀX̃ Θ̂ᵀ)) / n.

Type:

ndarray (p,)

ci_lower, ci_upper, pvalues, z_scores
Type:

ndarray (p,)

mu_fitted

μ̂ at the penalized fit. Useful for diagnostics.

Type:

ndarray (n,)

Theta

Approximate inverse Fisher on the standardized scale.

Type:

ndarray (p, p)

lambda_main
Type:

float

lambda_nodewise
Type:

ndarray (p,)

alpha
Type:

float

family

"binomial" or "poisson".

Type:

str

class skein_glm.debiased.DebiasedLogisticLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]

Bases: _DebiasedGLMRegressorBase

Sklearn-style facade over debiased_logistic_lasso().

Parameters:
  • lambda_ (float | None)

  • lambda_nodewise (float | NDArray[np.float64] | None)

  • alpha (float)

  • fit_intercept (bool)

  • standardize (bool)

  • max_iter (int)

  • tol (float)

  • n_jobs (int | None)

class skein_glm.debiased.DebiasedPoissonLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, offset=None, max_iter=1000, tol=1e-07, n_jobs=None)[source]

Bases: _DebiasedGLMRegressorBase

Sklearn-style facade over debiased_poisson_lasso().

Parameters:
  • lambda_ (float | None)

  • lambda_nodewise (float | NDArray[np.float64] | None)

  • alpha (float)

  • fit_intercept (bool)

  • standardize (bool)

  • offset (NDArray[np.float64] | None)

  • max_iter (int)

  • tol (float)

  • n_jobs (int | None)

predict(X)[source]

μ̂ = exp(η̂) — matches PoissonLassoRegressor’s convention.

Parameters:

X (Any)

Return type:

ndarray[tuple[Any, …], dtype[float64]]

Tuning

  • lambda_ (main fit): default theoretical scale √(2 log p / n) on standardized features. For best inference quality, pass a CV-tuned λ from a prior ElasticNetPathCV (LS) or LogisticLassoPathCV / PoissonLassoPathCV (GLM) fit.

  • lambda_nodewise: per-column λ for the nodewise lassos that build Θ̂. Scalar or array of length p. Default also √(2 log p / n) uniformly.

  • alpha (CI level): default 0.05 (95% intervals).

  • n_jobs: joblib parallelism over the p nodewise lassos. Materially faster for p 50 since each nodewise lasso releases the GIL during compute.

  • standardize (default True): the dimensionless lambda_nodewise default is calibrated to unit-variance columns. Turning standardize off without supplying a column-specific lambda_nodewise usually gives the wrong scale on the CIs.

Diagnostics

DebiasedLassoResult / DebiasedGLMResult expose the underlying penalized fit (coef_lasso / coef_glm), Theta for inspection or reuse, lambda_main and lambda_nodewise, and (for GLMs) mu_fitted. The GLM working weights are floored at 1e-8 to prevent degenerate columns in near boundary fitted probabilities — inspect mu_fitted if CIs look implausibly narrow.

Example

import numpy as np
from skein_glm import DebiasedLogisticLassoRegressor

rng = np.random.default_rng(0)
n, p = 200, 30
X = rng.standard_normal((n, p))
beta = np.zeros(p); beta[:3] = [1.5, -1.0, 0.8]
y = (rng.uniform(size=n) < 1.0 / (1.0 + np.exp(-X @ beta))).astype(float)

est = DebiasedLogisticLassoRegressor(n_jobs=-1, random_state=0).fit(X, y)
significant = np.where(est.pvalues_ < 0.05)[0]
print("significant features:", significant)
print("CI for β_0:", (est.ci_lower_[0], est.ci_upper_[0]))

References

  • Van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42(3): 1166–1202.

  • Zhang, C.-H., Zhang, S. (2014). Confidence intervals for low- dimensional parameters in high-dimensional linear models. JRSS B 76(1): 217–242.

  • Reid, S., Tibshirani, R., Friedman, J. (2016). A study of error variance estimation in lasso regression. Statistica Sinica 26: 35–67.