Debiased / desparsified lasso¶
Confidence intervals and Wald p-values for high-dimensional penalized fits. Van de Geer–Bühlmann–Ritov (VBR 2014) for least squares; the GLM extension uses the canonical-link score plus the Fisher information as the surrogate Gram.
The penalized lasso estimator β̂ is biased toward zero — the
penalty trades variance for bias. The debiased estimator corrects
the bias by walking back along an approximate score direction:
with Θ̂ ≈ Σ⁻¹ (LS) or Θ̂ ≈ J⁻¹ (GLM, with J = (1/n) Xᵀ W X the
empirical Fisher), built nodewise — one lasso per column. Under
sparsity + restricted-eigenvalue regularity:
for LS (no σ² for GLMs — the noise is in W). Diagonal of that
matrix gives the per-coordinate standard error, and the rest is a
two-sided Wald test.
Why¶
glmnet / ncvreg / grpreg ship the penalized fit but no
inference. The canonical R interface for this is
hdi::lasso.proj; skein matches that semantics — free function on
(X, y) returning a result dataclass, plus a thin sklearn-style
estimator wrapper for pipeline use.
LS¶
- skein_glm.debiased.debiased_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶
Van de Geer–Bühlmann–Ritov debiased lasso for least squares.
- Parameters:
X (
array-like (n,p))y (
array-like (n,))lambda (
floatorNone, defaultNone) – Regularisation for the main lasso fit. IfNone, the theoretical scale √(2 log p / n) is used on standardized features. For best inference quality, pass a CV-tuned λ from a priorElasticNetPathCVfit.lambda_nodewise (
float,array-like (p,), orNone, defaultNone) – Per-column λ for the nodewise lassos that build Θ̂. Scalar broadcasts to every column; an array selects per column.Noneuses √(2 log p / n) uniformly (the VBR theoretical choice on standardized columns).alpha (
float, default0.05) – Two-sided CI / p-value level.fit_intercept (
bool, defaultTrue) – Center y and X; intercept is recovered post-fit. The intercept itself is not an inferential target — VBR theory applies to the slopes.standardize (
bool, defaultTrue) – Scale columns of X to unit variance before fitting. The dimensionless lambda_nodewise default is calibrated to standardized columns; turning this off without supplying column-specific lambda_nodewise typically gives wrong scale on the CIs.max_iter (
int,float) – Forwarded to every underlying lasso fit.n_jobs (
intorNone) – joblib parallelism over the p nodewise lassos.-1uses all cores. The main lasso is single-threaded; the nodewise loop dominates for p ≳ 50.lambda_ (float | None)
- Return type:
Notes
Memory is O(p²) for Θ̂ plus O(n p) for the standardized design — practical up to p ~ 5000 on a modern laptop. For larger p, the nodewise loop runtime (p lasso fits) is the binding constraint rather than memory.
The asymptotic distribution requires (a) lasso consistency at rate ‖β̂ − β‖₁ = o(1/√(log p)), (b) row-wise sparsity of the true Σ⁻¹, (c) sub-Gaussian design and noise. Failures show up as poor coverage in the empirical CI simulation tests.
- class skein_glm.debiased.DebiasedLassoResult(coef_debiased, coef_lasso, intercept_, se, ci_lower, ci_upper, pvalues, z_scores, sigma_hat, Theta, lambda_main, lambda_nodewise, alpha)[source]¶
Bases:
objectOutput of
debiased_lasso().All coefficient arrays are returned on the original (un- standardized) feature scale.
Thetais on the standardized scale used internally (where the nodewise lasso defaults are dimensionless); it is exposed for inspection and reuse.- coef_debiased¶
Desparsified estimator β̂_d.
- Type:
ndarray (p,)
- coef_lasso¶
The underlying penalized lasso fit β̂ (also original-scale).
- Type:
ndarray (p,)
- se¶
Asymptotic standard error from the diagonal of σ̂² · Θ̂ Σ̂ Θ̂ᵀ / n.
- Type:
ndarray (p,)
- ci_lower, ci_upper
Two-sided (1 − alpha) CIs.
- Type:
ndarray (p,)
- pvalues¶
Two-sided Wald p-values for H_0: β_j = 0.
- Type:
ndarray (p,)
- z_scores¶
β̂_d / se.
- Type:
ndarray (p,)
- Theta¶
Approximate inverse Gram on the standardized scale.
- Type:
ndarray (p,p)
- lambda_nodewise¶
- Type:
ndarray (p,)
- class skein_glm.debiased.DebiasedLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶
Bases:
BaseEstimator,RegressorMixinSklearn-style facade over
debiased_lasso().Exposes the debiased estimator as
coef_/intercept_so the result composes with sklearn Pipeline and metric helpers. The VBR-specific attributes (se_,ci_lower_,ci_upper_,pvalues_,z_scores_,Theta_,sigma_hat_) live on the fitted estimator.Parameters mirror
debiased_lasso().Examples
>>> from skein_glm import DebiasedLassoRegressor >>> est = DebiasedLassoRegressor(random_state=0).fit(X, y) >>> est.coef_, est.ci_lower_, est.ci_upper_, est.pvalues_
- Parameters:
- set_score_request(*, sample_weight='$UNCHANGED$')¶
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Logistic + Poisson¶
For canonical-link GLMs (binomial logit, Poisson log), the score is
Xᵀ(y − μ̂) and the Fisher information uses working weights
W = diag(μ̂(1−μ̂)) (binomial) or diag(μ̂) (Poisson). Θ̂ is
built nodewise on the weighted design X̃ = W^{1/2} X — the
same lasso primitive as the LS case, applied to a re-weighted matrix.
- skein_glm.debiased.debiased_logistic_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶
VBR debiased lasso for binomial logistic regression.
Construction (Van de Geer 2014 §3): fit penalized logistic lasso to obtain
β̂, computeμ̂_i = σ(η̂_i)and working weightsW_ii = μ̂_i (1 − μ̂_i), then build the approximate inverse FisherΘ̂by nodewise lassos onX̃ = W^{1/2} X. The debiased estimator isβ̂_d = β̂ + (1/n) · Θ̂ · Xᵀ (y − μ̂)with asymptotic variancediag(Θ̂ X̃ᵀX̃ Θ̂ᵀ) / n(no σ² — the noise is encoded in W).Parameters mirror
debiased_lasso(). The penalized fit is obtained fromLogisticLassoRegressor, the new convex-L1 primitive — debiasing on top of the priorMCP(γ=1e9)approximation would have inherited its bias.Notes
Working weights are floored at
1e-8to avoid degenerate columns in X̃ when fitted probabilities saturate near 0 or 1. If you see suspiciously narrow CIs, inspectmu_fittedfor boundary cases.
- skein_glm.debiased.debiased_poisson_lasso(X, y, *, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, offset=None, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶
VBR debiased lasso for Poisson log-link regression.
Construction: fit penalized Poisson lasso to obtain
β̂, computeμ̂_i = exp(η̂_i + offset_i)and working weightsW_ii = μ̂_i, buildΘ̂nodewise onX̃ = W^{1/2} X, debias as in the logistic case.Parameters mirror
debiased_logistic_lasso()plus a Poissonoffset(log-exposure) that participates in the linear predictor but is not an inferential target.Notes
Poisson working weights can have very large dynamic range (μ̂ = exp(η̂)). If some μ̂_i are tiny, the corresponding rows of X̃ are heavily down-weighted and the asymptotic theory’s regularity conditions may be strained. The same 1e-8 floor as the logistic case applies — well-conditioned problems are unaffected.
- Parameters:
- Return type:
- class skein_glm.debiased.DebiasedGLMResult(coef_debiased, coef_glm, intercept_, se, ci_lower, ci_upper, pvalues, z_scores, mu_fitted, Theta, lambda_main, lambda_nodewise, alpha, family)[source]¶
Bases:
objectOutput of
debiased_logistic_lasso()/debiased_poisson_lasso().Same fields as
DebiasedLassoResultminussigma_hat(GLMs have no residual scale — the noise is encoded in the working weights W) and plusmu_fittedandfamily.- coef_debiased¶
- Type:
ndarray (p,)
- coef_glm¶
Underlying penalized GLM fit on the original feature scale.
- Type:
ndarray (p,)
- intercept_¶
Intercept of the underlying fit; not an inferential target (VBR theory applies to slopes only).
- Type:
- se¶
sqrt(diag(Θ̂ X̃ᵀX̃ Θ̂ᵀ)) / n.
- Type:
ndarray (p,)
- ci_lower, ci_upper, pvalues, z_scores
- Type:
ndarray (p,)
- mu_fitted¶
μ̂ at the penalized fit. Useful for diagnostics.
- Type:
ndarray (n,)
- Theta¶
Approximate inverse Fisher on the standardized scale.
- Type:
ndarray (p,p)
- lambda_nodewise¶
- Type:
ndarray (p,)
- class skein_glm.debiased.DebiasedLogisticLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶
Bases:
_DebiasedGLMRegressorBaseSklearn-style facade over
debiased_logistic_lasso().
- class skein_glm.debiased.DebiasedPoissonLassoRegressor(*, lambda_=None, lambda_nodewise=None, alpha=0.05, fit_intercept=True, standardize=True, offset=None, max_iter=1000, tol=1e-07, n_jobs=None)[source]¶
Bases:
_DebiasedGLMRegressorBaseSklearn-style facade over
debiased_poisson_lasso().- Parameters:
Tuning¶
lambda_(main fit): default theoretical scale√(2 log p / n)on standardized features. For best inference quality, pass a CV-tuned λ from a priorElasticNetPathCV(LS) orLogisticLassoPathCV/PoissonLassoPathCV(GLM) fit.lambda_nodewise: per-column λ for the nodewise lassos that buildΘ̂. Scalar or array of lengthp. Default also√(2 log p / n)uniformly.alpha(CI level): default 0.05 (95% intervals).n_jobs: joblib parallelism over thepnodewise lassos. Materially faster forp ≳ 50since each nodewise lasso releases the GIL during compute.standardize(default True): the dimensionlesslambda_nodewisedefault is calibrated to unit-variance columns. Turning standardize off without supplying a column-specificlambda_nodewiseusually gives the wrong scale on the CIs.
Diagnostics¶
DebiasedLassoResult / DebiasedGLMResult expose the underlying
penalized fit (coef_lasso / coef_glm), Theta for inspection or
reuse, lambda_main and lambda_nodewise, and (for GLMs)
mu_fitted. The GLM working weights are floored at 1e-8 to
prevent degenerate columns in X̃ near boundary fitted
probabilities — inspect mu_fitted if CIs look implausibly narrow.
Example¶
import numpy as np
from skein_glm import DebiasedLogisticLassoRegressor
rng = np.random.default_rng(0)
n, p = 200, 30
X = rng.standard_normal((n, p))
beta = np.zeros(p); beta[:3] = [1.5, -1.0, 0.8]
y = (rng.uniform(size=n) < 1.0 / (1.0 + np.exp(-X @ beta))).astype(float)
est = DebiasedLogisticLassoRegressor(n_jobs=-1, random_state=0).fit(X, y)
significant = np.where(est.pvalues_ < 0.05)[0]
print("significant features:", significant)
print("CI for β_0:", (est.ci_lower_[0], est.ci_upper_[0]))
References¶
Van de Geer, S., Bühlmann, P., Ritov, Y., Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Annals of Statistics 42(3): 1166–1202.
Zhang, C.-H., Zhang, S. (2014). Confidence intervals for low- dimensional parameters in high-dimensional linear models. JRSS B 76(1): 217–242.
Reid, S., Tibshirani, R., Friedman, J. (2016). A study of error variance estimation in lasso regression. Statistica Sinica 26: 35–67.