Porting from glmnet¶
glmnet (Friedman, Hastie & Tibshirani) is the dominant R package
for L1 / elastic-net regularized GLMs. If you’re moving from R to
Python and want to keep your cv.glmnet-based workflow, this page
maps glmnet’s API onto skein.
skein ships native elastic net (ElasticNet*Regressor, matching
glmnet’s alpha ∈ [0, 1] exactly) and a nonconvex MCP/SCAD path. We
generally recommend MCP at γ=3 over lasso for less-biased estimates
of truly active features; if you specifically want lasso, either
ElasticNetRegressor(alpha=1.0) or MCPRegressor(gamma=1e6) works
(the former is exact; the latter is numerically indistinguishable).
The three top-line translations¶
|
|
|---|---|
|
|
|
|
|
|
In R you write:
library(glmnet)
fit <- cv.glmnet(x, y, family = "gaussian", nfolds = 10)
beta_hat <- as.numeric(coef(fit, s = "lambda.min"))[-1] # drop intercept
alpha_hat <- as.numeric(coef(fit, s = "lambda.min"))[1]
In Python:
import skein_glm
cv_fit = skein_glm.MCPPathCV(gamma=1e6, cv=10).fit(X, y)
beta_hat = cv_fit.coef_
alpha_hat = cv_fit.intercept_
Family map¶
|
|
Notes |
|---|---|---|
|
|
Default. LS datafit. |
|
|
Two-class only. v0.1 doesn’t support multinomial. |
|
|
Log link. y ≥ 0 required. |
|
|
Right-censored survival, fit signature is |
|
(not in v0.1) |
M3.6 roadmap. Use one-vs-rest manually for now. |
|
(not in v0.1) |
M7 multi-task roadmap. |
Per-argument translation¶
Most-used arguments¶
|
|
Notes |
|---|---|---|
|
|
numpy array, scipy.sparse, MmapDesignF64/32, ChunkedDesignF64/32. |
|
|
For Cox: |
|
(choose estimator class) |
See family map above. |
|
|
|
|
|
numpy array. Pass |
|
|
Default 100 (matches glmnet). |
|
|
Default 1e-3 if |
|
|
Per-sample weights. Identical semantics. |
|
|
Per-feature penalty weights. |
|
|
Default |
|
|
Default |
|
|
Default |
|
|
Default |
|
|
Pass an int or any sklearn CV splitter. |
|
(auto-selected by family) |
Family-appropriate metric. See “type.measure” below. |
Defaults that differ¶
Two glmnet defaults that bite people moving to skein:
standardize: glmnet defaults toTRUE, skein defaults toFalse. If your features have heterogeneous scales, passstandardize=Trueexplicitly.thresh: glmnet defaults to1e-7, skein defaults to1e-6. Tighten withtol=1e-8or smaller for numerically delicate problems (e.g. matching reference fits exactly).
type.measure map¶
glmnet’s type.measure selects the CV scoring metric. In skein, the
metric is auto-selected by the GLM family, but you can override via
the *PathCV mixin’s scorer attribute (advanced; see API ref).
|
|
|---|---|
|
mean squared error (lower-better) |
|
binomial deviance (lower-better) |
|
(not default; can override) |
|
(not default; can override) |
|
Poisson deviance (lower-better) |
|
Harrell’s c-index (higher-better) — actually skein uses concordance, not deviance, by default. Note this difference: |
Workflow translations¶
Basic CV fit and predict¶
# R
library(glmnet)
fit <- cv.glmnet(x, y, family = "gaussian")
y_hat <- predict(fit, newx = x_new, s = "lambda.min")
beta <- coef(fit, s = "lambda.min")
# Python
import skein_glm
fit = skein_glm.MCPPathCV(gamma=1e6, cv=10).fit(X, y)
y_hat = fit.predict(X_new)
beta = fit.coef_
intercept = fit.intercept_
Logistic with class weights¶
# R
fit <- cv.glmnet(x, y, family = "binomial", weights = w)
prob <- predict(fit, newx = x_new, type = "response", s = "lambda.min")
# Python
fit = skein_glm.LogisticMCPPathCV(gamma=1e6, cv=10).fit(X, y, sample_weight=w)
# v0.1: LogisticMCPPathCV picks lambda_best_ at fit time and refits;
# `fit.predict_proba(X_new)` returns a 1D probability vector.
prob = fit.predict_proba(X_new)
labels = fit.predict(X_new)
For path inspection (every λ at once), use the *PathRegressor
instead of *PathCV:
path = skein_glm.LogisticMCPPathRegressor(gamma=1e6, n_lambdas=50).fit(X, y)
prob_path = path.predict_proba(X_new) # shape (n_new, n_lambdas)
Cox PH¶
# R
fit <- cv.glmnet(x, Surv(time, event), family = "cox")
risk <- predict(fit, newx = x_new, s = "lambda.min")
# Python
fit = skein_glm.CoxMCPPathCV(gamma=1e6, cv=10).fit(X, time, event)
risk = fit.predict(X_new) # the prognostic index η = Xβ
skein’s Cox uses Breslow ties by default; glmnet uses Efron. For moderate tie rates the difference is negligible. Efron is on the M3.7 roadmap.
Adaptive lasso (two-stage)¶
glmnet doesn’t have a built-in adaptive lasso, but the canonical
recipe is straightforward:
# R
fit_init <- cv.glmnet(x, y, family = "gaussian")
beta_init <- as.numeric(coef(fit_init, s = "lambda.min"))[-1]
penalty <- 1 / (abs(beta_init) + 1e-3)
fit_adaptive <- cv.glmnet(x, y, family = "gaussian", penalty.factor = penalty)
# Python — same two-stage recipe.
import numpy as np
# Stage 1: coarse fit.
init = skein_glm.MCPPathCV(gamma=1e6).fit(X, y)
beta_init = init.coef_
weights = 1.0 / (np.abs(beta_init) + 1e-3)
# Stage 2: refit with adaptive weights.
adaptive = skein_glm.MCPPathCV(gamma=1e6, weights=weights).fit(X, y)
The M5.x roadmap promotes this two-stage idiom to a one-shot
AdaptiveLasso / AdaptiveMCP estimator.
Sparse design matrices¶
glmnet accepts Matrix::sparseMatrix (CSC) seamlessly; skein does
the same with scipy.sparse.csc_matrix:
# R
library(Matrix)
x_sp <- as(x, "CsparseMatrix")
fit <- cv.glmnet(x_sp, y, family = "binomial")
# Python
import scipy.sparse as sp
X_sp = sp.csc_matrix(X)
fit = skein_glm.LogisticMCPPathCV(gamma=1e6).fit(X_sp, y)
CSR inputs are converted to CSC at the boundary in skein. Group and sparse-group penalties also accept sparse inputs.
Things glmnet does that skein doesn’t yet¶
Multinomial (
family = "multinomial"): M3.6 roadmap. Use one-vs-rest manually for now.Multi-response Gaussian (
family = "mgaussian"): M7 multi-task roadmap.relax = TRUE(relaxed lasso): not in v0.1.Offset terms: not in v0.1 (M3.7 roadmap).
Things skein does that glmnet doesn’t¶
Nonconvex penalties (MCP / SCAD with γ < ∞): nearly unbiased estimates of truly active features.
glmnetis L1 / L2 only.Group MCP via native closed-form prox (Breheny & Huang 2015 §3) on LS and all GLM families (M13.4b + M13.4c), plus group SCAD via LLA, both with Rayon-parallel block CD.
grpreghas the penalties but is single-threaded R.Three weight axes (per-sample, per-feature, per-group) composable on every estimator.
glmnetdoes per-sample + per-feature; per-group is awkward.Memory-mapped and chunked design matrices out of the box — fits problems too big to load into RAM.
Sparse-group penalties convex and nonconvex.
The dual extension surface (Python ABCs mirroring Rust traits) for prototyping custom penalties / datafits.
See also¶
Porting from ncvreg — for users coming from the MCP / SCAD-focused R package.
Porting from grpreg — for users coming from the group-penalty R package.
Concepts: Penalties — when to use MCP vs SCAD vs group vs sparse-group.