Porting from glmnet¶

glmnet (Friedman, Hastie & Tibshirani) is the dominant R package for L1 / elastic-net regularized GLMs. If you’re moving from R to Python and want to keep your cv.glmnet-based workflow, this page maps glmnet’s API onto skein.

skein ships native elastic net (ElasticNet*Regressor, matching glmnet’s alpha ∈ [0, 1] exactly) and a nonconvex MCP/SCAD path. We generally recommend MCP at γ=3 over lasso for less-biased estimates of truly active features; if you specifically want lasso, either ElasticNetRegressor(alpha=1.0) or MCPRegressor(gamma=1e6) works (the former is exact; the latter is numerically indistinguishable).

The three top-line translations¶

`glmnet` ®	`skein` (Python)
`glmnet(x, y, family = "gaussian")`	`MCPPathRegressor(gamma=1e6).fit(X, y)`
`cv.glmnet(x, y, family = "gaussian")`	`MCPPathCV(gamma=1e6, cv=10).fit(X, y)`
`coef(cv_fit, s = "lambda.min")`	`cv_fit.coef_`, `cv_fit.intercept_`

In R you write:

library(glmnet)
fit <- cv.glmnet(x, y, family = "gaussian", nfolds = 10)
beta_hat <- as.numeric(coef(fit, s = "lambda.min"))[-1]   # drop intercept
alpha_hat <- as.numeric(coef(fit, s = "lambda.min"))[1]

In Python:

import skein_glm
cv_fit = skein_glm.MCPPathCV(gamma=1e6, cv=10).fit(X, y)
beta_hat = cv_fit.coef_
alpha_hat = cv_fit.intercept_

Family map¶

`glmnet`	`skein` estimator base	Notes
`"gaussian"`	`MCP` / `SCAD`	Default. LS datafit.
`"binomial"`	`LogisticMCP` / `LogisticSCAD`	Two-class only. v0.1 doesn’t support multinomial.
`"poisson"`	`PoissonMCP` / `PoissonSCAD`	Log link. y ≥ 0 required.
`"cox"`	`CoxMCP` / `CoxSCAD`	Right-censored survival, fit signature is `fit(X, time, event)`.
`"multinomial"`	(not in v0.1)	M3.6 roadmap. Use one-vs-rest manually for now.
`"mgaussian"`	(not in v0.1)	M7 multi-task roadmap.

Per-argument translation¶

Most-used arguments¶

`glmnet` arg	`skein` arg	Notes
`x`	`X` (positional)	numpy array, scipy.sparse, MmapDesignF64/32, ChunkedDesignF64/32.
`y`	`y` (positional)	For Cox: `fit(X, time, event)`.
`family`	(choose estimator class)	See family map above.
`alpha`	`alpha` on `ElasticNet*Regressor`	`skein_glm.ElasticNet*Regressor(alpha=...)` matches `glmnet`’s `alpha ∈ [0, 1]` exactly. `α=1` is lasso, `α=0` is ridge.
`lambda`	`lambdas`	numpy array. Pass `None` to auto-compute.
`nlambda`	`n_lambdas`	Default 100 (matches glmnet).
`lambda.min.ratio`	`lambda_min_ratio`	Default 1e-3 if `n > p`, 1e-2 if `n < p` (glmnet); skein defaults to 1e-3 always.
`weights`	`sample_weight` (in `fit()`)	Per-sample weights. Identical semantics.
`penalty.factor`	`weights` (in constructor)	Per-feature penalty weights. `penalty.factor[j]=0` → unpenalized; same in skein.
`intercept`	`fit_intercept`	Default `True`.
`standardize`	`standardize`	Default `False` (glmnet defaults to `True`!).
`thresh`	`tol`	Default `1e-6` (glmnet `1e-7`).
`maxit`	`max_iter`	Default `100`.
`nfolds` (cv.glmnet)	`cv`	Pass an int or any sklearn CV splitter.
`type.measure`	(auto-selected by family)	Family-appropriate metric. See “type.measure” below.

Defaults that differ¶

Two glmnet defaults that bite people moving to skein:

standardize: glmnet defaults to TRUE, skein defaults to False. If your features have heterogeneous scales, pass standardize=True explicitly.
thresh: glmnet defaults to 1e-7, skein defaults to 1e-6. Tighten with tol=1e-8 or smaller for numerically delicate problems (e.g. matching reference fits exactly).

`type.measure` map¶

glmnet’s type.measure selects the CV scoring metric. In skein, the metric is auto-selected by the GLM family, but you can override via the *PathCV mixin’s scorer attribute (advanced; see API ref).

`glmnet` `type.measure`	`skein` family default scorer
`"mse"` (gaussian)	mean squared error (lower-better)
`"deviance"` (binomial)	binomial deviance (lower-better)
`"class"` (binomial)	(not default; can override)
`"auc"` (binomial)	(not default; can override)
`"deviance"` (poisson)	Poisson deviance (lower-better)
`"deviance"` (cox)	Harrell’s c-index (higher-better) — actually skein uses concordance, not deviance, by default. Note this difference: `glmnet`’s default for Cox is partial-likelihood deviance.

Workflow translations¶

Basic CV fit and predict¶

# R
library(glmnet)
fit <- cv.glmnet(x, y, family = "gaussian")
y_hat <- predict(fit, newx = x_new, s = "lambda.min")
beta <- coef(fit, s = "lambda.min")

# Python
import skein_glm
fit = skein_glm.MCPPathCV(gamma=1e6, cv=10).fit(X, y)
y_hat = fit.predict(X_new)
beta = fit.coef_
intercept = fit.intercept_

Logistic with class weights¶

# R
fit <- cv.glmnet(x, y, family = "binomial", weights = w)
prob <- predict(fit, newx = x_new, type = "response", s = "lambda.min")

# Python
fit = skein_glm.LogisticMCPPathCV(gamma=1e6, cv=10).fit(X, y, sample_weight=w)
# v0.1: LogisticMCPPathCV picks lambda_best_ at fit time and refits;
# `fit.predict_proba(X_new)` returns a 1D probability vector.
prob = fit.predict_proba(X_new)
labels = fit.predict(X_new)

For path inspection (every λ at once), use the *PathRegressor instead of *PathCV:

path = skein_glm.LogisticMCPPathRegressor(gamma=1e6, n_lambdas=50).fit(X, y)
prob_path = path.predict_proba(X_new)   # shape (n_new, n_lambdas)

Cox PH¶

# R
fit <- cv.glmnet(x, Surv(time, event), family = "cox")
risk <- predict(fit, newx = x_new, s = "lambda.min")

# Python
fit = skein_glm.CoxMCPPathCV(gamma=1e6, cv=10).fit(X, time, event)
risk = fit.predict(X_new)   # the prognostic index η = Xβ

skein’s Cox uses Breslow ties by default; glmnet uses Efron. For moderate tie rates the difference is negligible. Efron is on the M3.7 roadmap.

Adaptive lasso (two-stage)¶

glmnet doesn’t have a built-in adaptive lasso, but the canonical recipe is straightforward:

# R
fit_init <- cv.glmnet(x, y, family = "gaussian")
beta_init <- as.numeric(coef(fit_init, s = "lambda.min"))[-1]
penalty <- 1 / (abs(beta_init) + 1e-3)
fit_adaptive <- cv.glmnet(x, y, family = "gaussian", penalty.factor = penalty)

# Python — same two-stage recipe.
import numpy as np

# Stage 1: coarse fit.
init = skein_glm.MCPPathCV(gamma=1e6).fit(X, y)
beta_init = init.coef_
weights = 1.0 / (np.abs(beta_init) + 1e-3)

# Stage 2: refit with adaptive weights.
adaptive = skein_glm.MCPPathCV(gamma=1e6, weights=weights).fit(X, y)

The M5.x roadmap promotes this two-stage idiom to a one-shot AdaptiveLasso / AdaptiveMCP estimator.

Sparse design matrices¶

glmnet accepts Matrix::sparseMatrix (CSC) seamlessly; skein does the same with scipy.sparse.csc_matrix:

# R
library(Matrix)
x_sp <- as(x, "CsparseMatrix")
fit <- cv.glmnet(x_sp, y, family = "binomial")

# Python
import scipy.sparse as sp
X_sp = sp.csc_matrix(X)
fit = skein_glm.LogisticMCPPathCV(gamma=1e6).fit(X_sp, y)

CSR inputs are converted to CSC at the boundary in skein. Group and sparse-group penalties also accept sparse inputs.

Things glmnet does that skein doesn’t yet¶

Multinomial (family = "multinomial"): M3.6 roadmap. Use one-vs-rest manually for now.
Multi-response Gaussian (family = "mgaussian"): M7 multi-task roadmap.
relax = TRUE (relaxed lasso): not in v0.1.
Offset terms: not in v0.1 (M3.7 roadmap).

Things skein does that glmnet doesn’t¶

Nonconvex penalties (MCP / SCAD with γ < ∞): nearly unbiased estimates of truly active features. glmnet is L1 / L2 only.
Group MCP via native closed-form prox (Breheny & Huang 2015 §3) on LS and all GLM families (M13.4b + M13.4c), plus group SCAD via LLA, both with Rayon-parallel block CD. grpreg has the penalties but is single-threaded R.
Three weight axes (per-sample, per-feature, per-group) composable on every estimator. glmnet does per-sample + per-feature; per-group is awkward.
Memory-mapped and chunked design matrices out of the box — fits problems too big to load into RAM.
Sparse-group penalties convex and nonconvex.
The dual extension surface (Python ABCs mirroring Rust traits) for prototyping custom penalties / datafits.