Porting from ncvreg

ncvreg (Patrick Breheny) is the canonical R package for nonconvex penalized regression — MCP and SCAD with rigorous statistical treatment. If you came to skein because you wanted MCP/SCAD support in Python, this is your guide.

skein’s scalar MCP/SCAD path is a direct port of ncvreg’s algorithmic stack: coordinate descent with strong-rule screening, gap-safe screening (skein’s addition), warm-started λ-paths, and the same glmnet-style standardization conventions. Coefficients should match ncvreg to ~1e-6 on canonical problems (the M8 numerical regression suite, pending, will pin this).

The three top-line translations

ncvreg ®

skein (Python)

ncvreg(X, y, penalty = "MCP")

MCPPathRegressor(gamma=3.0).fit(X, y)

cv.ncvreg(X, y, penalty = "MCP")

MCPPathCV(gamma=3.0, cv=10).fit(X, y)

coef(cvfit) (at lambda.min)

cv.coef_, cv.intercept_

R:

library(ncvreg)
fit <- cv.ncvreg(X, y, penalty = "MCP", gamma = 3, nfolds = 10)
beta <- coef(fit)[-1]   # drop intercept
alpha <- coef(fit)[1]

Python:

import skein_glm
fit = skein_glm.MCPPathCV(gamma=3.0, cv=10).fit(X, y)
beta = fit.coef_
alpha = fit.intercept_

Family map

ncvreg family

skein estimator base

family = "gaussian" (default)

MCP* / SCAD* (LS)

family = "binomial"

LogisticMCP* / LogisticSCAD*

family = "poisson"

PoissonMCP* / PoissonSCAD*

ncvsurv (separate function)

CoxMCP* / CoxSCAD*

For the binomial family, ncvreg uses an iteratively reweighted least-squares (IRLS) outer loop; skein uses prox-Newton (which is mathematically the same scheme, with the inner LS solved by CD instead of QR). Convergence behavior should be similar.

Penalty + γ map

ncvreg

skein

penalty = "MCP", gamma = 3

MCPPathRegressor(gamma=3.0)

penalty = "SCAD", gamma = 3.7

SCADPathRegressor(a=3.7)note a not gamma

penalty = "lasso"

MCPPathRegressor(gamma=1e6) (≈ lasso)

ncvreg uses gamma for the SCAD shape parameter (which is the a in the SCAD literature — Fan & Li 2001). skein keeps the literature naming: gamma for MCP, a for SCAD. Default a=3.7 matches ncvreg.

Per-argument map

ncvreg arg

skein arg

Notes

X

X (positional)

Same conventions.

y

y (positional)

For Cox (ncvsurv): fit(X, time, event).

penalty

(choose MCP* vs SCAD* class)

See above.

gamma

gamma (MCP) or a (SCAD)

Same numerical meaning.

family

(choose family-prefixed class)

See family map.

lambda

lambdas

numpy array.

nlambda

n_lambdas

Default 100 (matches ncvreg).

lambda.min

lambda_min_ratio

Same semantic.

eps

tol

max.iter

max_iter

convex

(handled internally)

ncvreg’s convex = TRUE mode is equivalent to skein’s LLA outer loop, which is always on for nonconvex penalties.

dfmax

(not directly; use working set + KKT)

skein’s default working-set strategy already screens; no explicit df cap.

penalty.factor

weights (constructor)

Per-feature penalty weights. Identical semantics.

nfolds (cv.ncvreg)

cv (PathCV)

int or sklearn splitter.

seed (cv.ncvreg)

random_state

ncvreg-specific recipes in skein

Standard MCP path

# R
fit <- ncvreg(X, y, penalty = "MCP", gamma = 3)
plot(fit)
coef(fit, lambda = 0.1)    # coefficients at a specific λ
# Python
fit = skein_glm.MCPPathRegressor(gamma=3.0, n_lambdas=100).fit(X, y)
# fit.coefs_ is (100, p), fit.lambdas_ is (100,)
# Pick a specific λ:
import numpy as np
idx = np.argmin(np.abs(fit.lambdas_ - 0.1))
beta_at_0p1 = fit.coefs_[idx]
intercept_at_0p1 = fit.intercepts_[idx]

CV with custom λ grid

# R
my_lambda <- 10^seq(0, -3, length.out = 50)
fit <- cv.ncvreg(X, y, penalty = "MCP", lambda = my_lambda, nfolds = 10)
# Python
import numpy as np
my_lambda = 10 ** np.linspace(0, -3, 50)
fit = skein_glm.MCPPathCV(gamma=3.0, lambdas=my_lambda, cv=10).fit(X, y)

BIC selection

ncvreg ships BIC.ncvreg for information-criterion selection. skein has select_by_ic:

# R
fit <- ncvreg(X, y, penalty = "MCP")
sel <- summary(fit, lambda = "BIC")
# Python
path = skein_glm.MCPPathRegressor(gamma=3.0, n_lambdas=100).fit(X, y)
best_idx, scores = skein_glm.select_by_ic(path, X, y, criterion="bic")
beta_best = path.coefs_[best_idx]

select_by_ic supports "aic", "bic", and "ebic" (with ebic_gamma=0.5 matching ncvreg::BIC’s high-dim recommendation).

Survival analysis

# R
library(ncvreg)
fit <- ncvsurv(X, y = Surv(time, event), penalty = "MCP", gamma = 3)
risk <- predict(fit, X_new, lambda = lambda_min, type = "link")
# Python
fit = skein_glm.CoxMCPPathRegressor(gamma=3.0, n_lambdas=100).fit(X, time, event)
risk = fit.predict(X_new)   # shape (n_new, n_lambdas), η = Xβ

# Or with CV-selected λ:
cv = skein_glm.CoxMCPPathCV(gamma=3.0, cv=10).fit(X, time, event)
risk_at_lambda_min = cv.predict(X_new)   # 1D, at the best λ

skein’s Cox uses Breslow ties (matches ncvsurv’s default).

Differences worth knowing

Standardization

ncvreg defaults to standardize = FALSE for the X matrix; skein also defaults to standardize=False. So the “compatibility” mode is the default. If you want centering+scaling, pass standardize=True on both sides.

lambda.min behavior under CV

ncvreg::cv.ncvreg returns both lambda.min and lambda.1se (one-standard-error rule). skein_glm.MCPPathCV only returns lambda_best_ (= the equivalent of lambda.min). The 1se rule is on the M5.x roadmap; for now you can compute it manually:

cv = skein_glm.MCPPathCV(gamma=3.0, cv=10).fit(X, y)
mean_scores = cv.cv_mean_scores_   # shape (n_lambdas,)
std_scores = cv.cv_std_scores_     # shape (n_lambdas,)

best_score = mean_scores.min()
best_idx = mean_scores.argmin()
# lambda.1se: largest λ with mean_score within 1 SE of best.
threshold = best_score + std_scores[best_idx]
candidates = np.where(mean_scores <= threshold)[0]
# largest λ → smallest index (lambdas are decreasing).
lambda_1se_idx = candidates.min()
lambda_1se = cv.lambdas_[lambda_1se_idx]

Convergence threshold

ncvreg’s eps = 1e-4 is its default (relative change in β); skein’s tol = 1e-6 is its default (max-coord-update L1, KKT-based). Both are reasonable; if you want to match ncvreg numerics exactly, pass tol=1e-4 and bump max_iter.

Sparse design matrices

ncvreg doesn’t natively support sparse X (you’d convert to dense first). skein does:

import scipy.sparse as sp
X_sp = sp.csc_matrix(X)
fit = skein_glm.MCPPathCV(gamma=3.0).fit(X_sp, y)

This alone can be a major speedup if your X is mostly zero.

What skein adds vs ncvreg

  • Group penalties with Rayon-parallel block CD: group MCP via native closed-form prox (Breheny & Huang 2015 §3) on LS and all GLM families (M13.4b + M13.4c); group SCAD via LLA. ncvreg is scalar-only; for groups you’d switch to grpreg (single-threaded R).

  • Sparse-group penalties (convex and nonconvex).

  • Three weight axes including per-group, all composable.

  • Memory-mapped / chunked design matrices for n in the hundreds of millions.

  • scipy.sparse.csc_matrix support transparently.

  • The dual extension surface for prototyping new penalties.

See also