Information-criterion selection¶
Pick the best λ from a fitted *PathRegressor by AIC, BIC, or EBIC.
Single free function — no per-estimator wrapper explosion.
The criteria use the negative log-likelihood at each λ on the training data plus a complexity penalty:
AIC = 2k + 2·NLL
BIC = log(n)·k + 2·NLL
EBIC = BIC + 2γ·log C(p, k), with
gamma_ebic ∈ [0, 1](default 0.5; matchesncvreg::BIC’s high-dim recommendation).
Effective df is the number of nonzero coefficients per λ — the
Zou-Hastie-Tibshirani unbiased estimator and the standard
ncvreg/glmnet convention.
select_by_ic dispatches the per-family NLL by sniffing the path
estimator’s class name. The five families currently supported:
LS (
MCPPathRegressor,SCADPathRegressor,*Group*PathRegressor):NLL = (n/2) · log(RSS/n).Logistic (
Logistic*PathRegressor):NLL = Σ softplus(η) − y·η.Poisson (
Poisson*PathRegressor):NLL = Σ exp(η) − y·η.Cox PH (
Cox*PathRegressor): Breslow per-sample partial NLL ×n, read from the path’sinfo_["final_losses"].Multinomial (
Multinomial*PathClassifier): per-λΣ_i (logsumexp(η_i) − η_{i, y_i}). Effective df is the row-grouped active-feature count (a feature is “active” if any of its K class coefficients is nonzero), the analog of the Zou-Hastie-Tibshirani df for row-grouped lasso.
- skein_glm.ic.select_by_ic(path_model, x, *outcomes, criterion='bic', ebic_gamma=0.5, active_eps=1e-12)[source]¶
Pick the best λ from a fitted path estimator by AIC, BIC, or EBIC.
A single free function — no per-estimator wrapper. Dispatches on the path estimator’s class name to compute the right negative log-likelihood (LS, logistic, Poisson, or Cox), then adds a complexity penalty:
AIC = 2k + 2·NLL
BIC = log(n)·k + 2·NLL
EBIC = BIC + 2γ·log C(p, k), with γ ∈ [0, 1] (default 0.5; matches
ncvreg::BIC’s high-dim recommendation)
where k is the number of nonzero coefficients per λ (the Zou-Hastie-Tibshirani unbiased df estimator, the standard
ncvreg/glmnetconvention).- Parameters:
path_model (
*PathRegressor) – Any fitted path estimator (LS / logistic / Poisson / Cox).x (
array-like) – The design matrix used in the fit. Used to recompute the per-λ negative log-likelihood.*outcomes – For non-Cox estimators: a single
yarray. For Cox:time, event. Mirrors each estimator’sfitsignature.criterion (
{"aic", "bic", "ebic"}, default"bic") – Which information criterion to use.ebic_gamma (
float, default0.5) – EBIC penalty parameter γ ∈ [0, 1]. Ignored for AIC/BIC.active_eps (
float, default1e-12) – Threshold for counting a coefficient as “active” (nonzero).
- Returns:
best_idx (
int) – Index intopath_model.lambdas_of the IC-minimizing λ.scores (
ndarrayofshape (n_lambdas,)) – Per-λ score vector (lower-is-better). The fitted β ispath_model.coefs_[best_idx].
- Return type:
Examples
>>> import skein_glm >>> path = skein_glm.MCPPathRegressor(gamma=3.0, n_lambdas=50).fit(X, y) >>> best_idx, scores = skein_glm.select_by_ic(path, X, y, criterion="bic") >>> beta_best = path.coefs_[best_idx] >>> intercept_best = path.intercepts_[best_idx]
For Cox PH:
>>> cox_path = skein_glm.CoxMCPPathRegressor(gamma=3.0, n_lambdas=50).fit( ... X, time, event) >>> best_idx, _ = skein_glm.select_by_ic(cox_path, X, time, event, criterion="ebic")