Worked example: survival — Cox PH with sparse-group MCP¶
Survival analysis with high-dimensional covariates: predict time-to-event from a feature matrix where features cluster into biologically meaningful groups (gene pathways, treatment regimens, etc.) and we want both pathway-level and feature-level sparsity.
This is the same shape as the genomics example, but
with a Cox PH datafit instead of LS. Cox has no intercept (the
baseline hazard absorbs it), and the fit signature takes
(X, time, event) instead of (X, y).
Setup¶
import numpy as np
import skein_glm
rng = np.random.default_rng(7)
# 800 patients, 100 features grouped into 10 pathways of 10 features.
n = 800
n_pathways = 10
features_per_pathway = 10
p = n_pathways * features_per_pathway
groups = np.repeat(np.arange(n_pathways), features_per_pathway)
# Standard-normal features (simulating standardized expression data,
# clinical covariates, etc.).
X = rng.standard_normal((n, p))
# Truth: 2 of the 10 pathways have any active features. Within an
# active pathway, 3 features are active.
true_beta = np.zeros(p)
active_pathways = [2, 6]
for path in active_pathways:
feature_idx = np.where(groups == path)[0]
active = rng.choice(feature_idx, size=3, replace=False)
true_beta[active] = rng.normal(0.5, 0.2, size=3)
# Simulate survival times under exponential baseline hazard:
# T_i ~ Exp(λ_0 · exp(η_i)), independent right-censoring at C_i ~ Exp(0.3).
eta = X @ true_beta
hazard_rate = np.exp(eta)
T = rng.exponential(scale=1.0 / hazard_rate)
C = rng.exponential(scale=1.0 / 0.3, size=n)
time = np.minimum(T, C)
event = (T <= C).astype(float)
print(f"n = {n}, p = {p}, n_pathways = {n_pathways}")
print(f"event rate (uncensored): {event.mean():.2%}")
print(f"# truly active features: {int((true_beta != 0).sum())}")
print(f"truly active pathways: {active_pathways}")
The event rate (~70-80% with the hazards above) is on the higher end of typical clinical data; if you’re modeling something with heavy censoring, expect the per-fit information to drop and λ_min to need tightening.
The fit¶
group_sizes = np.bincount(groups)
group_weights = np.sqrt(group_sizes.astype(float))
# Cox + sparse-group MCP, CV-selected λ, standardize for safety.
fit = skein_glm.CoxSparseGroupMCPPathCV(
groups=groups,
weights=group_weights,
gamma=3.0,
alpha=0.5,
n_lambdas=40,
lambda_min_ratio=1e-3,
cv=10,
standardize=True,
random_state=0,
).fit(X, time, event)
print(f"Best λ: {fit.lambda_best_:.4f}")
print(f"# selected features: {int((fit.coef_ != 0).sum())}")
print(f"selected pathways: "
f"{sorted(set(int(g) for g in groups[fit.coef_ != 0]))}")
print(f"truly active pathways: {active_pathways}")
print(f"CV concordance at best λ: {fit.cv_mean_scores_[fit.cv_mean_scores_.argmax()]:.3f}")
Note: Cox CV uses Harrell’s c-index (concordance) by default,
which is higher-better. fit.cv_mean_scores_.argmax() picks the
best-by-c-index λ; for lower-is-better family the choice would
be argmin.
Predicting risk¶
# Build a small held-out set the same way.
X_new = rng.standard_normal((100, p))
risk_score = fit.predict(X_new) # η = Xβ, the prognostic index
# Higher η → shorter survival. Rank-order patients:
order = np.argsort(risk_score)[::-1] # descending risk
top10 = order[:10]
print(f"10 highest-risk patients (by η): {top10}")
print(f"η values: {risk_score[top10]}")
For survival predictions you typically want either:
The prognostic index \(\eta = X\beta\) — what
predict()returns for Cox. Higher is shorter survival.Survival probabilities \(S(t \mid x)\) — needs the baseline hazard estimate. v0.1 doesn’t ship this; M3.7 has Breslow’s cumulative-baseline-hazard estimator on the roadmap.
For now, the prognostic index is enough for ranking patients (the “who’s at highest risk” question), which is what most clinical models use anyway.
Path inspection¶
If you want to trace which features come on / drop off as λ decreases (a common diagnostic in genomics):
path = skein_glm.CoxSparseGroupMCPPathRegressor(
groups=groups,
weights=group_weights,
gamma=3.0,
alpha=0.5,
n_lambdas=40,
lambda_min_ratio=1e-3,
standardize=True,
).fit(X, time, event)
# path.coefs_ has shape (n_lambdas, p). For each λ, count active
# features per pathway.
n_active_by_lambda = (path.coefs_ != 0).sum(axis=1)
n_active_pathways_by_lambda = np.array([
len(set(groups[path.coefs_[k] != 0]))
for k in range(len(path.lambdas_))
])
import matplotlib.pyplot as plt # if you have it
plt.plot(path.lambdas_, n_active_by_lambda, label="features")
plt.plot(path.lambdas_, n_active_pathways_by_lambda, label="pathways")
plt.xscale("log")
plt.xlabel("λ")
plt.ylabel("# selected")
plt.legend()
A typical pattern: the “pathways” curve grows in steps (one pathway turns on at a time), then plateaus; the “features” curve grows more smoothly within each pathway as within-group sparsity relaxes.
Comparing to a non-grouped fit¶
What happens if you drop the group structure and use scalar Cox MCP?
fit_scalar = skein_glm.CoxMCPPathCV(
gamma=3.0,
n_lambdas=40,
lambda_min_ratio=1e-3,
cv=10,
standardize=True,
random_state=0,
).fit(X, time, event)
selected_groups_scalar = sorted(set(
int(g) for g in groups[fit_scalar.coef_ != 0]
))
print(f"Scalar Cox MCP selected pathways: {selected_groups_scalar}")
print(f"Sparse-group MCP selected: "
f"{sorted(set(int(g) for g in groups[fit.coef_ != 0]))}")
print(f"Truth: {active_pathways}")
Without the group structure, the scalar MCP often picks individual features scattered across many pathways — even when, biologically, features in the same pathway are correlated and either should all be selected or none. The sparse-group MCP enforces the right structure: every selected feature lives in an actively-selected pathway.
Confidence intervals on prognostic features¶
Once we’ve selected an active set, the standard next question is:
which of these features have a statistically significant effect on
survival, and how large is the effect? Penalized Cox coefficients
are biased toward zero by construction, so the raw coef_ from a
penalized fit doesn’t admit a Wald interpretation.
DebiasedCoxLassoRegressor adds a single-step
debiasing correction and produces a Wald CI + p-value per feature.
The implementation extends the Van de Geer et al. (2014) debiased
construction to Cox by running the nodewise lassos on the
weighted design X̃ = W^{1/2} X, where W is the per-sample
diagonal of the Cox partial-likelihood Fisher information (extracted
from the prox-Newton surrogate). See the
inference concept page for the full
derivation.
from skein_glm import DebiasedCoxLassoRegressor
deb = DebiasedCoxLassoRegressor(alpha=0.05, ties="breslow").fit(X, time, event)
# Per-feature inferential output, all on the original feature scale.
for j in np.flatnonzero(np.abs(deb.coef_) > 0.05):
print(
f"feature {j:3d}: β̂_d = {deb.coef_[j]:+.3f} "
f"95% CI = [{deb.ci_lower_[j]:+.3f}, {deb.ci_upper_[j]:+.3f}] "
f"p = {deb.pvalues_[j]:.3g}"
)
# Prognostic index on training data.
print(f"η̂ range: [{deb.risk_score_.min():.2f}, {deb.risk_score_.max():.2f}]")
The Cox debiased estimator is per-coordinate; for group-level tests
(does this pathway matter?) use stability selection on the
sparse-group fit above, or combine the debiased coefficients via a
Wald test for the group sum-of-squares (not in skein today).
See also¶
Concepts: Inference on penalized fits — Wald CIs and stability selection across LS / GLM / Cox.
Concepts: Datafits — Cox PH partial likelihood and Breslow ties.
Concepts: Penalties — sparse-group MCP shape.
Worked example: genomics — the same group structure with a continuous outcome instead.
Porting from glmnet for
cv.glmnet(family = "cox")users; Porting from grpreg forgrpsurvusers.