4. Group penalties¶
When features come in natural groups — dummy-encoded categoricals, gene pathways, polynomial expansions, time-frequency bands — selecting features one at a time can produce nonsensical results: keep three of five dummies for the same factor and you’ve effectively kept all five. Group penalties fix this by treating an entire group as the unit of selection.
This tutorial walks through the three flavors of group penalty in skein and when each is the right choice.
Building a groups vector¶
groups is a length-n_features integer vector: feature j belongs
to group groups[j]. Group labels must form a contiguous range
{0, 1, …, n_groups - 1}.
import numpy as np
# 30 features in 6 groups of 5.
groups = np.repeat(np.arange(6), 5).astype(np.int64)
# array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, ...])
For a one-hot encoding pattern with mixed group sizes, build the
labels explicitly — np.repeat([0, 1, 2], [3, 4, 2]) gives
[0, 0, 0, 1, 1, 1, 1, 2, 2] for groups of size 3, 4, 2.
Three penalties, three behaviors¶
GroupLasso — convex baseline¶
The L1-of-L2 penalty: λ Σ_g w_g ‖β_g‖_2. Convex, easy to fit, but
biased on truly active groups (the L2-norm shrinkage applies even
when the group is unambiguously active).
import skein_glm
rng = np.random.default_rng(0)
n, p = 200, 30
X = rng.standard_normal((n, p))
true_beta = np.zeros(p)
true_beta[0:5] = [1.5, -1.0, 0.8, -0.6, 1.2] # group 0 fully active
true_beta[10:15] = [0.7, -0.5, 0.4, 0.3, -0.6] # group 2 fully active
y = X @ true_beta + 0.3 * rng.standard_normal(n)
groups = np.repeat(np.arange(6), 5).astype(np.int64)
m = skein_glm.GroupLassoPathRegressor(
groups=groups, n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)
# Active groups: features 0-4 and 10-14 should have nonzero β.
last = m.coefs_[-1]
group_norms = np.array([
np.linalg.norm(last[groups == g]) for g in range(6)
])
print(group_norms.round(3)) # large at g=0 and g=2, small elsewhere
Use this when you want a fast, convex baseline and don’t mind the attenuation bias.
GroupMCP — unbiased nonconvex¶
Same L2-of-coefficients aggregation but with MCP shrinkage at the group level: shrinks small group norms toward zero, leaves large group norms alone. Solved via Local Linear Approximation (LLA) — a sequence of weighted group-lasso problems.
m = skein_glm.GroupMCPPathRegressor(
groups=groups, gamma=3.0,
n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)
# `gamma` controls the kink: smaller = more aggressive sparsification.
# Default gamma=3.0 is the ncvreg recommendation.
Use this when you want unbiased estimates on the active groups — which is usually what you want for downstream interpretation.
SparseGroupMCP — hybrid¶
Combines group-level and within-group sparsity: a feature can be zero even when its group is active. Useful when you expect that only some features within each “active” group actually contribute.
# Sparse-truth: feature 1 of group 0 is the only active feature there.
true_beta = np.zeros(p)
true_beta[1] = 1.5 # group 0, feature 1 only
true_beta[12] = 0.8 # group 2, feature 12 only
y = X @ true_beta + 0.3 * rng.standard_normal(n)
m = skein_glm.SparseGroupMCPPathRegressor(
groups=groups, gamma=3.0, alpha=0.5, # α: within-group L1 vs L2 mix
n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)
active = np.flatnonzero(np.abs(m.coefs_[-1]) > 1e-6)
print(active) # [1, 12] — exact within-group sparsity
alpha controls the mix: alpha=1 is pure within-group L1,
alpha=0 is pure group L2. The default 0.5 is a balanced
compromise.
CV picks λ the same way¶
cv = skein_glm.GroupMCPPathCV(
groups=groups, gamma=3.0, cv=5, random_state=0,
n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)
cv.lambda_best_ # selected λ
cv.coef_ # final-refit β
cv.predict(X[:5]) # standard predict surface
Group penalties for GLMs¶
Every group-penalty regressor has logistic, Poisson, and Cox counterparts. Same workflow, different datafit:
clf = skein_glm.LogisticGroupMCPPathRegressor(
groups=groups, gamma=3.0, n_lambdas=30,
).fit(X, y_binary)
cox = skein_glm.CoxGroupLassoPathRegressor(
groups=groups, n_lambdas=30,
).fit(X, time, event)
Per-group weights¶
If you know some groups are a priori more important, pass per-group
weights (length n_groups):
n_groups = 6
group_w = np.ones(n_groups)
group_w[0] = 0.0 # don't penalize group 0
group_w[1] = 0.5 # half-penalize group 1
m = skein_glm.GroupMCPPathRegressor(
groups=groups, gamma=3.0, weights=group_w,
).fit(X, y)
Per-group weights are the per-group axis of skein’s three weight axes (the others are per-sample and per-feature; see concepts/weights).
Recap¶
Penalty |
Convex? |
Use for |
|---|---|---|
|
yes |
fast baseline; entire groups in/out |
|
no (LLA) |
unbiased estimates on active groups |
|
no (LLA) |
alternative to MCP, less aggressive |
|
yes |
correlated groups, want stability |
|
yes |
groups + within-group sparsity, convex |
|
no (LLA) |
groups + within-group, unbiased |
|
no (LLA) |
as above, SCAD-flavored |
Next¶
→ 5. Sparse and standardize — scipy.sparse input and the standardization story.