4. Group penalties

When features come in natural groups — dummy-encoded categoricals, gene pathways, polynomial expansions, time-frequency bands — selecting features one at a time can produce nonsensical results: keep three of five dummies for the same factor and you’ve effectively kept all five. Group penalties fix this by treating an entire group as the unit of selection.

This tutorial walks through the three flavors of group penalty in skein and when each is the right choice.

Building a groups vector

groups is a length-n_features integer vector: feature j belongs to group groups[j]. Group labels must form a contiguous range {0, 1, …, n_groups - 1}.

import numpy as np

# 30 features in 6 groups of 5.
groups = np.repeat(np.arange(6), 5).astype(np.int64)
# array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, ...])

For a one-hot encoding pattern with mixed group sizes, build the labels explicitly — np.repeat([0, 1, 2], [3, 4, 2]) gives [0, 0, 0, 1, 1, 1, 1, 2, 2] for groups of size 3, 4, 2.

Three penalties, three behaviors

GroupLasso — convex baseline

The L1-of-L2 penalty: λ Σ_g w_g ‖β_g‖_2. Convex, easy to fit, but biased on truly active groups (the L2-norm shrinkage applies even when the group is unambiguously active).

import skein_glm

rng = np.random.default_rng(0)
n, p = 200, 30
X = rng.standard_normal((n, p))
true_beta = np.zeros(p)
true_beta[0:5] = [1.5, -1.0, 0.8, -0.6, 1.2]   # group 0 fully active
true_beta[10:15] = [0.7, -0.5, 0.4, 0.3, -0.6] # group 2 fully active
y = X @ true_beta + 0.3 * rng.standard_normal(n)

groups = np.repeat(np.arange(6), 5).astype(np.int64)

m = skein_glm.GroupLassoPathRegressor(
    groups=groups, n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)

# Active groups: features 0-4 and 10-14 should have nonzero β.
last = m.coefs_[-1]
group_norms = np.array([
    np.linalg.norm(last[groups == g]) for g in range(6)
])
print(group_norms.round(3))  # large at g=0 and g=2, small elsewhere

Use this when you want a fast, convex baseline and don’t mind the attenuation bias.

GroupMCP — unbiased nonconvex

Same L2-of-coefficients aggregation but with MCP shrinkage at the group level: shrinks small group norms toward zero, leaves large group norms alone. Solved via Local Linear Approximation (LLA) — a sequence of weighted group-lasso problems.

m = skein_glm.GroupMCPPathRegressor(
    groups=groups, gamma=3.0,
    n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)

# `gamma` controls the kink: smaller = more aggressive sparsification.
# Default gamma=3.0 is the ncvreg recommendation.

Use this when you want unbiased estimates on the active groups — which is usually what you want for downstream interpretation.

SparseGroupMCP — hybrid

Combines group-level and within-group sparsity: a feature can be zero even when its group is active. Useful when you expect that only some features within each “active” group actually contribute.

# Sparse-truth: feature 1 of group 0 is the only active feature there.
true_beta = np.zeros(p)
true_beta[1] = 1.5            # group 0, feature 1 only
true_beta[12] = 0.8            # group 2, feature 12 only
y = X @ true_beta + 0.3 * rng.standard_normal(n)

m = skein_glm.SparseGroupMCPPathRegressor(
    groups=groups, gamma=3.0, alpha=0.5,  # α: within-group L1 vs L2 mix
    n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)

active = np.flatnonzero(np.abs(m.coefs_[-1]) > 1e-6)
print(active)  # [1, 12] — exact within-group sparsity

alpha controls the mix: alpha=1 is pure within-group L1, alpha=0 is pure group L2. The default 0.5 is a balanced compromise.

CV picks λ the same way

cv = skein_glm.GroupMCPPathCV(
    groups=groups, gamma=3.0, cv=5, random_state=0,
    n_lambdas=30, lambda_min_ratio=1e-2,
).fit(X, y)

cv.lambda_best_       # selected λ
cv.coef_              # final-refit β
cv.predict(X[:5])     # standard predict surface

Group penalties for GLMs

Every group-penalty regressor has logistic, Poisson, and Cox counterparts. Same workflow, different datafit:

clf = skein_glm.LogisticGroupMCPPathRegressor(
    groups=groups, gamma=3.0, n_lambdas=30,
).fit(X, y_binary)

cox = skein_glm.CoxGroupLassoPathRegressor(
    groups=groups, n_lambdas=30,
).fit(X, time, event)

Per-group weights

If you know some groups are a priori more important, pass per-group weights (length n_groups):

n_groups = 6
group_w = np.ones(n_groups)
group_w[0] = 0.0       # don't penalize group 0
group_w[1] = 0.5       # half-penalize group 1

m = skein_glm.GroupMCPPathRegressor(
    groups=groups, gamma=3.0, weights=group_w,
).fit(X, y)

Per-group weights are the per-group axis of skein’s three weight axes (the others are per-sample and per-feature; see concepts/weights).

Recap

Penalty

Convex?

Use for

GroupLasso

yes

fast baseline; entire groups in/out

GroupMCP

no (LLA)

unbiased estimates on active groups

GroupSCAD

no (LLA)

alternative to MCP, less aggressive

GroupElasticNet

yes

correlated groups, want stability

SparseGroupLasso

yes

groups + within-group sparsity, convex

SparseGroupMCP

no (LLA)

groups + within-group, unbiased

SparseGroupSCAD

no (LLA)

as above, SCAD-flavored

Next

5. Sparse and standardize — scipy.sparse input and the standardization story.