Concepts¶
skein is built around a small set of orthogonal abstractions. Once
you understand the four pieces below, you can predict the behavior
of any combination — including ones we haven’t documented as
worked examples — because the solver code never sees the difference.
The four axes¶
Axis |
Trait surface |
Concrete types in v0.2 |
Page |
|---|---|---|---|
Penalty |
|
MCP, SCAD, elastic net, group lasso, group MCP, group elastic net, sparse-group lasso/MCP |
|
Datafit |
|
Least squares, binomial logistic, Poisson, Cox PH (Breslow), multinomial / softmax |
|
Weights |
(per-axis on each trait) |
per-sample, per-feature, per-group |
|
Backend |
|
dense, sparse CSC, mmap (f64 + f32), chunked, augmented, standardized, multi-task |
A fifth axis, response shape, sits orthogonal to these four:
single-output y ∈ ℝ^n (the default everywhere on this site) vs.
multi-response Y ∈ ℝ^(n×K) (multi-task LS) vs. multinomial /
softmax classification (K class labels). All three reduce
algebraically to a group-lasso problem on a virtual block-replicated
design, so they reuse the rest of the stack unchanged. See
Multi-task and Multinomial.
Every estimator class in skein_glm.* is a packaging of one
(datafit, penalty) pair with optional weights, behind a single
sklearn-compatible fit / predict interface. The path variants
add warm-starting across a λ-grid; the CV variants wrap a path
in K-fold cross-validation. You don’t need to learn 60+ separate
classes — they’re all the same machinery with different
(datafit, penalty) instantiations.
Why this matters¶
Two things follow from the abstraction:
New backends are O(1) effort. The
DesignMatrixtrait has five methods (matvec,rmatvec,col_dot,col_sq_norm,columns). Implementing them for a new backend (an HDF5 reader, a Parquet column store, a GPU buffer) takes a few hundred lines and works with every estimator immediately. TheAugmentedandStandardizedwrappers compose with anything that implements the trait.New penalties are also O(1) effort. The
Penaltytrait exposesprox,value, andweights; theGroupPenaltytrait adds block-prox primitives. Once you implement these, every datafit and every backend already supports your penalty — you don’t write a separateMyPenaltyOnSparseorMyPenaltyForCox.
The downside of this orthogonality: when something goes wrong, the bug is in exactly one of the four traits, and the type system won’t help you locate it. The test suite is structured the same way (199 cargo tests + 138 pytests) so each axis is covered independently.
Reading order¶
If you’re new to the conceptual model:
Datafits first — what “the loss function” means in
skeinand how prox-Newton turns non-Gaussian losses into a sequence of weighted least-squares problems.Penalties second — convex (lasso, group lasso) vs. nonconvex (MCP, SCAD), and how Local Linear Approximation reduces nonconvex to weighted convex.
Weights third — the three independent axes (per-sample, per-feature, per-group), what each one accomplishes statistically, and how to combine them.
Backends last — the storage + wrapper hierarchy (
DenseMatrix,SparseCSC,MmapMatrix,Chunked<C>, plus theAugmented<D>andStandardized<D>wrappers).
If you’re porting from R, skip ahead to the Porting section (coming in commit 2) — it’s organized by R package and points back to the relevant concept pages.
Beyond regression: graphical models¶
A fifth, separate use of the same machinery: instead of regressing
y on X, estimate a sparse precision matrix that encodes
conditional independence between variables. This is the workhorse
of network psychometrics and Gaussian graphical models. The same
penalties (L1, MCP, SCAD) and weight infrastructure carry over
edge-wise, and joint estimation across populations reuses the
group-penalty primitives.
Graphical models — sparse inverse covariance, glasso, joint estimation across populations.