Graphical edge stability¶
Bootstrap-based confidence in edges for graphical models. Complements the EBIC tuner (graph_selection) by quantifying how often an edge would be selected under resampling, not just whether it is selected at one λ.
Two complementary wrappers around the M11 graphical estimators:
GraphicalStabilitySelection— Meinshausen– Bühlmann (2010) stability selection lifted to edges. Sweep a user-supplied λ-grid; for each (bootstrap, λ) refit, record the off-diagonal nonzero pattern ofΘ̂; aggregate to per-(λ, i, j) selection probability. The stable edge set is the edges whose max-over-λ probability crosses a threshold (default 0.6; the MB error-control bound requires > 0.5).GraphicalBootstrap— classic non-parametric (resample-with-replacement) bootstrap at a single λ. Returns the per-edge sampling distribution: mean, SD,[α/2, 1−α/2]quantile CIs, edge selection probability. This is the headlinebootnet::bootnet(type="nonparametric")output used for edge error bars in network psychometrics.
Both work with single-population (GraphicalLasso / GraphicalMCP /
GraphicalSCAD) and joint (JointGraphicalLasso /
JointGraphicalMCP) base estimators. Family is auto-detected via the
estimator’s lambda_2 / alpha init param; no manual switching.
Why¶
Field practice in network psychometrics is to fit a graphical lasso
at one λ chosen by EBIC, then bootstrap the edges to plot error
bars. R’s bootnet is the canonical package for this; its output
(edge selection probability + CI on edge weight) is what
psychometric papers reproduce. Until now, the Python toolchain
(sklearn.covariance.GraphicalLasso) did not offer a clean
bootstrap utility — users would either drop to R or hand-roll a
resampling loop on top of sklearn fits.
GraphicalBootstrap matches bootnet’s output shape; users porting
from bootnet can plug skein in as a drop-in replacement that also
supports nonconvex edge penalties (MCP/SCAD) and joint estimation.
Compatibility¶
Both classes require raw observation data, not a precomputed
covariance — bootstrap means we need to resample rows. Passing a
square symmetric (p, p) matrix is rejected with a clear error
message. For joint estimation, pass a list/tuple of per-population
X^(k) arrays.
The bootstrap loop is embarrassingly parallel via joblib; pass
n_jobs=-1. For fixed random_state the bootstrap indices are
pre-generated, so n_jobs > 1 produces identical results to the
serial fit.
API¶
- class skein_glm.GraphicalStabilitySelection(base_estimator, lambdas, *, n_bootstraps=100, sample_fraction=0.5, threshold=0.6, random_state=None, n_jobs=None, active_eps=1e-06)[source]¶
Bases:
BaseEstimatorMeinshausen–Bühlmann stability selection for graphical models.
Wraps a graphical estimator (single or joint) and runs subsample-bootstrap over a user-supplied λ-grid. Each bootstrap iteration draws a fraction of rows without replacement, refits at every λ in the grid, and records which off-diagonal entries of Θ̂ are nonzero. The fraction of bootstraps in which edge
(i, j)was selected at λ_k is its selection probability; an edge is stable if its max-over-λ probability crossesthreshold.The MB (2010) error-control bound on expected false positives applies to
threshold > 0.5.- Parameters:
base_estimator (Any) – Any single-population (
GraphicalLasso/MCP/SCAD) or joint (JointGraphicalLasso/MCP) graphical estimator. The init params (gamma/a/edge_weights/…) are passed through to every refit; only the regularisation parameter (alphaorlambda_2) is overridden per-λ.lambdas (
array-like (n_lambdas,)) – Positive λ values to sweep. The order is preserved in the output but does not affect the result (each (bootstrap, λ) fit is independent).n_bootstraps (
int, default100)sample_fraction (
float, default0.5) – Subsample fraction (without replacement). MB recommend 0.5.threshold (
float, default0.6) – Stable-edge probability cutoff; must be > 0.5 for the MB error-control bound.n_jobs (
intorNone) – Forwarded tojoblib.Parallel.-1uses all cores.active_eps (
float, default1e-6) – Threshold for declaring |Θ_ij| nonzero. Looser than the sklearn default to absorb mild solver drift across subsample fits — graphical block-CD/ADMM converge to lower tolerance than coefficient solvers in practice.
- selection_probabilities_¶
Single-pop: shape (n_lambdas, p, p). Joint: shape (n_lambdas, K, p, p). Symmetric in the last two axes, zero on the diagonal.
- Type:
ndarray
- max_probabilities_¶
Single-pop: shape (p, p). Joint: shape (K, p, p). Max over λ of the per-edge selection probability.
- Type:
ndarray
- stable_edges_¶
Single-pop: shape (n_stable, 2) of (i, j) pairs (upper triangular). Joint: list of K such arrays, one per population.
- Type:
ndarrayofint64
- lambdas_¶
- Type:
ndarray
- n_populations_¶
- Type:
int (joint only)
Notes
Bootstrap requires raw observation data. Passing a precomputed covariance matrix is rejected with a clear error message.
Examples
>>> from skein_glm import ( ... GraphicalMCP, GraphicalStabilitySelection, ... ) >>> import numpy as np >>> ss = GraphicalStabilitySelection( ... base_estimator=GraphicalMCP(gamma=3.0), ... lambdas=np.geomspace(0.5, 0.05, 8), ... n_bootstraps=100, threshold=0.6, n_jobs=-1, random_state=0, ... ) >>> ss.fit(X) >>> ss.stable_edges_ # shape (n_stable, 2)
- mb_threshold(expected_false_positives=1.0)[source]¶
Required
π_thrto controlE[V] ≤ expected_false_positives.Applies the Meinshausen–Bühlmann (2010) closed-form bound to this fitted stability-selection result. Uses
p · (p − 1)/2as the family size (number of unique edges) and the average number of selected edges per (λ, bootstrap) cell asq_Λ.Returns a stability-probability threshold in
(0.5, 1]. If the requested error bound is infeasible at the observedq_Λ, raisesValueErrorwith a diagnostic message.See
skein_glm.mb_stability_threshold()for the formula and reference.
- class skein_glm.GraphicalBootstrap(base_estimator, *, n_bootstraps=200, alpha=0.05, random_state=None, n_jobs=None, active_eps=1e-06)[source]¶
Bases:
BaseEstimatorNon-parametric bootstrap for graphical edge weights.
Resamples observations with replacement (same n), refits the graphical estimator at the given regularisation parameter, and records the bootstrap sampling distribution of every edge in Θ̂. Output statistics:
`mean_`, `std_` — per-edge bootstrap mean and standard deviation. The SD doubles as a Wald-style standard error.
`ci_lower_`, `ci_upper_` — per-edge quantile confidence bands at alpha/2 and 1 − alpha/2. The default
alpha=0.05gives the 2.5% / 97.5% percentile envelope typically plotted on bootnet stability diagrams.`edge_selection_probabilities_` — fraction of bootstrap replicates in which the edge was active. This is what
bootnetcalls the “edge stability” output (distinct from the MB-styleGraphicalStabilitySelection, which sweeps λ instead of resampling at a fixed λ).
- Parameters:
base_estimator (Any) – A fully-configured graphical estimator. Its regularisation parameter (
alphafor single,lambda_2for joint) is used as-is at every bootstrap replicate; this class does not sweep λ.n_bootstraps (
int, default200) – Number of bootstrap replicates.alpha (
float, default0.05) – CI level — bands are [alpha/2, 1 - alpha/2] quantiles.active_eps (
float, default1e-6)
- precisions_¶
Single-pop: shape (B, p, p). Joint: shape (B, K, p, p). Full bootstrap stack of fitted precisions.
- Type:
ndarray
- mean_, std_, ci_lower_, ci_upper_
Single-pop: shape (p, p). Joint: shape (K, p, p).
- Type:
ndarray
- edge_selection_probabilities_¶
Same shape as
mean_. Off-diagonal selection probability; diagonal is zero.- Type:
ndarray
- n_populations_¶
- Type:
int (joint only)
- fdr_threshold(fdr=0.1)[source]¶
Benjamini–Hochberg FDR on edges from this bootstrap fit.
Convenience wrapper around
skein_glm.edge_fdr_threshold(). See that function for the returned dict shape.
Example¶
import numpy as np
from skein_glm import GraphicalMCP, GraphicalStabilitySelection
rng = np.random.default_rng(0)
n, p = 200, 15
# Sparse precision: a chain Θ_{i, i+1} = -0.5, Θ_ii = 1.
Theta = np.eye(p)
for i in range(p - 1):
Theta[i, i + 1] = Theta[i + 1, i] = -0.4
Sigma = np.linalg.inv(Theta)
L = np.linalg.cholesky(Sigma)
X = rng.standard_normal((n, p)) @ L.T
ss = GraphicalStabilitySelection(
base_estimator=GraphicalMCP(gamma=3.0),
lambdas=np.geomspace(0.5, 0.05, 8),
n_bootstraps=100, threshold=0.6, n_jobs=-1, random_state=0,
).fit(X)
print(ss.stable_edges_) # (n_stable, 2) — upper-triangular (i, j) pairs
References¶
Meinshausen, N. and Bühlmann, P. (2010). Stability selection. JRSS B 72(4): 417–473.
van Borkulo, C. D. et al. (2017). Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers 2:60. (The methodology behind R’s
bootnet.)Friedman, J., Hastie, T., Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3): 432–441.