Graphical edge stability

Bootstrap-based confidence in edges for graphical models. Complements the EBIC tuner (graph_selection) by quantifying how often an edge would be selected under resampling, not just whether it is selected at one λ.

Two complementary wrappers around the M11 graphical estimators:

  • GraphicalStabilitySelection — Meinshausen– Bühlmann (2010) stability selection lifted to edges. Sweep a user-supplied λ-grid; for each (bootstrap, λ) refit, record the off-diagonal nonzero pattern of Θ̂; aggregate to per-(λ, i, j) selection probability. The stable edge set is the edges whose max-over-λ probability crosses a threshold (default 0.6; the MB error-control bound requires > 0.5).

  • GraphicalBootstrap — classic non-parametric (resample-with-replacement) bootstrap at a single λ. Returns the per-edge sampling distribution: mean, SD, [α/2, 1−α/2] quantile CIs, edge selection probability. This is the headline bootnet::bootnet(type="nonparametric") output used for edge error bars in network psychometrics.

Both work with single-population (GraphicalLasso / GraphicalMCP / GraphicalSCAD) and joint (JointGraphicalLasso / JointGraphicalMCP) base estimators. Family is auto-detected via the estimator’s lambda_2 / alpha init param; no manual switching.

Why

Field practice in network psychometrics is to fit a graphical lasso at one λ chosen by EBIC, then bootstrap the edges to plot error bars. R’s bootnet is the canonical package for this; its output (edge selection probability + CI on edge weight) is what psychometric papers reproduce. Until now, the Python toolchain (sklearn.covariance.GraphicalLasso) did not offer a clean bootstrap utility — users would either drop to R or hand-roll a resampling loop on top of sklearn fits.

GraphicalBootstrap matches bootnet’s output shape; users porting from bootnet can plug skein in as a drop-in replacement that also supports nonconvex edge penalties (MCP/SCAD) and joint estimation.

Compatibility

Both classes require raw observation data, not a precomputed covariance — bootstrap means we need to resample rows. Passing a square symmetric (p, p) matrix is rejected with a clear error message. For joint estimation, pass a list/tuple of per-population X^(k) arrays.

The bootstrap loop is embarrassingly parallel via joblib; pass n_jobs=-1. For fixed random_state the bootstrap indices are pre-generated, so n_jobs > 1 produces identical results to the serial fit.

API

class skein_glm.GraphicalStabilitySelection(base_estimator, lambdas, *, n_bootstraps=100, sample_fraction=0.5, threshold=0.6, random_state=None, n_jobs=None, active_eps=1e-06)[source]

Bases: BaseEstimator

Meinshausen–Bühlmann stability selection for graphical models.

Wraps a graphical estimator (single or joint) and runs subsample-bootstrap over a user-supplied λ-grid. Each bootstrap iteration draws a fraction of rows without replacement, refits at every λ in the grid, and records which off-diagonal entries of Θ̂ are nonzero. The fraction of bootstraps in which edge (i, j) was selected at λ_k is its selection probability; an edge is stable if its max-over-λ probability crosses threshold.

The MB (2010) error-control bound on expected false positives applies to threshold > 0.5.

Parameters:
  • base_estimator (Any) – Any single-population (GraphicalLasso/MCP/SCAD) or joint (JointGraphicalLasso/MCP) graphical estimator. The init params (gamma/a/edge_weights/…) are passed through to every refit; only the regularisation parameter (alpha or lambda_2) is overridden per-λ.

  • lambdas (array-like (n_lambdas,)) – Positive λ values to sweep. The order is preserved in the output but does not affect the result (each (bootstrap, λ) fit is independent).

  • n_bootstraps (int, default 100)

  • sample_fraction (float, default 0.5) – Subsample fraction (without replacement). MB recommend 0.5.

  • threshold (float, default 0.6) – Stable-edge probability cutoff; must be > 0.5 for the MB error-control bound.

  • random_state (int or None)

  • n_jobs (int or None) – Forwarded to joblib.Parallel. -1 uses all cores.

  • active_eps (float, default 1e-6) – Threshold for declaring |Θ_ij| nonzero. Looser than the sklearn default to absorb mild solver drift across subsample fits — graphical block-CD/ADMM converge to lower tolerance than coefficient solvers in practice.

selection_probabilities_

Single-pop: shape (n_lambdas, p, p). Joint: shape (n_lambdas, K, p, p). Symmetric in the last two axes, zero on the diagonal.

Type:

ndarray

max_probabilities_

Single-pop: shape (p, p). Joint: shape (K, p, p). Max over λ of the per-edge selection probability.

Type:

ndarray

stable_edges_

Single-pop: shape (n_stable, 2) of (i, j) pairs (upper triangular). Joint: list of K such arrays, one per population.

Type:

ndarray of int64

lambdas_
Type:

ndarray

n_features_in_
Type:

int

n_populations_
Type:

int (joint only)

Notes

Bootstrap requires raw observation data. Passing a precomputed covariance matrix is rejected with a clear error message.

Examples

>>> from skein_glm import (
...     GraphicalMCP, GraphicalStabilitySelection,
... )
>>> import numpy as np
>>> ss = GraphicalStabilitySelection(
...     base_estimator=GraphicalMCP(gamma=3.0),
...     lambdas=np.geomspace(0.5, 0.05, 8),
...     n_bootstraps=100, threshold=0.6, n_jobs=-1, random_state=0,
... )
>>> ss.fit(X)
>>> ss.stable_edges_      # shape (n_stable, 2)
mb_threshold(expected_false_positives=1.0)[source]

Required π_thr to control E[V] expected_false_positives.

Applies the Meinshausen–Bühlmann (2010) closed-form bound to this fitted stability-selection result. Uses p · (p 1)/2 as the family size (number of unique edges) and the average number of selected edges per (λ, bootstrap) cell as q_Λ.

Returns a stability-probability threshold in (0.5, 1]. If the requested error bound is infeasible at the observed q_Λ, raises ValueError with a diagnostic message.

See skein_glm.mb_stability_threshold() for the formula and reference.

Parameters:

expected_false_positives (float)

Return type:

float

class skein_glm.GraphicalBootstrap(base_estimator, *, n_bootstraps=200, alpha=0.05, random_state=None, n_jobs=None, active_eps=1e-06)[source]

Bases: BaseEstimator

Non-parametric bootstrap for graphical edge weights.

Resamples observations with replacement (same n), refits the graphical estimator at the given regularisation parameter, and records the bootstrap sampling distribution of every edge in Θ̂. Output statistics:

  • `mean_`, `std_` — per-edge bootstrap mean and standard deviation. The SD doubles as a Wald-style standard error.

  • `ci_lower_`, `ci_upper_` — per-edge quantile confidence bands at alpha/2 and 1 − alpha/2. The default alpha=0.05 gives the 2.5% / 97.5% percentile envelope typically plotted on bootnet stability diagrams.

  • `edge_selection_probabilities_` — fraction of bootstrap replicates in which the edge was active. This is what bootnet calls the “edge stability” output (distinct from the MB-style GraphicalStabilitySelection, which sweeps λ instead of resampling at a fixed λ).

Parameters:
  • base_estimator (Any) – A fully-configured graphical estimator. Its regularisation parameter (alpha for single, lambda_2 for joint) is used as-is at every bootstrap replicate; this class does not sweep λ.

  • n_bootstraps (int, default 200) – Number of bootstrap replicates.

  • alpha (float, default 0.05) – CI level — bands are [alpha/2, 1 - alpha/2] quantiles.

  • random_state (int or None)

  • n_jobs (int or None)

  • active_eps (float, default 1e-6)

precisions_

Single-pop: shape (B, p, p). Joint: shape (B, K, p, p). Full bootstrap stack of fitted precisions.

Type:

ndarray

mean_, std_, ci_lower_, ci_upper_

Single-pop: shape (p, p). Joint: shape (K, p, p).

Type:

ndarray

edge_selection_probabilities_

Same shape as mean_. Off-diagonal selection probability; diagonal is zero.

Type:

ndarray

n_features_in_
Type:

int

n_populations_
Type:

int (joint only)

fdr_threshold(fdr=0.1)[source]

Benjamini–Hochberg FDR on edges from this bootstrap fit.

Convenience wrapper around skein_glm.edge_fdr_threshold(). See that function for the returned dict shape.

Parameters:

fdr (float, default 0.1) – Target false discovery rate in (0, 1).

Return type:

dict

fwer_threshold(fwer=0.05, method='holm')[source]

Family-wise error control on edges from this bootstrap fit.

Convenience wrapper around skein_glm.edge_fwer_threshold().

Parameters:
  • fwer (float, default 0.05) – Target family-wise error rate in (0, 1).

  • method ({"bonferroni", "holm"}, default "holm") – Multiple-testing correction method.

Return type:

dict

Example

import numpy as np
from skein_glm import GraphicalMCP, GraphicalStabilitySelection

rng = np.random.default_rng(0)
n, p = 200, 15
# Sparse precision: a chain Θ_{i, i+1} = -0.5, Θ_ii = 1.
Theta = np.eye(p)
for i in range(p - 1):
    Theta[i, i + 1] = Theta[i + 1, i] = -0.4
Sigma = np.linalg.inv(Theta)
L = np.linalg.cholesky(Sigma)
X = rng.standard_normal((n, p)) @ L.T

ss = GraphicalStabilitySelection(
    base_estimator=GraphicalMCP(gamma=3.0),
    lambdas=np.geomspace(0.5, 0.05, 8),
    n_bootstraps=100, threshold=0.6, n_jobs=-1, random_state=0,
).fit(X)
print(ss.stable_edges_)  # (n_stable, 2) — upper-triangular (i, j) pairs

References

  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. JRSS B 72(4): 417–473.

  • van Borkulo, C. D. et al. (2017). Network analysis of multivariate data in psychological science. Nature Reviews Methods Primers 2:60. (The methodology behind R’s bootnet.)

  • Friedman, J., Hastie, T., Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3): 432–441.