Graph model selection¶
Tuning rules for graphical models. The headline export is
ebic_path, the
field-standard Extended Bayesian Information Criterion (Foygel & Drton
2010) for single-population graphical lasso. The joint analogue
(joint_ebic_path) sums
per-population log-likelihoods and counts the union of active edges
across populations.
EBIC¶
The EBIC formula for a single-population precision estimate Θ̂(α) is
where |Ê(α)| is the number of nonzero off-diagonal entries and
γ ∈ [0, 1] controls the strength of the EBIC correction. γ = 0
gives plain BIC. γ = 0.5 is the field default for graphical models
and the value bootnet / qgraph use out of the box.
- skein_glm.graph_selection.ebic_path(X, estimator_cls, lambdas, *, gamma=0.5, n=None, assume_centered=False, **estimator_kwargs)[source]¶
Sweep a λ-grid for a graphical estimator and pick the model minimising EBIC.
- Parameters:
X (
ndarray) – Either raw (n, p) data or a precomputed (p, p) symmetric covariance. If precomputed, pass n explicitly.estimator_cls (
a class with an `alphaparameter (GraphicalLasso,`) – GraphicalMCP, GraphicalSCAD).lambdas (
array-likeoffloats (positive,ideally in descending order).)gamma (
float, default0.5) – EBIC strength. 0 → BIC.n (
int, optional) – Effective sample size. Required if X is precomputed covariance.**estimator_kwargs (
passed throughtothe estimator constructor.)
- Return type:
- class skein_glm.graph_selection.EBICPathResult(best_estimator, best_lambda, best_ebic, lambdas, ebic, n_edges)[source]¶
Bases:
objectOutcome of a single-population EBIC search.
- Parameters:
- best_estimator¶
- Type:
fitted estimator at the EBIC-selected λ.
- lambdas¶
- Type:
ndarray (n_lambdas,)
- ebic¶
- Type:
ndarray (n_lambdas,)
- n_edges¶
- Type:
ndarray (n_lambdas,) int
- skein_glm.graph_selection.joint_ebic_path(Xs, estimator_cls, lambdas, *, gamma=0.5, ns=None, assume_centered=False, **estimator_kwargs)[source]¶
EBIC tuner for joint glasso. Walks a λ_2 grid; the active edge count is the union across populations (an edge counts once if any population has it nonzero).
Sums per-population log-likelihoods. Per-pop n_k is read from raw X^(k) row count, or via the ns argument when populations are passed as precomputed covariances.