At-scale benchmarks (H1)

Headline cells in benches/v2/ historically topped out at medium (n=10k, p=1k), with large (n=50k, p=5k) populated only for the Lasso/LS family and xlarge (n=100k, p=10k) marked “opt-in only.” H1 (post-v1.0 roadmap) closes that gap by promoting xlarge into the headline matrix and adding a per-PR canary so regressions at the size that actually matters to users — n on the order of 10⁵ — surface before they ship.

What runs where

Tier

n × p

Headline scenarios

Comparators

Cadence

small

1 000 × 200

every scenario

full set (Python + R)

per release

medium

10 000 × 1 000

every scenario

full set (Python + R)

per release

large

50 000 × 5 000

ls_lasso, ls_mcp, ls_scad, ls_elasticnet

full set (Python + R)

per release

xlarge (H1)

100 000 × 10 000

ls_lasso, ls_mcp, logistic_lasso, ls_group_lasso

reduced — see below

maintainer overnight

large (per-PR canary)

50 000 × 5 000

ls_lasso / sparse / seed 0 / skein

skein only, --trials 1

every PR (bench-smoke)

The per-PR canary uses the existing large tier (not xlarge) because a single xlarge cell at n=100 000 × p=10 000 with one warmup + one timed trial still exceeds the 15-minute CI budget once the release maturin build is included. The large canary catches the same class of regression (memory-bandwidth wall, strong-rule misfire at scale) at a budget that fits on the free ubuntu-latest runner.

Comparator asymmetry at xlarge

R comparators and sklearn.linear_model.coordinate_descent are dropped at xlarge because they hit memory or wall-clock ceilings before n=100 000 × p=10 000 dense. The exclusions are mechanical, not philosophical, and are captured in paper/manifest.json under at_scale_comparator_gap so downstream paper figures can flag the gap rather than silently dropping the comparator. Summary:

Scenario

Included at xlarge

Excluded — and why

ls_lasso

skein, celer, skglm

sklearn (CD OOM on dense 8 GB X); glmnet (32-bit nlam × nvar index space)

ls_mcp

skein, skglm

ncvreg (p × p intermediate ≈ 800 MB at p=10k)

logistic_lasso

skein

glmnet (binomial path exceeds per-cell wall ≈ 1 h)

ls_group_lasso

skein

grpreg (Fortran core copies X, peak RSS ≈ 3× X size)

When a future paper figure consumes the xlarge aggregates, flag the missing comparators rather than drop them silently — a “no R bar shown at xlarge” is information about the comparator, not about skein.

Reproducing the xlarge matrix

Maintainer-overnight, on a machine with ≥32 GB RAM:

# Release build with BLAS — `xlarge` under dev profile or
# BLAS-less release falls back to ndarray's pure-Rust matvec and
# costs ~3× wall-clock.
pip install -e '.[bench]'
maturin develop --release --features=blas-accelerate    # macOS
maturin develop --release --features=blas-openblas      # Linux

# Drive only the `xlarge` cells:
cd benches/v2
snakemake --profile profiles/m1-headline \
  $(python -c "
import yaml
cfg = yaml.safe_load(open('config.yaml'))
ids = sorted({s['id'] for s in cfg['headline']
              if 'xlarge' in s['sizes']})
for sid in ids:
    for regime in ('deep', 'sparse'):
        for seed in range(5):
            print(f'results/cells/{sid}__xlarge__{regime}__seed{seed}__skein.jsonl')
")

Wall-clock budget: the M13.6 lasso_ls_scaling characterization put the per-fit time at ~500 s for skein on an Apple M1 + Accelerate at n=100k × p=10k dense regime. With 1 warmup + 5 timed trials × 2 regimes × 5 seeds × 4 scenarios × ~3 packages on average, the xlarge matrix is roughly 10–12 hours of wall-clock on a single laptop. Linux + OpenBLAS should land in the same order of magnitude.

R-anchor fixtures at scale

tests/fixtures/generate.R ships *_large problems for LS + logistic Lasso/MCP. The default size is n=5 000, p=500 — a 10× extension over the M14c.3 mid-tier (n=500, p=100) that’s large enough to catch scale-dependent regressions (Phase 2.3 strong-rule misfire, LLA local-min drift at high p) while keeping each JSON ~25 MB raw.

The roadmap aspirational size is n=50 000, p=2 000, which produces ~800 MB JSON per fixture. Override via environment variable when regenerating on a machine with adequate RAM:

SKEIN_FIXTURE_LARGE_N=50000 SKEIN_FIXTURE_LARGE_P=2000 \
  Rscript tests/fixtures/generate.R

Like the mid-tier, *_large.json files are never committed. The Python tests use _skipped_if_missing_optional so CI silently skips them; the maintainer regenerates locally when gating a scale-dependent regression. See tests/test_r_regression.py::test_*_large_* for the four *_large test functions.

Per-PR canary

.github/workflows/bench-smoke.yml has two parallel jobs:

  • smoke — two small/sparse cells under dev maturin profile, ~1 minute total. Catches pipeline breakage (renamed runner, broken Snakemake rule, missing module export).

  • smoke-at-scale — one large/sparse cell under release maturin + OpenBLAS, --trials 1. Target ≤15 minutes; emits a workflow warning if the cell itself exceeds 10 minutes wall-clock.