At-scale benchmarks (H1)¶
Headline cells in benches/v2/ historically topped out at
medium (n=10k, p=1k), with large (n=50k, p=5k) populated only
for the Lasso/LS family and xlarge (n=100k, p=10k) marked
“opt-in only.” H1 (post-v1.0 roadmap) closes that gap by promoting
xlarge into the headline matrix and adding a per-PR canary so
regressions at the size that actually matters to users — n on the
order of 10⁵ — surface before they ship.
What runs where¶
Tier |
n × p |
Headline scenarios |
Comparators |
Cadence |
|---|---|---|---|---|
|
1 000 × 200 |
every scenario |
full set (Python + R) |
per release |
|
10 000 × 1 000 |
every scenario |
full set (Python + R) |
per release |
|
50 000 × 5 000 |
|
full set (Python + R) |
per release |
|
100 000 × 10 000 |
|
reduced — see below |
maintainer overnight |
|
50 000 × 5 000 |
|
skein only, |
every PR (bench-smoke) |
The per-PR canary uses the existing large tier (not xlarge)
because a single xlarge cell at n=100 000 × p=10 000 with one
warmup + one timed trial still exceeds the 15-minute CI budget
once the release maturin build is included. The large canary
catches the same class of regression (memory-bandwidth wall,
strong-rule misfire at scale) at a budget that fits on the free
ubuntu-latest runner.
Comparator asymmetry at xlarge¶
R comparators and sklearn.linear_model.coordinate_descent are
dropped at xlarge because they hit memory or wall-clock ceilings
before n=100 000 × p=10 000 dense. The exclusions are mechanical,
not philosophical, and are captured in paper/manifest.json under
at_scale_comparator_gap so downstream paper figures can flag the
gap rather than silently dropping the comparator. Summary:
Scenario |
Included at |
Excluded — and why |
|---|---|---|
|
skein, celer, skglm |
|
|
skein, skglm |
|
|
skein |
|
|
skein |
|
When a future paper figure consumes the xlarge aggregates,
flag the missing comparators rather than drop them silently —
a “no R bar shown at xlarge” is information about the comparator,
not about skein.
Reproducing the xlarge matrix¶
Maintainer-overnight, on a machine with ≥32 GB RAM:
# Release build with BLAS — `xlarge` under dev profile or
# BLAS-less release falls back to ndarray's pure-Rust matvec and
# costs ~3× wall-clock.
pip install -e '.[bench]'
maturin develop --release --features=blas-accelerate # macOS
maturin develop --release --features=blas-openblas # Linux
# Drive only the `xlarge` cells:
cd benches/v2
snakemake --profile profiles/m1-headline \
$(python -c "
import yaml
cfg = yaml.safe_load(open('config.yaml'))
ids = sorted({s['id'] for s in cfg['headline']
if 'xlarge' in s['sizes']})
for sid in ids:
for regime in ('deep', 'sparse'):
for seed in range(5):
print(f'results/cells/{sid}__xlarge__{regime}__seed{seed}__skein.jsonl')
")
Wall-clock budget: the M13.6 lasso_ls_scaling characterization put
the per-fit time at ~500 s for skein on an Apple M1 + Accelerate at
n=100k × p=10k dense regime. With 1 warmup + 5 timed trials × 2
regimes × 5 seeds × 4 scenarios × ~3 packages on average, the
xlarge matrix is roughly 10–12 hours of wall-clock on a
single laptop. Linux + OpenBLAS should land in the same order of
magnitude.
R-anchor fixtures at scale¶
tests/fixtures/generate.R ships *_large problems for
LS + logistic Lasso/MCP. The default size is n=5 000, p=500 —
a 10× extension over the M14c.3 mid-tier (n=500, p=100) that’s
large enough to catch scale-dependent regressions (Phase 2.3
strong-rule misfire, LLA local-min drift at high p) while keeping
each JSON ~25 MB raw.
The roadmap aspirational size is n=50 000, p=2 000, which produces ~800 MB JSON per fixture. Override via environment variable when regenerating on a machine with adequate RAM:
SKEIN_FIXTURE_LARGE_N=50000 SKEIN_FIXTURE_LARGE_P=2000 \
Rscript tests/fixtures/generate.R
Like the mid-tier, *_large.json files are never committed.
The Python tests use _skipped_if_missing_optional so CI silently
skips them; the maintainer regenerates locally when gating a
scale-dependent regression. See
tests/test_r_regression.py::test_*_large_* for the four
*_large test functions.
Per-PR canary¶
.github/workflows/bench-smoke.yml has two parallel jobs:
smoke— twosmall/sparsecells under dev maturin profile, ~1 minute total. Catches pipeline breakage (renamed runner, broken Snakemake rule, missing module export).smoke-at-scale— onelarge/sparsecell under release maturin + OpenBLAS,--trials 1. Target ≤15 minutes; emits a workflow warning if the cell itself exceeds 10 minutes wall-clock.