Design-matrix helpers¶
Python helper classes for the out-of-RAM backends. Used in place of a numpy array when fitting:
design = skein_glm.MmapDesignF64("X.bin", n_rows=n, n_cols=p)
model = skein_glm.MCPPathRegressor(...).fit(design, y)
Estimators with mmap / chunked support sniff isinstance(x, ...) to
route through to the corresponding _mmap / _chunked PyO3 entry
points. v1 estimator coverage: MCPPathRegressor and
LogisticMCPPathRegressor. Other estimators raise a clear error
if handed a Mmap* or Chunked* design — expanding coverage is
mechanical and tracked on the M4.x roadmap.
See Concepts: Backends for the storage model and when to use each helper.
Memory-mapped (single file)¶
- class skein_glm.mmap.MmapDesignF64(path, n_rows, n_cols)[source]¶
Bases:
objectReference to an on-disk column-major f64 matrix.
The constructor validates dimensions against file size; it does not open the mapping (the Rust side mmaps lazily inside each _mmap solve).
- Parameters:
path (str | os.PathLike)
n_rows (int)
n_cols (int)
- class skein_glm.mmap.MmapDesignF32(path, n_rows, n_cols)[source]¶
Bases:
objectReference to an on-disk column-major f32 matrix.
Half the bytes per element vs. MmapDesignF64, with f32→f64 conversion done on each column read inside the solver. Equivalent to the f64 path up to f32 truncation error (~1e-7 relative).
- Parameters:
path (str | os.PathLike)
n_rows (int)
n_cols (int)
Row-block-chunked (multiple files)¶
- class skein_glm.mmap.ChunkedDesignF64(chunks, n_cols)[source]¶
Bases:
objectReference to a row-block-chunked on-disk f64 matrix.
Each chunk is a separate column-major raw f64 file with the same n_cols. The solver streams chunk-by-chunk, accumulating Σ_chunks X_chunk[:, j]ᵀ v_chunk for each col_dot call.
Use when n is too large for a single mmap (or your data pipeline already produces shards). Construct from a list of (path, n_rows_per_chunk) tuples; total n_rows is the sum.
>>> chunks = [("chunk_0.bin", 10_000_000), ... ("chunk_1.bin", 10_000_000), ... ("chunk_2.bin", 7_345_678)] >>> design = skein_glm.ChunkedDesignF64(chunks, n_cols=50_000) >>> model = skein_glm.MCPPathRegressor(...).fit(design, y)
- class skein_glm.mmap.ChunkedDesignF32(chunks, n_cols)[source]¶
Bases:
objectReference to a row-block-chunked on-disk f32 matrix.
Same as ChunkedDesignF64 but each chunk holds 4-byte values. Halves the disk footprint and page-cache pressure; pays the same f32→f64 cast on each column read as MmapDesignF32.