Design-matrix helpers¶

Python helper classes for the out-of-RAM backends. Used in place of a numpy array when fitting:

design = skein_glm.MmapDesignF64("X.bin", n_rows=n, n_cols=p)
model = skein_glm.MCPPathRegressor(...).fit(design, y)

Estimators with mmap / chunked support sniff isinstance(x, ...) to route through to the corresponding _mmap / _chunked PyO3 entry points. v1 estimator coverage: MCPPathRegressor and LogisticMCPPathRegressor. Other estimators raise a clear error if handed a Mmap* or Chunked* design — expanding coverage is mechanical and tracked on the M4.x roadmap.

See Concepts: Backends for the storage model and when to use each helper.

Memory-mapped (single file)¶

class skein_glm.mmap.MmapDesignF64(path, n_rows, n_cols)[source]¶

Bases: object

Reference to an on-disk column-major f64 matrix.

The constructor validates dimensions against file size; it does not open the mapping (the Rust side mmaps lazily inside each _mmap solve).

Parameters:

path (str | os.PathLike)
n_rows (int)
n_cols (int)

class skein_glm.mmap.MmapDesignF32(path, n_rows, n_cols)[source]¶

Bases: object

Reference to an on-disk column-major f32 matrix.

Half the bytes per element vs. MmapDesignF64, with f32→f64 conversion done on each column read inside the solver. Equivalent to the f64 path up to f32 truncation error (~1e-7 relative).

Parameters:

path (str | os.PathLike)
n_rows (int)
n_cols (int)

Row-block-chunked (multiple files)¶

class skein_glm.mmap.ChunkedDesignF64(chunks, n_cols)[source]¶

Bases: object

Reference to a row-block-chunked on-disk f64 matrix.

Each chunk is a separate column-major raw f64 file with the same n_cols. The solver streams chunk-by-chunk, accumulating Σ_chunks X_chunk[:, j]ᵀ v_chunk for each col_dot call.

Use when n is too large for a single mmap (or your data pipeline already produces shards). Construct from a list of (path, n_rows_per_chunk) tuples; total n_rows is the sum.

>>> chunks = [("chunk_0.bin", 10_000_000),
...           ("chunk_1.bin", 10_000_000),
...           ("chunk_2.bin",  7_345_678)]
>>> design = skein_glm.ChunkedDesignF64(chunks, n_cols=50_000)
>>> model = skein_glm.MCPPathRegressor(...).fit(design, y)

Parameters:

chunks (list[tuple[str, int]])
n_cols (int)

class skein_glm.mmap.ChunkedDesignF32(chunks, n_cols)[source]¶

Bases: object

Reference to a row-block-chunked on-disk f32 matrix.

Same as ChunkedDesignF64 but each chunk holds 4-byte values. Halves the disk footprint and page-cache pressure; pays the same f32→f64 cast on each column read as MmapDesignF32.

Parameters:

chunks (list[tuple[str, int]])
n_cols (int)