- Preconditioner bug in attributor.py
(
6232d1e)
- Release
(
dec3df9)
- Add flag to enable TF32
(
35ab164)
- Release bergson without pinned transformers
(
ef9dc9a)
- Set default precision to fp32 in IndexConfig and ScoreConfig
(
92d4807)
Co-authored-by: Lucia Quirke luciaquirke@users.noreply.github.com
- Always compute mixing coefficient in Trackstar pipeline
(
c990375)
Remove the conditional guard — lambda is always auto-computed from the preconditioner eigenvalues since the cost is negligible.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Standardize trace collector preconditioning
(
6a14e53)
- Enable trackstar
(
2dd26d3)
- Convert PyArrow Column to list in allocate_batches
(
7fe4dd3)
HuggingFace Dataset column access (ds["length"]) returns a PyArrow Column, not a Python list. Iterating over it element-by-element (via sorted(), random indexing) is ~1000x slower than on a native list. For 10M items this caused allocate_batches to hang for 13+ hours instead of completing in ~17 seconds.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Convert PyArrow columns to list at callsites of allocate_batches
(
5d734dc)
Move the list conversion out of allocate_batches (which types doc_lengths as list[int]) to the callsites that pass HF Dataset columns. Use ds["length"][:] which returns a plain list[int].
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Remove redundant zero-fill loop in MemmapSequenceScoreWriter
(
558829f)
np.memmap w+ mode already creates a zero-filled file, making the per-field written flag initialization loop unnecessary. For large datasets (10M+ items) with many query scores, the strided writes through the structured dtype caused multi-hour hangs.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Use [:] instead of list() for consistency
(
c76d131)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Unpin transformers by explicitly setting float32 dtype in tests
(
0b6c226)
Transformers 4.56+ changed from_config() to honor the config's torch_dtype field, causing test models (tiny-GPTNeoX, tiny-Phi3) to be created in float16 instead of float32. This caused gradient comparison tests to fail from reduced precision, not from any actual change in gradient collection logic.
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Use _csv._writer type for csv_recorder annotation
(
6e6289c)
csv.writer is a function, not a class, so it cannot be used as a type annotation. Import the private
_writer type from _csv and use it for the Generator yield type. Also fix the None check to use if not path since QueryConfig.record uses empty string as the sentinel value.
Co-authored-by: Lucia Quirke luciaquirke@users.noreply.github.com
- Pin pyright version and fix faiss type error
(
b9f54cf)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Use Python 3.11 for typechecking
(
9ef4122)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Use Python 3.11 for typechecking
(
ea50dd8)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Add --record flag to query CLI for saving results to CSV
(
59770ff)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
- Replace try/finally CSV block with context manager
(
6431320)
Co-authored-by: Lucia Quirke luciaquirke@users.noreply.github.com
- Pass batches to CollectorComputer in fit_normalizers
(
c95d5d4)
Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com
-
Improve Claude workflows (fetch-depth, timeout, max-turns, pip install) (
7a315e5) -
Run tests and typechecking in parallel (
e690fc0)
- Release
(
f0ad2be)
- Add optimizer-aware gradients
(
497edab)
- Update build.yml
(
ba4cd5a)
- Always use unstructured gradients in score
(
595ed92)
- Release bergson
(
c9040a6)
- Release bergson
(
350dafe)
- Unit normalize in float32
(
cae8352)
- Pin transformers to avoid fp error bug
(
9feac20)
- Enable specifying a custom tokenizer
(
9781a55)
- Release bergson
(
64b5baf)
- Add on-the-fly queries
(
0ce0ee2)
- Simplify query
(
fd37173)
- Add on-the-fly queries
(
294661e)