Skip to content

SY-3272: Add Boolean Data Type Channel Support#2229

Open
emilbon99 wants to merge 51 commits intorcfrom
sy-3272-add-boolean-data-type-channel
Open

SY-3272: Add Boolean Data Type Channel Support#2229
emilbon99 wants to merge 51 commits intorcfrom
sy-3272-add-boolean-data-type-channel

Conversation

@emilbon99
Copy link
Copy Markdown
Contributor

@emilbon99 emilbon99 commented Apr 16, 2026

Issue Pull Request

Linear Issue

SY-3272

Description

Adds a first-class BoolT data type across the full stack per RFC 0036.

Three distinct representations per RFC 0036 §3.0:

  • In memory: byte-packed, canonical {0x00, 0x01}. The telem.Series density invariant is preserved; iterators, writers, Cesium readers, and client TypedArray views treat a bool sample identically to a uint8 sample.
  • On the wire: bit-packed, LSB-first. The Freighter frame codec packs 8 samples per byte on send and unpacks back to byte-packed Series on receive, dropping digital traffic 8x before any further compression.
  • On disk: byte-packed today, identical to the in-memory form. Future Cesium storage codec can compress independently.

Write paths normalize any nonzero source byte to 0x01 at the client boundary. Cesium storage required zero changes: BoolT falls through the fixed-density path.

What landed per language

Language Scope
Go BoolT in x/go/telem/data_type.go, extended types.Sized + FixedSample, NewSeries[bool]/UnmarshalSeries[bool]/NewSeriesFromAny with bool normalization, bit-packed frame codec in core/pkg/distribution/framer/codec/codec.go
TypeScript DataType.BOOLEAN with Uint8Array backing, atBoolean accessor, convertDataType handles BOOLEAN, bit-packed codec in client/ts/src/framer/codec.ts
Python DataType.BOOL mapped to np.bool_, list[bool] and np.bool_ array inference, _FROM_NUMPY[np.bool_] flipped from UINT8, bit-packed codec in client/py/synnax/framer/codec.py
C++ BOOL_T in details + public namespace, bool → BOOL_T via TYPE_INDEXES, cast normalization via std::visit, bit-packed codec in client/cpp/framer/codec.cpp

Wire format

ceil(N / 8) bytes LSB-first within each byte, with the existing per-series sample_count header telling the decoder how many bits to recover. Reference vector [1,0,1,1,0,0,0,1,1] encodes to [0x8D, 0x01], pinned in Go (codec_internal_test.go) and Python (test_frame_codec.py).

Tests

  • Unit: per-language data type tests, series construction tests, cast tests, codec round-trip tests with sample counts on partial-byte boundaries (1, 7, 8, 9, 17, ...).
  • Cross-language: reference vector test pinned in Go and Python.
  • End-to-end: Python (test_channel.py, test_frame_writer.py) and TypeScript (channel.spec.ts, writer.spec.ts, streamer.spec.ts) verify create + write + read + stream through a live server.

Out of scope (per RFC §5)

  • Driver migration (Modbus coils, LabJack DIO, NI digital lines currently Uint8T)
  • Arc type system integration
  • Console/Schematic rendering
  • Generalized per-type wire codecs (gorilla, delta, RLE)

Each warrants its own RFC.

Basic Readiness

  • I have performed a self-review of my code.
  • I have added relevant, automated tests to cover the changes.
  • I have updated documentation to reflect the changes.

Greptile Summary

This PR adds first-class BoolT/DataType.BOOLEAN support across the full stack (Go, TypeScript, Python, C++), with byte-packed in-memory representation and LSB-first bit-packed wire encoding. The implementation is thorough — codec round-trip tests, a cross-language reference vector, and end-to-end tests are all included.

  • P1 (TypeScript): BOOLEAN.canSafelyCastTo(any numeric type) returns true, so the TS codec's type-validation silently permits writing a BOOLEAN series to a UINT8 (or FLOAT64, etc.) channel. The TS encoder emits bit-packed data (1 bit/sample), but the Go server decoder reads the channel's registered type (e.g. UINT8 → 1 byte/sample), consuming far more bytes than were written and causing frame desync / silent data corruption. This is particularly risky for backward-compat with existing UINT8 "boolean" channels.
  • C++ (pre-existing, from prior review): telem.h is still missing BOOL_T branches in at(), write_casted(), operator<<, and avg(), which will throw at runtime for those paths.

Confidence Score: 4/5

Safe to merge for new BOOLEAN channels; risk of silent data corruption when writing BOOLEAN series to legacy UINT8 channels via the TS client.

One confirmed P1 in the TS canSafelyCastTo logic; the core codec implementations across all languages are correct and well-tested. Pre-existing C++ gaps from prior review remain unresolved.

x/ts/src/telem/telem.ts (canSafelyCastTo logic), x/cpp/telem/telem.h (missing BOOL_T dispatch branches)

Important Files Changed

Filename Overview
core/pkg/distribution/framer/codec/codec.go Adds bit-packing/unpacking helpers and integrates them into the encode/decode paths for BoolT channels; logic is correct and well-tested.
x/ts/src/telem/telem.ts Adds BOOLEAN DataType constant; canSafelyCastTo allows BOOLEAN → any numeric type, which can cause silent wire-format corruption when writing to non-BOOLEAN channels.
client/ts/src/framer/codec.ts Adds bit-pack/unpack helpers and integrates them into TS encode/decode paths; correctness depends on writers always using BOOLEAN channels for BOOLEAN series.
client/py/synnax/framer/codec.py Adds Python-side bit-packing helpers and integrates correctly; reference vector test confirms cross-language compatibility.
x/go/telem/series_factory.go Adds bool support to NewSeries, UnmarshalSeries, NewSeriesFromAny, and castToBool; normalization logic is correct.
x/cpp/telem/telem.h Adds BOOL_T constant, density entry, TYPE_INDEXES entry, and cast dispatch; previous review comments flagged missing branches in at(), write_casted(), operator<<, and avg().

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant Enc as Client Encoder
    participant Wire as Wire (TCP)
    participant Dec as Server Decoder
    participant Store as Cesium Storage

    Note over App,Store: Happy path — BOOLEAN channel
    App->>Enc: Series(DataType=BOOLEAN, data=[0x01,0x00,0x01,...])
    Enc->>Enc: packBoolBits() → ceil(N/8) bytes
    Enc->>Wire: header(key, sample_count=N) + bit-packed bytes
    Wire->>Dec: header + bit-packed bytes
    Dec->>Dec: channel type==BoolT → unpackBoolBits(ceil(N/8) bytes, N)
    Dec->>Store: Series(DataType=BoolT, data=[0x01,0x00,0x01,...])

    Note over App,Store: Bug path — BOOLEAN series to UINT8 channel (TS only)
    App->>Enc: Series(DataType=BOOLEAN) for UINT8 channel
    Enc->>Enc: canSafelyCastTo(UINT8)=true → no error raised
    Enc->>Enc: packBoolBits() → ceil(N/8) bytes
    Enc->>Wire: header(key, sample_count=N) + ceil(N/8) bytes
    Wire->>Dec: header + ceil(N/8) bytes
    Dec->>Dec: channel type==UINT8 → reads N bytes (frame desync!)
Loading

Comments Outside Diff (5)

  1. x/cpp/telem/series.h, line 1040-1055 (link)

    P1 BOOL_T missing from polymorphic at() dispatch

    SampleValue at(const int&) has no branch for BOOL_T, so any code that calls the type-erased accessor on a boolean Series — e.g., the Arc evaluator, control-flow tasks, or anything that iterates samples as SampleValue — will throw "unsupported data type for at: bool" at runtime.

    The fix is to add a uint8_t branch immediately before the throw:

  2. x/cpp/telem/series.h, line 1068-1096 (link)

    P1 operator<< silently emits "unknown data type" for BOOL_T

    BOOL_T is not covered in the chain of else if (dt == ...) branches inside operator<<. Printing any boolean Series — common in logging and debugging — produces Series(type: bool, size: N, cap: N, data: [unknown data type]) instead of the actual values. Add a branch before the final else:

  3. x/cpp/telem/series.h, line 1154-1179 (link)

    P1 write_casted throws on BOOL_T source type

    write_casted(const void*, size_t, const DataType&) has no branch for BOOL_T, so casting from a boolean source array throws "Unsupported data type for casting: bool". This is reachable whenever the Arc evaluator or any driver pipeline casts heterogeneous series containing boolean channels. Add a branch before the throw:

  4. x/cpp/telem/series.h, line 1637-1646 (link)

    P2 avg<T>() throws on BOOL_T

    avg() has no branch for BOOL_T and falls through to throw std::runtime_error("Unsupported data type for average: bool"). Computing the mean of a boolean array (fraction of true values) is a valid and common operation. Consider adding it alongside the UINT8_T branch:

  5. x/cpp/telem/series.h, line 928-939 (link)

    P2 write(const NumericType*, size_t) writes to buffer start instead of appending

    memcpy(this->data_.get(), d, …) always copies to offset 0 rather than data_.get() + size_ * density. This is a pre-existing bug in the non-bool path, but it becomes observable in this PR's codec decode path because s.write(unpacked.data(), local_data_len_or_byte_cap) happens to be called when size_=0, masking the bug. Any subsequent write call on the same freshly-decoded series would silently overwrite the first batch. This isn't triggered by current codec code, but it's worth noting as it could bite future callers.

Reviews (3): Last reviewed commit: "tuning to codec implementations" | Re-trigger Greptile

Greptile also left 1 inline comment on this PR.

…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
emilbon99 and others added 16 commits April 13, 2026 15:59
Widens CrudeSeries type alias to accept list[int] and list[TimeStamp]
(runtime already handles these) and adjusts tests to satisfy strict
mypy without type: ignores or Any annotations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eries' of https://github.com/synnaxlabs/synnax into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
variable.Writer.Write returned a post-write alignment while
fixed.Writer.Write returned pre-write, so writer_stream.go's
sampleCount = SampleIndex + series.Len() produced new+delta instead of
new for the variable branch. This corrupted resolveCommitEnd's Stamp
call whenever a writer committed a variable channel without the index
in its frame.

Align variable.Writer.Write with fixed.Writer.Write by deferring
scanOffsets until after the pre-write alignment is captured, and add a
regression test covering commits on an index-less variable writer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…0-support-for-persisted-variable-length-data-types-in-cesium

# Conflicts:
#	cesium/writer_stream.go
#	x/py/tests/test_series.py
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 82.53968% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.14%. Comparing base (650e43f) to head (b961df2).
⚠️ Report is 1 commits behind head on rc.

Files with missing lines Patch % Lines
x/go/telem/series_factory.go 42.85% 24 Missing ⚠️
core/pkg/distribution/framer/codec/codec.go 87.87% 2 Missing and 2 partials ⚠️
x/go/telem/series.go 0.00% 2 Missing ⚠️
x/ts/src/telem/series.ts 88.88% 1 Missing and 1 partial ⚠️
x/py/x/telem/telem.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##               rc    #2229      +/-   ##
==========================================
+ Coverage   63.95%   64.14%   +0.18%     
==========================================
  Files        2153     2149       -4     
  Lines      109337   109203     -134     
  Branches     8304     8382      +78     
==========================================
+ Hits        69924    70043     +119     
+ Misses      33398    33140     -258     
- Partials     6015     6020       +5     
Flag Coverage Δ
alamos-go 55.25% <ø> (ø)
alamos-ts 48.87% <ø> (ø)
arc-go 76.80% <ø> (ø)
aspen 67.90% <ø> (-0.36%) ⬇️
cesium 82.48% <ø> (-0.13%) ⬇️
client-py 85.97% <100.00%> (+0.05%) ⬆️
client-ts 90.13% <100.00%> (+0.12%) ⬆️
console 20.36% <ø> (ø)
core 68.48% <87.87%> (+1.48%) ⬆️
drift 39.05% <ø> (ø)
freighter-go 62.91% <ø> (-0.09%) ⬇️
freighter-integration 1.51% <ø> (ø)
freighter-py 79.96% <ø> (ø)
freighter-ts 73.87% <ø> (ø)
oracle 62.37% <ø> (-0.03%) ⬇️
pluto 55.13% <ø> (-0.16%) ⬇️
x-go 81.40% <44.68%> (-0.24%) ⬇️
x-py 84.37% <83.33%> (+0.06%) ⬆️
x-ts 88.89% <90.90%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Base automatically changed from sy-4060-support-for-persisted-variable-length-data-types-in-cesium to rc April 17, 2026 15:32
Comment thread x/ts/src/telem/telem.ts
Comment on lines 1958 to 1962
if (!this.isNumeric || !other.isNumeric) return false;
if (this.isVariable || other.isVariable) return false;
if (this.equals(DataType.BOOLEAN)) return true;
if (other.equals(DataType.BOOLEAN)) return false;
if (this.isUnsignedInteger && other.isSignedInteger) return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 BOOLEAN.canSafelyCastTo(non-BOOLEAN numerics) causes silent wire-format corruption

BOOLEAN.canSafelyCastTo(UINT8) (and any other numeric type) returns true, so the codec's data-type validation passes when a BOOLEAN series is written to a UINT8 (or FLOAT64, INT32, …) channel. However, the encoding path always uses bit-packed BOOLEAN encoding for BOOLEAN series, while the server decoder reads based on the channel's registered type:

  • 8 BOOLEAN samples → 1 wire byte (bit-packed)
  • 8 UINT8 samples → 8 wire bytes (byte-packed)

When the Go server decodes a UINT8 channel's frame section, it reads N bytes for N samples. But only ceil(N/8) bytes were written. The decoder over-reads into the next series' header/data bytes, causing complete frame desync and silent data corruption — no error is raised on either side.

The semantic argument ("0 and 1 fit in any numeric type") is conceptually valid but practically wrong because BOOLEAN has a fundamentally different wire encoding than all other numeric types. The canSafelyCastTo rule should only permit BOOLEAN → BOOLEAN, or the codec should separately validate that BOOLEAN series are only written to BOOLEAN channels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant