SY-3272: Add Boolean Data Type Channel Support#2229
Conversation
…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
…eries' into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
Widens CrudeSeries type alias to accept list[int] and list[TimeStamp] (runtime already handles these) and adjusts tests to satisfy strict mypy without type: ignores or Any annotations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…eries' of https://github.com/synnaxlabs/synnax into sy-4060-support-for-persisted-variable-length-data-types-in-cesium
variable.Writer.Write returned a post-write alignment while fixed.Writer.Write returned pre-write, so writer_stream.go's sampleCount = SampleIndex + series.Len() produced new+delta instead of new for the variable branch. This corrupted resolveCommitEnd's Stamp call whenever a writer committed a variable channel without the index in its frame. Align variable.Writer.Write with fixed.Writer.Write by deferring scanOffsets until after the pre-write alignment is captured, and add a regression test covering commits on an index-less variable writer. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…0-support-for-persisted-variable-length-data-types-in-cesium # Conflicts: # cesium/writer_stream.go # x/py/tests/test_series.py
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## rc #2229 +/- ##
==========================================
+ Coverage 63.95% 64.14% +0.18%
==========================================
Files 2153 2149 -4
Lines 109337 109203 -134
Branches 8304 8382 +78
==========================================
+ Hits 69924 70043 +119
+ Misses 33398 33140 -258
- Partials 6015 6020 +5
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…2-add-boolean-data-type-channel
| if (!this.isNumeric || !other.isNumeric) return false; | ||
| if (this.isVariable || other.isVariable) return false; | ||
| if (this.equals(DataType.BOOLEAN)) return true; | ||
| if (other.equals(DataType.BOOLEAN)) return false; | ||
| if (this.isUnsignedInteger && other.isSignedInteger) return false; |
There was a problem hiding this comment.
BOOLEAN.canSafelyCastTo(non-BOOLEAN numerics) causes silent wire-format corruption
BOOLEAN.canSafelyCastTo(UINT8) (and any other numeric type) returns true, so the codec's data-type validation passes when a BOOLEAN series is written to a UINT8 (or FLOAT64, INT32, …) channel. However, the encoding path always uses bit-packed BOOLEAN encoding for BOOLEAN series, while the server decoder reads based on the channel's registered type:
- 8 BOOLEAN samples → 1 wire byte (bit-packed)
- 8 UINT8 samples → 8 wire bytes (byte-packed)
When the Go server decodes a UINT8 channel's frame section, it reads N bytes for N samples. But only ceil(N/8) bytes were written. The decoder over-reads into the next series' header/data bytes, causing complete frame desync and silent data corruption — no error is raised on either side.
The semantic argument ("0 and 1 fit in any numeric type") is conceptually valid but practically wrong because BOOLEAN has a fundamentally different wire encoding than all other numeric types. The canSafelyCastTo rule should only permit BOOLEAN → BOOLEAN, or the codec should separately validate that BOOLEAN series are only written to BOOLEAN channels.
Issue Pull Request
Linear Issue
SY-3272
Description
Adds a first-class
BoolTdata type across the full stack per RFC 0036.Three distinct representations per RFC 0036 §3.0:
{0x00, 0x01}. Thetelem.Seriesdensity invariant is preserved; iterators, writers, Cesium readers, and clientTypedArrayviews treat a bool sample identically to a uint8 sample.Serieson receive, dropping digital traffic 8x before any further compression.Write paths normalize any nonzero source byte to
0x01at the client boundary. Cesium storage required zero changes:BoolTfalls through the fixed-density path.What landed per language
BoolTinx/go/telem/data_type.go, extendedtypes.Sized+FixedSample,NewSeries[bool]/UnmarshalSeries[bool]/NewSeriesFromAnywith bool normalization, bit-packed frame codec incore/pkg/distribution/framer/codec/codec.goDataType.BOOLEANwithUint8Arraybacking,atBooleanaccessor,convertDataTypehandles BOOLEAN, bit-packed codec inclient/ts/src/framer/codec.tsDataType.BOOLmapped tonp.bool_,list[bool]andnp.bool_array inference,_FROM_NUMPY[np.bool_]flipped fromUINT8, bit-packed codec inclient/py/synnax/framer/codec.pyBOOL_Tindetails+ public namespace,bool → BOOL_TviaTYPE_INDEXES, cast normalization viastd::visit, bit-packed codec inclient/cpp/framer/codec.cppWire format
ceil(N / 8)bytes LSB-first within each byte, with the existing per-seriessample_countheader telling the decoder how many bits to recover. Reference vector[1,0,1,1,0,0,0,1,1]encodes to[0x8D, 0x01], pinned in Go (codec_internal_test.go) and Python (test_frame_codec.py).Tests
test_channel.py,test_frame_writer.py) and TypeScript (channel.spec.ts,writer.spec.ts,streamer.spec.ts) verify create + write + read + stream through a live server.Out of scope (per RFC §5)
Uint8T)Each warrants its own RFC.
Basic Readiness
Greptile Summary
This PR adds first-class
BoolT/DataType.BOOLEANsupport across the full stack (Go, TypeScript, Python, C++), with byte-packed in-memory representation and LSB-first bit-packed wire encoding. The implementation is thorough — codec round-trip tests, a cross-language reference vector, and end-to-end tests are all included.BOOLEAN.canSafelyCastTo(any numeric type)returnstrue, so the TS codec's type-validation silently permits writing aBOOLEANseries to aUINT8(orFLOAT64, etc.) channel. The TS encoder emits bit-packed data (1 bit/sample), but the Go server decoder reads the channel's registered type (e.g. UINT8 → 1 byte/sample), consuming far more bytes than were written and causing frame desync / silent data corruption. This is particularly risky for backward-compat with existing UINT8 "boolean" channels.telem.his still missingBOOL_Tbranches inat(),write_casted(),operator<<, andavg(), which will throw at runtime for those paths.Confidence Score: 4/5
Safe to merge for new BOOLEAN channels; risk of silent data corruption when writing BOOLEAN series to legacy UINT8 channels via the TS client.
One confirmed P1 in the TS canSafelyCastTo logic; the core codec implementations across all languages are correct and well-tested. Pre-existing C++ gaps from prior review remain unresolved.
x/ts/src/telem/telem.ts (canSafelyCastTo logic), x/cpp/telem/telem.h (missing BOOL_T dispatch branches)
Important Files Changed
Sequence Diagram
sequenceDiagram participant App as Application participant Enc as Client Encoder participant Wire as Wire (TCP) participant Dec as Server Decoder participant Store as Cesium Storage Note over App,Store: Happy path — BOOLEAN channel App->>Enc: Series(DataType=BOOLEAN, data=[0x01,0x00,0x01,...]) Enc->>Enc: packBoolBits() → ceil(N/8) bytes Enc->>Wire: header(key, sample_count=N) + bit-packed bytes Wire->>Dec: header + bit-packed bytes Dec->>Dec: channel type==BoolT → unpackBoolBits(ceil(N/8) bytes, N) Dec->>Store: Series(DataType=BoolT, data=[0x01,0x00,0x01,...]) Note over App,Store: Bug path — BOOLEAN series to UINT8 channel (TS only) App->>Enc: Series(DataType=BOOLEAN) for UINT8 channel Enc->>Enc: canSafelyCastTo(UINT8)=true → no error raised Enc->>Enc: packBoolBits() → ceil(N/8) bytes Enc->>Wire: header(key, sample_count=N) + ceil(N/8) bytes Wire->>Dec: header + ceil(N/8) bytes Dec->>Dec: channel type==UINT8 → reads N bytes (frame desync!)Comments Outside Diff (5)
x/cpp/telem/series.h, line 1040-1055 (link)BOOL_Tmissing from polymorphicat()dispatchSampleValue at(const int&)has no branch forBOOL_T, so any code that calls the type-erased accessor on a booleanSeries— e.g., the Arc evaluator, control-flow tasks, or anything that iterates samples asSampleValue— will throw"unsupported data type for at: bool"at runtime.The fix is to add a
uint8_tbranch immediately before the throw:x/cpp/telem/series.h, line 1068-1096 (link)operator<<silently emits "unknown data type" forBOOL_TBOOL_Tis not covered in the chain ofelse if (dt == ...)branches insideoperator<<. Printing any booleanSeries— common in logging and debugging — producesSeries(type: bool, size: N, cap: N, data: [unknown data type])instead of the actual values. Add a branch before the finalelse:x/cpp/telem/series.h, line 1154-1179 (link)write_castedthrows onBOOL_Tsource typewrite_casted(const void*, size_t, const DataType&)has no branch forBOOL_T, so casting from a boolean source array throws"Unsupported data type for casting: bool". This is reachable whenever the Arc evaluator or any driver pipeline casts heterogeneous series containing boolean channels. Add a branch before the throw:x/cpp/telem/series.h, line 1637-1646 (link)avg<T>()throws onBOOL_Tavg()has no branch forBOOL_Tand falls through tothrow std::runtime_error("Unsupported data type for average: bool"). Computing the mean of a boolean array (fraction oftruevalues) is a valid and common operation. Consider adding it alongside theUINT8_Tbranch:x/cpp/telem/series.h, line 928-939 (link)write(const NumericType*, size_t)writes to buffer start instead of appendingmemcpy(this->data_.get(), d, …)always copies to offset 0 rather thandata_.get() + size_ * density. This is a pre-existing bug in the non-bool path, but it becomes observable in this PR's codec decode path becauses.write(unpacked.data(), local_data_len_or_byte_cap)happens to be called whensize_=0, masking the bug. Any subsequentwritecall on the same freshly-decoded series would silently overwrite the first batch. This isn't triggered by current codec code, but it's worth noting as it could bite future callers.Reviews (3): Last reviewed commit: "tuning to codec implementations" | Re-trigger Greptile