Data product support #519

knoepfel · 2025-05-27T15:59:21Z

knoepfel
May 27, 2025
Maintainer

Among the developers, there are various ideas for how data products should be supported. These ideas are motivated by multiple and, perhaps at times, competing desires: ease-of-use, retaining framework-independence of algorithms (and possibly their data products), and exploiting maximum efficiency of the computing hardware. We will need to decide on a way to balance each of these desires...and others not yet enumerated. This discussion is to catalog our thoughts on data-product support.

knoepfel · 2025-05-27T16:01:27Z

knoepfel
May 27, 2025
Maintainer Author

See Marc's slides presented on April 23.

0 replies

knoepfel · 2025-05-28T15:06:44Z

knoepfel
May 28, 2025
Maintainer Author

See Philippe's response (presented on May 14) to Marc's slides.

0 replies

knoepfel · 2025-08-21T17:17:33Z

knoepfel
Aug 21, 2025
Maintainer Author

Related issues:

Explore an interface description language (IDL) #71

0 replies

brettviren · 2026-05-07T16:01:11Z

brettviren
May 7, 2026

I read the linked slides. I want to come at this at a different angle.

First, Phlex lets users enjoy a low bar to making new data product types. Once DUNE people start to write Phlex nodes, we can expect a proliferation of data types if we do not get out in front of that movement.

To get ahead of this, I want to put out this idea for a generic data model and implementation that may not even need to depend on phlex. It is based on patterns WCT uses in its "tensor data model" which in turn is based on HDF5's model. The basic idea is to separate out type and "format" from schema. The model would describe the transient representation for:

structured metadata following the JSON data model
rectangular array following numpy and boost multiarray data model
an awkward / sparse array model

These would be represented by concrete types. General purpose file I/O code can be written based on these types. If we are smart, we can implement these types in ways that we get a lot of support code for free. Eg, actually use JSON, actually use multiarray.

What they do not cover is the schema used to interpret instances of these types.

To join the ideas in this thread so far, we may think about these basic types as the substrate and then express "conceptual data types" and IDL layers which overlay structure on these basic types.

That is, I don't attempt to dodge that bigger problem that this thread opens with but rather to make a decision that helps narrow the scope for the problem and give us a more concrete way to reason about the problem.

1 reply

brettviren May 27, 2026

For reference, slides expanding on this idea given at the collab meeting: https://indico.fnal.gov/event/71671/contributions/341334/attachments/198068/275613/datamodel.pdf

brettviren · 2026-05-27T20:08:24Z

brettviren
May 27, 2026

Has Apache Arrow been considered as a way to flesh out the C++/Python boundary? The work Phlex does to find type_id maps well to Arrow format strings. The boost::pfr and a data type registration on the C++ side, I think can allow translation between C++ struct and Python class. I am sure I have only a surface understanding but this looks potentially very useful with some good knock-on benefits.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Framework R&D

Data product support #519

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Framework R&D

Data product support #519

Uh oh!

knoepfel May 27, 2025 Maintainer

Replies: 5 comments · 1 reply

Uh oh!

knoepfel May 27, 2025 Maintainer Author

Uh oh!

knoepfel May 28, 2025 Maintainer Author

Uh oh!

Uh oh!

knoepfel Aug 21, 2025 Maintainer Author

Uh oh!

brettviren May 7, 2026

Uh oh!

brettviren May 27, 2026

Uh oh!

brettviren May 27, 2026

knoepfel
May 27, 2025
Maintainer

Replies: 5 comments 1 reply

knoepfel
May 27, 2025
Maintainer Author

knoepfel
May 28, 2025
Maintainer Author

knoepfel
Aug 21, 2025
Maintainer Author

brettviren
May 7, 2026

brettviren
May 27, 2026