Add Apache Gluten/Velox support to DataFlint UI#59
Conversation
- Add Gluten/Velox node type classification, display names, and accelerator badges (Velox, Photon, RAPIDS, DataFusion) in the SQL plan flow view - Fix stage identification for Gluten's WholeStageCodegenTransformer nodes by inferring codegen-to-node mapping and handling AQE codegen renumbering - Split ColumnarExchange into write/read visual nodes across stage boundaries - Propagate stages through Gluten-specific boundary nodes (VeloxResizeBatches, RowToVeloxColumnar, TakeOrderedAndProjectExecTransformer, etc.) - Show Velox native timing metrics (aggregation/filter/sort/window time, peak memory, spill) on plan nodes - Strip Gluten class name prefixes from plan descriptions in parsers - Add Docker environment and example app for running Gluten/Velox on Spark 3.5 - Add unit test for Gluten stage assignment with real fixture data
|
|
ScreenshotsStatus Page — GlutenVeloxExample with 16 SQL QueriesThe DataFlint status page correctly identifies the Gluten/Velox application. GroupBy Aggregation (SQL 3) — Two stages with split exchangeShows Stage 5 (write side) and Stage 7 (read side) with the ColumnarExchange visually split. All Velox-accelerated nodes have orange Velox badges. Native metrics like Aggregation Time and Peak Memory are displayed. Window Functions (SQL 10) — Three stages with window partition fieldsShows Stage 47, Stage 49, and Stage 52 with two split ColumnarExchange nodes. Window nodes display Partition Fields (play_name), Sort Fields, and Select Fields parsed from the physical plan.
|
- Add CometExchange, CometColumnarExchange, GpuColumnarExchange to exchange visual split, stage assignment, and shuffle metrics calculation - Add CometHashAggregate to aggregate node parsing and naming - Support Comet plan description format (Keys:/Functions:) in parser - Re-add fallback plan description parsing from SQL-level planDescription for native engines where DataFlint custom endpoint returns empty
Summary
Test plan
GlutenStageAssignment.spec.tswith real fixture datalocalhost:4000with the Gluten example app running in Docker