Skip to content

Add Apache Gluten/Velox support to DataFlint UI#59

Draft
menishmueli wants to merge 2 commits intomainfrom
feature/gluten-velox-support
Draft

Add Apache Gluten/Velox support to DataFlint UI#59
menishmueli wants to merge 2 commits intomainfrom
feature/gluten-velox-support

Conversation

@menishmueli
Copy link
Copy Markdown
Contributor

Summary

  • Add full Gluten/Velox node type support to the DataFlint SQL plan flow view: node classification, friendly display names, Velox-native metrics (aggregation/filter/sort/window time, peak memory, spill), and accelerator badges (Velox, Photon, RAPIDS, DataFusion)
  • Fix stage identification for Gluten's WholeStageCodegenTransformer by inferring codegen-to-node mapping from node ID ordering, handling AQE codegen renumbering, and propagating stages through Gluten-specific boundary nodes
  • Add Docker environment and Scala example app for running Gluten/Velox on Spark 3.5 with DataFlint, including automation script

Test plan

  • 93 UI unit tests pass (17 suites), including new GlutenStageAssignment.spec.ts with real fixture data
  • Visually verify all 16 SQL queries at localhost:4000 with the Gluten example app running in Docker
  • Verify exchange nodes split correctly between stages (write/read)
  • Verify Window nodes show partition-by and sort-by columns
  • Verify accelerator badges appear on Velox nodes and not on standard Spark nodes
  • Verify no regressions on standard (non-Gluten) Spark plans

- Add Gluten/Velox node type classification, display names, and accelerator
  badges (Velox, Photon, RAPIDS, DataFusion) in the SQL plan flow view
- Fix stage identification for Gluten's WholeStageCodegenTransformer nodes
  by inferring codegen-to-node mapping and handling AQE codegen renumbering
- Split ColumnarExchange into write/read visual nodes across stage boundaries
- Propagate stages through Gluten-specific boundary nodes (VeloxResizeBatches,
  RowToVeloxColumnar, TakeOrderedAndProjectExecTransformer, etc.)
- Show Velox native timing metrics (aggregation/filter/sort/window time,
  peak memory, spill) on plan nodes
- Strip Gluten class name prefixes from plan descriptions in parsers
- Add Docker environment and example app for running Gluten/Velox on Spark 3.5
- Add unit test for Gluten stage assignment with real fixture data
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@menishmueli
Copy link
Copy Markdown
Contributor Author

Screenshots

Status Page — GlutenVeloxExample with 16 SQL Queries

The DataFlint status page correctly identifies the Gluten/Velox application.

GroupBy Aggregation (SQL 3) — Two stages with split exchange

Shows Stage 5 (write side) and Stage 7 (read side) with the ColumnarExchange visually split. All Velox-accelerated nodes have orange Velox badges. Native metrics like Aggregation Time and Peak Memory are displayed.

Window Functions (SQL 10) — Three stages with window partition fields

Shows Stage 47, Stage 49, and Stage 52 with two split ColumnarExchange nodes. Window nodes display Partition Fields (play_name), Sort Fields, and Select Fields parsed from the physical plan.


Note: To view live screenshots, run ./docker/gluten/run-gluten-example.sh and visit http://localhost:4000

- Add CometExchange, CometColumnarExchange, GpuColumnarExchange to
  exchange visual split, stage assignment, and shuffle metrics calculation
- Add CometHashAggregate to aggregate node parsing and naming
- Support Comet plan description format (Keys:/Functions:) in parser
- Re-add fallback plan description parsing from SQL-level planDescription
  for native engines where DataFlint custom endpoint returns empty
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants