Skip to content

DATAFLINT-5041: dataflint-spark4-databricks shaded artifact for DBR 17.3+#73

Merged
minskya merged 4 commits intomainfrom
DATAFLINT-5041
May 7, 2026
Merged

DATAFLINT-5041: dataflint-spark4-databricks shaded artifact for DBR 17.3+#73
minskya merged 4 commits intomainfrom
DATAFLINT-5041

Conversation

@minskya
Copy link
Copy Markdown
Contributor

@minskya minskya commented May 7, 2026

Fixes #47

Two distinct issues that surface together when running DataFlint OSS on Databricks Runtime 17.3+:

  1. Cluster crash on startupNoClassDefFoundError: jakarta/servlet/Servlet. DBR 17.3 is Spark 4–based but ships javax.servlet, not jakarta.servlet. Fixed by publishing a separate shaded artifact.
  2. Spark UI shows duration as a bare number (no 5s (1s, 2s, 3s) formatting), and the DataFlint React UI crashes with Unsupported time unit: 58. Fixed by passing the metric initValue explicitly so the bytecode skips a Scala-generated default-arg helper that DBR doesn't have.

1. Separate shaded artifact for Databricks

flowchart LR
    Stock["dataflint-spark4_2.13<br/>(jakarta.servlet)"]:::stock
    Spark4["Stock Spark 4.x<br/>(jakarta.servlet)"]:::ok
    DBR["Databricks Runtime 17.3<br/>(javax.servlet)"]:::bad

    Stock -->|"works ✓"| Spark4
    Stock -->|"NoClassDefFoundError ✗"| DBR

    classDef stock fill:#e8eef5,stroke:#36c
    classDef ok fill:#dff7e0,stroke:#2a7
    classDef bad fill:#fde2e2,stroke:#c33
Loading

Build a second artifact io.dataflint:dataflint-spark4-databricks_2.13 from the same sources as pluginspark4 with sbt-assembly's ShadeRule.rename("jakarta.servlet.**" -> "javax.servlet.@1").inAll. Same plugin class (io.dataflint.spark.SparkDataflintPlugin); only the jar coordinate differs.

flowchart LR
    subgraph Sources [shared sources]
        S1[plugin/]
        S2[pluginspark4/]
    end

    subgraph Modules [sbt modules]
        M1[pluginspark4]
        M2["pluginspark4databricks<br/>(shade rule)"]
    end

    subgraph Artifacts [published jars]
        A1["dataflint-spark4_2.13<br/>(jakarta.servlet)"]:::stock
        A2["dataflint-spark4-databricks_2.13<br/>(javax.servlet — shaded)"]:::dbr
    end

    subgraph Runtimes [target runtimes]
        R1[Stock Spark 4.x]:::ok
        R2[Databricks Runtime 17.3+]:::ok
    end

    Sources --> M1
    Sources --> M2
    M1 -->|sbt assembly| A1
    M2 -->|"sbt assembly + ShadeRule"| A2
    A1 --> R1
    A2 --> R2

    classDef stock fill:#e8eef5,stroke:#36c
    classDef dbr fill:#fff3cd,stroke:#a80
    classDef ok fill:#dff7e0,stroke:#2a7
Loading

Class hierarchy + UI gate

classDiagram
    class DataflintPageFactory {
        <<abstract>>
        +isUISupported(ui) Boolean = true
    }
    class Spark4PageFactory {
        +isUISupported(ui) = !isDatabricks
    }
    class Spark4DatabricksPageFactory {
        +isUISupported(ui) = isDatabricks
    }
    DataflintPageFactory <|-- Spark4PageFactory
    Spark4PageFactory <|-- Spark4DatabricksPageFactory

    note for Spark4PageFactory "in pluginspark4/<br/>UI on stock Spark 4 only"
    note for Spark4DatabricksPageFactory "in pluginspark4databricks/<br/>UI on Databricks only"
Loading

Symmetric checks → each jar serves UI exclusively on its intended runtime; misinstall silently degrades to "listeners run, no UI" instead of crashing.

Jar installed Stock Spark 4 Databricks 17.3+
dataflint-spark4_2.13 ✅ UI works ⚠️ UI skipped
dataflint-spark4-databricks_2.13 ⚠️ UI skipped ✅ UI works

2. SQLMetrics default-arg bytecode bug on DBR

The TimedExec "duration" metric was rendering as a bare number on DBR's Spark UI (no total (min, med, max) format). The DataFlint React UI then sliced the last 2 chars as a unit and threw Unsupported time unit: 58.

Root cause

Stock Spark 4 source:

def createTimingMetric(sc: SparkContext, name: String, initValue: Long = -1): SQLMetric

The Scala compiler emits createTimingMetric$default$3() as a separate static helper for the default value -1L. Calling SQLMetrics.createTimingMetric(sc, name) compiles to:

invokevirtual createTimingMetric$default$3():J     ← ← ← fetches the default
invokevirtual createTimingMetric(SparkContext, String, J)

Databricks rewrote SQLMetrics with explicit overloads instead of default args, so on DBR there's no $default$3 helper. The invokevirtual on it NoSuchMethodErrors, our catch falls through:

flowchart TD
    Call["SQLMetrics.createTimingMetric(sc, name)"]
    D3["createTimingMetric$default$3()"]
    Created["TIMING SQLMetric<br/>'5s (1s, 2s, 3s)'"]:::ok
    Step2["new SQLMetric('timing', 0L)"]
    SQL2arg["2-arg SQLMetric ctor"]
    Step3["SQLMetrics.createMetric(sc, name)<br/>SUM metric '1058' ❌"]:::bad
    UI["React: timeStringToMilliseconds<br/>throws 'Unsupported time unit: 58'"]:::bad

    Call --> D3
    D3 -- exists, stock Spark 4 --> Created
    D3 -- "NoSuchMethodError on DBR" --> Step2
    Step2 --> SQL2arg
    SQL2arg -- exists, stock Spark --> Created
    SQL2arg -- "NoSuchMethodError on DBR<br/>(only 3-arg ctor)" --> Step3
    Step3 --> UI

    classDef ok fill:#dff7e0,stroke:#2a7
    classDef bad fill:#fde2e2,stroke:#c33
Loading

Verified the assumed NoSuchMethodErrors on a real DBR 17.3 cluster via the REST API:

=== SQLMetric constructors ===
  public SQLMetric(java.lang.String, long, boolean)        ← only 3-arg ctor

=== SQLMetrics.createTimingMetric overloads ===
  createTimingMetric(SparkContext, String)
  createTimingMetric(SparkContext, String, long)
  createTimingMetric(SparkContext, String, boolean)
  createTimingMetric(SparkContext, String, long, boolean)

  (no $default$3 helper)

Fix

Pass initValue explicitly so the bytecode emits a direct 3-arg invokevirtual with no $default$3 fetch:

- SQLMetrics.createTimingMetric(sparkContext, name)
+ SQLMetrics.createTimingMetric(sparkContext, name, -1L)

(Same change for createSizeMetric.) -1L matches stock Spark's default — runtime semantics are unchanged on stock + EMR + Spark 3.5; the only effect is the bytecode now skips the $default$3 helper and lands on a 3-arg overload that exists on every target runtime. Verified by disassembling the rebuilt jar — $default$3 is gone.


Files changed

File Purpose
spark-plugin/plugin/.../MetricsUtils.scala Pass explicit -1L to createTimingMetric / createSizeMetric (fix for issue #2)
spark-plugin/build.sbt New pluginspark4databricks SBT module: shade rule + source-share + loader-exclude filter (fix for issue #1)
spark-plugin/pluginspark4databricks/.../api/Spark4DatabricksPageFactory.scala Subclass of Spark4PageFactory; inverts isUISupported
spark-plugin/pluginspark4databricks/.../DataflintSparkUILoader.scala Same FQN as upstream loader; instantiates the new factory
spark-plugin/clean-and-setup.sh Local-dev build hint
README.md Databricks install note pointing to the new artifact
.github/workflows/cd.yml Drop the unreliable Maven-Central verify step

pluginspark4 source files: untouched (other than the shared MetricsUtils fix in plugin/).

Test plan

Done locally

  • sbt "plugin/compile; pluginspark3/compile; pluginspark4/compile; pluginspark4databricks/compile" — all [success]
  • sbt pluginspark4databricks/assembly — jar built
  • Bytecode verification: renderJson in the new jar references javax.servlet.http.HttpServletRequest (shade applied)
  • Subclass wired — loader's pageFactory field is typed Spark4DatabricksPageFactory
  • Stock Spark 4.0.2 smoke test — /dataflint/applicationinfo/json/ returns HTTP 200
  • MetricsUtils bytecode no longer references createTimingMetric$default$3 / createSizeMetric$default$3
  • On a real DBR 17.3 cluster (via REST API): confirmed createTimingMetric$default$3 does NOT exist, but the 3-arg overload does — explicit -1L will route correctly

Pending (your side)

  • Real DBR 17.3 LTS cluster smoke test with both fixes — confirm cluster boots, DataFlint tab renders, duration shows 5s (1s, 2s, 3s) formatting, no React Unsupported time unit error
  • CI snapshot run from this branch — confirm spark_2.12, dataflint-spark4_2.13, dataflint-spark4-databricks_2.13 all publish

Out of scope

  • Updating gitbook install docs (separate, external repo).
  • Spark 3 path (untouched, but inherits the MetricsUtils improvement).

🤖 Generated with Claude Code

Databricks Runtime 17.3 ships javax.servlet instead of jakarta.servlet,
crashing the standard Spark 4 plugin at startup with NoClassDefFoundError
on jakarta/servlet/Servlet (issue #47). Add a parallel SBT module
pluginspark4databricks that source-shares with pluginspark4 but applies
ShadeRule.rename("jakarta.servlet.**" -> "javax.servlet.@1") at assembly
time, producing io.dataflint:dataflint-spark4-databricks_2.13. A
Spark4DatabricksPageFactory subclass inverts the Databricks UI gate so
the new jar enables the UI only on DBR (and silently degrades to
listeners-only if accidentally installed on stock Spark 4); the original
Spark4PageFactory is unchanged. Drop the Maven-Central verify step in
cd.yml — it only checked spark_2.12 and didn't work for snapshots.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@notion-workspace
Copy link
Copy Markdown

spark4 on databricks

minskya and others added 3 commits May 7, 2026 19:03
…ize metrics

Databricks Runtime 17.x rewrites SQLMetrics with explicit overloads instead
of Scala default args, so the bytecode-level helpers createTimingMetric$default$3
and createSizeMetric$default$3 don't exist on DBR. Calling
SQLMetrics.createTimingMetric(sc, name) compiles to a $default$3 fetch +
3-arg call — the helper fetch NoSuchMethodErrors before the metric is ever
created, the catch falls through to step 3 (SQLMetrics.createMetric, a SUM
metric), and the TimedExec "duration" surfaces in the Spark UI as a bare
number with no unit (e.g. "1058") instead of "5s (1s, 2s, 3s)". The DataFlint
React UI then crashes parsing the bare number with "Unsupported time unit: 58".

Pass -1L explicitly so the bytecode emits a direct 3-arg invokevirtual,
matching the runtime overload that exists on both stock Spark 4 and DBR 17.x.
-1L is the same value stock Spark uses as the default — semantics unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
timeStringToMilliseconds slices the last 2 chars of a metric value as the
unit and threw "Unsupported time unit: ${unit}" if it wasn't ms/s/m/h.
Some Spark forks (Databricks) return duration-named metrics as bare numbers
with no unit suffix — the slice then picks up digit pairs (e.g. "58") and
the throw bubbles up through the React render, blanking the SQL plan page.

Return undefined instead. Every caller (SqlReducer, GraphDurationAttribution)
already handles undefined with `?? 0`, so missing duration data degrades
gracefully and the rest of the page keeps rendering. Logs a console warning
so the malformed value is still discoverable in DevTools.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scaladoc couldn't resolve [[Spark4DatabricksPageFactory]] when generating
docs in CD because of how the new module compiles via source-share. Use
backticks (markdown code) instead of doc links — same readability, no
linker pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@minskya minskya requested a review from menishmueli May 7, 2026 16:14
@minskya minskya merged commit dcd00ba into main May 7, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cluster crash when using dataflint on Databricks runtime 17.3 LTS

2 participants