Is there an existing issue for this?
Category of Bug / Issue
Reconcile bug
Current Behavior
When running a reconciliation using an Oracle source with jdbc_options configured (specifically partition_column, lower_bound, upper_bound, and num_partitions), the process fails with an org.apache.spark.sql.AnalysisException.
Even though the column exists in the source table, Spark reports that the user-defined partition column is not found in the JDBC relation. This appears to be caused by how the partition_column string is handled or quoted before being passed to the Spark JDBC reader.
Error Snippet:
AnalysisException: User-defined partition column `id_col_name` not found in the JDBC relation: struct<HASH_VALUE_RECON: string, AMOUNT: decimal(10,0), ..., ID_COL_NAME: decimal(10,0), ...>
I found that the issue is related to backticks/quoting. I applied a local workaround in lakebridge/reconcile/connectors/jdbc_reader.py at line 18 by stripping the wrapping characters, which resolved the issue:
# Temporary fix/workaround
option_dict["partitionColumn"] = options.partition_column[1:-1]
Expected Behavior
he JDBC reader should correctly accept the partition_column defined in the YAML configuration, match it against the JDBC schema, and successfully split the data into the requested number of partitions without throwing a partition column not found error.
Steps To Reproduce
- Install
databricks-labs-lakebridge==0.11.3.
- Configure a reconciliation with an Oracle source.
- Add jdbc_options to the source configuration:
jdbc_options:
number_partitions: "8"
partition_column: "ID_COL_NAME"
lower_bound: "1"
upper_bound: "10000000"
- Run the reconciliation.
- The process fails during the data fetching phase with a
ReconciliationException wrapping a Spark AnalysisException.
Relevant log output or Exception details
14:57:29 WARNING [d.l.l.r.connectors.data_source] Runtime exception occurred while fetching data using SELECT ... FROM DP_EXCHANGE.dataset_010 : User-defined partition column `id_col_name` not found in the JDBC relation: struct<HASH_VALUE_RECON: string, AMOUNT: decimal(10,0), BIRTHDAY: timestamp, CREATED_AT: timestamp, ID_COL_NAME: decimal(10,0), IS_ACTIVE: decimal(1,0), PRICE: decimal(19,4), PRODUCT_ID: string, TRANSACTION_ID: string, TRANSACTION_TIME: timestamp, USER_CATEGORY: string ... 9 more fields>.
JVM stacktrace:
org.apache.spark.sql.AnalysisException
at org.apache.spark.sql.errors.QueryCompilationErrors$.userDefinedPartitionNotFoundInJDBCRelationError(QueryCompilationErrors.scala:4420)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$verifyAndGetNormalizedPartitionColumn$2(JDBCRelation.scala:198)
...
at org.apache.spark.sql.classic.DataFrameReader.load(DataFrameReader.scala:170)
Logs Confirmation
Sample Query
SELECT LOWER(DBMS_CRYPTO.HASH(RAWTOHEX(COALESCE(TRIM(amount), '_null_recon_') || ... )), 2)) AS hash_value_recon, amount, birthday, created_at, id_col_name, is_active, price, product_id, transaction_id, transaction_time, user_category FROM ORCL_SCHEMA.TBL
Operating System
macOS
Version
latest via Databricks CLI
Is there an existing issue for this?
Category of Bug / Issue
Reconcile bug
Current Behavior
When running a reconciliation using an Oracle source with
jdbc_optionsconfigured (specificallypartition_column,lower_bound,upper_bound, andnum_partitions), the process fails with anorg.apache.spark.sql.AnalysisException.Even though the column exists in the source table, Spark reports that the user-defined partition column is not found in the JDBC relation. This appears to be caused by how the
partition_columnstring is handled or quoted before being passed to the Spark JDBC reader.Error Snippet:
I found that the issue is related to backticks/quoting. I applied a local workaround in
lakebridge/reconcile/connectors/jdbc_reader.pyat line 18 by stripping the wrapping characters, which resolved the issue:Expected Behavior
he JDBC reader should correctly accept the
partition_columndefined in the YAML configuration, match it against the JDBC schema, and successfully split the data into the requested number of partitions without throwing apartition column not founderror.Steps To Reproduce
databricks-labs-lakebridge==0.11.3.ReconciliationExceptionwrapping a SparkAnalysisException.Relevant log output or Exception details
Logs Confirmation
--debuglsp-server.logunder USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.logSample Query
Operating System
macOS
Version
latest via Databricks CLI