Is your feature request related to a problem? Please describe.
Currently, spark-expectations does not explicitly handle schema evolution (e.g., changes in data types of columns) for error tables when data quality rules fail. If a source data column's type evolves, subsequent error table writes may fail (for example, writing to Delta Lake error tables without mergeSchema=true). This can hinder long-running jobs or incremental loads involving evolving data sources.
Describe the solution you'd like
Add explicit schema evolution support for error tables in the writing logic. This could include:
- Automatically enabling relevant write options (e.g.,
mergeSchema=true for Delta) when writing error tables.
- Validating and, if possible, updating existing error table schemas before writing failed records.
- Documenting best practices for handling schema evolution in production environments.
Describe alternatives you've considered
- Manually recreating error tables when encountering schema mismatch errors.
- Relying on underlying table formats to handle schema evolution implicitly (may not be reliable for all use cases).
Additional context
- This affects users ingesting evolving data (especially with streaming or batch jobs that accumulate errors over time).
Is your feature request related to a problem? Please describe.
Currently, spark-expectations does not explicitly handle schema evolution (e.g., changes in data types of columns) for error tables when data quality rules fail. If a source data column's type evolves, subsequent error table writes may fail (for example, writing to Delta Lake error tables without
mergeSchema=true). This can hinder long-running jobs or incremental loads involving evolving data sources.Describe the solution you'd like
Add explicit schema evolution support for error tables in the writing logic. This could include:
mergeSchema=truefor Delta) when writing error tables.Describe alternatives you've considered
Additional context