Skip to content

Explicit Schema Evolution Support Needed for Error Tables in Data Quality Rule Failures #254

@menathan

Description

@menathan

Is your feature request related to a problem? Please describe.

Currently, spark-expectations does not explicitly handle schema evolution (e.g., changes in data types of columns) for error tables when data quality rules fail. If a source data column's type evolves, subsequent error table writes may fail (for example, writing to Delta Lake error tables without mergeSchema=true). This can hinder long-running jobs or incremental loads involving evolving data sources.

Describe the solution you'd like

Add explicit schema evolution support for error tables in the writing logic. This could include:

  • Automatically enabling relevant write options (e.g., mergeSchema=true for Delta) when writing error tables.
  • Validating and, if possible, updating existing error table schemas before writing failed records.
  • Documenting best practices for handling schema evolution in production environments.

Describe alternatives you've considered

  • Manually recreating error tables when encountering schema mismatch errors.
  • Relying on underlying table formats to handle schema evolution implicitly (may not be reliable for all use cases).

Additional context

  • This affects users ingesting evolving data (especially with streaming or batch jobs that accumulate errors over time).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions