Skip to content

Data: Add TCK coverage for reader default values#16638

Draft
joyhaldar wants to merge 1 commit into
apache:mainfrom
joyhaldar:tck-default-values
Draft

Data: Add TCK coverage for reader default values#16638
joyhaldar wants to merge 1 commit into
apache:mainfrom
joyhaldar:tck-default-values

Conversation

@joyhaldar
Copy link
Copy Markdown
Contributor

@joyhaldar joyhaldar commented May 31, 2026

Adds the reader default-value tests from DataTestBase into the Base Format model TCK:

  • testDefaultValues
  • testNullDefaultValue
  • testNestedDefaultValue
  • testMapNestedDefaultValue
  • testListNestedDefaultValue
  • testMissingRequiredWithoutDefault.

Co-authored-by: Joy Haldar <joy.haldar@target.com>
@github-actions github-actions Bot added the data label May 31, 2026
Arguments.of(Types.DoubleType.get(), Literal.of(-0.0D)),
Arguments.of(Types.DateType.get(), Literal.of(DateTimeUtil.isoDateToDays("2024-12-17"))),
Arguments.of(
Types.TimeType.get(), Literal.of(DateTimeUtil.isoTimeToMicros("23:59:59.999999"))),
Copy link
Copy Markdown
Contributor Author

@joyhaldar joyhaldar May 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-enabled the TIME case here, it passes for AVRO and PARQUET, ORC skips. It was added commented out in DataTestBase, I haven't traced why, so flagging in case it was intentional.

.map(generator -> Arguments.of(format, generator)))
.toList();

private static final List<Arguments> PRIMITIVE_TYPES_AND_DEFAULTS =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a DataGenerator for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix is (type, default-literal) pairs and the test asserts the reader injects exactly that literal, so the type has to stay paired with its expected value. DataGenerator is one schema() + random rows, so it can't carry that pairing, at least as I understand the current interface.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a generator with a schema with default values, and create 2 tests, one with nulls where we check for the default values, and one with random data (not nulls), and check for the generated data?


@ParameterizedTest
@FieldSource("FILE_FORMATS")
void testDefaultValues(FileFormat fileFormat) throws IOException {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this test ? Seem the same like testSchemaEvolutionAddColumn?

expectedNested.setField("missing_inner_float", -0.0F);
expected.setField("nested", expectedNested);
}
return expected;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line

.copy("value_str", val.getField("value_str"), "value_int", 34)));
expected.setField("nested_map", rebuilt);
}
return expected;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line. Please check all the code .Thanks

.collect(Collectors.toList());
expected.setField("nested_list", rebuilt);
}
return expected;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: new line .

List<Record> genericRecords = RandomGenericData.generate(writeSchema, 10, 1L);
writeGenericRecords(fileFormat, writeSchema, genericRecords);

Schema expectedSchema =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse writeSchema to create expectedSchema?

List<Record> genericRecords = RandomGenericData.generate(writeSchema, 10, 1L);
writeGenericRecords(fileFormat, writeSchema, genericRecords);

Schema expectedSchema =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants