Skip to content

Add ryaml as optional YAML backend for faster parsing#3055

Merged
koxudaxi merged 3 commits intomainfrom
feature/ryaml-optional-dependency
Mar 16, 2026
Merged

Add ryaml as optional YAML backend for faster parsing#3055
koxudaxi merged 3 commits intomainfrom
feature/ryaml-optional-dependency

Conversation

@koxudaxi
Copy link
Copy Markdown
Owner

@koxudaxi koxudaxi commented Mar 16, 2026

Fixes: #2785

Summary by CodeRabbit

  • New Features

    • Added optional ryaml dependency and dynamic selection of YAML backend
    • Improved YAML parsing/error handling and backend detection utilities
  • Documentation

    • Updated CLI reference examples to use simpler, validated default values for fields
  • Tests

    • Added comprehensive tests for YAML backend detection, loading, and error scenarios

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cc8d7e93-4fe6-43ac-af4a-97c7d231c2d5

📥 Commits

Reviewing files that changed from the base of the PR and between 13f6f69 and 9803a31.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock, !**/*.lock and included by none
📒 Files selected for processing (1)
  • pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml

📝 Walkthrough

Walkthrough

Adds ryaml as an optional YAML parser and runtime detection/utilities to prefer ryaml when available; load_yaml and infer_input_type now use the detected backend and parse-error types. Also updates several documentation examples to replace lambda-based Field default_factories with direct defaults and validate_default=True, and adds tests for YAML backend behavior.

Changes

Cohort / File(s) Summary
YAML Backend Utilities
src/datamodel_code_generator/util.py
Add YamlBackend type alias and cached helpers get_yaml_backend() and get_yaml_parse_errors() to detect available YAML backend (prefer ryaml) and expose parse error types.
YAML Loading / Input Detection
src/datamodel_code_generator/__init__.py
Update load_yaml to call ryaml.loads when backend is ryaml, otherwise use PyYAML yaml.load(..., SafeLoader); change infer_input_type error handling to use get_yaml_parse_errors() instead of catching yaml.parser.ParserError.
Optional Dependency
pyproject.toml
Add ryaml >= 0.5.1 as an optional dependency under optional-dependencies.ryaml.
Tests for Backends
tests/test_yaml_backend.py
Add comprehensive tests covering backend detection, parse-error reporting, load_yaml behavior for string/TextIO inputs across backends, and infer_input_type behavior when parsers raise errors.
Documentation Examples
docs/cli-reference/field-customization.md, docs/cli-reference/model-customization.md, docs/cli-reference/template-customization.md, docs/cli-reference/typing-customization.md
Replace several Field(default_factory=...) usages with direct literal defaults and validate_default=True; add/adjust field metadata (e.g., description, examples) in examples.

Sequence Diagram

sequenceDiagram
    participant Caller as Caller
    participant Loader as load_yaml
    participant Detector as get_yaml_backend
    participant Ryaml as ryaml
    participant PyYAML as PyYAML

    Caller->>Loader: load_yaml(input)
    Loader->>Detector: get_yaml_backend()
    Detector-->>Loader: "ryaml" or "pyyaml"
    alt backend == "ryaml"
        Loader->>Ryaml: ryaml.loads(input)
        Ryaml-->>Loader: parsed_data
    else backend == "pyyaml"
        Loader->>PyYAML: yaml.load(input, SafeLoader)
        PyYAML-->>Loader: parsed_data
    end
    Loader-->>Caller: parsed_data
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

breaking-change-analyzed

Poem

🐰 I sniffed the parsers, quick and spry,
Ryaml leapt while PyYAML stayed nigh,
Defaults simplified, factories shed,
Validation checks now forge ahead,
Hooray—two backends, one happy sky!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ❓ Inconclusive Documentation changes to field customization examples appear to demonstrate the new validate_default parameter behavior, which is unrelated to ryaml backend support. These changes should be validated as in-scope with the PR objectives. Clarify whether the documentation updates (field-customization.md, model-customization.md, template-customization.md, typing-customization.md) are intentionally demonstrating validate_default behavior or are out-of-scope refactoring.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of the PR: adding ryaml as an optional YAML backend for performance improvements.
Linked Issues check ✅ Passed The PR implements all coding requirements from issue #2785: adds ryaml to optional dependencies in pyproject.toml with version >= 0.5.1, implements runtime detection logic to use ryaml when available with PyYAML fallback, and adds comprehensive tests for the new functionality.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/ryaml-optional-dependency
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 16, 2026

📚 Docs Preview: https://pr-3055.datamodel-code-generator.pages.dev

Comment thread src/datamodel_code_generator/util.py Dismissed
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 16, 2026

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 11 untouched benchmarks
⏩ 98 skipped benchmarks1


Comparing feature/ryaml-optional-dependency (9803a31) with main (af51cd7)

Open in CodSpeed

Footnotes

  1. 98 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (af51cd7) to head (9803a31).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main     #3055    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           85        86     +1     
  Lines        17911     18011   +100     
  Branches      2074      2075     +1     
==========================================
+ Hits         17911     18011   +100     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pyproject.toml (1)

63-65: Include ryaml in the all extra to keep extras behavior consistent.

Line 63 adds a new optional feature, but datamodel-code-generator[all] won’t include it unless optional-dependencies.all is also updated.

Suggested patch
 optional-dependencies.all = [
   "datamodel-code-generator[debug]",
   "datamodel-code-generator[graphql]",
   "datamodel-code-generator[http]",
+  "datamodel-code-generator[ryaml]",
   "datamodel-code-generator[ruff]",
   "datamodel-code-generator[validation]",
   "datamodel-code-generator[watch]",
 ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 63 - 65, The new optional extra
optional-dependencies.ryaml (ryaml>=0.4) was added but not included in the
aggregate optional-dependencies.all extra; update the optional-dependencies.all
list in pyproject.toml to include "ryaml" so that installing
datamodel-code-generator[all] pulls this new optional dependency; locate the
optional-dependencies.all definition and append "ryaml" (matching the extra
name) to that list.
tests/test_yaml_backend.py (1)

64-97: Optional cleanup: parameterize the duplicated backend/input tests.

The four load_yaml tests share the same structure; pytest.mark.parametrize would reduce duplication and future maintenance drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_yaml_backend.py` around lines 64 - 97, Refactor the four tests in
TestLoadYaml to a single parameterized test using pytest.mark.parametrize over
two axes: backend availability (e.g., a tuple with ("ryaml_missing", None,
expected_calls) and ("ryaml_present", mock_ryaml, expected_calls)) and input
type (string vs io.StringIO); inside the test, inject sys.modules via
patch.dict("sys.modules", {"ryaml": backend_mock}), prepare the input value (raw
string or stream), call load_yaml(input), and assert both the returned dict and
whether mock_ryaml.loads was called with the expected string when a mock is
provided; reference TestLoadYaml and load_yaml to locate the code to replace.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pyproject.toml`:
- Around line 63-65: The new optional extra optional-dependencies.ryaml
(ryaml>=0.4) was added but not included in the aggregate
optional-dependencies.all extra; update the optional-dependencies.all list in
pyproject.toml to include "ryaml" so that installing
datamodel-code-generator[all] pulls this new optional dependency; locate the
optional-dependencies.all definition and append "ryaml" (matching the extra
name) to that list.

In `@tests/test_yaml_backend.py`:
- Around line 64-97: Refactor the four tests in TestLoadYaml to a single
parameterized test using pytest.mark.parametrize over two axes: backend
availability (e.g., a tuple with ("ryaml_missing", None, expected_calls) and
("ryaml_present", mock_ryaml, expected_calls)) and input type (string vs
io.StringIO); inside the test, inject sys.modules via patch.dict("sys.modules",
{"ryaml": backend_mock}), prepare the input value (raw string or stream), call
load_yaml(input), and assert both the returned dict and whether mock_ryaml.loads
was called with the expected string when a mock is provided; reference
TestLoadYaml and load_yaml to locate the code to replace.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c1d53ea8-d94d-4d8a-851f-e7555a4307fa

📥 Commits

Reviewing files that changed from the base of the PR and between af51cd7 and 13f6f69.

⛔ Files ignored due to path filters (2)
  • docs/llms-full.txt is excluded by none and included by none
  • uv.lock is excluded by !**/*.lock, !**/*.lock and included by none
📒 Files selected for processing (8)
  • docs/cli-reference/field-customization.md
  • docs/cli-reference/model-customization.md
  • docs/cli-reference/template-customization.md
  • docs/cli-reference/typing-customization.md
  • pyproject.toml
  • src/datamodel_code_generator/__init__.py
  • src/datamodel_code_generator/util.py
  • tests/test_yaml_backend.py

@koxudaxi koxudaxi enabled auto-merge (squash) March 16, 2026 05:19
@koxudaxi koxudaxi merged commit 5bd8eab into main Mar 16, 2026
38 checks passed
@koxudaxi koxudaxi deleted the feature/ryaml-optional-dependency branch March 16, 2026 05:23
@github-actions
Copy link
Copy Markdown
Contributor

Breaking Change Analysis

Result: No breaking changes detected

Reasoning: PR #3055 adds ryaml as an optional YAML backend for faster parsing. This is purely additive - users must explicitly install the optional dependency with pip install datamodel-code-generator[ryaml]. The existing PyYAML behavior is preserved as the fallback when ryaml is not installed. The function signatures (load_yaml, infer_input_type) remain unchanged. The documentation updates showing validate_default=True syntax reflect changes from PR #3050, not this PR. No CLI options, code generation output, templates, or Python version requirements were changed.


This analysis was performed by Claude Code Action

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

🎉 Released in 0.56.0

This PR is now available in the latest release. See the release notes for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ryaml as optional YAML parser dependency

2 participants