Skip to content

Add COCO annotation import/export support#679

Open
deanp70 wants to merge 6 commits intomainfrom
coco_converter
Open

Add COCO annotation import/export support#679
deanp70 wants to merge 6 commits intomainfrom
coco_converter

Conversation

@deanp70
Copy link
Copy Markdown
Member

@deanp70 deanp70 commented Mar 25, 2026

Based on changes to dagshub-annotation-converter to the same effect: DagsHub/dagshub-annotation-converter#26

Implementation Details

The PR extends annotation support through a multi-layered approach:

Importer Module Enhancements (importer.py):

  • Extended AnnotationType literal to include "coco",

Metadata Annotations Support (metadata.py):

  • Introduced add_coco_annotation method to load COCO-format JSON and integrate annotations into metadata

Query Result Export API (query_result.py):

  • One new export methods:
    • export_as_coco() - generates COCO JSON with optional class mapping and custom output filename
  • Introduced _resolve_annotation_field to auto-select available annotation field with alphabetical fallback
  • Refactored YOLO export to use the new annotation field resolution

Test Coverage:

  • test_coco.py (218 lines): Import/export flows, filename rewriting, field resolution, class mapping, output customization

Code Quality and Testing Infrastructure:

  • Enhanced ruff.toml linting configuration with comprehensive rule set including:
    • Security checks (flake8-bandit)
    • Bug detection (flake8-bugbear)
    • Try/except pattern validation (tryceratops)
    • Unused argument detection and other design pattern checks
    • Per-file exceptions for __init__.py and conftest.py
  • Added DatasourceType.REPOSITORY assignment in test fixtures (tests/data_engine/conftest.py)
  • Added id property to MockRepoAPI test utility
  • Minor refactor in token_auth.py (no behavioral change - logical equivalence in boolean expression)

Documentation Note: Per PR comments, the supported formats list at dagshub.com/docs/use_cases/annotation/#supported-formats-for-annotation_imports requires manual update to reflect COCO BBOX, and COCO Segmentation support.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@deanp70 deanp70 requested a review from kbolashev March 25, 2026 11:41
@deanp70 deanp70 self-assigned this Mar 25, 2026
@deanp70 deanp70 added the enhancement New feature or request label Mar 25, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1bf821c6-59fd-47ee-8942-9c38e934bed8

📥 Commits

Reviewing files that changed from the base of the PR and between eb2c643 and 15289ed.

📒 Files selected for processing (1)
  • setup.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • setup.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: check-licenses
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
  • GitHub Check: Build documentation
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.13)

📝 Walkthrough

Walkthrough

Adds COCO annotation import and export: importer accepts "coco" and uses load_coco_from_file; QueryResult gains _resolve_annotation_field and export_as_coco; filename-remapping now handles missing filename; tests and mocks updated; dagshub-annotation-converter requirement bumped to >=0.2.0.

Changes

Cohort / File(s) Summary
COCO Import Support
dagshub/data_engine/annotation/importer.py
Added "coco" to AnnotationType, added COCO import branch using load_coco_from_file(...), added repo download branch that downloads the single annotation file, and made filename remapping conditional when ann.filename may be None.
COCO Export Support
dagshub/data_engine/model/query_result.py
Added export_as_coco(...) to assemble and write COCO JSON (resolve annotation field, collect annotations, optional classes → categories mapping, prepend source_prefix to filenames, download images, write JSON). Added _resolve_annotation_field(...) and updated export_as_yolo(...) to use it.
COCO Feature Tests
tests/data_engine/annotation_import/test_coco.py
New end-to-end tests for COCO import/export, _resolve_annotation_field() behaviors, bbox conversion to COCO format, class mapping, custom output filename handling, error cases, and aggregation across datapoints.
Test Infrastructure Updates
tests/data_engine/conftest.py, tests/mocks/repo_api.py
Datasource mock sets ds_state.source_type = DatasourceType.REPOSITORY and stubs source_prefix via PropertyMock. MockRepoAPI gains an id property returning 1.
Dependency Management
setup.py
Bumped dagshub-annotation-converter requirement from >=0.1.12 to >=0.2.0; reordered setuptools import line.

Sequence Diagram

sequenceDiagram
    participant User
    participant QueryResult
    participant FieldResolver as _resolve_annotation_field
    participant AnnotationStore as AnnotationsCollector
    participant RepoAPI
    participant ImageDownloader
    participant COCOWriter as export_to_coco_file
    participant FS as FileSystem

    User->>QueryResult: export_as_coco(download_dir, annotation_field, ...)
    QueryResult->>FieldResolver: resolve annotation_field
    FieldResolver-->>QueryResult: resolved_field

    QueryResult->>AnnotationStore: collect annotations from resolved_field
    AnnotationStore-->>QueryResult: annotations list

    alt annotations exist
        QueryResult->>RepoAPI: request image files (prepend source_prefix)
        RepoAPI-->>ImageDownloader: stream/download images
        ImageDownloader-->>QueryResult: local image paths

        QueryResult->>COCOWriter: build COCO JSON (images, annotations, categories)
        COCOWriter->>FS: write output JSON (output_filename)
        FS-->>COCOWriter: write confirmation
        COCOWriter-->>QueryResult: output Path
        QueryResult-->>User: return Path
    else no annotations
        QueryResult-->>User: raise RuntimeError
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 I hopped through JSON fields today,
Bboxes lined up, filenames in play.
Resolver sniffed the right-tagged clue,
Images fetched and COCO grew—
Carrots for tests, a tidy bouquet! 🥕

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add COCO annotation import/export support' directly and clearly summarizes the main objective of the changeset, which adds COCO format support for both importing and exporting annotations.
Description check ✅ Passed The description is directly related to the changeset, providing detailed implementation details across multiple modules (importer.py, metadata.py, query_result.py), test coverage, and test infrastructure updates that align with the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch coco_converter

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
dagshub/data_engine/annotation/importer.py (1)

83-92: ⚠️ Potential issue | 🟡 Minor

Add else clause to handle unsupported annotation types.

If annotations_type is not one of the supported values, annotation_dict remains unbound, causing an UnboundLocalError on line 92.

Proposed fix
             elif self.annotations_type == "coco":
                 annotation_dict, _ = load_coco_from_file(annotations_file)
+            else:
+                raise ValueError(f"Unsupported annotation type: {self.annotations_type}")

             return annotation_dict
♻️ Duplicate comments (1)
dagshub/__init__.py (1)

1-1: ⚠️ Potential issue | 🔴 Critical

Version changes should not be included in PRs.

The version bump has been flagged previously as a blocker issue. Version changes should be handled separately from feature PRs, typically during the release process.


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 13ca3252-8076-4a28-a78a-8200f0235dc6

📥 Commits

Reviewing files that changed from the base of the PR and between af5fe5d and e68048a.

📒 Files selected for processing (9)
  • dagshub/__init__.py
  • dagshub/auth/token_auth.py
  • dagshub/data_engine/annotation/importer.py
  • dagshub/data_engine/annotation/metadata.py
  • dagshub/data_engine/model/query_result.py
  • setup.py
  • tests/data_engine/annotation_import/test_coco.py
  • tests/data_engine/conftest.py
  • tests/mocks/repo_api.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
  • GitHub Check: build (3.9)
  • GitHub Check: build (3.13)
  • GitHub Check: build (3.12)
🔇 Additional comments (11)
dagshub/auth/token_auth.py (1)

40-40: Readability improvement with no behavior change.

Line 40 is an equivalent condition and reads more idiomatically in Python. No concerns here.

dagshub/data_engine/annotation/metadata.py (2)

284-284: Move import to module level.

The import should be at the top of the file with other imports, not inside the function.

Proposed fix

Add to imports at top of file:

from dagshub_annotation_converter.converters.coco import load_coco_from_json_string

Then remove line 284.


274-294: LGTM on method implementation.

The method correctly:

  1. Parses COCO JSON via the converter
  2. Rewrites all annotation filenames to the datapoint's path (expected behavior when adding annotations to a specific datapoint)
  3. Extends the existing annotations list
  4. Persists via _update_datapoint()

The filename rewriting is intentional here since you're associating external annotations with a specific datapoint.

tests/data_engine/annotation_import/test_coco.py (2)

20-23: Consider moving mock to shared fixture.

This mock_source_prefix fixture should be integrated into _create_mock_datasource() in conftest.py so it works for all datasource fixture users, not just this test module.


1-218: Good test coverage for COCO functionality.

The tests comprehensively cover:

  • Import from disk and error handling for missing files
  • Label Studio task conversion
  • add_coco_annotation filename rewriting
  • _resolve_annotation_field edge cases (explicit, auto, no fields, alphabetical ordering)
  • Export with various configurations (bbox coordinates, empty annotations error, custom classes, custom filename, multiple datapoints)

The helper functions (_make_coco_json, _write_coco, _make_image_bbox, _make_qr) are well-structured and reusable.

tests/mocks/repo_api.py (1)

116-118: LGTM!

The mock id property correctly mirrors the real RepoAPI.id signature with a constant return value appropriate for test scenarios.

tests/data_engine/conftest.py (1)

8-8: LGTM!

Explicitly setting source_type = DatasourceType.REPOSITORY ensures the mocked datasource state correctly exercises REPOSITORY-specific code paths (e.g., path_parts() revision extraction), which is necessary for the new COCO annotation tests.

Also applies to: 29-29

dagshub/data_engine/model/query_result.py (3)

783-791: LGTM!

Good refactoring to centralize annotation field resolution. The alphabetical sorting provides deterministic selection when multiple annotation fields exist, and the error message is clear when no fields are found.


871-917: LGTM on overall export_as_coco structure.

The method follows the established pattern from export_as_yolo: resolves annotation field, collects annotations, prefixes filenames with source prefix, downloads images, and exports to the target format. Error handling for empty annotations is appropriate.


901-903: No changes required. The code correctly assigns dict(classes) to context.categories, which matches the documented COCO format specification of {id: name}. The imported Categories class is from the YOLO module and should not be used with CocoContext, as it is format-specific to YOLO, not COCO.

			> Likely an incorrect or invalid review comment.
dagshub/data_engine/annotation/importer.py (1)

160-163: Defensive handling for None filenames.

This change prevents TypeError when remap_func is called with None. The fallback to new_filename is reasonable since the annotation belongs to the image keyed by that filename in the dictionary.

However, a None filename from the converter could indicate a bug upstream. Consider logging when this fallback is triggered to aid debugging.

@deanp70 deanp70 requested a review from kbolashev March 29, 2026 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants