Parser quality checklist

### Description
Create a lightweight but consistent checklist to evaluate the quality of parsed output on representative course documents.

The goal is to make parser validation repeatable across samples and to quickly identify the most common failure modes before they affect chunking, retrieval, and source-grounded generation.

### Scope
For each sample document, evaluate the parsed output against the following dimensions:

- text extraction quality
- slide/page segmentation
- heading/section preservation
- tables
- formulas
- image-heavy slides
- broken reading order
- missing content

### Evaluation dimensions
Each document review should include brief notes and a simple rating for each of the following:

- **Text extraction quality**  
  Is the extracted text complete, readable, and reasonably clean?

- **Slide/page segmentation**  
  Are page and slide boundaries preserved correctly?

- **Heading/section preservation**  
  Are titles, section headers, and hierarchical structure retained?

- **Tables**  
  Are tables captured in a usable form, or are they broken/lost?

- **Formulas**  
  Are formulas preserved, partially degraded, or missing?

- **Image-heavy slides**  
  Does the parser still produce useful output when slides contain little text and mostly visuals?

- **Broken reading order**  
  Does the extracted content follow the correct logical reading sequence?

- **Missing content**  
  Is any obvious content missing from the parsed output?

### Suggested output format
For each sample document, produce:

- document name
- file type
- parser used
- short overall quality summary
- checklist evaluation by category
- examples of major issues
- recommendation:
  - acceptable for retrieval
  - acceptable with cleanup
  - not acceptable yet

### Deliverables
- parser quality checklist template
- completed evaluations for a small set of representative sample documents
- short summary of recurring parser weaknesses

### Acceptance criteria
- the checklist is clear and reusable
- at least 3–5 representative documents are evaluated
- major parser failure modes are explicitly documented
- the output is useful for both parsing and retrieval work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser quality checklist #37

Description

Scope

Evaluation dimensions

Suggested output format

Deliverables

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parser quality checklist #37

Description

Description

Scope

Evaluation dimensions

Suggested output format

Deliverables

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions