Skip to content

small up#718

Open
holden wants to merge 1 commit intomainfrom
03-29-small_up
Open

small up#718
holden wants to merge 1 commit intomainfrom
03-29-small_up

Conversation

@holden
Copy link
Copy Markdown
Member

@holden holden commented Mar 29, 2026

Summary by CodeRabbit

  • Documentation

    • Added Prediction Integrity guidelines covering hypothesis pre-registration, holdout testing protocols, baseline comparisons, and required reporting standards.
  • New Features

    • Added command-line interface for executing weight sweeps with configurable sample count, top-results filtering, optional persistence, and JSON/table output formatting.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a new weight optimization sweep feature for the prediction system. It adds documentation guidelines on prediction integrity, a new public API WeightOptimizer.sweep/3 for evaluating weight vectors in-memory, and a corresponding Mix task CLI that exposes this functionality with configurable sampling, optional persistence, and JSON/table output formats.

Changes

Cohort / File(s) Summary
Documentation
CLAUDE.md, README.md
Added development guidelines requiring authors to review the Prediction Integrity section before modifying prediction/scoring logic, and introduced an anti-self-deception protocol documenting pre-registration, holdout testing, and baseline comparison standards for model evaluation.
Weight Optimizer API
lib/cinegraph/predictions/weight_optimizer.ex
Implemented new public sweep/3 function that loads decade feature data once, evaluates randomly sampled weight vectors and named profiles in-memory, ranks results by accuracy, and optionally persists best weights via MovieLists.save_trained_weights/2.
Mix Task CLI
lib/mix/tasks/predictions.sweep.ex
Added new Mix task predictions.sweep that accepts CLI options (--list-key, --n-samples, --top, --save, --json), delegates to WeightOptimizer.sweep/3, and outputs results as formatted console tables or JSON with error handling and startup banners.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant MixTask as Mix Task<br/>(predictions.sweep)
    participant WeightOptimizer
    participant MovieLists
    participant Output

    User->>MixTask: Run with options
    MixTask->>MixTask: Parse CLI args
    MixTask->>WeightOptimizer: sweep(source_key, n_samples, opts)
    WeightOptimizer->>WeightOptimizer: Load cached decade features
    WeightOptimizer->>WeightOptimizer: Generate n_samples random weights
    WeightOptimizer->>WeightOptimizer: Evaluate all weights in-memory
    WeightOptimizer->>WeightOptimizer: Rank by accuracy (descending)
    
    alt --save option provided
        WeightOptimizer->>MovieLists: save_trained_weights(best)
        MovieLists-->>WeightOptimizer: OK
    end
    
    WeightOptimizer-->>MixTask: Top results with ranks
    MixTask->>Output: Format as JSON or table
    Output-->>User: Display results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Weights sweep through the simplex dance,
Each vector gets its fair chance,
Integrity guards our models true,
No self-deception in our view,
Baselines beaten, holdouts blessed! 🎬✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'small up' is vague and generic, providing no meaningful information about the substantial changes introduced in this PR (new prediction integrity documentation, weight optimizer API, and sweep task). Revise the title to clearly describe the main change, such as 'Add weight optimizer sweep feature with prediction integrity guidelines' or 'Implement random weight sweep CLI task and integrity protocol.'
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 03-29-small_up

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Member Author

holden commented Mar 29, 2026

This stack of pull requests is managed by Graphite. Learn more about stacking.

@holden holden marked this pull request as ready for review March 29, 2026 15:52
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
lib/mix/tasks/predictions.sweep.ex (1)

123-129: Consider guarding against empty results.

hd(results) will crash if results is empty. While named profiles should always be present, a defensive check would improve robustness.

♻️ Proposed defensive check
+   if results == [] do
+     Mix.shell().error("No results returned from sweep.")
+     exit({:shutdown, 1})
+   end
+
    best = hd(results)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/mix/tasks/predictions.sweep.ex` around lines 123 - 129, Guard against an
empty results list before calling hd(results): check results (e.g., with a case
or if) and handle the empty branch (log a message via Mix.shell().info/error and
return or skip further processing) so hd(results) is only called when results is
non-empty; update the block that assigns best and calls Mix.shell().info (the
hd(results) usage and subsequent Mix.shell().info with
format_weights(best.weights)) to first verify results is not [].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/cinegraph/predictions/weight_optimizer.ex`:
- Around line 322-327: The random_simplex_point/1 function can produce :infinity
when using :rand.uniform() and taking 1.0 / 0.0; update the sampling to use
:rand.uniform_real/0 (which never returns 0.0) and compute the exponential
variates via the negative log directly (i.e., generate each raw element with
-:math.log(:rand.uniform_real()) inside the for comprehension) before
normalizing by Enum.sum to produce the simplex point.

---

Nitpick comments:
In `@lib/mix/tasks/predictions.sweep.ex`:
- Around line 123-129: Guard against an empty results list before calling
hd(results): check results (e.g., with a case or if) and handle the empty branch
(log a message via Mix.shell().info/error and return or skip further processing)
so hd(results) is only called when results is non-empty; update the block that
assigns best and calls Mix.shell().info (the hd(results) usage and subsequent
Mix.shell().info with format_weights(best.weights)) to first verify results is
not [].
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ac76c43b-d6ce-42b8-8210-ee19604fe353

📥 Commits

Reviewing files that changed from the base of the PR and between e376c46 and 94a8a3f.

📒 Files selected for processing (4)
  • CLAUDE.md
  • README.md
  • lib/cinegraph/predictions/weight_optimizer.ex
  • lib/mix/tasks/predictions.sweep.ex

Comment on lines +322 to +327
# Sample a uniform random point on the (n-1)-simplex via the exponential method.
defp random_simplex_point(n) do
raw = for _ <- 1..n, do: :math.log(1.0 / :rand.uniform())
total = Enum.sum(raw)
Enum.map(raw, &(&1 / total))
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does Erlang :rand.uniform/0 return exactly 0.0 as a possible value, and does :rand.uniform_real/0 exclude 0.0?

💡 Result:

Yes, Erlang's :rand.uniform/0 can return exactly 0.0 as a possible value in the range 0.0 =< X < 1.0. :rand.uniform_real/0 excludes 0.0, returning values in the range 0.0 < X < 1.0.

Citations:


Use :rand.uniform_real/0 to avoid log(infinity) edge case in simplex sampling.

Erlang's :rand.uniform/0 can return exactly 0.0 (range [0.0, 1.0)), which causes 1.0 / 0.0 to produce :infinity and corrupts the sample vector. Use :rand.uniform_real/0 instead, which excludes 0.0 and returns the mathematically correct negative log directly:

Proposed fix
  defp random_simplex_point(n) do
-   raw = for _ <- 1..n, do: :math.log(1.0 / :rand.uniform())
+   raw = for _ <- 1..n, do: -:math.log(:rand.uniform_real())
    total = Enum.sum(raw)
    Enum.map(raw, &(&1 / total))
  end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/cinegraph/predictions/weight_optimizer.ex` around lines 322 - 327, The
random_simplex_point/1 function can produce :infinity when using :rand.uniform()
and taking 1.0 / 0.0; update the sampling to use :rand.uniform_real/0 (which
never returns 0.0) and compute the exponential variates via the negative log
directly (i.e., generate each raw element with -:math.log(:rand.uniform_real())
inside the for comprehension) before normalizing by Enum.sum to produce the
simplex point.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 29, 2026

Greptile Summary

This PR adds a random weight sweep capability to the predictions system — a fast in-memory alternative to the logistic regression trainer (mix predictions.train) that evaluates thousands of simplex-sampled weight vectors without any per-vector DB round-trips. It also adds process discipline documentation (README "Prediction Integrity" section + CLAUDE.md cross-reference) to guard against overfitting and self-deception during model development.

Key changes:

  • WeightOptimizer.sweep/3 — loads decade feature data once via the existing load_decade_data_parallel path, then evaluates n_samples random simplex points plus all named profiles entirely in memory using a new fast_evaluate/4 helper
  • fast_evaluate/4 — computes precision@K (recall) per decade; metric is consistent with HistoricalValidator.validate_decade (both take top-K, count hits)
  • random_simplex_point/1 — uniform sampling on the (n-1)-simplex via the exponential method; mathematically correct
  • mix predictions.sweep — CLI wrapper with --n-samples, --top, --save, and --json flags
  • One crash path: print_results/3 calls hd(results) without guarding against an empty list; passing --top 0 will raise a FunctionClauseError

Confidence Score: 4/5

Safe to merge after fixing the hd/1 crash in print_results/3.

One P1 crash path exists: passing --top 0 produces an empty result list, causing print_results/3 to call hd([]) and raise an unhandled FunctionClauseError. The core logic — simplex sampling, fast_evaluate metric, caching strategy — is sound and consistent with the existing HistoricalValidator accuracy convention. All other findings are P2 or lower.

lib/mix/tasks/predictions.sweep.ex — specifically the hd(results) call in print_results/3

Important Files Changed

Filename Overview
lib/mix/tasks/predictions.sweep.ex New Mix task wrapping WeightOptimizer.sweep/3. Well-structured with JSON output mode, good logging, and rescue around the sweep call. Has a crash path in print_results/3 when --top 0 is passed (hd on empty list).
lib/cinegraph/predictions/weight_optimizer.ex Adds sweep/3, fast_evaluate/4, and random_simplex_point/1. The in-memory evaluation correctly reuses cached criterion scores (weight-independent), and the simplex sampling via the exponential method is mathematically sound. The accuracy metric (precision@K) is consistent with HistoricalValidator.validate_decade.
README.md Adds a Prediction Integrity: Anti-Self-Deception Protocol section documenting pre-registration, sacred holdout, baseline requirements, and reporting standards. Documentation-only change.
CLAUDE.md Single-line addition cross-referencing the new README Prediction Integrity section from the dev conventions guide.

Sequence Diagram

sequenceDiagram
    participant CLI as mix predictions.sweep
    participant WO as WeightOptimizer.sweep/3
    participant HV as HistoricalValidator
    participant DB as Database
    participant CS as CriteriaScoring

    CLI->>WO: sweep(list_key, n_samples, opts)
    WO->>HV: get_all_decades(source_key)
    HV-->>WO: [1930, 1940, ..., 2010]
    WO->>WO: load_decade_data_parallel (Task.async_stream)
    loop per decade (parallel)
        WO->>DB: get_decade_movies_query(decade)
        DB-->>WO: [movies...]
        WO->>CS: batch_score_movies(movies, @default_weights)
        CS-->>WO: [{criteria_scores: %{mob, critics, ...}}]
        WO->>WO: build feature rows (scores / 100), labels
    end
    Note over WO: decade_data cached in memory
    WO->>CS: get_named_profiles()
    CS-->>WO: [default, festival-heavy, audience-first, ...]
    WO->>WO: build named_vectors from profiles
    loop n_samples random vectors
        WO->>WO: random_simplex_point(n_criteria)
    end
    loop (named + random) vectors — in memory only
        WO->>WO: fast_evaluate(vec, decade_data) → precision@K
    end
    WO->>WO: sort by accuracy desc, take top N
    opt --save
        WO->>DB: MovieLists.save_trained_weights(source_key, weights)
    end
    WO-->>CLI: [%{rank, accuracy, label, weights}, ...]
    CLI->>CLI: print_results / JSON output
Loading

Reviews (1): Last reviewed commit: "small up" | Re-trigger Greptile


Mix.shell().info(" " <> String.duplicate("-", 80))

Enum.each(results, fn r ->
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 hd/1 on potentially empty list crashes the task

print_results/3 calls hd(results) without guarding against an empty list. Passing --top 0 causes WeightOptimizer.sweep to return [] (via Enum.take(results, 0)), which makes hd([]) raise a FunctionClauseError and kills the Mix task with an unformatted stack trace.

The JSON output branch handles this correctly (it just maps over an empty list), so the inconsistency is only in the text-rendering path. The guard needs to wrap the entire best-dependent block, not just the hd call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant