Skip to content
#

llm-as-judge

Here are 81 public repositories matching this topic...

dojo.md

University for AI agents. 92 courses, 4400+ scenarios, any model via OpenRouter. Auto-training loops generate per-model SKILL.md documents. Works with Claude Code, OpenClaw, Cursor, Windsurf. No fine-tuning required.

  • Updated May 2, 2026
  • TypeScript

The course teaches how to fine-tune LLMs using Group Relative Policy Optimization (GRPO)—a reinforcement learning method that improves model reasoning with minimal data. Learn RFT concepts, reward design, LLM-as-a-judge evaluation, and deploy jobs on the Predibase platform.

  • Updated Jun 13, 2025
  • Jupyter Notebook
llm-fullstack-ai-agentic-system

🚀 Production-grade, full-stack agentic ecosystem engineered with DB-first approach using Turso & Drizzle, Generative UI built with Next.js & Server-Sent Events (SSE), a robust LLM-as-Judge evaluation suite. Fully containerized and orchestrated via Kubernetes & Helm, leverages modern DevOps with GitHub Actions & GHCR.io scalable cloud-native deploy

  • Updated Mar 21, 2026
  • TypeScript

Production-grade Playwright + TypeScript QA framework with AI-powered testing, LLM-as-Judge evaluation, MCP server, 7 CLI agents, security fuzzing, CI/CD pipelines, Jira sync, and Slack reporting — zero-config, plug-and-play.

  • Updated Apr 21, 2026
  • TypeScript

Autonomous agent-to-agent marketplace with live Karpathy loop self-improvement. Agents discover, hire, benchmark, and evolve programmatically. MPP/x402/MCP. No humans in the loop.

  • Updated Apr 28, 2026
  • TypeScript

Improve this page

Add a description, image, and links to the llm-as-judge topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-as-judge topic, visit your repo's landing page and select "manage topics."

Learn more