Research Paper Judge
Multi-agent AI system that evaluates arXiv research papers and generates peer-review-style reports with PASS/FAIL verdicts and scored dimensions.
Problem
Manual peer review is slow, inconsistent, and doesn't scale — reviewers apply different rubrics, miss key dimensions, and create bottlenecks.
Approach
Built a multi-agent pipeline with two sequential waves: Wave 1 runs Grammar, Novelty (with live Google Search), and Fact-Check agents concurrently; Wave 2 runs Consistency and Authenticity agents in parallel; a final Evaluator agent applies weighted scoring to produce a PASS/FAIL verdict.
Value
Structured, explainable peer-review reports in seconds — traceable per-dimension scores with rationale, not just a verdict.
Snapshot
Accepts an arXiv URL, retrieves paper metadata, extracts PDF content page-by-page, deploys six specialized AI agents across two waves, and outputs a scored review report with verdict.
Stack
- Python
- FastAPI
- PostgreSQL (NeonDB)
- Streamlit
- OpenRouter
- Gemini
- pymupdf4llm
Role
- Multi-agent system design
- LLM orchestration
- Evaluation pipeline
- Full-stack build
Outcomes
- PASS/FAIL verdicts with per-dimension scores
- Six specialized agents covering grammar, novelty, fact-checking, consistency, authenticity
- Wave-based concurrency for faster evaluation
- Traceable scoring rationale per dimension
Build notes
- Wave 1 agents run concurrently — Grammar, Novelty (Gemini + Google Search), Fact-Check.
- Wave 2 agents run after Wave 1 completes — Consistency and Authenticity in parallel.
- Final Evaluator agent applies weighted scoring: Consistency 30%, Authenticity 25%, Novelty 20%, Fact-Check 15%, Grammar 10%.
- pymupdf4llm for page-level PDF extraction with structured metadata.
- NeonDB (PostgreSQL) for storing paper metadata and evaluation results.
Roadmap
- Multi-paper batch processing and comparison.
- Fine-tuned judge model on domain-specific data.
- Export to structured review formats.
Want something similar built for your product?
I'll scope the path, then ship with reliability in mind.