back to home

llmsresearch / paperbanana

Open source implementation and extension of Google Research’s PaperBanana for automated academic figures, diagrams, and research visuals, expanded to new domains like slide generation.

View on GitHub
1,232 stars
178 forks
23 issues

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing llmsresearch/paperbanana in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/llmsresearch/paperbanana)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

PaperBanana Automated Academic Illustration for AI Scientists --- > **Disclaimer**: This is an **unofficial, community-driven open-source implementation** of the paper > *"PaperBanana: Automating Academic Illustration for AI Scientists"* by Dawei Zhu, Rui Meng, Yale Song, > Xiyu Wei, Sujian Li, Tomas Pfister, and Jinsung Yoon (arXiv:2601.23265). > This project is **not affiliated with or endorsed by** the original authors or Google Research. > The implementation is based on the publicly available paper and may differ from the original system. An agentic framework for generating publication-quality academic diagrams and statistical plots from text descriptions. Supports OpenAI (GPT-5.2 + GPT-Image-1.5), Azure OpenAI / Foundry, and Google Gemini providers. • Two-phase multi-agent pipeline with iterative refinement • Multiple VLM and image generation providers (OpenAI, Azure, Gemini) • Input optimization layer for better generation quality • Auto-refine mode and run continuation with user feedback • CLI, Python API, and MCP server for IDE integration • **Batch generation** from a manifest file (YAML/JSON) for multiple diagrams in one run • Claude Code skills for , , and --- Quick Start Prerequisites • Python 3.10+ • An OpenAI API key (platform.openai.com) or Azure OpenAI / Foundry endpoint • Or a Google Gemini API key (free, Google AI Studio) Step 1: Install Or install from source for development: Step 2: Get Your API Key Or use the setup wizard for Gemini: Step 3: Generate a Diagram With input optimization and auto-refine: Output is saved to along with all intermediate iterations and metadata. --- How It Works PaperBanana implements a multi-agent pipeline with up to 7 specialized agents: **Phase 0 -- Input Optimization (optional, ):** • **Input Optimizer** runs two parallel VLM calls: • **Context Enricher** structures raw methodology text into diagram-ready format (components, flows, groupings, I/O) • **Caption Sharpener** transforms vague captions into precise visual specifications **Phase 1 -- Linear Planning:** • **Retriever** selects the most relevant reference examples from a curated set of 13 methodology diagrams spanning agent/reasoning, vision/perception, generative/learning, and science/applications domains • **Planner** generates a detailed textual description of the target diagram via in-context learning from the retrieved examples • **Stylist** refines the description for visual aesthetics using NeurIPS-style guidelines (color palette, layout, typography) **Phase 2 -- Iterative Refinement:** • **Visualizer** renders the description into an image • **Critic** evaluates the generated image against the source context and provides a revised description addressing any issues • Steps 4-5 repeat for a fixed number of iterations (default 3), or until the critic is satisfied ( ) Providers PaperBanana supports multiple VLM and image generation providers: | Component | Provider | Model | Notes | |-----------|----------|-------|-------| | VLM (planning, critique) | OpenAI | | Default | | Image Generation | OpenAI | | Default | | VLM | Google Gemini | | Free tier | | Image Generation | Google Gemini | | Free tier | | VLM / Image | OpenRouter | Any supported model | Flexible routing | Azure OpenAI / Foundry endpoints are auto-detected — set to your endpoint. Gemini-compatible gateways are also supported — set when needed. --- CLI Reference -- Methodology Diagrams | Flag | Short | Description | |------|-------|-------------| | | | Path to methodology text file (required for new runs) | | | | Figure caption / communicative intent (required for new runs) | | | | Output image path (default: auto-generated in ) | | | | Number of Visualizer-Critic refinement rounds (default: 3) | | | | Loop until critic is satisfied (with safety cap) | | | | Safety cap for mode (default: 30) | | | | Preprocess inputs with parallel context enrichment and caption sharpening | | | | Continue from the latest run in | | | | Continue from a specific run ID | | | | User feedback for the critic when continuing a run | | | | VLM provider name (default: ) | | | | VLM model name (default: ) | | | | Image gen provider (default: ) | | | | Image gen model (default: ) | | | | Output format: , , or (default: ) | | | | Path to YAML config file (see ) | | | | Show detailed agent progress and timing | | | | Emit JSON progress events to stdout during generation | -- Statistical Plots | Flag | Short | Description | |------|-------|-------------| | | | Path to data file, CSV or JSON (required) | | | | Communicative intent for the plot (required) | | | | Output image path | | | | Refinement iterations (default: 3) | -- Batch Generation Generate multiple methodology diagrams from a single manifest file (YAML or JSON). Each item runs the full pipeline; outputs are written under and a summarizes all runs. Manifest format (YAML or JSON with an list): Paths in the manifest are resolved relative to the manifest file's directory. **Generate a human-readable report** from an existing batch run (Markdown or HTML): | Flag | Short | Description | |------|-------|-------------| | | | Path to manifest file (required) | | | | Parent directory for batch run (default: outputs) | | | | Path to config YAML | | | | Refinement iterations per item | | | | Preprocess inputs for each item | | | | Loop until critic satisfied per item | | | | Output image format (png, jpeg, webp) | | | | Download expanded reference set if needed | -- Quality Assessment Comparative evaluation of a generated diagram against a human reference using VLM-as-a-Judge: | Flag | Short | Description | |------|-------|-------------| | | | Path to generated image (required) | | | | Path to human reference image (required) | | | | Path to source context text file (required) | | | | Figure caption (required) | Scores on 4 dimensions (hierarchical aggregation per the paper): • **Primary**: Faithfulness, Readability • **Secondary**: Conciseness, Aestheti…