wanshuiyin / Auto-claude-code-research-in-sleep

ARIS ⚔️ (Auto-Research-In-Sleep) — Claude Code skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation via Codex MCP

1,265 stars

116 forks

6 issues

PythonTeX

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing wanshuiyin/Auto-claude-code-research-in-sleep in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/wanshuiyin/Auto-claude-code-research-in-sleep)

Preview:

Repository Overview (README excerpt)

Crawler view

Auto-claude-code-research-in-sleep (ARIS ⚔️) 中文版 README | English > 🌙 **Let Claude Code do research while you sleep.** Wake up to find your paper scored, weaknesses identified, experiments run, and narrative rewritten — autonomously. · · -orange?style=flat) · 💬 Join Community · Custom Claude Code skills for autonomous ML research workflows. These skills orchestrate **cross-model collaboration** — Claude Code drives the research while an external LLM (via Codex MCP) acts as a critical reviewer. 🔀 **Also supports alternative model combinations (GLM, MiniMax, Kimi, LongCat, DeepSeek, etc.) — no Claude or OpenAI API required.** > 💭 **Why not self-play with a single model?** Using Claude Code subagents or agent teams for both execution and review is technically possible, but tends to fall into **local minima** — the same model reviewing its own patterns creates blind spots. > > *Think of it like adversarial vs. stochastic bandits: a single model self-reviewing is the stochastic case (predictable reward noise), while cross-model review is adversarial (the reviewer actively probes weaknesses the executor didn't anticipate) — and adversarial bandits are fundamentally harder to game.* > > 💭 **Why two models, not more?** Two is the minimum needed to break self-play blind spots, and 2-player games converge to Nash equilibrium far more efficiently than n-player ones. Adding more reviewers increases API cost and coordination overhead with diminishing returns — the biggest gain is going from 1→2, not 2→4. > > Claude Code's strength is fast, fluid execution; Codex (GPT-5.4 xhigh) is slower but more deliberate and rigorous in critique. These complementary styles — **speed × rigor** — produce better outcomes than either model talking to itself. 📢 What's New • **2026-03-15** — 🔀 **Bring your own model!** Any OpenAI-compatible API now works as reviewer via MCP server. GLM, MiniMax, Kimi, LongCat, DeepSeek all tested — **zero Claude or OpenAI API needed** • **2026-03-15** — 🐾 **OpenClaw adaptation guide** — use ARIS research workflows in OpenClaw without Claude Code slash skills • **2026-03-15** — 📐 ** ** — community skill for rigorous theorem proof drafting. 📚 **Anti-hallucination citations** — now fetches real BibTeX from DBLP/CrossRef instead of LLM-generated entries — on by default, zero install • **2026-03-14** — 📱 Feishu/Lark integration: three modes (off/push/interactive), mobile notifications for experiments, reviews, and checkpoints • **2026-03-13** — 🛑 Human-in-the-loop: configurable checkpoints across all workflows. Full autopilot or step-by-step approval • **2026-03-12** — 🔗 Zotero + Obsidian + local PDFs + arXiv/Scholar: multi-source literature search with cross-model novelty verification • **2026-03-11** — 🚀 Three end-to-end workflows complete: one prompt → top-venue-style paper. chains idea discovery → auto review → paper writing autonomously • **2026-03-09** — 📝 workflow: narrative report → structured outline → figures → LaTeX → compiled PDF → 2-round auto-improvement (4/10 → 8.5/10) 🚀 Quick Start > **Tip:** All pipeline behaviors are configurable via inline overrides — append to any command: > > | Parameter | Default | What it does | > |-----------|---------|-------------| > | | | Auto-continue at idea selection gate. Set to manually pick which idea to pursue before committing GPU time | > | | | Pause after each review round so you can read the score, give custom modification instructions, skip specific fixes, or stop early | > | | | Download top relevant arXiv PDFs during literature survey. When , only fetches metadata (title, abstract, authors) | > | | | Fetch real BibTeX from DBLP/CrossRef instead of LLM-generated entries. Eliminates hallucinated citations. Zero install | > > > **Important:** Codex MCP uses the model from , not from skill files. Make sure it says (recommended). Other options: , , . Run or edit the file directly. See full setup guide for details and alternative model combinations if you don't have Claude/OpenAI API. ✨ Features • 📊 **20 composable skills** — mix and match, or chain into full pipelines ( , , , ) • 🔍 **Literature & novelty** — multi-source paper search (**Zotero** + **Obsidian** + **local PDFs** + arXiv/Scholar) + cross-model novelty verification • 💡 **Idea discovery** — literature survey → brainstorm 8-12 ideas → novelty check → GPU pilot experiments → ranked report • 🔄 **Auto review loop** — 4-round autonomous review, 5/10 → 7.5/10 overnight with 20+ GPU experiments • 📝 **Paper writing** — narrative → outline → figures → LaTeX → PDF → auto-review (4/10 → 8.5/10), one command • 🤖 **Cross-model collaboration** — Claude Code executes, GPT-5.4 xhigh reviews. Adversarial, not self-play • 📝 **Peer review** — review others' papers as a conference reviewer, with structured scoring and meta-review • 🖥️ **GPU deployment** — auto rsync, screen sessions, multi-GPU parallel experiments, live monitoring • 🔀 **Flexible models** — default Claude × GPT-5.4, also supports GLM, MiniMax, Kimi, LongCat, DeepSeek, etc. — no Claude or OpenAI API required • 🛑 **Human-in-the-loop** — configurable checkpoints at key decisions. for full autopilot, to approve each step • 📱 **Feishu/Lark notifications** — three modes: **off (default, strongly recommended for most users)**, push-only (webhook, mobile alerts), interactive (approve/reject from Feishu). Zero impact when unconfigured Preview: Push cards (group) & Interactive chat (private) **Push Only** — group chat cards (experiment done, checkpoint, error, pipeline complete): **Interactive** — private chat with Claude Code (approve/reject, custom instructions): • 🧩 **Extensible** — domain-specific skills welcome! Add a and open a PR. See community skills like (architecture/EDA) --- 📈 Score Progression (Real Run) A real overnight 4-round run on an ML research project, from borderline reject to submission-ready: | Round | Score | What Happened | |-------|-------|---------------| | Init…