777genius / os-ai-computer-use

AI controls your OS. OS AI Computer Use, OS and API agnostic. For now on OpenAI and Anthropic API. Desktop app ready.

139 stars

8 forks

0 issues

PythonDartC++

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing 777genius/os-ai-computer-use in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/777genius/os-ai-computer-use)

Preview:

Repository Overview (README excerpt)

Crawler view

OS AI Computer Use **The most capable open-source desktop automation agent — 75.0% on OSWorld, surpassing human performance.** Supports OpenAI GPT-5.4 and Anthropic Claude. Cross-platform. Production-ready. > **Coming soon:** MCP-first architecture with sandboxed code execution, plug & play model backends, and isolated environments — in active development. https://github.com/user-attachments/assets/7fb80b7f-6cef-45e0-adba-7b616e939a60 For End Users **Want to use OS AI without coding?** Download the latest release for your platform: > **Download Latest Release** Available for: • macOS (Intel + Apple Silicon) • Windows (x64) • Linux (x64) • Web **New to OS AI?** Read the **User Guide** for installation and setup instructions. **Key Features:** • 🧠 **Multi-provider AI** — OpenAI GPT-5.4 and Anthropic Claude, switchable in Settings • 🖥️ AI controls your desktop: clicks, types, scrolls, drags, takes screenshots • 🔒 Secure API key storage in system keychain • 💬 Chat-based interface with visual feedback • 📊 Real-time cost tracking for both providers • 🎨 Cross-platform Flutter UI (macOS, Windows, Linux, Web) • 🖼️ Image upload and clipboard paste • 💬 Multiple chat sessions with persistent history • 🔄 Conversation context resume after app restart Supported AI Providers | Provider | Model | Computer Use | Status | |----------|-------|-------------|--------| | **OpenAI** | GPT-5.4 | Batched actions, continuity | **Fully supported** | | **Anthropic** | Claude Sonnet 4.6 / Opus 4.6 | Single actions, zoom, full message history | **Fully supported** | Switch providers in **Settings** — enter your API key and select the active provider from the dropdown. --- For Developers Table of Contents • OS AI Computer Use • Table of Contents • Installation \& Setup • Quick start • CLI Examples • Development Mode • 1. Install dependencies • 2. Start the backend • 3. Start the frontend (in a new terminal) • Architecture • Provider Comparison (March 2026) • Features • Supported Platforms • Configuration (config/settings.py) • Tool input (API) • Tests • Flutter integration • Contributing • License • Troubleshooting • Contact Local agent for desktop automation with **multi-provider AI support**. Currently supports **OpenAI GPT-5.4 Computer Use** and **Anthropic Claude Computer Use**. The LLM layer is abstracted behind , making it easy to add new providers. What this project is: • A **multi-provider** Computer Use agent (OpenAI + Anthropic) with a stable tool interface • An OS-agnostic execution layer using ports/drivers (macOS, Windows, and Linux) • A CLI you can bundle into a single executable for local use What it is not (yet): • A remote SaaS; this is a local agent Highlights: • **OpenAI GPT-5.4** with batched actions and for efficient multi-step workflows • **Anthropic Claude Sonnet 4.6 / Opus 4.6** with single-action precision, zoom support, and full message history • Provider selection in UI Settings with per-provider API key management • Smooth mouse movement, clicks, drag-and-drop with easing and timing controls • Reliable keyboard input, hotkeys and hold sequences • Screenshots (Quartz on macOS or PyAutoGUI fallback), on-disk saving and base64 tool_result • Detailed logs and running cost estimation per iteration and total • Multiple chats, image upload, persistent chat history with context resume See provider architecture in , OS ports/drivers in , and packaging notes in . Installation & Setup Requirements: • macOS 13+ or Windows 10/11 or Linux (X11/XWayland) • Python 3.12+ • API key for at least one provider: • **OpenAI**: (for GPT-5.4 Computer Use) • **Anthropic**: (for Claude Computer Use) Linux system dependencies (if applicable): Install: macOS permissions (for GUI automation): Grant permissions to Terminal/iTerm and your venv Python under: Accessibility, Input Monitoring, Screen Recording. --- Quick start Requirements: • macOS 13+ or Windows 10/11 or Linux (X11/XWayland; unit tests on any OS; GUI tests macOS/self-hosted Windows/Linux) • Python 3.12+ • API key: or Install: macOS permissions (required for GUI automation): Grant permissions to Terminal/iTerm and your venv Python under: Accessibility, Input Monitoring, Screen Recording. Run the agent (CLI): CLI Examples Useful make targets: --- Development Mode For development with backend + frontend (Flutter UI): • Install dependencies • Start the backend Backend environment variables (optional): • - default AI provider: or (default: ) • - host address (default: ) • - port number (default: ) • - enable debug logging (default: ) • - authentication token (optional) • - allowed CORS origins (default: ) Backend endpoints: • - health check • - WebSocket for JSON-RPC commands • - file upload • - file download • - metrics snapshot • Start the frontend (in a new terminal) Frontend config (in code): • Default backend WebSocket: • Default REST base: See for more details on the Flutter app architecture and features. --- Architecture The project uses a **provider-agnostic** architecture: **Adding a new provider** requires only creating a new package that implements the interface — no changes to core, backend, or frontend needed. Key design decisions: • **ProviderPart** — typed content blocks for provider-specific data (replaces text-based markers) • **provider_context** — opaque state passed between iterations (e.g., OpenAI's ) • **ToolCall.metadata** — internal routing separated from clean action data • **Batch handler** — unified entry point for single (Anthropic) and batched (OpenAI) actions See for details. --- Provider Comparison (March 2026) | | OpenAI GPT-5.4 | Anthropic Claude Sonnet 4.6 | Anthropic Claude Opus 4.6 | |---|---|---|---| | **OSWorld** (desktop tasks) | **75.0%** | 72.5% | 72.7% | | **SWE-Bench Verified** (coding) | ~80% | — | **80.8%** | | **Input price** (per 1M tokens) | $2.50 | $3.00 | $5.00 | | **Output price** (per 1M tokens) | $15.00 | $15.00 | $25.00 | | **Context window** | 1.05M |…