AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing web-infra-dev/midscene in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewMidscene.js English | 简体中文 Official Website : https://midscenejs.com/ AI-powered, vision-driven UI automation for every platform. 📣 Midscene Skills is here! Use Midscene Skills to control any platform with OpenClaw Showcases • Web Automation - Automatically register the GitHub form in a web browser and pass all field validations • iOS Automation - Meituan coffee order • iOS Automation - Auto-like the first @midscene_ai tweet • Android Automation - DCar: Xiaomi SU7 specs • Android Automation - Booking a hotel for Christmas • MCP Integration - Midscene MCP UI prepatch release • robotic arm + vision + voice for in-vehicle testing 💡 Features Write Automation with Natural Language • Describe your goals and steps, and Midscene will plan and operate the user interface for you. • Use Javascript SDK or YAML to write your automation script. Web & Mobile App & Any Interface • **Web Automation**: Either integrate with Puppeteer, Playwright or use Bridge Mode to control your desktop browser. • **Android Automation**: Use Javascript SDK with adb to control your local Android device. • **iOS Automation**: Use Javascript SDK with WebDriverAgent to control your local iOS devices and simulators. • **Any Interface Automation**: Use Javascript SDK to control your own interface. For Developers • **Three kinds of APIs**: • Interaction API: interact with the user interface. • Data Extraction API: extract data from the user interface and dom. • Utility API: utility functions like , , . • **MCP**: Midscene provides MCP services that expose atomic Midscene Agent actions as MCP tools so upper-layer agents can inspect and operate UIs with natural language. Docs • **Caching for Efficiency**: Replay your script with cache and get the result faster. • **Debugging Experience**: Midscene.js offers a visualized replay back report file, a built-in playground, and a Chrome Extension to simplify the debugging process. These are the tools most developers truly need. 👉 Zero-code Quick Experience • **Chrome Extension**: Start in-browser experience immediately through the Chrome Extension, without writing any code. • **Android Playground**: There is also a built-in Android playground to control your local Android device. • **iOS Playground**: There is also a built-in iOS playground to control your local iOS device. ✨ Driven by Visual Language Model Midscene.js is all-in on the pure-vision route for UI actions: element localization and interactions are based on screenshots only. It supports visual-language models like , , , and . For data extraction and page understanding, you can still opt in to include DOM when needed. • Pure-vision localization for UI actions; the DOM extraction mode is removed. • Works across web, mobile, desktop, and even surfaces. • Far fewer tokens by skipping DOM for actions, which cuts cost and speeds up runs. • DOM can still be included for data extraction and page understanding when needed. • Strong open-source options for self-hosting. Read more about Model Strategy 📄 Resources • Official Website: https://midscenejs.com • Documentation: https://midscenejs.com • Sample Projects: https://github.com/web-infra-dev/midscene-example • API Reference: https://midscenejs.com/api • GitHub: https://github.com/web-infra-dev/midscene 🤝 Community • Discord • Follow us on X • Lark Group(飞书交流群) 🌟 Awesome Midscene Community projects that extend Midscene.js capabilities: • midscene-ios - iOS Mirror automation support for Midscene • midscene-pc - PC operation device for Windows, macOS, and Linux • midscene-pc-docker - Docker image with Midscene-PC server pre-installed • Midscene-Python - Python SDK for Midscene automation • midscene-java by @Master-Frank - Java SDK for Midscene automation • midscene-java by @alstafeev - Java SDK for Midscene automation 📝 Credits We would like to thank the following projects: • Rsbuild and Rslib for the build tool. • UI-TARS for the open-source agent model UI-TARS. • Qwen-VL for the open-source VL model Qwen-VL. • scrcpy and yume-chan allow us to control Android devices with browser. • appium-adb for the javascript bridge of adb. • appium-webdriveragent for the javascript operate XCTest。 • YADB for the yadb tool which improves the performance of text input. • libnut-core for the cross-platform native keyboard and mouse control. • Puppeteer for browser automation and control. • Playwright for browser automation and control and testing. 📖 Citation If you use Midscene.js in your research or project, please cite: ✨ Star History 📝 License Midscene.js is MIT licensed. --- If this project helps you or inspires you, please give us a star