AIR-DISCOVER / FreeAskWorld

[AAAI 2026 Oral] FreeAskWorld is an interactive simulation framework that integrates large language models (LLMs) for high-level planning and socially grounded interaction in embodied AI.

217 stars

1 forks

1 issues

PythonShell

Chat with Codebase Architecture Scan Security Audit Explain Codebase

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing AIR-DISCOVER/FreeAskWorld in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Click here to launch the interactive analysis workspace

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/AIR-DISCOVER/FreeAskWorld)

Preview:

Repository Overview (README excerpt)

Crawler view

FreeAskWorld Simulator (AAAI26 Oral) An Interactive and Closed-Loop Simulator for Human-Centric Embodied AI --> --> FreeAskWorld is an interactive simulation framework that integrates large language models (LLMs) for high-level planning and socially grounded interaction in embodied AI. --- Project Milestones • [x] 📝 **Paper Publication**: Published the main research paper describing FreeAskWorld. • [x] 📊 **Data Processing Code Release**: Released code for preprocessing, data cleaning, and annotation pipelines. • [x] 🎥 **Presentation Video**: Released project presentation video. • [ ] 🛠️ **Simulator Code Release**: Publish the core simulation code for developers and external collaborators. • [ ] 🤖 **OpenClaw Robot Integration**: Integrate OpenClaw to access and interact with robots inside the FreeAskWorld simulation environment. • [ ] 📚 **Usage Tutorial**: Create a comprehensive tutorial for using the FreeAskWorld simulator, including setup, configuration, and example workflows. • [ ] 🧑‍💻 **API Documentation**: Provide thorough documentation of the simulator’s API for seamless integration and extension. • [ ] 🎮 **Steam Release**: Prepare and publish the FreeAskWorld simulator on Steam for broader accessibility. --- 🎥 Demos **Simulator Presentation** Demonstrates the main functions of this simulator. 你的浏览器不支持 HTML5 视频播放，请点击下方链接下载。 📥 Download Simulator Presentation Video **Simulator APP Presentation** Demonstrates the main functions of this simulator. 你的浏览器不支持 HTML5 视频播放，请点击下方链接下载。 📥 Download APP Presentation Video **ROS2 Example** Demonstrates the ROS2 RGBD SLAM in our simulator. 你的浏览器不支持 HTML5 视频播放，请点击下方链接下载。 📥 Download ROS2 Example Video 📌 Introduction As embodied intelligence progresses, simulation platforms must evolve beyond low-level physics toward **human-centric, socially interactive environments**. **FreeAskWorld** introduces: • A **closed-loop interactive simulator** • A **scalable human-agent world modeling framework** • A **modular data generation pipeline** • A new benchmark: **Direction Inquiry Task**, extending VLN to **active question-asking & guidance following** This repo contains **simulator code** and **baseline models** from our AAAI 2026 paper. --- ✨ Key Features | Feature | Description | |---|---| | 🤖 **LLM-Powered Agents** | Intention modeling, reasoning, natural dialog, instruction generation | | 🚶 **Realistic Humans** | Personalized profiles, schedules, motion & navigation styles | | 🌦️ **Dynamic World** | Weather, lighting, traffic, and scene randomization | | 🔁 **Closed-Loop Sync** | WebSocket-based state exchange for real-time model interaction | | 🧩 **Direction Inquiry Task** | Agents ask for help, interpret human guidance, adapt plans | | 📦 **Large-Scale Data** | 6 tasks · 16 object categories · 63,429 frames · 17+ hours | | 🔄 **Data Generation Pipeline** | Modular pipeline for generating embodied ai data | --- Synthetic Data Generation docs/OccupancyMapGenerationContrast.jpg docs/SyhteticDataPic.jpg We used Unity Perception (Borkman et al. 2021) to build a rich and diverse synthetic dataset that includes multiple annotation types and data modalities. The dataset is designed to support a wide range of vision, navigation, and human–computer interaction tasks, and contains both dense per-frame annotations and global scene-level metadata. The main components are: • **Visual annotations:** 2D/3D bounding boxes, instance segmentation, and semantic segmentation. • **Geometric annotations:** depth maps and surface normal maps for scene geometry. • **Visual observations:** panoramic RGB images and six 90° perspective views. • **Interaction data:** natural language instructions, dialog histories, and agent trajectories. • **Spatial representations:** 2D occupancy heatmaps for mapping and localization. • **Environment metadata:** map boundaries, semantic regions, and other contextual information. The dataset covers 16 common object categories (e.g., vehicles, pedestrians, street furniture). By combining 2D occupancy heatmaps (encoding static layout) with 3D bounding boxes (capturing dynamic entity positions) and the provided world coordinates, we can accurately reconstruct simulated scenes to create a comprehensive digital twin. This reconstructed environment supports open-loop evaluations similar to nuScenes (Caesar et al. 2020), and is particularly suited for unstructured environments as in FreeAD (Peng et al. 2025). The dataset enables a broad spectrum of downstream tasks including navigation planning, behavior prediction, and human–computer interaction studies. The figures below illustrate occupancy map generation and sample synthetic data: 🚀 Getting Started How to Run 📊 Proactive VLN Results Models fine-tuned on FreeAskWorld demonstrate enhanced semantic understanding and interaction competency. However, a significant gap to human performance remains, especially in high-level reasoning and social navigation. Closed-Loop Navigation Performance (Table 4 from Paper) | Method | TL (m) | SR (%) | SPL | NE (m) | OSR (%) | ONE (m) | NDI | | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Human (no asking) | 47.5 | 40.2 | 38.2 | 18.3 | 41.3 | 11.3 | 0.0 | | Human (asking) | 59.9 | 82.6 | 71.2 | 3.49 | 82.6 | 1.63 | 0.78 | | ETPNav | 31.2 | 0.0 | 0.0 | 32.9 | 0.0 | 28.7 | 0.0 | | BEVBert | 14.6 | 0.0 | 0.0 | 31.0 | 0.0 | 29.0 | 0.0 | | ETPNav-FT | 33.6 | 0.0 | 0.0 | 31.6 | 1.1 | 27.1 | 0.0 | | BEVBert-FT | 18.7 | 0.0 | 0.0 | 30.0 | 0.0 | 28.5 | 0.0 | Licence FreeAskWorld is licensed under the Apache 2.0 License.