back to home

openzim / mwoffliner

MediaWiki scraper: all your wiki articles in one highly compressed ZIM file

438 stars
110 forks
120 issues
TypeScriptJavaScriptCSS

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing openzim/mwoffliner in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/openzim/mwoffliner)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

MWoffliner MWoffliner is a tool for creating a local offline HTML snapshot of any online MediaWiki instance. It scrapes all articles (or a selection if specified) and creates the corresponding ZIM file. While primarily targeted for Wikimedia projects like Wikipedia and Wiktionary, MWoffliner also supports any recent MediaWiki instance (version 1.27+), though instances with custom skins or highly unusual configurations may have limitations. Read CONTRIBUTING.md to learn more about MWoffliner development. User help is available in the FAQ. Features • Scrape with or without image thumbnails • Scrape with or without audio/video multimedia content • S3 cache (optional) • Image size optimization and WebP conversion • Scrape all articles in namespaces or title list based • Specify additional/non-main namespaces to scrape Run to see all available options. Prerequisites • Docker (or Docker-based engine) • amd64 architecture Installation The recommended way to install and run is using the pre-built Docker container: Run software locally / Build from source Prerequisites for local execution • *NIX Operating System (GNU/Linux, macOS, etc.) • Redis — in-memory data store • Node.js version 24 (we support only one single Node.js version; other versions might work or might not) • Libzim — C++ library for creating ZIM files (automatically downloaded on GNU/Linux & macOS) • Various build tools which are probably already installed on your machine: • — JPEG image processing • — OpenGL utility library • — automatic configuration system • — Makefile generator • — C compiler (These packages are for Debian/Ubuntu systems) An online MediaWiki instance with its API available. Installation methods Build your own container • Clone the repository locally: • Build the image: Run the software locally using NPM > [!WARNING] > Local installation requires several system dependencies (see above). Using the Docker image is strongly recommended to avoid setup issues. Setting up MWoffliner locally for development can be tricky due to several dependencies and version requirements. Follow these steps carefully to avoid common errors. • Node.js Version MWoffliner requires Node.js 24 (other versions may fail). Compatible Node 24 ranges: or . Check your version: If your version does not match, use nvm to install the correct Node.js version. • libzim Dependency MWoffliner depends on , which requires the C++ libzim library. • On Linux/macOS, MWoffliner can download libzim automatically. • On Windows, you must install libzim manually because there are no prebuilt binaries. See the libzim installation guide for details. • Compiler Requirements (Windows) Node 24 on Windows officially supports Visual Studio 2019 (v16) or Visual Studio 2022 (v17). Ensure C++ build tools are installed and environment variables are set correctly. See Windows Setup for node-gyp for detailed instructions. • Node-gyp MWoffliner uses node-gyp, which enforces strict checks for Node and compiler versions. Make sure you have: • Proper Visual Studio version (Windows) — see Visual Studio versions • Required C++ headers, e.g., — see libzim documentation • Python 3.10+ (required by node-gyp; a recent version is preferred for compatibility) Additional troubleshooting steps if errors persist: • **Clear npm cache** — a corrupted cache can cause cryptic install failures: • **Delete node_modules and reinstall** — stale or partially installed dependencies are a common source of errors: • **Check that all environment variables are set** — especially on Windows, , , and must point to the correct Visual Studio and libzim directories. Reopen your terminal after installing new tools. • **Verify Redis is running before starting MWoffliner** — MWoffliner will fail immediately if it cannot connect to Redis: • **Run npm install with verbose logging** to see exactly where it fails: • Common Errors & Troubleshooting | Error | Cause | Solution | |-------|-------|----------| | Node.js version error | Node.js version incompatible | Install Node 24 with nvm | | Cannot find module @openzim/libzim | libzim not installed | Follow libzim installation guide; Windows users must install manually | | node-gyp rebuild failed | Wrong Node or compiler version | Check Node.js version, Visual Studio version, Python 3.x | | zim/archive.h not found | C++ headers missing | Install libzim system-wide, verify include paths | > [!NOTE] > Even with these steps, other setup errors may occur. Using Docker is strongly recommended for a smoother experience. Installation via NPM > [!WARNING] > You might need to run this command with the command, depending on how your / OS is configured. permission checking can be a bit annoying for newcomers. Please read the npm script documentation if you encounter issues. Usage Using Docker (Recommended) Using NPM / Local Install To use MWoffliner with an S3 cache, provide an S3 URL: Contribute If you've retrieved the MWoffliner source code (e.g., via a git clone), you can install and run it locally with your modifications: Detailed contribution documentation and guidelines are available. API MWoffliner provides an API and can be used as a Node.js library. Here's a stub example for your file: Background Complementary information about MWoffliner: • **MediaWiki software** is used by thousands of wikis, the most famous ones being the Wikimedia ones, including Wikipedia. • **MediaWiki** is a PHP wiki runtime engine. • **Wikitext** is the markup language that MediaWiki uses. • **MediaWiki parser** converts Wikitext to HTML, which displays in your browser. • Read the scraper functional architecture for more details. License GPLv3 or later, see LICENSE for more details. Acknowledgements This project received funding through NGI Zero Core, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program. Learn more at the NLnet project page.