Best Open Source web scraping Libraries
A curated list of the most popular GitHub repositories tagged with web scraping. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1firecrawl/firecrawl
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
#2scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
#3Mintplex-Labs/anything-llm
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.
#4dgtlmoon/changedetection.io
Best and simplest tool for website change detection, web page monitoring, and website change alerts. Perfect for tracking content changes, price drops, restock alerts, and website defacement monitoring—all for free or enjoy our SaaS plan!
#5ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
#6apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
#7seleniumbase/SeleniumBase
Python APIs for web automation, testing, and bypassing bot-detection with ease.
#8yusufkaraaslan/Skill_Seekers
Convert documentation websites, GitHub repositories, and PDFs into Claude AI skills with automatic conflict detection
#9D4Vinci/Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
#10go-rod/rod
A Chrome DevTools Protocol driver for web automation and scraping.
#11autoscrape-labs/pydoll
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.