Best Open Source crawling Libraries
A curated list of the most popular GitHub repositories tagged with crawling. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
#2gocolly/colly
Elegant Scraper and Crawler Framework for Golang
#3apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
#4codelucas/newspaper
newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:
#5D4Vinci/Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
#6go-rod/rod
A Chrome DevTools Protocol driver for web automation and scraping.