Best Open Source scraping Libraries
A curated list of the most popular GitHub repositories tagged with scraping. Select any project to visualize its architecture and dive into the codebase using RepoMind's AI engine.
#1firecrawl/firecrawl
🔥 The Web Data API for AI - Turn entire websites into LLM-ready markdown or structured data
#2scrapy/scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
#3feder-cr/Jobs_Applier_AI_Agent_AIHawk
AIHawk aims to easy job hunt process by automating the job application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in a tailored way.
#4gocolly/colly
Elegant Scraper and Crawler Framework for Golang
#5ScrapeGraphAI/Scrapegraph-ai
Python scraper based on AI
#6apify/crawlee
Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
#7soxoj/maigret
🕵️♂️ Collect a dossier on a person by username from thousands of sites
#8psf/requests-html
Pythonic HTML Parsing for Humans™
#9ultrafunkamsterdam/undetected-chromedriver
Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)
#10D4Vinci/Scrapling
🕷️ An adaptive Web Scraping framework that handles everything from a single request to a full-scale crawl!
#11autoscrape-labs/pydoll
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.