Skillter / ProxyGather
Sophisticated proxy scraper and checker
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing Skillter/ProxyGather in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewThe Ultimate Proxy Scraper & Checker This project is a sophisticated tool designed to scrape proxies from a wide variety of sources and check them for validity and performance. Additionally the scraper runs every 30 minutes on its own via GitHub Actions, ensuring the proxy lists are always fresh. If you find this project useful, **please consider giving it a star ⭐** or share it by the word of mouth. Those things help a lot. thanks. Index • Live Proxy Lists • Notice • Installation • Advanced Usage • Adding Your Own Sites • What Makes This Project Different? • Contributions Live Proxy Lists These URLs link directly to the raw, automatically-updated proxy lists. You can integrate them right into your projects. • **Working Proxies (Checked and Recommended):** • All Protocols: • HTTP: • SOCKS4: • SOCKS5: • **All Scraped Unchecked Proxies (Most are dead):** Notice I do not host the provided proxies. The code is design to only **collect publicly listed proxies** from websites and check if they are working. Remember that **some public proxies are intentionally malicious**, so **never** send your passwords or any sensitive data while connected to any public proxy to be safe. I built this tool to make it easier for developers and power-users to access resources for building/creating things, because I believe skill and talent shouldn't be wasted by no budget. **I condemn malicious use**, please use proxies responsibly. Installation Getting up and running is fast and simple. *Tested on Python 3.12.9* • **Clone the repository and install packages:** • **Run It** Execute the script. Default settings make it work out-of-box. The results are in the same folder. Advanced Usage For more control, you can use these command-line arguments. Scraping Proxies Arguments: • : Specify the output file for the scraped proxies. (Default: ) • : Number of concurrent threads to use for the general scrapers. (Default: 50) • : Number of concurrent threads for browser automation scrapers. (Default: 3) • : Run only specific scrapers. For example: • : Run all scrapers except for specific ones. For example: • , : Enable detailed logging for what's being scraped. • : Automatically remove URLs from that yield no proxies. • : Run in compliant mode (respects robots.txt, no anti-bot bypass). • : Enable browser automation scrapers (Hide.mn, OpenProxyList, Spys.one). • , : Auto-accept the legal disclaimer. To see a list of all available scrapers, run: Available Sources: • **Websites** - URLs from • **Discover** - URLs discovered from website lists in • **Advanced.name** • **CheckerProxy** • **Geonode** • **GoLogin** • **Hide.mn** (Pass to enable) • **OpenProxyList** (Pass to enable) • **PremProxy** • **Proxy-Daily** • **ProxyDB** • **ProxyDocker** • **ProxyHttp** • **ProxyList.org** • **ProxyNova** • **ProxyScrape** • **ProxyServers.pro** • **Spys.one** (Pass to enable) Checking Proxies Arguments: • : The input file(s) containing the proxies to check. You can use wildcards. (Default: ) • : The base name for the output files. The script will create separate files for each protocol (e.g. , ). • : The number of concurrent threads to use for checking. (Default: 500) • : The timeout for each proxy check (e.g. , ). (Default: ) • , : Enable detailed logging • : Add the protocol prefix (e.g. "http://", "socks5://") to the start of each line Unified Mode (Scrape + Check) This runs both scraping and checking in one command with a streaming pipeline. Scraped proxies are immediately fed to the checker for validation. Arguments: • : Output file for scraped proxies. (Default: ) • : Additional proxy file(s) to check alongside scraped proxies. Useful for re-checking existing lists. • : Base name for working proxy output files. (Default: ) • : Number of concurrent threads for scraping. (Default: 50) • : Number of concurrent threads for checking. (Default: 500) • : Timeout for each proxy check (e.g. , ). (Default: ) • : Concurrent threads for browser automation scrapers. (Default: 3) • : Run only specific scrapers. (See for list) • : Exclude specific scrapers. • : Run in compliant mode (respects robots.txt, no anti-bot bypass). • : Enable browser automation scrapers. • , : Auto-accept the legal disclaimer. • , : Enable detailed logging. Example: Check Existing Proxies + Scrape New Ones This will: • Load proxies from into the checker • Scrape new proxies from all sources • Check all proxies (existing + scraped) • Save working proxies to (and , , ) Adding Your Own Sites You can easily add an unlimited number of your own targets by editing the file. It uses a simple format: • **URL**: The only required part. • **JSON\_PAYLOAD**: (Optional) A JSON object for POST requests. Use as a placeholder for page numbers in paginated sites. • **JSON\_HEADERS**: (Optional) A JSON object for custom request headers. Examples: So what makes this project different from other proxy scrapers? • **Advanced Anti-Bot Evasion**: This isn't just a simple script. It includes dedicated logic for websites that use advanced anti-bot measures like session validation, Recaptcha fingerprinting or even required account registration. It can parse JavaScript-obfuscated IPs, decode Base64-encoded proxies, handle paginated API calls, and in cases where it's required, an automated browser (SeleniumBase) to bypass the detection and unlock exclusive proxies that other tools can't reach. • **A Checker That's Actually Smart**: Most proxy checkers just see if a port is open. That's not good enough. A proxy can be "alive" but useless or even malicious. This engine's validator is more sophisticated. • **Detects Hijacking**: It sends a request to a trusted third-party 'judge'. If a proxy returns some weird ad page or incorrect content instead of the real response, it's immediately flagged as a potential **hijack** and discarded. This is a common issue with free proxies that this checker actively prevents. • **Identifies Password Walls**: If a proxy requires a usernam…