back to home

niespodd / browser-fingerprinting

Analysis of Bot Protection systems with available countermeasures 🚿. How to defeat anti-bot system 👻 and get around browser fingerprinting scripts 🕵️‍♂️ when scraping the web?

4,970 stars
268 forks
9 issues
JavaScriptHTML

AI Architecture Analysis

This repository is indexed by RepoMind. By analyzing niespodd/browser-fingerprinting in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.

Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.

Source files are only loaded when you start an analysis to optimize performance.

Embed this Badge

Showcase RepoMind's analysis directly in your repository's README.

[![Analyzed by RepoMind](https://img.shields.io/badge/Analyzed%20by-RepoMind-4F46E5?style=for-the-badge)](https://repomind.in/repo/niespodd/browser-fingerprinting)
Preview:Analyzed by RepoMind

Repository Overview (README excerpt)

Crawler view

*This repository's development would not have been possible without the support of many partners and sponsors. One of these partners is **ScrapingBee, which is a cloud web scraping service with some neat built-in anti-bot detection features.*** ScrapingBee - Sign up for a free trial and get -10% on the first invoice with code "NIESPODD" Avoiding bot detection: How to scrape the web without getting blocked? 👨‍🔧 Whether you're just starting to build a web scraper from scratch and wondering what you're doing wrong because your solution isn't working, or you've already been working with crawlers for a while and are stuck on a page that gives you an error saying you're a bot, you can't go any further, keep reading. Anti-bot solutions have evolved in recent years. More and more websites are introducing security measures: from simple ones, such as filtering IP addresses according to their geolocation, to advanced ones based on in-depth analysis of browser parameters and behavioral analysis. All this makes web scraping content more difficult and costly than a few years ago. Nevertheless, it is still possible. Here I highlight a few tips that you may find helpful. Where to begin building undetectable bot? Below you can find list of curated services that I used to get around different anti-bot protections. Depending on your use-case you may need one of the following: | Scenario/use-case | Solution | Example | | - | - | - | | **Short-lived sessions without auth** | Pool of rotating IP addresses | That comes handy when you scrape websites like Amazon, Walmart or public LinkedIn pages. That is any website where no sign-in is required. You plan to make a high number of short-lived sessions and can afford being blocked every now and then. | | **Geographically restricted websites** | Region-specific pool of IP addresses | This is useful when the website uses a firewall similar to the one from Cloudflare to block entire geography from accessing it. | | **Long-lived sessions after sign-in** | Repeatable pool of IP addresses and stable set of browser fingerprints | The most common scenario here is social media automation e.g. you build a tool to automate social media accounts to manage ads more efficiently. | | **Javascript-based detection** | Use of popular evasion libraries, similar to puppeteer-extra-plugin-stealth | There is a number of websites utilizing FingerprintJS that can be easily bypassed when you employ open-source plugins such as the aforementioned puppeteer stealth plugin to work with your existing software. | | **Detection with browser fingerprinting techniques** | Natural looking browser fingerprints. That is, having covered the whole surface that is being validated by the installed Javascript solution on the target website. | These are one of the most advanced cases. Mainstream examples are credit card processors such as Adyen or Stripe. A very sophisticated browser fingerprint is being created to detect credit fraud, or prompt additional authorization from the user. | | **Unique set of detection techniques** | Specialized bot software that targets the unique detection surface of the target website. | Good examples are sneakers marketplace websites and e-commerce shops, reportedly being under heavy attack from custom made bot software. | | **Simple custom-made detection techniques** | Before diving into any of the above, if you are targeting a smaller website, it is very likely that all you need is a Scrapy script with tweaks, a cheap data-center proxy, and you are good to go. | - | Once you have decided on what type of evasion is going to be needed in your project, you can use the list below to pick the best provider for your project: Helpful services Type Service Note Proxy The Social Proxy Highly recommended 👍 ✔️ Pros : The IP pools is consistently good, contrary to existing "big sharks" of the proxy industry that charge per GB, here you get unlimited traffic within a rotating endpoint. Transparent business model. ❌ Cons: The geo coverage is limited to the countries listed on the website. IP isn't rotated instantly, but you rather got to wait 10-15 seconds. BrightData (formerly Luminati Networks) One of the most popular, but probably as well the most expensive, proxy provider. The IP pool is mainly sourced from users of HolaVPN and an app monetization SDK. Oxylabs Competitor to BrightData with more no-code/low-code scraping products. Scraping as a service ScrapingBee Highly recommended 👍 One of the most advanced stealthy scraping as a service. At times it may be cheaper than building a dedicated scraping solution - they do not charge for the amount of traffic used. Apify.com Apify has evolved into a complete scraping and automation SaaS platform, with ready-made tools, an integrated proxy, and custom solutions for scraping at any scale. Developers can also create scrapers on the platform and rent them to other users. De-captcha as a service Anti Captcha: Captcha Solving Service. Bypass reCAPTCHA, FunCaptcha (...) Self-explanatory. Bitcoin accepted ❤️. List of anti-bot software providers This is a non-exhaustive list of companies that provide the most advanced anti-bot solutions for businesses ranging from smaller e-commerce sites to Fortune 500 companies: • Akamai Bot Manager by Akamai • Advanced Bot Protection by Imperva (former Distil Networks) • DataDome Bot Protection • PerimeterX • Shape Security • Cloudflare Bot Management • Barracuda Advanced Bot Protection • HUMAN • Kaskada • Alibaba Cloud Anti-Bot Service • Travatar • HUMAN • Ocule • Sift • Forter • Reblaze • Arkose Labs • LexisNexis® ThreatMetrix® How do you know who is getting you blocked? Join extra.community. There runs an automated tester **Botty McBotface** that uses several complicated techniques to determine what exact protection a tested website uses (credits to berstend and others from #insiders). Available stealth browsers with automation features **Important**…