Web Crawling & Web Scraping: The Mystery Behind Legal Lines

This is in continuation to the “Web Crawling and Web Scraping: Mystery Behind Legal Lines”. In case you have missed out on the earlier two publications, you can click here for Part 1 and click here for Part 2.


Let’s put things into perspective this way – imagine someone using your bandwidth and is very conveniently retrieving data for use from your website, are you going to like it? Probably not very much. The fact whether you will take action or not, or to what degree is a different story; but you are obviously not going to like it and at least wish that the scraper stops.

Now, that’s exactly what goes through a website administrator’s mind when you try to rip data off his website and are consuming his bandwidth for apparently no good to the website owner. While what happens next depends a lot on the website administrator’s mood, but rest assured, he/she has all the right to file a suit.


You may think that this is a clear exaggeration of the case and it is most likely that the website administrator may simply ignore you or use a technical setup to perhaps block your IP or MAC and perhaps the worst case being him sending you a ‘cease & desist’ email. However, the news is that life is never so simple, especially when you try to infringe or trespass on someone else’s property.

Here is a classic case where Linkedin is suing ~100 odd anonymous initiators of web crawling and web scraping on their website. While the verdict is not yet out, let’s just look at the codes under which the lawsuit is filed.

  1. Violation of the Computer Fraud and Abuse Act (CFAA).
  2. Violation of California Penal Code.
  3. Violation of the Digital Millennium Copyright Act (DMCA).
  4. Breach of contract.
  5. Trespass.
  6. Misappropriation.


Now, while the verdict is not yet out, one can easily fathom the collateral damage that being booked under even one of these acts can do to the “anonymous” web crawler.

Imagine yourself being booked for something as simple as trespassing, the thought itself is not very pleasant. Even if we let aside the monetary shock that this verdict can have on your financials (let’s not forget that this cost does not include only the indemnity that you may have to pay but also the exorbitant lawyer fee that you will have to cough up), the very pain of handling a lawsuit could be quite devastating to your peace of mind. Just a heads up, in such cases it typically does not really matter if you scraped or crawled a website for a noble cause or with malicious intent; what really matters is that you infringed upon certain laws and you are liable to make good for the damage (that the filer claims).

The biggest caveat here is that dealing with such a case is not like any other organizational challenge that you may have experience dealing with A lawsuit is devoid of common sense or rational logic, it is all in black and white. This is clearly not a level-playing field where the men in black have an edge over you through legal jargons and twisted evidences unless you have really deep pockets and are willing to splurge on the best lawyer on the street (Also, even that is no guarantee that you may win). After all, you would wonder if the entire drama was even worth it?

Another problem is that law isn’t like anything you’re probably used to. Because where you use logic, common sense and your technical expertise, they’ll use legal jargon and some grey areas of law to prove that you did something wrong. This isn’t a level playing field. And it certainly isn’t a good situation to be in. So you’ll need to get a lawyer, and this might cost you a lot of money.

Like in the Linkedin case, you would be able to understand the fact that such a case wouldn’t go down with being just a simple “web-scraping” case, but will expand its scope exponentially.

Comments are closed