Copyright Infringement Statistics and Internet Scraper Sites
I'm attending the Webmaster World conference this week, so it occurred to me to try to find some statistics about copyright infringement on the Internet, just in case the subject comes up:-) As I mentioned in a previous post, my lawyer filed an infringement case in Federal court for me last week, and I'm looking forward to finding out whether the process proves to be a practical enforcement mechanism and deterrent. The only useful statistics about copyright infringements I turned up were from the Federal government itself. I'll summarize their reporting for all copyright cases for the 12 months ending September 2005.
Cases Filed: 4,494
No Action Yet: 1767
Settled Before Pretrial: 2,397
During/After Pretrial: 289
Jury Trial: 27
Nonjury Trial: 14
Percentage gone to trial: 0.9%
I didn't find any statistics about how those cases were decided or how the infringements broke down was in terms of books, music, print media, broadcast media or Internet. I'm sure I left plenty of possibilities out as well. The one thing that is for sure is that Internet infringement runs rampant, and if you include scraper sites in the infringement camp, it must run into the billions of pages. One amusing way to get a feel for the problem, rather than searching on random strings of text from a website, is to search on the copyright notification. Since I'm too lazy to remember the ALT key combination for the copyright symbol, my web pages all spell it out. Every search result that includes my copyright notification isn't necessarily an infringement, for example, many online retail sites selling my books are likely to turn up. It doesn't take long to turn up the real infringers who were too lazy to remove the copyright notification (or maybe who considered it immoral) and the scraper sites, which automate the borrowing process and don't really know what ends up on their pages.
I searched for the string "Copyright 2004 by Morris Rosenthal" on Google, Yahoo and MSN, and turned up the following results:
Yahoo - 176 results, 21 of which were mine, all or the rest appeared to be scrapers
MSN - 202 results, 2 of which were mine, number would have gone up if I expanded results
Google - 4,320 results, 46 of which were mine. In addition, Google found several "whole hog" infringers, along with the scraper sites.
Most scraper sites try to legitimize their automated borrowing of a few sentences based around a key phrase by providing a link back to the source. Their defence is that they are enjoying fair use of the copyrighted material. A quick Google on fonerbooks (this domain name) turns up over 11,000 results, less than a 1,000 of which are legitimate links. The 10,000 plus remaining links (and Google excludes many known spammers from these results) give an indication of the size of the scraper world. The real question I'd like to get answered one day is "Does the intent of the 'fair user' have legal bearing?"
Some automated fair use is a requirement for a functioning Internet. When Google, Yahoo or MSN present search results from my site, that's a form of automated (if temporary) fair use, and I'm very glad that they do it. Google, Yahoo and MSN all show advertising along side these fair use excerpts, without which they'd have little motivation to provide the search service. Scraper sites, on the other hand, create quasi-permanent web pages, with the intent that the search engines should index them, and bring the scraper site traffic and ad revenue. It's a very different business model, since they are depending on the text scrapings to bring them traffic, as opposed to presenting results to a visitor who has come to do a search.
Cases Filed: 4,494
No Action Yet: 1767
Settled Before Pretrial: 2,397
During/After Pretrial: 289
Jury Trial: 27
Nonjury Trial: 14
Percentage gone to trial: 0.9%
I didn't find any statistics about how those cases were decided or how the infringements broke down was in terms of books, music, print media, broadcast media or Internet. I'm sure I left plenty of possibilities out as well. The one thing that is for sure is that Internet infringement runs rampant, and if you include scraper sites in the infringement camp, it must run into the billions of pages. One amusing way to get a feel for the problem, rather than searching on random strings of text from a website, is to search on the copyright notification. Since I'm too lazy to remember the ALT key combination for the copyright symbol, my web pages all spell it out. Every search result that includes my copyright notification isn't necessarily an infringement, for example, many online retail sites selling my books are likely to turn up. It doesn't take long to turn up the real infringers who were too lazy to remove the copyright notification (or maybe who considered it immoral) and the scraper sites, which automate the borrowing process and don't really know what ends up on their pages.
I searched for the string "Copyright 2004 by Morris Rosenthal" on Google, Yahoo and MSN, and turned up the following results:
Yahoo - 176 results, 21 of which were mine, all or the rest appeared to be scrapers
MSN - 202 results, 2 of which were mine, number would have gone up if I expanded results
Google - 4,320 results, 46 of which were mine. In addition, Google found several "whole hog" infringers, along with the scraper sites.
Most scraper sites try to legitimize their automated borrowing of a few sentences based around a key phrase by providing a link back to the source. Their defence is that they are enjoying fair use of the copyrighted material. A quick Google on fonerbooks (this domain name) turns up over 11,000 results, less than a 1,000 of which are legitimate links. The 10,000 plus remaining links (and Google excludes many known spammers from these results) give an indication of the size of the scraper world. The real question I'd like to get answered one day is "Does the intent of the 'fair user' have legal bearing?"
Some automated fair use is a requirement for a functioning Internet. When Google, Yahoo or MSN present search results from my site, that's a form of automated (if temporary) fair use, and I'm very glad that they do it. Google, Yahoo and MSN all show advertising along side these fair use excerpts, without which they'd have little motivation to provide the search service. Scraper sites, on the other hand, create quasi-permanent web pages, with the intent that the search engines should index them, and bring the scraper site traffic and ad revenue. It's a very different business model, since they are depending on the text scrapings to bring them traffic, as opposed to presenting results to a visitor who has come to do a search.

<< Home