This will be a long post covering both Panda destruction and recovery, so I thought I’d better start off with a graph and a little backstory for context.
Once upon a time, a little girl with golden hair got a job with a big corporation judging the quality of websites. They dressed her up in a funny bear suit and sent her out to pass black and white judgment on three websites. First, she tried the poppa website, online ever since 1996 and drawing around 2,500 visitors a day from Google. “This website is too old,” she said, and Google took away 80% of its visitors. Then she tried the momma website, online since 2000 and drawing over 6,000 visitors a day from Google. “This website is too ugly,” she said, and Google took away 75% of its traffic. Finally she tried the baby website, online since 2008, which was drawing around 500 visitors a day from Google. At first she wasn’t sure, and then she thought didn’t like it, but then she kind of liked it and finally she decided “This website is just right,” and Google tripled its traffic.
The poppa website in this story is DAILEYINT.COM. It was started as an essay style technology news site with Franklyn Dailey Jr., a fellow author with whom I’d co-authored a book length technology report for a tech publisher. We soon realized we couldn’t compete with 24×7 technology news sites powered by PR releases, so we repurposed the site for our book writing projects. DAILEYINT consists of about 160 web pages, the majority of which were published as book chapters over the last fifteen years. Franklyn, who retired from the service as a Captain (USNR), served as the gunnery officer on a destroyer in World War Two and later as a naval aviator. His pages are historical nonfiction, reinforced by research, eyewitness accounts and photographs. My part of the website started with the drafts of The Hand-Me-Down PC and Build Your Own PC, books I later sold to McGraw-Hill, and a newsworthy (Dateline MSNBC, front page of The Investors Business Daily) PC question and answer page with daily updates.
The momma website is FONERBOOKS.COM, which I started with daily progress reports about translating my great-grandmother’s groundbreaking fiction works from Hebrew. When that project was finished, I moved my prior Israel writing and the often cited analysis of Amazon ranks over from the DAILEYINT site, and focused all of my writing efforts on FONERBOOKS. The most popular pages were full chapters from my PC troubleshooting book with interactive flowcharts, but the publishing journalism also became popular and my Self Publishing blog (started in 2005) was a top 10 lock in Google search for years. I also kept up the Israel guide, added an illustrated section about building a timber frame, and wrote about my misadventures in business and investing.
The baby website, IFITJAMS.COM, was started for the purpose of proving to my publishing blog readers that it wasn’t already too late to build a content based website in 2008. I built it around some hack work I was doing to keep my 1986 Dodge Omni on the road, added some troubleshooting flowcharts for basic car problems, and even a short-lived blog for fixing anything. I’m strictly a shade-tree mechanic, my experience is limited to fixing my own cars and occasionally helping friends, and most of my work would cause a professional mechanic to laugh himself to death. The site also includes a little of my “grin and bear it” medical writing that would get a physician sued for malpractice.
Prior to the Panda update, Google’s main indicators of site quality (what they now dismiss as mere “relevancy”) were the quantity, quality and context of incoming links. The table below shows the links to these three websites as reported by Google Webmaster Tools and the Yahoo! Site Explorer Tool. The totals are different because they count links in different ways, neither of which is particularly reliable. Webmaster Tools, amusingly, counts links that are NOFOLLOWED.
|Links (WebMaster Tools)||12,250||70,134||1,908|
|Links (Yahoo Explorer)||2,452||26,492||911|
|Pages Infringed/Scraped||Medium to High||High||Low|
|Links from Spammers||Medium||High||Low|
|Pages||160||1000 (70% blogs)||30|
|Link Quality||High||High to Medium||Medium to Low|
|HTML Uniformity||Medium||Low (blogs)||High|
|Google Analytics||100% (now)||50% (blogs)||100%|
|Changes since Panda||Many||Many||Few|
The pages infringed/scraped count is based on Google searches using exact quotes from pages on the websites. Not surprisingly, the most popular pages of past years tend to be those that got ripped off the most. For both the FONERBOOKS and DAILEYINT sites, the most infringed upon pages are my troubleshooting flowcharts, though the automatic scrapers often take the text and leave the flowcharts behind! It’s not uncommon to find thousands of syndicated infringements of a single popular web page. There were very few infringements on the IFITJAMS website, mainly snippets or graphics pasted into automotive forums.
I’ve spent entire weeks this year filing DMCA complaints for infringements on DAILEYINT and FONERBOOKS, sending hundreds to Google Blogger alone. But it’s a disgusting experience, or as old New Englanders would put it, you can’t touch pitch and not become defiled. Prior to Panda, Google did an excellent job making sure that the original web pages appeared in search before the infringements. After Panda penalties are applied, that whole system fell apart.
Links from spammers is another category that isn’t supposed to make a difference, but webmasters are beginning to assume that Google has taken the approach that if there are too many links from bad neighborhoods in proportion to your total, something must be wrong. Thanks to autoblogging, scrapers, and other forms of spam generation software, FONERBOOKS has over thirty thousand unwanted links from lousy websites. A single offshore blogging site links to a minor page on the FONERBOOKS website over 13,000 times! Spammers have always included links to high quality websites in the belief that this will lead Google to include their spammy websites in a good neighborhood. It may be working the opposite direction. The table below shows links from single spammy domains to the left, and the pages they link to right.
The page count for IFITJAMS is currently 30 pages, of which a single page is new this year.
The DAILEYINT count is approximate, there used to be over 250 pages, but in a effort to please Goldilocks, I removed three sections of older pages this year. First I got rid of an experimental Blogger blog from 2005 in which I would report a leading news story of the day from 100, 200 or 300 years ago. I just checked Webmaster Tools, and it turns out that one of these blog posts had attracted 71 links from 44 different domains, primarily .EDUs and .ORGs. Now it’s a soft 404. Then I got rid of the novel I posted online in 1996, which was one of the first to be included in Yahoo’s web published fiction directory and was referenced as a citation example by the Chicago Manual of Style. I think it was 26 chapters, call it 30 web pages.
Next, I got rid of The Midnight Question and Archives, nine LONG web pages of computer questions sent from all over the world which I had answered in the 1997/1998 time frame. My apologies to people who like using old computers. Finally, I dumped all of the original Hand-Me-Down PC chapters and a number of related pages that just didn’t draw many visitors. I’m a bit ashamed of all this since there was nothing wrong with the quality of any of these pages, but I’m addicted to troubleshooting so I couldn’t resist trying something.
The FONERBOOKS page count has been bouncing around this year as I deleted over five hundred old blog posts to get the old Blogger code off my site, and then felt so bad about all the incoming links getting intercepted by my 404 page that I put them all back last week. I’m still in the Webmaster Tools process of manually reincludng them in the index. At this point, I’m thinking of restoring the entire site from a 2010 backup, but it seems a shame considering how I invested a couple weeks redesigning the navigation and eliminating pages that only drew a few visitors a day. I think the only pages on FONERBOOKS that I was really happy to get rid of were a series of financial articles and blog posts I wrote about day trading some years ago, when I forced myself to make a stock trade every day for a month. I think it was a down month and I lost around $2,000 trading.
The link quality in the table is just my assessment of the average quality of incoming links. All three sites have some super high quality links from top websites and media outlets (New York Times, The Wall Street Journal, etc), and all sites have some super low quality links from spammers who were trying to move into a better neighborhood. On the whole, I think DAILEYINT has the highest quality links on average because the bulk of the pages, Franklyn’s historical work, just don’t attract links from spammers. There are also a couple PageRank=6 pages left on the DAILEYINT site. While FONERBOOKS has far and away the most high quality links of the three sites, the average is dragged down by all the spammers and scrapers. IFITJAMS has a single high quality link (Make Magazine) and a couple links from schools, but primarily it gets mentions in forums. IFITJAMS also has almost eight times as many NOFOLLOWED links from eHow (230) as it does pages to accept links (30).
I touched on author authority earlier, by which I mean the professional qualifications and experience of the authors in the particular subjects. On DAILEYINT it’s tops for all the pages. On FONERBOOKS, the unending pressure to produce blog posts over the course of six years no doubt led me to rush a post here and there, though I rarely write from off the top of my head. Author experience is the main knock on IFITJAMS, for which I’d call my author authority low. While it’s based on personal experience, much of that personal experience has resulted in needing to fix it again. Of course, the site’s moto is, “If it jams, fix it, if it breaks, fix it again.”
Google loves YouTube so it wouldn’t shock me if having a popular YouTube channel associated with a website is seen as a sign of quality. It also fits in with Google’s broader theme of trying to compete with FaceBook by playing up social networking. The IFITJAMS YouTube channel gets a couple thousand views a day for the collection of unrehearsed two minute videos I made while working on the car. The FONERBOOKS YouTube channel gets less than 100 views a day, most of them driven by my website. DAILEYINT has no YouTube presence.
Google Search loves Google products so much now that I even see Google Answers pages from 2002 popping up ahead of my regularly updated publishing statistics pages, even where the Google Answers page refers to mine as a source. And Google Books outranks FONERBOOKS for a search on The Laptop Repair Workbook, though all Google Books has to offer is a computer generated bibliographical record that I doubt has ever been linked.
HTML uniformity is a measure of how similar all of the pages are to each other, ignoring the content. The pages on IFITJAM are very uniform and all live in the root directory. They were created in the same old Windows 3.1 HTML editor from 1995 (GNNPress) and there are only two basic page designs on the site.
The DAILEYINT pages are getting more uniform every time I change anything on the site, though there are still a dozen pages with long URLs from an old Blogger blog I started writing about electric car technology. I converted them into plain HTML this year and I could easily delete them, but they’ve drawn some incoming links that I hate to orphan. The DAILEYINT pages are all in subdirectories, one for each topic. The biggest knock on the HTML would be the lack of any META tags on Franklyn’s pages, and the fact he occasionally uses empty header tags to create vertical spacing. Maybe that makes him an evil person.
From an HTML uniformity standpoint, FONERBOOKS is a mess. There are over five hundred old Blogger posts I just put back on the site to get rid of the soft 404 errors, and there are all the WordPress Self Publishing blog posts I’ve written since Blogger (Google) shut down their FTP service in early 2010. According to Webmaster Tools, there are some title tag repeats in the WordPress pages that I don’t know how to avoid since they aren’t really different pages, just different paths. And maybe Panda thinks it’s a sign of low quality to have page names ending with both “.htm” and “.html” (thanks to Blogger). Maybe that makes me an evil person too.
The FONERBOOKS directory structure is a little haphazard, but there’s nothing I can do about it since I’m hosted on a Windows IIS machine without the ability to implement 301 redirects to move things around. This is particularly painful for my old publishing blog main page which has links from over 500 different domains (that’s more different domains than have linked the entire IFITJAMS site) which I can’t redirect it to the new Self Publishing blog. I’m open to suggestions;-) ***Update: I figured out the redirect using the web.config file this morning. I have no idea why my extensive searching last year didn’t turn up a solution, I was still using Google then. ***
Other than the uniformity of the HTML, I wanted to say something about the other non-text elements of the sites. IFITJAMS is the most uniform, with Google Analytics code on every page. DAILEYINT used to be 50/50 because I only had Analytics code on my pages while Franklyn uses the server stats. Sunday morning I stuck my Analytics code onto all of his pages and made a few little HTML tweaks as I went along. Franklyn, at 90 years old with macular degeneration, is still adding pages to the DAILEYINT website because history is important to him. I’m not going to ask Franklyn to learn to use a different WYSIWYG editor with the productive time he has left.
I’ve put the biggest post-Panda effort into DAILEYINT because I know how much Franklyn enjoys hearing from old veterans and their surviving family members, and he’s been able to solve a few historical mysteries based on people coming forward with oral histories 65 years or so after the fact. That effort mainly amounted to removing older pages of mine that didn’t draw much traffic and trying to make my own pages more uniform in HTML construction and navigation. The site is so straight forward that other than removing pages that drew less than ten visitors a day and filing DMCA complaints, I couldn’t think of anything else to try.
I’ve made a good number of changes to FONERBOOKS since Panda hit, but it’s been more a question of making changes I’d been afraid to try in previous years for fear of losing search engine presence. Amongst other things, I deleted a couple blogs I had tried and abandoned, I deleted most of my pages about website design for authors, and I cut way back on internal navigation links between my most popular pages. Traffic has done nothing but droop, so I may restore from a 2010 back-up and forget about it.
I made the fewest changes to IFITJAMS, in part because the site isn’t associated with any published books so it doesn’t have any impact on my publishing business unless I sell the site outright. The only thing I can remember doing was deleting the FixIt blog in April, which totaled ten posts made over the course of three months in 2008. I might have neatened up the HTML, things like table sizes or META tags. I don’t remember, though anybody can check by comparing the current pages against the Internet Archive records.
So I’m left with two questions. Why did the Panda eat DAILEYINT and FONERBOOKS and why did it regurgitate IFITJAMS and then gift it multiples of visitors?
My leading theory has always been Google’s misidentification of duplicate content, and this theory is shared by many other webmasters with high quality content. FONERBOOKS and DAILEYINT are not only heavily infringed upon and scraped, much of the book content was voluntarily included in Google Books (we requested it be removed this year). My computer books have been heavily pirated, so that PDF copies are all over the web, and my more popular publishing articles have been stolen and in many cases, thinly rewritten.
I made a point of including links back to my website in the PDF eBooks I began publishing without DRM in 2008, so that Google would know where the content came from, but that didn’t stopped them from ranking piracy sites above my own site on some book title searches post-Panda. And I even tried fighting fire with fire by releasing a free PDF sampler with the same title as a book for propagation on piracy networks. Google’s company line about duplicate content has always been that it’s not a problem because you can always file a DMCA complaint. If even a single Google employee has spent as little as a hundred hours of his life time filing DMCA complaints, I’d love to hear from that employee so I can say, “A hundred hours? That’s nothing.”
Another possibility is that Google has targeted certain subject areas with a high level of skepticism. Since the Panda penalty is applied to the whole website, that means that if computer hardware is categorized a high crime area, both FONERBOOKS and DAILEYINT would be subject to that filter. If the next filter is to check if the content is unique, given the fact that there are tens of thousands of web pages infringing on my computer writings, they could easily be saying, “A pox on all their houses.” There’s nothing I can do to prove to a computer algorithm that I’m the author of my work. Fifteen years of accumulated organic links from high quality websites used to prove that point for me, but they’ve been overruled by Panda.
If it wasn’t for all of the links to my computer pages on DAILEYINT (not to mention around 500 NOFOLLOWED links from eHow to a mere handful of pages), I might delete them all and see if Franklyn’s history pages recover. The problem with that scenario is it would likely make Google believe that the leading copyright infringer of the moment is the original source, and I’d have pushed the few hundred remaining visitors a day to a crook. Or I suppose I could leave my host of the last decade for a host with Unix servers that would allow me to 301 redirect the two dozen computer pages to another site and see if Franklyn’s pages recover, but it’s a lot of hassle for a highly unlikely outcome.
Another possibility is automated scoring of website quality based on the underlying HTML aesthetics of our pages. I don’t really believe this since I’ve seen too many high quality websites decimated by Panda which used solid content management systems that provided entirely uniform HTML. But perhaps Panda is doing something insane like looking at our simple book order pages and declaring, “These evil people link this one page from lots of other pages and there’s not much here other than a link to Amazon.” Well, Panda-Pooh, that’s where we send people who want to buy a book.
It’s also possible that Panda is just dying to let FONERBOOKS and DAILEYINT recover but that another Google filter monitoring suspicious SEO activity (deleting and replacing pages, changing internal navigation links) is getting in the way. If the only Panda recovery I was aware of was IFITJAMS, I’d guess that an SEO penalty was likely, but since I’ve read about forum sites recovering after making massive and ongoing changes, I’m not going to worry about it. In any case, I’ll hold off on restoring any pages to DAILEYINT, even though that means hundreds of good, organic links pointing to missing pages, and if it should recover while FONERBOOKS languishes, it could mean that a busy beaver penalty is in force.
The funny thing is that I find myself in the same position as a few other author/engineers I know who are too disgusted with what Panda actually means to care whether or not our sites recover. Yet we are too obsessed with troubleshooting to just let it go. The very fact that a Panda can come out of nowhere to eat all of your web traffic while the PhD’s at Google talk about quality and suggest we evil authors examine our souls, means that using a website for the primary publication media is no longer a business model. That doesn’t change even if FONERBOOKS and DAILEYINT follow IFITJAMS into Panda heaven next month and instead of the old 10,000 visitors a day the sites start to draw 25,000. If I want to take up gambling for a living, I can go back to day trading and skip writing about it.
So it’s back to watching the old clock and and waiting for the hammer to fall.