Ghost Spam: a Horror Story, with Graphs

The web developer paused at the threshold of the WordPress admin pages. The dashboard was clean and well-organized; there were no ominous sounds, no foreshadowing of what lurked in the Google Analytics summary until she saw, out of the corner of her eye, the stat for the site’s top page:

The Top Referrers suddenly flew out at her: free-share-buttons, Get-Free-Traffic-Now, free-social-buttons, buy-cheap-online and guardlink. She tried to scream, but something cold and slimy wrapped around her throat as vitaly rules google taunted her from Top Searches.

OK, it didn’t go exactly like that. I was doing routine maintenance on a multisite several weeks ago when I realized the top pages for the site were all sexyteens addresses. My first thought was that some kid in the student lab with a penchant for porn had compromised the system. It wasn’t until I started tracing the stats back that I found out these, and their companions, were ghost spam.

What Are These Ghosts?

Ghost spammers are bots that never actually visit your web site. They’re exploiting the Google Analytics API and injecting spam directly into the stats. How-to articles usually describe these as ghost referrals, but that’s misleading because referrals are only one facet of the stats being attacked. Bots are also injecting bogus data for direct hits, search keywords and ad campaigns.

Why do spammers do it? Some articles speculate that spammers are injecting URLs into your stats in the hopes that you’ll go there out of curiosity, either increasing traffic to their e-commerce site or enabling them to infect your computer with malware. But many of the search keywords aren’t URLs, while the direct hits don’t lead anywhere. It seems more likely to me that the ghost spam exists primarily to harm Google, and the honest users of Google Analytics are collateral damage.

Measuring the Evil that Ghosts Do

If your livelihood depends on the stats for an online operation, ghost spam is extremely harmful. If you manage stats for multiple sites, it’ll be one of the most vile things you’ll ever encounter. Here’s a screen shot of what it did to one of the 70 sites whose Google Analytics I manage:

graph for European Studies analtyics

Stats for European Studies site from November 27, 2014, through May 31, 2015

You can see where the stats changed, jumping from maybe three sessions a day — this being a tiny site for a minor program — to spikes that are 20 times that number. The attacks started on this particular site March 17. Ghost spammers found maybe half our sites that week and all of them before the beginning of May.

Random Observations of a Ghost Hunter

I wish I had time to analyze the data over an extended period because interesting patterns are emerging. Here are three observations that deserve more scientific followup:

  • Our less-trafficked sites drew a disproportionate amount of ghost spam — not just percentages, but real numbers. For June 2014 through May 2015, one of our smallest sites recorded 1,617 sessions, of which 1,463 were ghosts. During the same period the most-visited site recorded 630,693 sessions, of which only 398 were ghosts. I saw similar stats across a sampling of about a dozen sites.
  • The departmental web sites we host on one WordPress multisite and the personal web sites on a different multisite had overlapping but slightly different flavors of spam. The worst spammers hit stats for both, but the less aggressive ones seemed to have focused on one multisite or the other. It’s possible they simply haven’t caught up yet.
  • Ghost spammers are finding new sites on Google Analytics immediately. I had one new setup on April 19 and another on June 3, and in both cases less than 48 hours had passed before ghosts were injected into the stats.

Trying (and Failing) to Cleanse the Haunted House

Google, which is fully aware how how much trouble spammers of all types are to its clients, announced a new bot and spider filter in July 2014. In the admin area under View Settings you’ll find a checkbox labeled “Exclude traffic from known bots and spiders.” Just check it and save the updated settings. It’s easy.

But so far it’s useless. I activated bot filtering on this site at the end of April. The screen shot illustrates how well that succeeded. (If you can’t see the graph, the answer is: not at all. On May 19 we had 59 sessions, one of the largest spikes recorded.)

marked graph for European Studies analtyics

Stats for European Studies site indicating ghost spam after filter was activated.

Google uses the IAB/ABC International Spiders & Bots List as the basis for filtering, and (assuming Google’s method is 100-percent effective, which I am), the list is simply not keeping up with all the new ghost spammers. It takes less time and effort to create and launch a new spambot than it does to discover, analyze and block it.

Laying Ghosts to Rest

After weeks of monitoring, reading, tinkering and testing, I’ve found what doesn’t work and what seems to do the job. In another post I’ll cover what I’ve found out so far.