24
Jul
2014

Dr. Crawlit - A Bot That Cares About the ‘Little Guy’

In the first post of this two-part series, we shared our insights into Googlebot’s activity and behavior patterns.

However, no overview of Googlebot activity would be complete without a mention of Googlebot imposters, who assume Googlebot’s identity to gain privileged access to websites and online information.

Every day millions of these “evil twins” are used for DDoS attacks, hacking, spam, content theft and many other shady activities. The details of these malicious escapades, that paint the event logs of Incapsula’s security services, are what we share with you here today.

Tweetable Stats


Methodology

For purposes of this study, we observed over 400 million search engine visits to 10,000 sites, resulting in over 2.19 billion page crawls over a 30 day period.

Information about Googlebot impostors (a.k.a., Fake Googlebots) comes from inspection of more than 50 million Googlebot impostor visits, as well as findings from our ‘DDoS Threat Landscape’ report, published earlier this year.

One in Twenty Five Is up to No Good

One of the most interesting facts about Fake Googlebots is just how common they really are. Based on our data, we see that over 4% of bots operating with Googlebot’s HTTP(S) user-agent are not who they claim to be. For those who unfamiliar with the term, “user-agent” is an online equivalent of an ID card, used to identify website visitors - browsers or bots.

As demonstrated below, the actual “type” of these impostors may vary, but all of them should be deemed suspicious by default, due to their attempt to assume a false identity.

Why Use Spoofed Googlebots?

To answer that question, just consider the benefits that come with fake Google credentials. For one, “Google ID” is as close as a bot can get to having a VIP backstage pass for every show in town.

After all, most website operators know that to block Googlebot is to disappear from Google. Consequently, to preserve their SEO rankings, these website owners will go out of their way to ensure unhindered Googlebot access to their site, at all times.

In practical terms, this may translate into exceptions to security rules and lenient rate limiting practices. At Incapsula, a month does not go by without our coming across hackers hoping to exploit these loopholes to improve their chances of success.

Extremely Popular DDoS Tools

Drilling down into our security logs, we were able to see just how Fake Googlebots are employed.

By observing recent data, collected from over 50 million Fake Googlebot sessions, we saw that 34.3% of all identified impostors were explicitly malicious, with 23.5% of these bots used for Layer 7 DDoS attacks.

Googlebot is an Extremely Popular DDoS Tools

'Other purposes' are suspicious, yet not not downright malicious, activities.

These numbers make all sorts of sense because DDoS is just the situation where Googlebot’s ID can come in handy, particularly in the case of security solutions that still rely on rate limiting instead of case-by-case traffic inspection. Website operators who use such solutions are unable to identify real Googlebots from fakes. As a result, when the attack bells go off, they are presented with a harsh “all or nothing” dilemma: to block all Googlebot agents and risk loss of traffic, or to allow all Googlebots in and suffer downtime.

With DDoS events that can last for days, weeks or even months, each of these alternatives is extremely damaging for the target, making the attack a success – either way.

The good news is that Fake Googlebots can be accurately identified using a combination of security heuristics, including IP and ASN verification – a process which allow you to identify bots based on their point of origin.

Still, even these practices rely on excessive processing power and software capabilities not typically available to the regular website owner.

A few months ago, in our ‘DDoS Threat Landscape’ report, we showed that Fake Googlebots are the 3rd most commonly used bot in DDoS attacks. The reason for this popularity is the very dilemma described above.

Fake Googlebot DDoS attack: 1,446 Requests per Second (RPS) originating from 262 attacking IPs.

Fake Googlebot DDoS attack: 1,446 Requests per Second (RPS) originating from 262 attacking IPs.

Spread Around the Globe

Fake Googlebot visits originate from botnets - clusters of compromised connected devices (e.g., Trojan infected personal computers) exploited for malicious purposes.

Looking at the sources of Fake Googlebot visits, we can identify the locations of some of these botnets. The list consists of many of the usual suspects, with the US, China, Turkey and India still claiming four of the top five spots, just as they were several months ago when we were compiling data for our ‘DDoS Threat Landscape’ report.

Googlebots' Country of OriginShare of Visits
US25.16%
China15.61%
Turkey14.7%
Brazil13.49%
India8.4%
Thailand4.07%
Other4.07%

However, we were a bit surprised to find Brazil squeezing into the 4th place, as the origin of 13.49% of all recent Fake Googlebot visits.

Does this have something to do with the World Cup? We can’t honestly say. Still, these numbers may have something to do with the myriad Internet devices brought into the country by 1 million tourists, some of whom probably should pay more attention to what they download.


We will continue to keep an eye on Fake Googlebot escapades. In the meanwhile, you can click here to learn more about Dr. Crawlit – the original Googlebot .