Last Tuesday, December 9, we released our 2013 Bot Traffic Report, showing that bots account for 61% of all website traffic. Following this report, and the resulting media coverage, we were hit with a lot of questions. Mainly, we were asked about the nature of “Other Impersonators” - which represent 20% of all bots - and about our bot identification techniques.
We believe that the following case study describing a DDoS attack that occurred on December 10th, one day after our report was published, provides a great answer to both of these questions.
Attack Description: DDoSed by Googlebots?
The target of the attack was a relatively small commercial website, which onboarded Incapsula’s service earlier this year. On the night of December 10, the site became a target for a Layer 7 DDoS attack, which peaked at around 1,446 Requests per Second (RPS) and amounted to a total of 973K daily hits.
The malicious traffic originated from China, as well as North and South America. Over the course of the attack we were able to record 262 attacking IPs, including addresses belonging to a local education institute.
In term of its relative scale and duration, the attack was mid-sized at best, especially when compared to other DDoS events we’ve handled earlier this year. Still, if left unattended, this would have been enough to bring down most commercial websites.
For the case in point, what interested us was the attacker’s assumed identity.
Looking at our logs we saw that the DDoS bots were using Googlebot’s user-agent, possibly in an attempt to bypass various “home-brewed” methods of DDoS protection (e.g., redirects via .htaccess file).
Obviously, these fake Googlebots were blocked before they could reach their target. This blocked attack provides the timely example we need to demonstrate Incapsula’s advanced methods of Client Classification.
Cross-Verifying the Identify of Fake Googlebots
General distrust and ongoing vigilance are the cornerstones of traffic profiling. When dealing with bots, you always have to play the “devil’s advocate” and assume that everything you see is false, until reliably proven otherwise.
Simply put, you must be aware that user-agents can be fake, IPs can be spoofed, headers can be re-modeled and so on. And so, to provide reliable identification, you need to cross-verify various tell-tale signs to uncover the visitors’ true identity and intentions.
Step 1: Looking at Header Data
In this case, even though the bots were using Google’s own user-agent, the rest of the header data was very “un-Google like”. This was enough to raise the “red flag”, but not to issue a blocking decision because our algorithms account for scenarios where Google deviates from the usual header structure.
Step 2: IP and ASN verification
Next on our checklist was the IP and ASN verification process. Here we looked for a couple of things, including the identity of the owners of the IPs and the ASNs that are producing the now suspicious traffic.
In this case, neither the IPs nor the ASN belonged to Google. So, by cross-verifying this information with the already dubious headers, the system was able to determine – with a high degree of certainty – that it was dealing with potentially dangerous impostors.
Step 3: Behavior Monitoring
However, “potentially dangerous” doesn’t always equal “malicious”. For example, we know that some SEO tools will try to pass themselves off as Googlebot to gain a “Google-like” perspective of a site’s content and link profile.
This is why the next thing we look for is the visitors’ behavior, in an attempt to uncover their intent. Clues to such intent often come from the request itself, as it gets profiled by our WAF. In this case, the sheer rate of visits was enough to complete the picture, immediately pointing toa DDoS attack and raising the automated anti-DDoS defenses.
Step 4: IP Reputation and New Low-Level Signature
Although this wasn’t our first encounter with fake Googlebots, this specific signature variant wasn’t part of our existing database. Even as the attack was being mitigated, our system used the collected data to create a new Low-Level Signature which was then added to our 10M+ pool of signatures and propagated across our network to protect all Incapsula clients.
As a result, the next time these bots will come to visit, they will be immediately blocked. Moreover, the reputation of the attacking IPs was also recorded and added to a different database, where we keep data about all potentially dangerous IPs.
More Tricks Up Our Sleeve
As far as Layer 7 DDoS goes, this Googlebot DDoS was relatively simplistic in its nature. The attackers were only using a modified user-agent, leaving us plenty of obvious clues to the bot’s true nature and malicious intent.
However, many malicious bots are not that naïve. For example, recently we reported a DDoS attack by browser-based bots which were using legitimate user-agents, correct header data and even going as far as to mimic human-like behavior.
For such attacks, which are becoming alarmingly more and more common, our algorithms are augmented with additional security features that allow them to dig deeper – looking at attributes like JS footprint and cookie/protocol support.
Still, for our purposes, the details of this DDoS event provide a good example of the technology behind our latest report. It also helps concretize the concept of “Malicious Impersonators,” which can assume a wide number of forms, one of which is the above-mentioned DDoSing Googlebot.