Today Incapsula released its annual bot traffic report for 2014. In it, we point out a number of trends about bots—the Internet’s automated inhabitants—including:
- Bot traffic directed to smaller websites has surpassed 80%
- The slow demise of RSS is driving down bot traffic
- Impersonator bots are on the rise
- Bad bots threaten small and big websites alike
Based on this research, it appears that bot miscreants pose a ubiquitous threat. No matter what size or type of web presence you oversee, it’s going to be visited by bots—frequently. In this post we’ll show you how the Incapsula solution (including our free plan) blocks bad bots.
Hands-off Client Classification
Dealing with bad bots using Incapsula couldn’t be simpler. We’ve gone to painstaking measures to develop a completely automated system capable of identifying, classifying, and blocking malicious bots with no manual intervention. That is not to say we’ve implemented an iron-fisted, one-size fits all approach to dealing with bots. Quite the contrary, Incapsula is designed to be a no-touch, low false-positive solution; the key is our client classification engine.
Conceptually, the Incapsula client classification system may be thought of as concentric rings, or sequential layers of analysis. It determines whether a website visitor is human or not, and what its intention is.
Here’s a more detailed look at the process Incapsula uses to identify and classify bots for you:
Step 1: Looking at Header Data
By inspecting HTTP headers, Incapsula gains valuable insight into visitors, including various clues to as whether each is human or automated, and whether or not it is malicious. It’s important to note that headers can be faked, so they should never be the sole criterion for making a blocking decision. Instead this information should be combined with additional criteria to make a more informed determination.
Step 2: IP and ASN Verification
The IP and ASN verification process is next on our checklist. Here we look for a couple of items, including the identity of the IP and ASN owners and whether they match with the visitor. This can be used to identify malicious bots posing as legitimate ones.
For example, if a bot claims to be from a search engine like Google, but neither the IPs nor the ASN used match with that company, it’s a telltale sign that it’s likely a dangerous impostor.
Step 3: Behavior Monitoring
Additional useful information can be garnered from the visitors’ behavior and their requests. During analysis by our web application firewall, suspicious and malicious requests are flagged or blocked. This information is then fed into our classification engine. Indicative of automation, Incapsula also tracks items like the order or rate of requests, irregular browsing patterns, and abnormal interaction between clients and servers.
Step 4: IP Reputation
IP reputation is another powerful tool which Incapsula uses to quickly filter out bad bots. Due to our global network footprint and the number of customers we protect, Incapsula is uniquely positioned to perform large scale analysis on automated clients. Once we’ve identified a bad bot, a signature is created for it. All traffic across our network is then screened using that signature. This type of crowd sourcing enables disparate websites across the entire Incapsula community to actively participate in their own security, thereby benefitting the whole.
Step 5: Client Technology Finger Printing
Not all malicious bots are naïve. For example, last year Incapsula reported a DDoS attack initiated by browser-based bots using legitimate user-agents and correct header data. They even went so far as to mimic human-like behavior.
Assessing the Automated Threat You Face
As stated earlier, by default Incapsula protects users from bad bots. However, if you want to “pop the hood” on our client classification engine and get über familiar with bot traffic on your website, here’s how to become a bot killing pro.
The first step to using Incapsula to mitigate bad bots visiting your site is to understand your traffic. How much of it is automated? How much is malicious? Where is it coming from?
To answer these questions, we suggest starting at the traffic dashboard in Incapsula’s user interface (Figure 1). Just below the fold you’ll find aggregated visitor statistics comparing humans with automated clients (bots) that have accessed your website. Not all reported bots are malicious; good ones include Google, Pingdom, Bing and others.
According to our newly-released 2014 bot report, the average ratio of humans to bots is 44% versus 56%, respectively. You’ll likely see similar numbers reflected in your own analytics.
Taking a Deep Dive Into Your Bot Traffic
To learn which of the reported bots are suspicious or malicious, click the Security tab (Figure 2). Here you’ll be able to see how many bad/suspicious bots Incapsula has blocked.
Scrolling down further, you’ll see the top bot agents that visit your website.
Now that you have an overview of your bot visitors, let’s examine the details and then take action to protect our website. Within the Threats breakdown table, click View Incidents on the far right of the Bot Access Control row. Here you’re able to access per-incident event logs, and then single out specific actions taken by the bad bots interacting with your site.
This next screen (Figure 5) displays your website security logs. It already has an active filter that shows only bot-related events.
To focus on a specific bot type, select it from the Visitor Type choices (Figure 6 and Figure 5, highlighted). For example, selecting the Comment Spam Bot filter displays only incident results for this bot type.
Once you identify a bot you want to eradicate, options within the Actions dropdown menu (Figure 7, highlighted) let you Blacklist (its) IP, or block it by selecting Add to Bad Bots.
Dealing With Unwanted Visitors
To deal with bots more generally ”using site-wide rules” you’ll want to use Incapsula’s bot access control feature. This is located within the Settings menu (Figure 1). Navigate to Settings > Security > Bot Access Control.
Your account should reflect the default settings shown in Figure 9. This configuration automatically blocks bad bots, while maintaining access for good ones. You have the option to adjust either setting, of course. To restrict the list of good bots which can access your site, click the Good Bots link (upper-right, Figure 8) and deselect them (Figure 9).
You can also identify specific bots as being bad, thereby disallowing access to your website. To do this, click the Also block link (upper-right, Figure 8) and type in a bot you would like Incapsula to blacklist (Figure 10). If it’s in our bot database, then you can simply select it and we’ll make sure it never bothers you again.
If you want to tighten the screws on potentially-automated patrons, you can engage CAPTCHAs for suspected bots. Using this mode, if a visitor appears to be a bot but cannot be positively identified as such using Incapsula’s client classification engine, we challenge it with a human capacity test like the one in Figure 11.
Check the Require all other Suspected Bots to pass a CAPTCHA test box (Figure 12) to enable this feature.
If you want to ensure a bot does have access to your website, you can create a whitelist rule for it by clicking the Add exception link (lower-right, Figure 12). Here you’re able to create a custom rule based on a part of your website (URL), Client app ID, IP address, Country, or User agent (Figure 13).
Between Incapsula’s out-of-the-box bot mitigation and these additional tools, you’re fully empowered to weed out pesky robotic troublemakers from your web traffic. As part of all Incapsula plans, all of these features are available for free to help you better protect your website(s).
To learn more about more about this year’s trends in Internet bot activity, check out the Incapsula 2014 Bot Traffic Report.
Want to learn more about bots?
Visit these links to learn more about:
Would you like to write for our blog? We welcome stories from our readers, customers and partners. Please send us your ideas: email@example.com