A bot , also known as Internet bot, is a program that runs automated tasks over the Internet. Typically intended to perform simple and repetitive tasks, Internet bots are scripts and programs that enables their user to do things quickly and on a scale. For example, search engines like Google, Bing, Yandex or Baidu use crawler bots to periodically collect information from hundreds of millions of domains and index it into their result pages.

Recently I hosted a webinar to explain what bots are, how they are used and what do they mean to your online business. Those are the topisc that I`m about to discuss in this blog post.

You can also find the recording of my webinar in the video below.

Good bot vs bad bot, what’s the difference

These days, Internet bots comprise nearly half of the activity on the Internet. But despite their neutral origins, 66 percent of all current bot traffic is actually used for malicious purposes. That means trouble for any company with a significant online presence.

Bots power the Internet and can perform very valuable functions like operating search engines, powering APIs, vulnerability scanning and monitoring websites, but they can also be used to cause serious problems. Bad bots are responsible for most of the malicious activity that we encounter on websites. The most popular example would be distributed denial-of-service (DDoS) attacks.

How DDoS compromises your business

A DDoS attack is a malicious attempt to make a server or a network resource unavailable to users. It is achieved by saturating a service, which results in its temporary suspension or interruption. DDoS attacks are often performed by a botnet (a group of hijacked Internet-connected devices, each injected with malware used to control it from a remote location without the knowledge of the device’s rightful owner), by generating large volumes of traffic against a target.

A successful DDoS attack results not only in short-term loss of business, but can have severe long-term effects on your online brand reputation, generate significant overage charges from hosting providers, and in some cases compromise your data and entire business.

We’ve seen a 200 percent increase from 2014 to 2015 in the number of DDoS attacks. Based on Imperva Incapsula’s latest DDoS landscape report, daily attacks of more than 100 Gbps are now routine, with multiple attacks often surpassing 300 Gbps.

While DDoS attacks are a major concern for online businesses, they’re not the only concern. Bad bots are responsible for a range of activities like referrer spam, comment spam, site scraping, fraud, and vulnerability scanning. These activities can damage your online business in many ways, including site takeover, loss or leak of data, degraded user-experience, poor SEO, and damaged analytics.

How bad bots work

Anyone who runs a content or e-commerce site will likely be familiar with site scraping. These bots come to your site and scrape pertinent information (like prices and original content). Scraping bots can pretty much harvest your entire website database and information. And in doing so erode your competitive advantage.

Comment spam bots can also run rampant on blogs and content websites. These bots are controlled by really big operations that are trying to go around the web looking for well-known content management systems. This type of spam engages the site’s comment section hoping to bypass simple anti-spam mechanisms. If visitors are lured away from your site via comment spam links, your conversion rates will suffer. And if you have a vulnerability on your system these bots will try to infect your visitors.

SEO referral spam is an excellent example of bad bots and spam bots. Here’s how it works: A company will offer individuals free download apps in exchange for participating in a marketing campaign. But in reality, the company will run its bots from the person’s computer. Later, they will artificially generate visits to various websites with the offender’s address in the referrer header, using a vast army of innocent computers. This allows the bot to slip undetected through the IP blacklisting mechanisms, and helps the perpetrator artificially boost SEO rating and stats for its clients and lure potential customers away.

The negative outcome for site owners is that this can have long term SEO damage to their websites and sometimes result in blacklisting and removal. This type of spam will also scramble the analytics data.

Understanding Internet bot behavior

The past few years have seen a steep increase in sophistication from developers responsible for writing the code behind bots. Once fueled by simple scripts (like Python or Perl), bot programs these days are able to mimic human behavior to pass themselves off as legitimate visitors.

Over 50% of all DDoS bots we’ve seen recently support JavaScript or cookies, characteristics we didn’t see previously. Lately there’s been a large increase in browser-based bots which are malware that spawn instances of valid browsers on the victim’s computer and behave as if the user is visiting that website.

We’ve also seen very large application layer attacks in the range of hundreds of thousands of requests per second, sometimes supporting cookies or JavaScript from tens to hundreds and thousands of different IP addresses. This marks a large step up in bot activity and we’ll definitely see larger attacks in 2016.

Another tactic used by some bot operators, is trying to impersonate other bots.

Many site owners are naively white-listing certain user-agents completely so they never block “Googlebot”, which will of course damage SEO. This is not a long-term solution because from our research, on average a Google bot will visit a website almost 187 times per day but at least 4% on average of these visits declaring to be Googlebot are in fact fake impersonators.

Tackling bots

At a very high-level, there are three primary ways to mitigate bots.

The static approach — The fastest way to identify and mitigate bots is by using a static analysis tool. By looking at structural web requests and the header information and co-relating that with what a bot claims to be, we can passively determine its true identity

The challenge-based approach — A more progressive way of addressing a bad bot is a challenge-based approach (or support based approach). Websites are equipped with proactive components to measure how a visitor behaves with different technologies — for instance, how it supports cookies and what cookies it supports. It also takes a close look at JavaScript, what kind of JavaScript it can run, and what objects are accessible in that JavaScript. We can also use scrambled imagery like CAPTCHA, which usually requires a more dedicated attack to bypass.

The behavioral approach — Beyond the static approach and the challenge-based approach is a behavioral approach to bot mitigation. This is where we look at the activity associated with a particular bot. In other words, a good way to uncover a bad bot is to find out what it claims to be. Most bots will link themselves to a parent program like JavaScript, MSIE, or Chrome. If the bot’s characteristics vary in any way from the parent program, this anomaly will help mitigate the current problem and any problems in the future.

The most effective way to identify and mitigate bots is by using a specialized tool in a combined multilateral approach. At Imperva Incapsula we analyze each visitor — every bot/human visit that goes through or comes to your website — and match a “client application id” based on its combined characteristics from all three approaches.

What about robots.txt?

You can try using robots.txt to shield your site from bad bots, but it wouldn’t really be effective. robots.txt works by telling a bad bot that it’s not welcome. Bad bots don’t adhere to rules, however, and they’ll inevitably ignore any commands they receive. In addition, some bad bots will look inside robots.txt for hidden gems (e.g., private folders, admin pages) the site owner is trying to hide from Google’s index.

In conclusion

Bad bots are responsible for a very large number of serious security threats to your website. You can harden your site security by analyzing traffic for bots and identifying malicious clients, and block them preferably in a transparent manner that doesn’t affect your visitors.

One way to do this is through the use of web application firewalls or application delivery controllers (WAFs and ADCs).


Would you like to write for our blog? We welcome stories from our readers, customers and partners. Please send us your ideas: blog@incapsula.com