The robots on the Web are not walking, talking metal boxes like the futuristic housekeeper from The Jetsons. Rather, they are programs written to roam the web automatically. They are also referred to as spiders, crawlers, or simply bots, and without them, the Web would die!
Okay, that may be a tad melodramatic. The truth is, robots enhance our web-browsing experience tremendously; but, like all powerful forces, they can also be used for evil. Today we will take a look at the important role robots play on the Internet, how they can help your site in SEO rankings, ways they can be used against you, and what you can do about it.
What is a robot?
For our purposes, a robot is an Internet program that automatically retrieves information from a website. The process works a little something like this: A robot finds a web page and scans it. If that page contains a link to another page, the robot will follow the link to that page and scan everything on it, as well. Through this process, a robot will gather information about an entire website and deposit it in a database.
Essentially, bots are the magic behind how search engines like Google or Bing can spit out 50 million results in the blink of an eye. When you type keywords into a search engine, it parses through the information collected by the robots and serves up what it determines to be the most relevant matches.
This function, called indexing, is only one example of what robots can do. They can also be used for other purposes, such as link validation or looking for updated blog content via RSS feeds.
So Robots are Good, Right?
Sure. Think about trying to effectively use the Internet without them. It would be nearly impossible to find what you were looking for without a direct link. Robots are the backbone of search engines and without them we would not have the world at our fingertips quite so easily.
Robots help get your website out there so other people (read: potential customers) can find it. If you want a robot to scan and index your website, you can submit your URL to Google, Bing, or Yahoo to let the robots know it should be scanned and dropped into the database for your search engine of choice.
Can We Control These Robots?
The good news is that you can “tell” a robot what should or shouldn’t be scanned and indexed. For instance, if you have a private link on your website that you don’t want published, you can tell the robots you don’t want them to index it.
Every website can have a file called robots.txt. This is a simple text document that gives instructions to robots. Most robots will abide by these rules. If you don’t want any robots indexing your site, you would use these two lines in your robots.txt file:
You can also use a metatag in a particular HTML document if you don’t (or can’t) use a robots.txt file:
<meta name="robots" content="noindex,nofollow"/>.
(To brush up on more robot-ese, find more info here.)
What’s the Downside?
Robots themselves aren’t inherently bad, but robots are programs written by people and we all know there are bad people in the world. A person with ill intent can write a malicious little robot to do very bad things.
Bad robots may ignore the rules in your robots.txt file. The robots.txt file is really only a suggestion, not a hard barrier, so if a robot wants to circumvent it, it can.
Bad robots can also be programmed to scan your site looking for ways to exploit it. We can thank robots for two problematic issues that have made many website owners pull their hair and gnash their teeth:
E-mail spam: When your website contains e-mail addresses, robots crawl your site just to collect them so they can blast them with spam. This is why I don’t recommend having your main e-mail address written in plain text on your site, even in the HTML code.
Comment spam: If your site has comment forms, robots will find them and post spammy links back to disreputable sites. Any blogger with a comment form has to battle this endless war and it can be a tremendous source of frustration. Even if you take the extra step of approving all comments and only posting those that are legit, you still have to deal with the barrage of e-mail notifications alerting you to yet another spammy comment.
These bad seeds are commonly called spambots (a mashup of the words “spam” and “robots”), and are the dark side of robots that unfortunately come with the territory. Like anything else in life, you can’t have the good without the bad.
Robots can be even more malicious if they take the form of malware, and are used to capture keystrokes, gather passwords, and launch attacks whenever they see a vulnerability. It is important to have security measures in place to prevent these little demons as much as possible.
What Can I Do about Robots? Anything?
There are a few things you can do to ensure robots are working for you, not against you. One is to make sure your robots.txt file or metatags are not allowing or blocking anything unintentionally. Your web developer or website host can review this for you.
It’s a pretty safe bet your web host will have security measures in place at the server level to protect against malicious bots, but it’s also a good idea to always safeguard at the individual level by using security software on your computer, practicing smart browsing habits, and installing security updates for your applications when released.
Another good strategy for dealing with robots on the Internet is: to not worry about them! Once you’ve taken advantage of them for your good, and safeguarded yourself against the bad, sit back and relax. Bots are a natural part of the information superhighway scenery, so they are here to stay and are generally quite beneficial.
Whether or not you believe robots will eventually rule the world (cast your vote here), robots are already part of our lives. We need look no further than our open browser window to see the proof.