blog
HOME · CREATIVE · WEB · TECH · BLOG

Monday, May 12th, 2008

Good spiders that execute Javascript

The general wisdom is that spiders don't execute Javascript, yet a few are popping up here and there... There was a YOUmoz article that demonstrated that SearchMe's spider was executing Javascript-based Google Analytics tags.

Another spider that executes Javascript is the spider Alexa uses to create site thumbnails. I believe this is the ia_archiver bot used by archive.org - a sister company to Alexa (both are owned by Amazon). If the spider encounters a Javascript on the page that contains something like

window.location = "http://wwww.google.com/";

It will take the thumbnail of the other page defined by window.location, not the page it was initially sent to crawl. This is true even if window.location is inside an if statement and the default ("OK") action is to not go to the other URL.

It is a little unclear whether Alexa's spider is executing Javascript or looking for the presence of window.location, though I'd guess it actually executes the Javascript because otherwise it couldn't faithfully render certain pages. In other words, Alexa's thumbnail spider needs to execute Javascript to do it's job properly.

Just because SearchMe and Alexa's spiders execute Javascript doesn't make them a "bad bot". We know SearchMe's bot declares itself via "user agent", and while I haven't dug through my log files to confirm it, I would be shocked if Alexa's bot didn't do the same. I'd also bet that both follow robots.txt.

That means these are essentially "good spiders", not "bad bots" since they're not trying to cause problems. However, as the YOUmoz article points out, SearchMe's spider can inadvertently cause problems by executing the Google Analytics tracking code.

Tags: , ,
Categories: Server Admin

Leave a Reply

HOME · CREATIVE · WEB · TECH · BLOG