blog

Friday, June 6th, 2008

Google Images, Google Analytics & Frames

Over the past week or so I've had a big change in my thinking regarding Google Analytics. I still think it's a wonderful service (you can't beat the price - for now...), but I'm starting to see some incredibly big flaws...

It all started with discrepancies between Google Analytics and Unica's NetInsight - big discrepancies for one particular site where NetInsight was showing about 60% more visits and visitors. You always expect some variation between analytics packages - but not 60%.

We first noticed that the pages with the biggest discrepancies were those that were most likely to get Google Images traffic. Then I noticed that many of the referral sources were roughly similar, but Google Images was way out of whack.

Google Analytics Doesn't Report Pages Framed By Another Site

I had a theory - that Google Analytics wasn't counting pageviews if the page was framed by another site. So I did a test. I set up a Google Analytics profile and made the tags only fire if they were in a frameset. The code looked something like this:

if (top.location != location) {
var framesTracker = _gat._getTracker("UA-287XXX-4");
framesTracker._initData();
framesTracker._trackPageview();
}

The first day I thought my theory had been proven wrong - traffic was showing up in the frames profile, but then it quickly dropped off to zero.

Google Analytics graphic showing framed content showing up the first day a profile is active, then nothing

In other words, the first day the profile existed framed pages were counted as page hits. The next day a filter kicked in and, with a few rare exceptions, framed pages never showed up again.

It's not just Google Images traffic that's not being reported, it's any site that frames pages on your site - so things like Yahoo! Image Search and MSN/Live Image Search are affected as well.

Why they would do this, I just don't understand. It's truly baffling. If your web server serves the page, then it should appear in the analytics for your site. Period. This means traffic from things like image and video search is grossly under reported in Google Analytics.

But it gets worse...

Google Reports Some Traffic From Google Images As 'Direct' Traffic

Let's take the example of Google Images (though it happens in other cases). You do a query and get a search results page with a bunch of thumbnails on it (example).

Google Images Search Results

You then click on an image and and get a page that's displayed using a frameset (example).

Google Images Image page

In the top frame is Google's information on the image. In the bottom frame is the page from the site with the image.

Now, if you click on any link in the bottom frame (that's being served by the site with the image), it does not remove the frameset. The frameset stays in place. Technically, I understand where this is coming from, but it's horrible from a user interface perspective. But remember what we established above - no pages that are framed by another site get reported in Google Analytics. So the person could spend a half hour looking at your site this way and you would never know it. They could purchase something from you and it wouldn't register as an e-commerce transaction in Google Analytics. This is bad - very bad...

Realizing this, I decided to add target="_top" to all of the links on the site to ensure that if the person clicked on a link on our page the frameset would be removed.

That lead to some confusion the following days when we saw a spike in direct traffic and didn't understand where it was coming from...

Spike in direct traffic in Google Analytics

After a while we realized it was coming from Google Images (or more precisely, from framed pages). Let me explain...

Because Google Analytics doesn't count the first framed page as traffic on your site (even though it's served by your site), that page just doesn't exist in Google Analytics. But it's that page that has Google Images as a referrer. So, by losing that page you lose the fact that Google Images sent you the traffic because the first page that Google Analytics counts is the one that the person went to when they clicked on the link on your framed page. Hence, the first initial referrer according to Google Analytics was the framed page on your site. That appears (in Google Analytics) as direct traffic (not a referral or organic traffic) since your own site can't be a referrer for your site.

Google's help pages are inaccurate when they say:

If frames on the site reside on different domains, the referral information is likely to be inaccurate, since one frame may be recorded as the referring source of another, instead of a previous site being recorded as the referring source.

It's more than the referral information being inaccurate - they're removing much of the traffic completely.

This does not mean you lose a record of all the traffic from Google Images. If the user clicks on a link on a page controlled by Google Images, for example the "remove frame" or "original context" links in the top frame, then since that action took place on a Google Images page, it will come through as a Google Images referral. But that's just a fraction of your actual Google Images traffic. If someone looks at the one framed page and leaves - they'll never show up in Google Analytics at all.

Almost Impossible To Get Search Keywords For Image Search

To make matters worse, even though Google Images is a search engine, it's pretty much impossible to treat it as an organic traffic source. Beyond the fact that you don't see all the traffic, the problem is one of URLs. Let me explain...

For starters it would be nice if we could just add something like this to the tracking code:

pageTracker._addOrganic("images.google.com","q");

In theory that's supposed to take any referral from images.google.com, find the 'q' query parameter and record the contents of that as the search keyword.

But the 'q' parameter is only found in the URL of the search results page. But the search results page is the referrer for the frameset that shows the page from your site at the bottom. The referrer for your page is the frameset, not the search results page. The frameset does, sorta, have the q parameter in it, but it's hidden inside the 'prev' parameter and looks something like this...

&prev=/images%3Fq%3Dharrier%2Bhound%26gbv%3D...

Notice the q%3Dharrier%2Bhound... That's not nearly as neat as it was on the search results page where it was &q=harrier+hound.

Now, some people have written some fairly complex routines to parse the q parameter out of that string, but it only works some of the time. When you consider the fact that the Google Images traffic you see is only a fraction of the real traffic it's questionable whether it's worth it to even bother since it's just bad data and you'll be mixing it with your good organic data from regular web search. Mixing good with bad makes everything bad.

It Doesn't Have To Be This Way

Google Analytics has painted themselves in a corner to an extent by not properly handling framed documents. If they decided to change how they handled framed documents it would significantly impact the analytics of many sites. But I'm not sure that a good excuse for continuing the way things are...

What should happen is that if a page's referrer is a frameset of a known search engine, then the query parameter should be pulled out of the referrer for the frameset (should be possible with Javascript). And it goes without saying that things like Google Images and Google Video should be known to be search engines (they are not at the moment).

You would think Google would at least want to properly reflect the traffic they drive to sites. Right now they're serverly underreporting traffic from Google Images and Google Video.

Hopefully this will get better in the future. I should do a test to see how IndexTools handles the situation since, in time, it will be Yahoo! free competitor to Google Analytics.

Categories: Google, Google Analytics, Web Analytics

6 Comments

  1. Dave Says:

    Great article. I was like you, suspecting that analytics doesn’t count framed Google Image sessions. You have solved my doubts with this article.

    My fear is if maybe Google penalizes sites which remove the framed access of Google Images. Have you noted a minor access rate on Analytics after you have made these changes?

  2. Jay Harper Says:

    @Dave – I believe the only penalties come if you use Javascript-based frame breaking code. That creates a bad user experience for the user and should be avoided (I’d penalize it if I were Google). Using target=”_top” doesn’t create a bad user experience and wouldn’t/shouldn’t incur a penalty.

  3. Dave Says:

    Thanks for the answer. I have followed your tips on my site and they work like a charm.

    Accomplishing the rewrite of all link with target=”_top” was very easy with jquery javascript library. This is the 1 line code solution:

    $(“a”).attr(“target”, “_top”);

  4. Darin Says:

    I am getting the following error on my webpages: ‘firstTracker’ is undefined. I think it is because of the following script: within my Google Analytics tracker script. The script refers to: firstTracker._addOrganic(“images.google”, “prev”);
    secondTracker._addOrganic(“images.google”, “prev”);

    I’m assuming this script is in there to track SE traffic from Google Images. Any suggestions on how I might remedy this?

    Thanks for any help!

    Darin

  5. Jay Harper Says:

    @Darin – If you have multiple tracking codes on your page you need to call Google Analytics once for each of them before referencing them with _addOrganic.

  6. Bugsy Says:

    Great blog post. Really puts into light a lot of the issues I have with Google Images in Analytics, so thanks for helping me understand what’s going on with it, because one particular site of mine is very heavy on traffic from Google Images.

    On the other hand, I’m not terribly concerned with getting the exact number of pageviews/visits as long as it’s all consistent and not changing. When it comes to analytics I’m more focused on the trending.

Leave a Reply

HOME · CREATIVE · WEB · TECH · BLOG