Weird SERPs…
I've been watching how Google treats the 166,242 page medical thesaurus (MeSH database) that I put up a few weeks ago, and have seen some really weird results the last few days...
I search for files in the medical thesaurus directory and it tells me it has 161,000 pages indexed. That number has been climbing over time and it now appears that all (or nearly all) of the pages are indexed - so all looks good...
But then I jump ahead to page 10 and it says there are 82,000 pages indexed. That's weird... Then I jump ahead to page 14 (the last page you can jump to) and it says there are only 140 pages indexed.
I'm not quite sure which number to believe. There are a few possibilities:
- 140 is the correct number and Google is slowly weeding out pages it thinks are duplicate content based on 'shingle' (small snippets of text) based analysis. The process takes a while and only 140 pages have survived.
- 161,000 is the correct number and there's some sort of limit on the number of supplemental results it will return in a 'site:' search.
Neither of those explanations explains the results on 'page 10'. At the top of the page it says there are 82,000 pages indexed, yet at the link section at the bottom of the page it doesn't provide links beyond page 14 (or 140 search results).
It's all extremely weird, and I'm not sure what to make of it. If they really were removing pages because they thought they were duplicate content, then I'd expect the traffic to start to take a dive, but that's not the case as you can see in this graph of organic search visits...

There was a dip, but it's gone back up and when I look at the traffic so far today (83 visits - not shown in the graph) it's higher than all the traffic yesterday (73 visits). So the dip was nothing more than people not working on the weekend...
The good news is that we are getting a decent amount of long-tail traffic from a variety of keywords (over 400 different keywords in the past week). All the pages seem to be supplemental (for now) but we're often the first supplemental result after the pages that aren't supplemental.
What really matters is that the traffic seems pretty consistent. Once I know things have stabilized I'll try to make something of the traffic (AdSense, targeted pitches for business, etc.)...