blog
HOME · CREATIVE · WEB · TECH · BLOG

Wednesday, April 11th, 2007

Duplicate Content and Multiple Site Issues

Converence notes from the "Duplicate Content & Multiple Site Issues" session at Search Enginge Strategies New York '07

Speakers:
Mikkel deMib Svendsen, Creative Director, deMib.dk
Shari Thurow, Webmaster/Marketing Director, GrantasticDesigns.com

Panelist:
Sean Suchter, Directory of Yahoo! Search Technology, Yahoo! Search
Vivek Pathak, Ask.com

Duplicate Content - Sean Suchter (Yahoo!), Shari Thurow (GrantasticDesign), Mikkel deMib Svendsen, Vivek Pathak (Ask)
[L-R] Sean Suchter (Yahoo!), Shari Thurow (GrantasticDesign), Mikkel deMib Svendsen, Vivek Pathak (Ask)

Shari Thurow, Webmaster/Marketing Director, GrantasticDesigns.com

What is duplicate content? Different engines have different definitions. It's not a exact match, it's not a fingerprint but rather similarities...

Duplicate content is a waste of resources and results in a poor search experience.

Clustering (two listings per site) is a way search engines control duplicate content as is "repeat search with similar content".

They'll detect the elements of the "template" and strip that out before figuring out what's duplicate content - they're looking for what's unique on each page. Low HTML density tends to be real content. It's the unique content that's put into the index...

They also look for linkage properties (inbound and outbound) to determine duplicate content. Compare page by page...

They also look at "page mutation" (how quickly the pages change). The higher page mutation the lower the quality the more likely the site's trying to game the search engines.

They also look at where it's hosted (down to IP address) and are more suspicious of sites on the same IP (in terms of duplicate content) plus suspicious of ones that move around a lot.

Shingles - "word sets" are at the core of their algorithms to determine duplicate content. Thinking of chips that are reordered - they're the same thing in a different order and duplicate content. This is what happens when items on a page is sorted. You'll only want to present one order of items to the search engines (the one that converts the best).

Pages with lots of parameters are more likely to be redundant. Use robots.txt and robots meta tag to control the problem.

Example: "Norwich University" their site turned out to contain search engine spam in the form of a list of terms leading to doorway pages that were incredibly similar.

Register your copyright on copyright.gov

Use robots exclusion for lower converting pages.

Mikkel deMib Svendsen, Creative Director, deMib.dk

Some problems are linking issues, not indexing issues. Be careful with links (always be consistent with www or no-www).

Use cookies for session information, don't add it to the URL. If you must use it in the URL don't display it for spiders.

Don't assume the spiders won't find non-official URLs.

"Bread crumb navigation" also leads to the same problem.

Never leave decisions of how to understand the web site to the engines - take control of it.

Anne Kennedy, Beyond Ink (moderator)

International localization isn't a problem, but be careful for regional localization within the same country (e.g. different cities).

Syndication -> link back to authoritative page.

Q&A

Amit - always attribute any significant blocks of text, so they don't think you've scraped and spammed.

Think about putting other content on the page if you're facing a duplicate content problem - user comments, product testimonials, etc.

Link development is important if you want to differentiate between different locales (the Starbucks in Seattle should have inbound and outbound links to other businesses in Seattle, the Starbucks in New York should have New York links - that will be enough for the search engines to figure out what's going on).

Also think about how you can communicate differently to different communities.

If your a content provider get the other sites to add unique content, unique links, and/or link back to your page.

Make sure your license with resellers says that they're not allowed to spam search engines. Also, don't give unique content to your affiliates or give them a separate set of content that they all share.

Don't just 301 to your home page - do a 301 to a relevant page (1-to-1, if possible).

A single shingle that's similar doesn't make it duplicate content.

There's a big difference between a penalty and a filter. A filter simply means it's not all that relevant, a penalty is more severe.

Y! - each site is allocated a certain number of URLs in the index - if you fill them up with duplicate content, then you've shot yourself in the foot.

Tags: , , ,
Categories: Duplicate Content, Spiders/Bots

Leave a Reply

HOME · CREATIVE · WEB · TECH · BLOG