blog

Monday, March 31st, 2008

How To Migrate From Blogger to WordPress

The other day I covered the difficulties of migrating off blogspot.com, and the sum total was that it's now a pretty horrid process. The Blogger team has tried to put measures in to combat spammers who use their software, but all they've accomplished is making life difficult for ordinary people - spammers can still pretty easily use Blogger and blogspot.com to spam.

Today I want to cover how to get off Blogger completely and move to WordPress. This is a pretty technical discussion and it assumes that you know the basics of Blogger and WordPress. It's also going to assume that you've migrated your blog from blogspot.com to your own domain. If you haven't done that, start with the post from the other day - that will step you through what you can to get off blogspot.com. The other assumption is that you're using an Apache server to host your blog (over half of all web sites use Apache).

The first issue you need to deal with is the fact that your URLs are going to change. They have to, and this is generally a good thing. If you're using Blogger with FTP publishing your URLs look something like this:

http://www.yourdomain.com/2008/03/post-title.html

http://www.yourdomain.com/labels/label-title.html

http://www.yourdomain.com/2008_03_archive.html

There will be many URLs, but those are the basic types. If you're using the new blogger templates (not available if you're using FTP), you'll have more URL options, but those three are the primary ones you need to concern yourself with. Also realize that WordPress isn't typically configured with file extensions on URLs. So you'll lose the .html portion of the URL, but we'll handle that below...

You can configure WordPress to act like Blogger in respect to "labels" - you can even keep the directory so it's named "labels", but WordPress actually has "categories" and "tags" and the two are far superior to Blogger's labels. A Blogger label is a one-size-fits-all solution. Whereas you should think of WordPress categories as items that would appear in the table of contents of a book, and tags as things that would appear in the index of a book. We'll come back to labels in a moment, but just realize that the URLs will change.

One of the other issues is that Blogger shortens long post titles when creating URLs, but WordPress doesn't. So if you had a title to one of your posts that was "Tech stocks up today after long decline over many months". Your Blogger URL might be /2008/03/tech-stocks-up-today-after.html whereas your WordPress URL would be /2008/03/tech-stocks-up-today-after-long-decline-over-many-months - that will create the biggest headache in a proper transition from Blogger to WordPress...

Getting into the nitty gritty of the transition. First, don't delete your blog on Blogger. In fact you want to leave it exactly as it is now. If you're hosted on blogspot, follow the guidelines in the post from the other day, which boil down to putting some Javascript in the <head> tag on your pages. That will redirect people who come to your blogspot blog to the blog on your new domain. If you're using Blogger with FTP already then you're a big step forward. Just leave things where they are...

Next, install WordPress in the same directory as your Blogger blog on your domain (if you're using Blogger with FTP, otherwise your target directory).

Now we need to tell Apache the general rules for how the URLs have changed. There are two ways to do this. Most people use .htaccess files which is an invisible file in the directory that tells Apache what to do. We use what are called virtual host files which are better to use if you can get access to them. If you use .htaccess files you'll need to change some of what we say here since things are slightly different with .htaccess files than they are with virtual host files. You also need to make sure that mod_rewrite is enabled on your server.

I'm going to go over the Apache instructions for my personal blog and explain what's going on...

RewriteEngine On

This turns on mod_rewrite for your site. All of the items that have RewriteRule below depend on mod_rewrite being on.

RewriteCond %{HTTP_HOST} !^www\.slicksurface\.com$
RewriteRule ^(.*)$ http://www.slicksurface.com$1 [R=301,L]

This isn't something that's particular to the blog, but I highly recommend it for any site. In enforces the canonical domain - making sure you don't have some links to www.yoursite.com and others to yoursite.com (the search engines treat them as two sites which will hurt you in organic search).

RewriteRule ^/jay-harper/blog/2007/10/netflix-home-of-damaged-dvds.html$ /jay-harper/2007-10/netflix-the-home-of-damaged-dvds [NC,R=301,L]
RewriteRule ^/jay-harper/blog/2007/09/manhattan-wildlife-encounter-of-smelly.html$ /jay-harper/2007-09/manhattan-wildlife-encounter-of-the-smelly-sort [NC,R=301,L]
RewriteRule ^/jay-harper/blog/2007/06/ny-commercial-real-estate-rentable.html$ /jay-harper/2007-06/ny-commercial-real-estate-rentable-square-feet-vs-usable-square-feet [NC,R=301,L]
RewriteRule ^/jay-harper/blog/2007/05/florence-knoll-tables-harry-bertoia.html$ /jay-harper/2007-05/florence-knoll-tables-harry-bertoia-chairs-for-sale [NC,R=301,L]
RewriteRule ^/jay-harper/blog/2007/05/likesdislikes-2003-mini-coooper-s.html$ /jay-harper/2007-05/likesdislikes-2003-mini-cooper-s [NC,R=301,L]

The part above can be the worst part of the project. After you've done everything else, you need to figure out where Blogger shortened the title and the Blogger URL and WordPress URL aren't substantially similar. This requires going through every URL, figuring out which ones are broken and writing exceptions to fix them. It's not quite as difficult as you think. First you do everything else, then you go to Blogger and click on all the "view" links and when you find one that doesn't work, you note what the original URL was and what the new one should be and write the rules you see above.

Please note each line above starts with RewriteRule. If you copy it into a text  editor you'll see how it should look. The limitations of HTML mean that I can't quite show it the way it actually should be.

RewriteRule ^/jay-harper/blog/atom\.xml$ /jay-harper/feed [R=301,L]

This redirects the feed from the old location to the new one.

RewriteRule ^/jay-harper/index.html$ /jay-harper/ [NC,R=301,L]

This redirects the old index file for the site to the new location since WordPress won't handle index.html correctly by default.

RewriteRule ^/jay-harper/blog/(\d{4})/(\d{2})/(.+)\.html$ /jay-harper/$1-$2/$3 [NC,R=301,L]

This redirects the old blog post URLs to the new ones. We incorporate dated directories in our URLs that look like /YYYY-MM/. We like these over the ones that look like /YYYY/MM/ since it reduces the number of levels of directories which is good for SEO. If you want a similar URL structure you'll need to go to Options -> Permalinks and enter /%year%-%monthnum%/%postname% under "Custom, specify below" (make sure that radio button is highlighted as well). If you use a different directory structure, you'll need a different rule that's appropriate for how you've structured WordPress.

One note, the root Blogger URL was http://www.slicksurface.com/jay-harper/blog/ whereas the root WordPress URL is http://www.slicksurface.com/jay-harper/ - this change is reflected in the rules. You probably won't have a similar change, so you'd tweak things accordingly.

RewriteRule ^/jay-harper/blog/labels/(.+)\.html$ /jay-harper/tag/$1 [NC,R=301,L]

This redirects all the labels pages to tag pages. If you want you can redirect them to category pages, or write custom rules to send some to category pages and others to tag pages.

RewriteRule ^/jay-harper/blog/(\d{4})_(\d{2})_01_archive\.html$ /jay-harper/$1/$2 [NC,R=301,L]

That rule redirects the archive URLs to the WordPress structure.

RewriteCond %{HTTP_USER_AGENT} (googlebot|slurp|msnbot|teoma) [NC]
RewriteRule ^/jay-harper/page/ /jay-harper/ [NC,R=301,L]

This rule is for search engine optimization. Essentially we want to eliminate duplicate content problems, so since I publish the full post on the main page of the blog (and also the "older posts" and "newer posts" links at the bottom of each of those pages), I don't want search engines to see the full posts anywhere except the post pages and the home page (which things will expire off of). So the rule redirects requests for older/newer posts pages to the home page if it's one of the four major search engines making the request.

RewriteCond %{HTTP_USER_AGENT} !(feedburner|googlebot|slurp|msnbot|teoma) [NC]
RewriteRule ^/jay-harper/blog/feed$ http://feeds.feedburner.com/jay-harper [R=302,L]

We track our feeds with FeedBurner, so we want average people to be redirected to FeedBurner when they request the official feed URL, but we want the search engine spiders (and the FeedBurner spider) to not be redirected because they need the feed from that specific location. FeedBurner obviously needs the original file, the search engines need to get it from our site because we're using the feeds as sitemaps which have to be located in the same directory (or a higher directory) as the URLs they reference.

We could also link to the FeedBurner URL directly for users, but this puts FeedBurner in too much control of our subscriber list. The way around that is to list the official URL and do a temporary redirect to FeedBurner. It's not perfect, but it's better than putting the FeedBurner URL in our link tags.

RewriteCond %{REQUEST_URI} ^/jay-harper/
RewriteCond %{REQUEST_URI} !^/jay-harper/blog/wp-
RewriteCond %{REQUEST_URI} !^/jay-harper/resources/
RewriteCond %{REQUEST_URI} !^/jay-harper/blog/(\d{4})/(\d{2})/(.+)\.(jpg|gif|png)$ [NC]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /jay-harper/index.php [L]

WordPress actually serves most of it's pages from a single page. If you've noticed the old style WordPress URLs (with query strings at the end) you're well acquainted with this issue. In other words, all of the "good" WordPress URLs are somewhat fake. This rule is what makes them work. But certain directories need to be exempted from the rule. We like our images in a /resources/ directory, so that's exempted. We've also exempted any image files that are in the old Blogger directories, and so on...

There's so much more that could be discussed when it comes to WordPress and Blogger, but those are some of the basics when you're moving to WordPress. And believe me, you'll be glad you did... It's always a good thing to be in control of your online assets. And there's really no better way than to use something which is completely under your control, like WordPress.

Good luck!

Tags: ,
Categories: Blogging, Server Admin, Web Site Configuration

5 Comments

  1. wraclerap Says:

    Nice template. Where can i download it?

  2. Jay Harper Says:

    @wraclerap – thanks, but this theme is proprietary.

  3. Fernando Says:

    There is another problem with the importing of images from posts in blogger to wordpress. There is no simple way to do this at all… :(

  4. Jay Harper Says:

    @Fernando – True, but last I checked (it’s been a while), Blogger/Blogspot will still host the images even if they’re embedded in another page.

  5. Jack Yan Says:

    I wish I read your page first, Jay. I’ve just done my migration so I know what was involved. Of the pages I have come across, yours is the most accurate and best suited to the situation I had.

Leave a Reply

HOME · CREATIVE · WEB · TECH · BLOG