blog
HOME · CREATIVE · WEB · TECH · BLOG
June 6th, 2008

SSH Attack and Password Problems on OS X

A few weeks ago, our server, running OpenSSH 4.7 on OS X 10.4.11, looked like it was slowing down or even crashing. While the web server was working perfectly, nothing that required a username and password, like ssh or ftp, would allow anyone to log in consistently. Looking at the logs, it turned out that the server was being bombarded with ssh requests, probably hackers looking to exploit the Debian SSH bug. Looking at the logs, the only thing to go on was that each bad ssh login had the message [Sender com.apple.SecurityServer] [PID -1] [Message Failed to authorize right system.login.tty by process /usr/sbin/sshd for authorization created by /usr/sbin/sshd.] in the file /var/log/asl.log

The problem was that every time the attacking machine tried another key/password, it would spawn a new sshd process, which had to communicate with the password services (com.apple.SecurityServer) in order to validate the password. Eventually what ended up happening is that there were so many requests to the password services that they basically ended up just hanging, and anything that required a password: ssh, ftp, etc, just stopped working.

I tried a number of things to get this to work. First, I tried shutting the ssh server off for a short time to see if it was just a single bot and it would move on, but it looks like once the bot knows that port 22 is open, it keeps that IP address in its memory. I upgraded OpenSSH to 5.0, which is the latest version. We tried filtering out the IP addresses, but the attacks were coming from all over the place.

Eventually, it turned out that there was a very simple solution in the OpenSSH config file. The problem was that when the ssh daemon was confronted with an incorrect password, it was checking against the password database multiple times, which was overwhelming the password services. In order to stop this behavior, I ended up changing the line which looks like
#ChallengeResponseAuthentication yes
in the default /etc/sshd_config file to
ChallengeResponseAuthentication no

While this problem does not stop the bots from trying to get bad keys/passwords, it does protect the password services from the attack. We have had no problem logging in to the server since I made that small change in the config file.

Digg It!  Add to del.icio.us  Add to StumbleUpon  Add to Reddit  Add to Technorati  Add to Furl  Add to Netscape

June 6th, 2008

Google Images, Google Analytics & Frames

Over the past week or so I’ve had a big change in my thinking regarding Google Analytics. I still think it’s a wonderful service (you can’t beat the price - for now…), but I’m starting to see some incredibly big flaws…

It all started with discrepancies between Google Analytics and Unica’s NetInsight - big discrepancies for one particular site where NetInsight was showing about 60% more visits and visitors. You always expect some variation between analytics packages - but not 60%.

We first noticed that the pages with the biggest discrepancies were those that were most likely to get Google Images traffic. Then I noticed that many of the referral sources were roughly similar, but Google Images was way out of whack.

Google Analytics Doesn’t Report Pages Framed By Another Site

I had a theory - that Google Analytics wasn’t counting pageviews if the page was framed by another site. So I did a test. I set up a Google Analytics profile and made the tags only fire if they were in a frameset. The code looked something like this:

if (top.location != location) {
var framesTracker = _gat._getTracker("UA-287XXX-4");
framesTracker._initData();
framesTracker._trackPageview();
}

The first day I thought my theory had been proven wrong - traffic was showing up in the frames profile, but then it quickly dropped off to zero.

Google Analytics graphic showing framed content showing up the first day a profile is active, then nothing

In other words, the first day the profile existed framed pages were counted as page hits. The next day a filter kicked in and, with a few rare exceptions, framed pages never showed up again.

It’s not just Google Images traffic that’s not being reported, it’s any site that frames pages on your site - so things like Yahoo! Image Search and MSN/Live Image Search are affected as well.

Why they would do this, I just don’t understand. It’s truly baffling. If your web server serves the page, then it should appear in the analytics for your site. Period. This means traffic from things like image and video search is grossly under reported in Google Analytics.

But it gets worse…

Google Reports Some Traffic From Google Images As ‘Direct’ Traffic

Let’s take the example of Google Images (though it happens in other cases). You do a query and get a search results page with a bunch of thumbnails on it (example).

Google Images Search Results

You then click on an image and and get a page that’s displayed using a frameset (example).

Google Images Image page

In the top frame is Google’s information on the image. In the bottom frame is the page from the site with the image.

Now, if you click on any link in the bottom frame (that’s being served by the site with the image), it does not remove the frameset. The frameset stays in place. Technically, I understand where this is coming from, but it’s horrible from a user interface perspective. But remember what we established above - no pages that are framed by another site get reported in Google Analytics. So the person could spend a half hour looking at your site this way and you would never know it. They could purchase something from you and it wouldn’t register as an e-commerce transaction in Google Analytics. This is bad - very bad…

Realizing this, I decided to add target=”_top” to all of the links on the site to ensure that if the person clicked on a link on our page the frameset would be removed.

That lead to some confusion the following days when we saw a spike in direct traffic and didn’t understand where it was coming from…

Spike in direct traffic in Google Analytics

After a while we realized it was coming from Google Images (or more precisely, from framed pages). Let me explain…

Because Google Analytics doesn’t count the first framed page as traffic on your site (even though it’s served by your site), that page just doesn’t exist in Google Analytics. But it’s that page that has Google Images as a referrer. So, by losing that page you lose the fact that Google Images sent you the traffic because the first page that Google Analytics counts is the one that the person went to when they clicked on the link on your framed page. Hence, the first initial referrer according to Google Analytics was the framed page on your site. That appears (in Google Analytics) as direct traffic (not a referral or organic traffic) since your own site can’t be a referrer for your site.

Google’s help pages are inaccurate when they say:

If frames on the site reside on different domains, the referral information is likely to be inaccurate, since one frame may be recorded as the referring source of another, instead of a previous site being recorded as the referring source.

It’s more than the referral information being inaccurate - they’re removing much of the traffic completely.

This does not mean you lose a record of all the traffic from Google Images. If the user clicks on a link on a page controlled by Google Images, for example the “remove frame” or “original context” links in the top frame, then since that action took place on a Google Images page, it will come through as a Google Images referral. But that’s just a fraction of your actual Google Images traffic. If someone looks at the one framed page and leaves - they’ll never show up in Google Analytics at all.

Almost Impossible To Get Search Keywords For Image Search

To make matters worse, even though Google Images is a search engine, it’s pretty much impossible to treat it as an organic traffic source. Beyond the fact that you don’t see all the traffic, the problem is one of URLs. Let me explain…

For starters it would be nice if we could just add something like this to the tracking code:

pageTracker._addOrganic("images.google.com","q");

In theory that’s supposed to take any referral from images.google.com, find the ‘q’ query parameter and record the contents of that as the search keyword.

But the ‘q’ parameter is only found in the URL of the search results page. But the search results page is the referrer for the frameset that shows the page from your site at the bottom. The referrer for your page is the frameset, not the search results page. The frameset does, sorta, have the q parameter in it, but it’s hidden inside the ‘prev’ parameter and looks something like this…

&prev=/images%3Fq%3Dharrier%2Bhound%26gbv%3D...

Notice the q%3Dharrier%2Bhound… That’s not nearly as neat as it was on the search results page where it was &q=harrier+hound.

Now, some people have written some fairly complex routines to parse the q parameter out of that string, but it only works some of the time. When you consider the fact that the Google Images traffic you see is only a fraction of the real traffic it’s questionable whether it’s worth it to even bother since it’s just bad data and you’ll be mixing it with your good organic data from regular web search. Mixing good with bad makes everything bad.

It Doesn’t Have To Be This Way

Google Analytics has painted themselves in a corner to an extent by not properly handling framed documents. If they decided to change how they handled framed documents it would significantly impact the analytics of many sites. But I’m not sure that a good excuse for continuing the way things are…

What should happen is that if a page’s referrer is a frameset of a known search engine, then the query parameter should be pulled out of the referrer for the frameset (should be possible with Javascript). And it goes without saying that things like Google Images and Google Video should be known to be search engines (they are not at the moment).

You would think Google would at least want to properly reflect the traffic they drive to sites. Right now they’re serverly underreporting traffic from Google Images and Google Video.

Hopefully this will get better in the future. I should do a test to see how IndexTools handles the situation since, in time, it will be Yahoo! free competitor to Google Analytics.

Digg It!  Add to del.icio.us  Add to StumbleUpon  Add to Reddit  Add to Technorati  Add to Furl  Add to Netscape

May 28th, 2008

A successful semester 3D Design, Advertising Design and Graphic Arts at the New York City College of Technology, CUNY

Trophy Design, 3D Design, City Tech, CUNY

The semester finished last week with great accomplishment by the students in the three-dimensional (3D) design course in the department of Advertising Design and Graphic Arts, City Tech, CUNY. All the students made creative efforts in the final project of the course: designing an award or trophy in a competition of your choice.

Some of the more memorable designs included the Best Dancer award, the Recycling award, the Most Inspired Architecture award, the Hottest New Video Game award, a Music Writers award, the MVP Basketball award. Voted unanimously by the students, were the top 2 awards: the Best Cake award by Michele C. and the Science award by Andre G.

Thanks to John Serdula, formerly the director of the Holly Solomon Gallery, and the Annina Nosei Gallery for coming in to give personal critiques to the talented students.

Digg It!  Add to del.icio.us  Add to StumbleUpon  Add to Reddit  Add to Technorati  Add to Furl  Add to Netscape

May 20th, 2008

Zero Visit Keywords In Google Analytics?

This one had me baffled… We launched a new site for an acne doctor about a week ago and when I was looking at the stats for the site I noticed some keywords had zero visits…

Zero visit keywords in Google Analytics

I didn’t understand how that was even possible. Keywords come from organic traffic, traffic = visits… So how could keywords have zero visits?

But like most other things, the answer was out there… Turns out that’s the result of the same person coming in from different keywords via organic search multiple times in one 30 minute session.

Learn something new every day…

Digg It!  Add to del.icio.us  Add to StumbleUpon  Add to Reddit  Add to Technorati  Add to Furl  Add to Netscape

HOME · CREATIVE · WEB · TECH · BLOG