Hit Me,
Baby, One More Time
"You may fool all the people some
of the time, you can even fool some of the people all of the time, but
you cannot fool all of the people all the time. P.S. Or webmasters
about inflated site stats." - Abraham Lincoln
Ol' Abe. What foresight!
First off, let's understand that
in most cases of inflated stats we're talking about sheer, raw, unadulterated
lust for money. Like the Nielsen Ratings affect how much a TV station
can charge its advertisers, and the Arbitron Ratings affect radio, so it
is with Web "hits" and the sites that measure them. The more hits,
the more the higher-paying advertisers will flock to you. So this
is serious business.
There are a number of ways to inflate
one's hits; some semi-illegitimately (through ignorance or bad programming),
but most are a case of outright fraud. Although I'll use specific
sites as examples here, I don't mean to single them out as this is a net-wide
problem.
Note 1: There are actually
three ways to determine site traffic; "hits", "visits" and "unique visitors".
An expanded definition is here. For
the sake of readability, I'll just call one visit to a page by anyone a
"hit".
Note 2: While you may
know that you can jump over pages when backing up in the browser (by using
the History box), most people don't. Normal people, for the most
part, really don't understand what's taking place when they're on a Web
page or how to use a browser properly, and will just sit there hitting
the 'Back' button until their eyes glaze over — with each click acting
as another page view or 'hit'. As such, I'm taking the whole "Back
button" thing seriously.
Off-Linking
You'll notice that most sites open
a new browser window or tab when you click on a link that goes to another
site. This is called "off-linking", as referred to 'normal' linking
to another page on the same site. You'll notice that on The Drudge
Report and Instapundit, two sites with extremely high hit rates, the linked
post stays in the same window. This way, when you're finished with
the article, you have to hit the 'Back' button on the browser to bring
you back to the original site. That counts as a second hit.
Read three or four articles, hit the 'Back' button every time, and now
the original site has you down for four or five hits, whereas you really
only arrived at the site once.
Auto-Refresh
This method is far more insidious
and can really jacks up the stats. If you look at the source code
of Drudge's home page, you'll see this:
var timer = setInterval("autoRefresh()",
1000 * 60 * 3)
That automatically refreshes the
page every 3 minutes. (1000 x 60 x 3 = 180,000 milliseconds divided
by 1,000 = 180 seconds = 3 minutes) And it's easy to prove.
Just go to his site and start a timer. Somewhere around the 3-minute
mark the screen will flash briefly as it refreshes.
So, as long as you've got Drudge
open, no matter what you're doing on the computer (or if you're even at
the computer), you're acting as a fresh hit every 3 minutes. If a
mere 1,000 people do this for an hour, that's a whopping 20,000 hits without
anyone even clicking a link, so the numbers can really add up.
And this is from Instapundit's home
page:
META HTTP-EQUIV="refresh"
content="1800"
That refreshes the page every 1,800
seconds, or 30 minutes. No big deal, you say?
What makes this particularly insidious
is the advent of tabbed browsers, like Firefox and IE 7. This way,
people have link sites like Drudge and Instapundit constantly open in one
tab while they're off reading things in other tabs. And what's going
on the whole time that link site is open in the first tab? Refresh,
is what's going on, and your helpful — if unknowing and inadvertent — contribution
to the site's ad revenues.
So, if 10,000 news junkies show up
at work in the morning, fire up Drudge on a tab and leave it up all day
long (some of which time they may actually spend working), that's:
10,000 hits x 20 refreshes/hr x 8
hours = 1,600,000 hits/day
Now add to that all of the people
who don't know the page is automatically being refreshed and are hitting
the browser's 'Refresh' button to see if anything new has been added every
time they return to it. Scary, huh? A mere 10,000 visitors
could turn into two million by the end of the day, just from the
page sitting there in a tab on somebody's work computer.
The site owners would argue that they're merely
refreshing the page for the users' benefit, so that they won't miss the
latest hot post, but that argument falls apart two ways:
-
Remember, when you click on a Drudge
or Instapundit link, you're staying in the same browser window, so their
pages automatically refresh when you go back to their site.
-
Browsers in their current form have
been around for almost two decades. Everyone knows where the
'Refresh' button is if they want to refresh the page to catch the latest,
hot, can't-miss links — and that's especially true with the type
of people who are into the latest, hot, can't-miss links.
And, right on cue as I'm putting
this article together, there's ol' Matt with the staggering February numbers
of 568 million hits. Remember, because he doesn't off-link correctly,
if you assume every person who went to his site clicked on just one
link, then you immediately have to cut the total number in half, due to
having to 'Back' your way back to the site, thereby creating another hit.
If the average person clicks on two links, then you have to cut it by two-thirds.
Then you add tabbed browsers and
the whole "Auto-Refresh every 3 minutes" thing into the equation and you
have to wonder what fraction of that 568 million the number really
is.
The
Redirect Code
I'm sure you've noticed that occasionally
you'll go to a site and the 'Back' button doesn't work. This is because
the clever webmaster has put a 'redirect' code in the source file, so every
time you hit 'Back', it redirects you back to the current page. The
primary purpose of this — I have to presume — is that you'll eventually
give up and take a look around, possibly buying something or at least bookmarking
the site — so that later you'll buy something. But the secondary
benefit is certainly that those ol' hits are mounting up as people click
in vain to escape the site's clutches.
As I said earlier, most people really
don't understand what's going on, computerwise, and that's especially true
with older people. They'll either think they're doing something wrong
as they continue to click on the 'Back' button, or they'll think the Internet
is suddenly down and if they click on it long enough, it'll eventually
work. I taught at a computer college for two years and did field
work and gave seminars and tutored people for a number of years and saw
this kind of stuff all the time.
Coding
Chicanery
Here's the 'History' box in Explorer
6 after going to Fox News, reading a number of articles, including articles
linked from articles, then backing up to the home page:

That's exactly as it should look
(on my system). "Start Page" is the links page on my computer that
the browser opens up with. Fumble around all day long inside the
average site, back your way up to the home page and you should see one
entry, like the above, in the History window.
Unless, of course, you're MSNBC:

So, after doing the same routine
I did with Fox, the average slob ends up hitting the 'Back' button five
times just to get back to where he started. And each one of those
is another hit. "My, that MSNBC site sure is popular! Just
look at their stats!"
What's puzzling is that it doesn't
act this way all the time. A week ago, when I was writing the first
draft of this article, it suddenly was operating correctly, whereas usually
it acts like the above. A cynic would say that on some particular
days of the month some governing body checks out the integrity of certain
professional sites and that's when the webmaster cranks it back to normal.
But I would never suggest such a
thing, of course. I'm sure it's just a case of...
Coding
Errors
I'm going to presume this
is just bad programming, rather than something a little more nefarious.
There's some blogging software around
with the somewhat oxymoronic name of "Serendipity" (whereas blogsites are
about the least serendipitous spots on the web) that has a small
bug that, over time, could inflate the hits a tad.
Let's take a stroll over to a site
that uses the software. Hey, there's an article on Hillary that I
want to respond to! I furiously scribble down my little missive and
hit the 'Preview' button to see how it'll look. Realizing that my
comment might perhaps be read by millions and change the very course of
civilization, I carefully comb through it, adding this, amending that,
all the while hitting the 'Preview' button to see how it'll look.
Finally, satisfied that my important comment is worded correctly and is
ready to alter the tide of history, I hit the 'Submit' button and off it
goes.
Okay! Well, let's hit the ol'
'Back' button and see what else is on the home page...

I mean, let's eventually see
what's on the home page. Due to a bit of poor programming
on Serendipity's part, the browser is actually logging each 'preview' as
a separate page in the history and it'll take me seven clicks on the ol'
'Back' button to get back to the home page. So, although I really
only went to one page on the site, it'll end up logging about fourteen
hits total, one for each of the six 'previews', the six to get back to
the home page, and two for the home page, itself.
The
Ad Angle
Normally, on a site that doesn't
carry advertising, whether the stats are inflated or not really doesn't
matter. The problem arises when the stats get to the point where
it begs the question, "Should we start advertising?" That's a pretty
big jump, in the sense of setup time & cost to get some advertisers
online, not to mention getting a business license, dealing with the IRS,
etc.
But, as it's always been on the Web,
it's all about the numbers:
If you get 1 million monthly visits
to your site and just 1% of them click on a banner that makes you one thin,
crummy little dime, that's a thousand bucks a month of pure gravy.
So you can see why it's tempting.
One problem is that stats can be
grossly overinflated and the owner might not even know, like if the original
webmaster slipped an AutoRefresh into the source code while constructing
the site, thus giving the owner the illusion that advertising is a viable
option. Another problem is that you're bound to lose a certain core
readership if suddenly the site 'goes commercial'.
And there are further problems, such
as the 'moral dilemma' of what ads to run. Do you run the big "IMPEACH
BUSH!" banner (5,000 hits/day) or the one for this really excellent book
you just finished (3 hits/day)? Do you run the crass "SHOOT THE DANCING
MONKEY!" banner (1,500 hits/day) and lower the overall IQ of your site
by 50 points, or do you run the respectable and stately Verizon banner
(7 hits/day)?
And then there's the biggest problem
of all. Once "numbers" are the game, you start thinking about altering
your format or 'site theme' to draw more people. You start writing
articles for their linkability and popularity, rather than from the heart.
You bring in co-bloggers to sparkle the place up, thereby driving up the
numbers and, in the process, destroying that undefinable element that made
the blog special in the first place.
Hit
Counters
About the last thing you should
believe on a web site is a hit counter. I did an AutoRefresh experiment
years ago using a counter and the numbers were spinning around like a slot
machine. And you can start the numbers anywhere, just to note.
I've, uh, heard that some unethical webmasters start their clients' counters
at "10,000", just to give the new business an established look. Hey,
if a bank can start out a checking account at check #1000, then surely
I those unethical webmasters can do the same thing!
'Raw'
Hits
When it comes to 'raw' hits; that
is, without any further information such as which pages were accessed,
there are a bunch of ways to generate hits. You could have a secret
page that has an AutoRefresh set to 1 second, so basically it reloads itself
the second it's loaded, then open a hundred of them on your own computer
and just let those ol' hits mount up. Ditto running a DOS batch file
and ftp'ing the site every 2 seconds (times 100 batch files) 24 hours a
day. It wouldn't count as far as advertisers and those who know how
to dig up the stats, but it would sure impress the newcomers.
This site has received
130,563,642 HITS
in the past day!
Summation
Unfortunately, it doesn't appear
that the fraudulence out there is going to be cleaned up anytime soon.
There are groups pushing for this or that measuring standard, but there
will always be ways to pad the stats. If not internally with tricks
like AutoRefresh, then externally with actual computers faking their way
in using randomly-generated IP addresses and the like.
Bottom line? Don't believe
a single stat you see.
|