November 10, 2004

Search Optimization Reflexions (Part III)

In our last two installments on search engine optimization, we introduced the mighty search engine and briefly discussed its glamorous evolutionary history. We have touched briefly on the differences between ethical and unethical search engine positioning, and we suggested that our next installment would discuss the nasty things that you can do to exploit weaknesses in search engines and get yourself banned for life.

Well, then. What we promise, we deliver! Sit back and relax as we hand to you the most abyssmal search engine practices you could possibly employ; practices whose very utterance will curl the toes of any legitimate web designer!

Pruning the Search Ecosystem

Google knows that its strength is in its relevant search results, so the search engine is constantly on the lookout for code that attempts to exploit it. Some of these exploits are simple and innocent, some are just plain ignored, while others are so outright filthy that they will get your site banned from Google. I know of people who have gotten themselves banned from Google because they didn’t listen to sound advice, and went with a search engine optimization company that used these unethical practices. Once you get blacklisted from Google, it is a long hard climb to get yourself back on.

It is important to understand that Google is always adapting to the changing ecosystem of the internet. An optimization exploit that works today may not work tomorrow. If you are using an uncommon (or recently discovered) exploit, you may be able to fly under the radar for months before getting smacked. The smack that comes (and it will come, if you’re not playing by the rules) may take you down a few notches in relevancy, or it may get you blocked entirely.

Exploiting search engine vulnerabilities is extremely dangerous, akin to playing with fire. The more you play, the further you go, the more likely that you will get burned. It is not a matter of if, but when.

But what of these exploits? What are we looking for? More importantly, what sort of things do we never want to appear on our websites? Let’s get down to the bare nitty-gritty.

Meta Tag Spam

The act of spamming meta tags was popularized in the mid 90’s by pr0n0graphy sites the world over. Meta tags are intended to help describe to users (and search engines) what sort of content appears on a page. The most popular meta tags are “keywords” and “description”, and a common exploit is to put hundreds of words in these fields, repeating the most coveted search words many times. An exploited meta tag would look like so:

<META NAME="keywords" CONTENT="transformers autobots decepticons

robots transformers autobots decepticons optimus prime megatron

rodimus prime wage battle wars cartoon show televison animation

transformers autobots decepticons robots transformers autobots

decepticons optimus prime megatron rodimus prime wage battle wars

cartoon robots transformers autobots decepticons optimus prime

megatron rodimus prime wage battle wars cartoon show televison

animation transformers autobots decepticons show televison

animation transformers autobots decepticons transformers autobots

decepticons robots transformers autobots decepticons optimus

prime megatron rodimus prime wage battle wars cartoon show

televison animation transformers autobots decepticons robots

transformers autobots decepticons optimus prime megatron rodimus

prime wage battle wars cartoon robots transformers autobots

decepticons optimus prime megatron rodimus prime wage battle

wars cartoon show televison animation transformers autobots

decepticons show televison animation transformers autobots


Note that the tag tries to call on every word that one might utter in reference to Transformers, regardless of whether or not that particular reference actually appears on the page. Also, the tag repeats these words multiple times, in an attempt to saturate the page with keywords and increase relevancy with search engines. If anything, this technique will make search engines more hostile towards your website, and will frustrate your users as they are forced to download the unnecessary content when they visit your page.

Content Spam

Content spamming comes in many different forms. Many times, unethical optimizers will take the same content they used for spamming the meta tags, and repeat it as body content within the website. The hope here is that even if the search engine ignores our spammed keywords, and even if it ignores our spammed description, it will still pick up on our spammed content and boost our relevancy appropriately.

The main problem with content spam (beyond its uselessness to internet users) is that it makes a real mess of the page. Many web pages are too cluttered as it is, and including content spam on a page will ultimately make a website impossible for actual humans to use.

Page content needs to be efficient and terse for people to digest it effectively. If they are forced to wade through incoherent lists of words, they will quickly get frustrated and go elsewhere. Since search engines don't yet have credit cards, we must make sure that the content on our website is clean and useful for our visitors. One would expect that this requirement prevents us from spamming content directly onto our website, but awful solutions abound for this equally awful problem.

Hidden Content Spam

Hidden content spam attempts to feed invisible content to search engines, keeping a page legible for humans when attempting to boost relevancy for search engines. There are many ugly exploits, which all skew the true relevancy of a page while requiring that people download content they can't see:

  • Use really small text that is all but invisible to users
  • Use text that is the same color as the background
  • Comment out text with HTML <!-- --> comment tags
  • Use various CSS techniques like negative margins, display: none, visibility: hidden, etc.

Improper Content Weighting

Search engines assign a lot of value to the actual content on a page. Content appearing higher on the page (or higher in the code) will be valued more than the content that appears below it. Additionally, content wrapped in header tags will be valued more strongly than content in paragraph tags.

However, just as with meta tags and keywords, there is a limit to how many headlines you can have before they start to dillute one another. Many unethical search optimizers will wrap large swaths of page content in <h1> tags stylized like <p> tags, in an attempt to skew the weight and relevancy for search engines.

This doesn't work as intended, because if everything on the page is a headline, who is to say that instead of everything being equally relevant, everything isn't equally irrelevant? The only effect that this heavy-handed exploit can hope to achieve is the destruction of relevant weighting on a website.

Multiple Domains

Some search optimizers claim that pointing multiple domains to the same website, or hosting the same website on multiple IP addresses, will help boost results. This isn't really the case, as most search engines will go to great lengths to avoid listing identical or redundant search results.

Link Farming

Link farming is where you create a site whose only purpose is to link to other sites. These sites typically have huge lists of poorly indexed hyperlinks, that all link to websites that are "related" to particular subjects and terms. These sites are typically other clients of the search engine optimizer, and often times will be competitors in your industry.

Since every link in a link farm is somewhat relevant, and the hyperlink text relates to the desired search word, the expectation is that the sheer volume of linked content will be considered highly relevant and will drive your site to the top.

However, because of their tortured structure and bloated code, link farms are not useful for actual internet users, and are maintained purely for search engines. Contrary to popular belief, search engines are not enthused by link farms and other sites that attempt to cater directly to them. When a link farm is detected, both it and the sites it actively links to will actually be rated lower in relevant search results.

Portal Sites

Many unethical search optimizers will register domain names and create portal sites that use all of the above exploitations, and hyperlink them to the actual website that they want to appear relevant. Some of the most inexcusable practices that I have seen will take a throw-away domain, place a hyperlinked screenshot of the site on it, and toss in hundreds of kilobytes of hidden content.

Thus Concludes Thine Beating

And so wraps up our survey of unethical search engine exploitations. Some of them are nastier than others, some are simply clumsy, and others are outright disgusting. If you ever feel uncomfortable with the course of action you may be pursuing with the guts of your website, just keep this one question in mind: Would I feel comfortable telling my mother what I'm doing?

In our next (and last!) installment on search engine optimization, we will finally discuss ethical techniques for increasing your search engine relevancy, and avoiding that greasy feeling you get from all the techniques we have already discussed.

Ground breaking work, Ethics on the Internet. You could use a good keyword slogan like, “It’s not just for p0rn anymore!” or “Penguins for Programming Preservation” That way when the rouge pervert with naturalist views and a love of Linix starts his evening search just on about anything he is interested in, BrainSideOut’s essay on Internet Ethics has a chance to be first thing he runs into.
Now I’m rambling again.

Dude. Excellent idea. If nothing else, the internet has proven that there is a market for free-association such as this… though that market is rather strange and mysterious and isn’t much of a market at all, more like a collection of weirdos with bizzare browsing tastes.
Pr0n, Penguins and Linux… those take me back to pages of scat porn ripped out of a magazine and taped to our door, with “Doug is the Devil!” scrawled on them in lipstick.
Pretending to be a penguins and smashing one’s head through the wall, only to blame it on kung-fu practice when it came time to check out of the apartment.
Laughing our guts out at Matrix Reloaded during the powerplant scene, when we all suddenly realized that the Matrix was running on Linux servers. Strangely enough, the saboteur was supposed to be hacking into the powerplant control system, when all she actually did was SSH into localhost.
Ahh, the memories of E101 Stadium. “Go home, Matt Grimm!”

Thanks for mentioning the pictures of scat p0rn, I had effectively blocked that out of my memory. Next you’ll bring up the gay version of Men’s Health. D’oh.
“Autoformat this mother &^$%er!”

in hidden content spam, you forgot to mention a couple of grotequeries I know about 🙂
a> not only CSS, but Javascript is popular to hide content spam. document.getElementById(‘whoopie’).style.display=’none’;
b> search engines have begun to use pattern matching to determine if the word order reeks of spam (I mean humans can tell that ‘chicken soup chicken soup chicken soup’ is a bit OCD) so SEO’s have begun to make more natural language-like content spam.

Cool. Thanks for the additional info, Jesse. Bringing the work of these nasty critters into the light is the best way to confront them.
I also forgot to mention blog spam, where bots will crawl a weblog and place comments that may appear to have innocuous text to get past spam filters, but actually link to a site that is to be optimized.
In the blog spam technique, too, you often see a lot of improper content weighting, with <h1> tags stylized to look like <p> tags.
There is also a technique that tries to work like a reverse link farm. A link farm will be set up, but instead of linking to sites to be optimized, it will link to sites that already have high (and ethical) search visibility, in an attempt to give the link farm more legitimacy in the eyes of the search engine.
I really enjoy the poetic nature of the natural language content spam. It’s free association at its finest!