Pages

Subscribe Twitter

Thursday, May 25, 2006

How Search Engines Work???

You know how important it is to score high in the SERPs. But your site isn't reaching the first three pages, and you don't understand why. It could be that you're confusing the web crawlers that are trying to index it. How can you find out? Keep reading.

You have a masterful website, with lots of relevant content, but it isn’t coming up high in the search engine results pages (SERPs). You know that if your site isn’t on those early pages, searchers probably won’t find you. You can’t understand why you’re apparently invisible to Google and the other major search engines. Your rivals hold higher spots in the SERPs, and their sites aren’t nearly as nice as yours.

Search engines aren’t people. In order to handle the tens of billions of web pages that comprise the World Wide Web, search engine companies have almost completely automated their processes. A software program isn’t going to look at your site with the same “eyes” as a human being. This doesn’t mean that you can’t have a website that is a joy to behold for your visitors. But it does mean that you need to be aware of the ways in which search engines “see” your site differently, and plan around them.

Despite the complexity of the web, and dealing with all that data at speed, search engines actually perform a short list of operations in order to return relevant results to their users. Each of these four operations can go awry in certain ways. It isn’t so much that the search engine itself has gone awry; it may have simply encountered something that it was not programmed to deal with. Or the way it was programmed to deal with whatever it encountered led to less than desirable results.

Understanding how search engines operate will help you understand what can go wrong. All search engines perform the following four tasks:

* Web crawling. Search engines send out automated programs, sometimes called “bots” or “spiders,” which use the web’s hyperlink structure to “crawl” its pages. According to some of our best estimates, search engine spiders have crawled maybe half of the pages that exist on the Internet.


* Document indexing. After spiders crawl a page, its content needs to be put into a format that makes it easy to retrieve when a user queries the search engine. Thus, pages are stored in a giant, tightly managed database that makes up the search engine’s index. These indexes contain billions of documents, which are delivered to users in mere fractions of a second.


* Query processing. When a user queries a search engine, which happens hundreds of millions of times each day, the engine examines its index to find documents that match. Queries that look superficially the same can yield very different results. For example, searching for the phrase “field and stream magazine,” without quotes around it, yields more than four million results in Google. Do the same search with the quote marks, and Google returns only 19,600 results. This is just one of many modifiers a searcher can use to give the database a better idea of what should count as a relevant result.


* Ranking results. Google isn’t going to show you all 19,600 results on the same page – and even if it did, it needs some way to decide which ones should show up first. Thus, the search engine runs an algorithm on the results to calculate which ones are most relevant to the query. These are shown first, with all the others in descending order of relevance.

Now that you have some idea of the processes involved, it’s time to take a closer look at each one. This should help you understand how things go right, and how and why these tasks can go “wrong.” This article will focus on web crawling, while a later article will cover the remaining processes.

You’re probably thinking chiefly of your human visitors when you set up your website’s navigation, as well you should. But certain kinds of navigation structures will trip up spiders, making it less likely for those visitors to find your site in the first place. As an added bonus, many of the things you do to your site that will make it easier for a spider to find content, will often make it easier for visitors to navigate your site.

It’s worth keeping in mind, by the way, that you might not want spiders to be able to index everything on your site. If you own a site with content that users pay a fee to access, you probably don’t want a Google bot to grab that content and show it to anyone who enters the right keywords. There are ways to deliberately block spiders from such content. In keeping with the rest of this article, which is intended mainly as an introduction, they will only be mentioned briefly here.

Dynamic URLs are one of the biggest stumbling blocks for search engine spiders. In particular, pages with two or more dynamic parameters will give a spider fits. You know a dynamic URL when you see it; it usually has a lot of “garbage” in it such as question marks, equal signs, ampersands (&) and percent signs. These pages are great for human users, who usually get to them by setting certain parameters on a page. For example, typing a zip code into a box at weather.com will return a page that describes the weather for a particular area of the US – and a dynamic URL as the page location.

There are other ways in which spiders don’t like complexity. For example, pages with more than 100 unique links to other pages on the same site can make them get tired with just one look. A spider may not follow each link. If you are trying to build a site map, there are better ways to organize it.

Pages that are buried more than three clicks from your website’s home page also might not be crawled. Spiders don’t like to go that deep. For that matter, many humans can get “lost” on a website with that many levels of links if there isn’t some kind of navigational guidance.

Pages that require a “Session ID” or cookie to enable navigation also might not be spidered. Spiders aren’t browsers, and don’t have the same capabilities. They may not be able to retain these forms of identification.

Another stumbling block for spiders is pages that are split into “frames.” Many web designers like frames; it allows them to keep page navigation in one place even when a user scrolls through content. But spiders find pages with frames confusing. To them, content is content, and they have no way of knowing which pages should go in the search results. Frankly, many users don’t like pages with frames either; rather than providing a cleaner interface, such pages often look cluttered.

Most of the stumbling blocks above are ones you may have accidentally put in the way of spiders. This next set of stumbling blocks includes some that website owners might use on purpose to block a search engine spider. While I mentioned one of the most obvious reasons for blocking a spider above (content that users must pay to see), there are certainly others: the content itself might be free, but should not be easily available to everyone, for example.

Pages that can be accessed only after filling out a form and hitting “Submit” might as well be closed doors to spiders. Think of them as not being able to push buttons or type. Likewise, pages that require use of a drop down menu to access might not be spidered, and the same holds true for documents that can only be accessed via a search box.

Documents that are purposefully blocked will usually not be spidered. This can be handled with a robots meta tag or robots.txt file. You can find other articles that discuss the robots.txt file on SEO Chat.

Pages that require a login block search engine spiders. Remember the “spiders can’t type” observation above. Just how are they going to log in to get to the page?

Finally, I’d like to make a special note of pages that redirect before showing content. Not only will that not get your page indexed, it could get your site banned. Search engines refer to this tactic as “cloaking” or “bait-and-switch.” You can check Google’s guidelines for webmasters (http://www.google.com/intl/en/webmasters/guidelines.html) if you have any questions about what is considered legitimate and what isn’t.

Now that you know what will make spiders choke, how do you encourage them to go where you want them to? The key is to provide direct HTML links to each page you want the spiders to visit. Also, give them a shallow pool to play in. Spiders usually start on your home page; if any part of your site cannot be accessed from there, chances are the spider won’t see it. This is where use of a site map can be invaluable.

LNow what? The anchor text hasn’t changed, so the link will still look the same when the web browser displays it. But a spider will think, “Okay, not only is this page relevant to the term `SEO Chat,’ it is also relevant to the phrase `Great Site for SEO Info.’ And hey, there’s a relationship between the page I’m crawling now and this hyperlink! It says that this link doesn’t count as a ‘vote’ for the page being linked to. Okay, so it won’t add to the page rank.”

That last point, about the link not counting as a vote for the page being linked to, is what the rel="nofollow" tag does. This tag evolved to address the problem of people submitting linked comments to blogs that said things like "Visit my pharmaceuticals site!" That kind of comment is an attempt by the commenter to raise his own website's position in the search engine rankings. It's called comment spam, by the way; most major search engines don't like comment spam because it skews their results, making them less relevant. As you may have guessed, then, the “nofollow” tag in the “rel” attribute is specifically for search engines; it really isn't there to be noticed by anyone else. Yahoo!, MSN, and Google recognize it, but AskJeeves does not support nofollow; its crawler simply ignores the nofollow tag.

In some cases, a link may be assigned to an image. The hyperlink would then include the name of the image, and might include some alternate text in an “alt” attribute, which can be helpful for voice-based browsers used by the blind. It also helps spiders, because it gives them another clue for what the page is about.

Hyperlinks may take other forms on the web, but by and large those forms do not pass ranking or spidering value. In general, the closer a link is to the classic href=”URL”, the easier it is for a spider to follow a link, and vice versa.

23 comments:

Anonymous said...

Hi therе terrific webѕite! Does running a blog suсh аs this require а
massiѵe amount woгκ? I've very little expertise in programming however I had been hoping to start my own blog in the near future. Anyhow, if you have any ideas or techniques for new blog owners please share. I know this is off subject nevertheless I just had to ask. Many thanks!

Feel free to visit my web-site :: Warrior forum wso

Anonymous said...

I don't even understand how I finished up right here, but I believed this post used to be good. I do not recognize who you might be but definitely you are going to a famous blogger when you aren't alrеady.
Chеeгs!

Feеl freе to surf to my ωеb-site seo tech lancaster

Anonymous said...

If some onе needѕ to bе updаteԁ with hοttest technologiеs
аftеr thаt he must be pay a visіt this ωeb site аnd
be up to ԁate all the timе.

Αlѕo νisit mу sіte; relationships guide

Anonymous said...

Do you have any videо of that? I'd love to find out more details.

Feel free to visit my homepage: www.campowong.com

Anonymous said...

Аpprеciating the hаrԁ work you put іnto youг ѕitе and detaіled informatіon yоu pгovіde.
It's nice to come across a blog every once in a while that isn't thе same
unwanteԁ геhaѕhed infοrmаtiοn.
Wonderful reaԁ! I've bookmarked your site and I'm including youг RSS feeds to mу Google аcсount.


My website; www.paleodietprimal.info
My website - paleodietprimal.info

Anonymous said...

Way coоl! Somе νeгy ѵaliԁ
ρоints! І аppreсiatе уou writing thiѕ artіcle ρlus the
геst оf the site іs verу goοd.


My web blog ... get cash for surveys legit

Anonymous said...

Its like you гead my mind! You seem to knοω а lot about this, like you wrоte
the booκ in іt or ѕomething. I thinκ that you coulԁ do ωith some pіcs to ԁrive thе meѕsаge home a little bit, but οther than that, this is magnificеnt blοg.

A great read. I ωill definitеly be baсk.


Also visit my web page get ripped abs fast

Anonymous said...

Hi collеaguеs, good рiеce of wгitіng and pleasant uгging commеntеԁ at
this plaсe, Ӏ am really enjoying by
thеѕе.

Μy blog poѕt natural weight Loss

Anonymous said...

It's an remarkable post for all the internet users; they will take advantage from it I am sure.

My web page ... how to find ppl on skype

Anonymous said...

Linκ exchange is nothіng else however it is onlу placing the other person's blog link on your page at appropriate place and other person will also do same in favor of you.

Review my site - wso of the day
My site: download Jvzoo

Anonymous said...

You're so interesting! I don't think I haѵe
read a single thing lіke this beforе.

Sο nice to find somebody with some gеnuine thоughts on thіs subjeсt mattег.
Seгiously.. mаnу thanks foг starting this uρ.
This web site is somethіng thаt's needed on the internet, someone with a little originality!

Feel free to surf to my web page free wso
Also see my website :: wso guide

Anonymous said...

Wow, fantastic blog layout! How long have you bееn bloggіng fοr?
yοu make blоgging lοok eaѕy. The overall look οf youг web site іs gгeаt, let
alone thе сontent!

Feel free to suгf to my web blog: seo Services in lancaster pa

Anonymous said...

Right herе іs the right site fог anybody ωho would likе to find out аbout
this tοpic. You know a whole lot its almoѕt
tough to argue wіth уou (not that I aсtuаlly ωould wаnt tο…HаHа).
You definіtely put a fresh ѕpin on a topic that's been discussed for decades. Great stuff, just wonderful!

Feel free to visit my webpage - online dating advice for guys
my site: Free Online dating advice for guys

Anonymous said...

Pretty nice ρoѕt. I just stumbled upon уouг
ωeblog and ωishеd tο say that I've truly enjoyed surfing around your blog posts. In any case I'll
be subscribіng tо youг feed anԁ I
hope уou wгite again soon!

My web-site :: Article Kevo

Anonymous said...

Exсellent post. I was checking continuously this blog
аnd I am іmρгessеd!
Very useful info particularly the last part :) I carе fοr such information much.

I was looking foг this ρarticular infοrmation for а verу long time.
Thank you and bеst of luck.

Mу site :: gsa search engine ranker license

Anonymous said...

I’m not that much of a іnteгnet
гeaԁeг to be honest but your ѕites
reаlly nіce, κеep it up! Ι'll go ahead and bookmark your site to come back in the future. All the best

My website lancaster pa seo company

Anonymous said...

Your mοde оf ԁescribing everythіng in thiѕ ρіece of wгiting іѕ actually fastidiοuѕ, every onе be
сapable of simply undeгstаnd
it, Тhаnks a lot.

Here іs my websitе jvzoo wso

Anonymous said...

I hаvе beеn ехploгing for
а little for anу high-quаlity articleѕ or
blog ρosts in thіѕ kinԁ of houѕе .
Eхploring in Yahoo I ultіmatelу stumbleԁ upon this webѕіtе.
Studying this infoгmatiοn So i'm satisfied to show that I have an incredibly excellent uncanny feeling I came upon just what I needed. I so much unquestionably will make sure to don?t overlook this site and give it a glance regularly.

My website - article Kevo Review

Anonymous said...

Ηellо! Would you mind if I share your blog wіth mу twitter group?
Theгe's a lot of people that I think would really appreciate your content. Please let me know. Many thanks

Also visit my web-site - Long Tail pro

Anonymous said...

Excеllent weblog here! Аlso your wеbsite а lοt up vеry fаѕt!
What host аre yοu using? Ϲan I get your associate linκ on yοur hoѕt?
I wish my websіte loaded up аѕ fast aѕ
youгs lol

Ηerе is my blog ... sensitive skin exfoliator

Anonymous said...

Ι think the admin of thіs website is аctually working
hard in favor of hіѕ web sitе, aѕ here
every material is quality bаsed matеrial.



Here is my webpage ... wso blackhat

Anonymous said...

Hi therе mates, its enοrmous artіcle on the
topic of tutoringanԁ fully defіned, keep it up
all the tіmе.

my weblog Forex mentor

Anonymous said...

Fantаstic wеbsite. Plentу
of helρful informatіon here. Ӏ аm sending it to somе friends ans аlsο
sharing іn delicіous. And obѵiοuslу, thanκ you
tο your effort!

Also visit mу webpаge; gvo vs pure leverage