Google search engine question

Penelope · Jul 9, 2009

Why does a .pdf for a website come up before any of the other pages of the same site? Any way to fix that?

swirt · Jul 9, 2009

There are a lot of possibilities as for why. (I'd need to know what the search is to say more specifically) Here are a few possibilities:

The site has little, if any actual text. (an all image based site) so the PDF ends up having more searchable content than the actual pages of the site.
The site may have a linking structure done in a way that the spider can't follow (javascript or flash navigation) so the majority of the pages for the site have no "visible" links and the links to the pdf are straight html links so they are "visible" and end up giving more "link love" to the pdf.
Content content content. The pdf may be just a better source of the content for that particular search than any other page. Google's intent is to give the best source of content to match the search.
Somebody screwed up when they set up a robots.txt file and blocked the actual pages of the site to robots but unblocked the pdf. (usually you do it the other way around..but I have seen many of these kind of errors.)
Errors in the pages make them impossible to crawl correctly so they have little visible content.
Incoming links..if for some reason there were are a lot of people linking to the pdf, it could give it more weight than other pages on the site.

Fixing it really depends on which of the above was the cause.

Penelope · Jul 9, 2009

Okay, #4 has me worried. I really hope that it isn't my fault that the .pdf comes up first because I don't know how to fix it.
Can you expand on the robot.txt file explanation please?

swirt · Jul 9, 2009

penelope said:
Okay, #4 has me worried. I really hope that it isn't my fault that the .pdf comes up first because I don't know how to fix it.
Can you expand on the robot.txt file explanation please?.

robots.txt is a file that can be placed in the root of your website that search engine spiders look to in order to see if they have permission to crawl everything on the site. If you go to www. yoursite.com/robots.txt you can see your robots.txt file. I would say 90% of websites don't have them, which is fine. They are only used to exclude pages, so if there is nothing to exclude, then you don't need one.
Over the years I've been contacted by a few people who are sure they were banned or thought the search engines were plotting against them only to discover that their robots.txt file was blocking all pages. They were either victims of an accident or they pissed off their webmaster who was less than scrupulous (does that make them scrupuless

) and left a little going away present.

Google search engine question

Help Support Bed & Breakfast / Short Term Rental Host Forum:

Penelope

Well-known member

swirt

Forum founder. Former Owner.

Penelope

Well-known member

swirt

Forum founder. Former Owner.

Latest posts

Google search engine question

Help Support Bed & Breakfast / Short Term Rental Host Forum:

Penelope

Well-known member

swirt

Forum founder. Former Owner.

Penelope

Well-known member

swirt

Forum founder. Former Owner.

Latest posts

Join the conversation!

Join today and get all the highlights of this community direct to your inbox. It's FREE!

Join the conversation!

Register today and take advantage of membership benefits. It's FREE!

Don't like ads?

Did you know that registered members can turn off the ads?