Quantcast

Google search engine question

INNspiring.com | Innkeeper Forum & Innkeeping Resources

Help Support INNspiring.com | Innkeeper Forum & Innkeeping Resources:

Penelope

Well-known member
Joined
Aug 4, 2008
Messages
1,716
Reaction score
0
Why does a .pdf for a website come up before any of the other pages of the same site? Any way to fix that?
 

swirt

Forum founder. Former Owner.
Joined
May 17, 2008
Messages
3,210
Reaction score
0
There are a lot of possibilities as for why. (I'd need to know what the search is to say more specifically) Here are a few possibilities:
  1. The site has little, if any actual text. (an all image based site) so the PDF ends up having more searchable content than the actual pages of the site.
  2. The site may have a linking structure done in a way that the spider can't follow (javascript or flash navigation) so the majority of the pages for the site have no "visible" links and the links to the pdf are straight html links so they are "visible" and end up giving more "link love" to the pdf.
  3. Content content content. The pdf may be just a better source of the content for that particular search than any other page. Google's intent is to give the best source of content to match the search.
  4. Somebody screwed up when they set up a robots.txt file and blocked the actual pages of the site to robots but unblocked the pdf. (usually you do it the other way around..but I have seen many of these kind of errors.)
  5. Errors in the pages make them impossible to crawl correctly so they have little visible content.
  6. Incoming links..if for some reason there were are a lot of people linking to the pdf, it could give it more weight than other pages on the site.
Fixing it really depends on which of the above was the cause.
 

Penelope

Well-known member
Joined
Aug 4, 2008
Messages
1,716
Reaction score
0
Okay, #4 has me worried. I really hope that it isn't my fault that the .pdf comes up first because I don't know how to fix it.
Can you expand on the robot.txt file explanation please?
 

swirt

Forum founder. Former Owner.
Joined
May 17, 2008
Messages
3,210
Reaction score
0
Okay, #4 has me worried. I really hope that it isn't my fault that the .pdf comes up first because I don't know how to fix it.
Can you expand on the robot.txt file explanation please?.
robots.txt is a file that can be placed in the root of your website that search engine spiders look to in order to see if they have permission to crawl everything on the site. If you go to www. yoursite.com/robots.txt you can see your robots.txt file. I would say 90% of websites don't have them, which is fine. They are only used to exclude pages, so if there is nothing to exclude, then you don't need one.
Over the years I've been contacted by a few people who are sure they were banned or thought the search engines were plotting against them only to discover that their robots.txt file was blocking all pages. They were either victims of an accident or they pissed off their webmaster who was less than scrupulous (does that make them scrupuless
) and left a little going away present.
 
Top