google ignoring robots.txt files

For a while now on one of my sites it looked as though google was ignoring my robots.txt file.

I say this because a directory I specifically put in my “disallow” comments was indexed in google. I didn’t really think anything of it except “stupid google” because it wasn’t affecting anything.

Now, on one of the pages I had excluded I had a link to another page but this was on a secure page. So google of course followed it and now I have pages indexed in google with http and https so I was looking on the internet to find out how to get rid of pages with https

Backtracking a bit, my robots.txt file looked like this :

User-agent: *

Disallow: /downloads/

Disallow: /bplanx/

Disallow: /cgi-bin/

Disallow: /uk/

Disallow: /uk-reviewed/

Disallow: /small-business/

Disallow: /business-plan-books/

Disallow: /orders/

Disallow: /products/

 

User-agent: googlebot

Disallow: /business-directory/shopping/

Disallow: /business-directory/lifestyle/

Disallow: /business-directory/motoring/

 

So it had a separate entry for googlebot. Now reading up on this it seems that this googlebot entry overrides the original entry so the items that I have requested to disallow by all bots is not being read by googlebot so I needed to put those lines below my specific googlebot entry.

AHHHHHHHHHHHHHHH

A bug in google that seems to go back some years. I have actually deleted the googlebot entry so all disallows ARE for all bots.

I just need to find out how to delete all my https:// content

Your comments on google ignoring robots.txt files



FREE! SEO Tutorials !
($97 value)

First Name
Your Email



SEO Videos seo tutorial videos
SEO tutorials



SEO Categories


Archives