I'm talking about URL's not files. Google indexes URL's. I'm not very familiar with vb
, I don't know how they handle deleted threads.
In doing some research I discovered the status code should be 410 Gone
, NOT 404 Not Found
as I previously mentioned. The code would then look like this:
Content visible to verified customers only.
So as an example, let's say we have two PP classifieds URLs. One URL is showproduct.php?product=1 and that product (red widgets) just sold today. Then we have another URL showproduct.php?product=999 which is an active ad with detailed information and a photo (blue widgets). Google has indexed both pages and is sending traffic to both the red & blue widget pages.
OK, so the red widget sold today. Great, so the seller updates/deletes the ad, and in a few days when you go to that URL, you're presented with this message: "Product ID is no longer in our database. If you were trying to renew an ad, you will need to resubmit.". Sounds good, right? Well it is and it isn't.
Now Google will visit that page again (because red widgets are still in their index). When they crawl that page, they get a 200 OK http status code which means everything is A-OK. But they'll notice there is new text there now, so they know the URL is good but they don't really know the product is actually gone. Now remember when a product is gone, that URL (showproduct.php?product=1) will never be used again, it's going to produce the "Product ID is no longer..." message forever ;-) However, as far as Google is concerned, that URL still exists, it just has new text on it. So it indexes the page that used to be about red widgets, now it has to figure out what is the page about because it must be about product ids and databases, so it has to "think" about the page and figure out how/where to rank that page now...
Then think.... take that "Product ID is no longer..." page, multiply it by a few dozen (maybe more) for each PhotoPost Classifieds installation. Then multiply 25 by the hundreds of sites running PP Classifieds and you can see how there will be a lot of red widget pages being indexed by Google that shouldn't be. Sure, Google can write algorithms to "deal" with this, but the correct way to handle it IMHO is to serve a 410 Gone header code.
It makes sense to serve a 410 http status code because as soon as Google comes by the red widget page, it knows that that URL is permanently gone. They will stop crawling the page and won't send visitors to that page anymore. And there's no need to, the product is sold/deleted/gone and even if someone uploads a red widget again, it won't be at the URL showproduct.php?product=1 ever again. So think of it this way. It's not sending an "error" for the script showphoto.php, it's sending a 410 Gone "error" for the product URL (showphoto.php?product=1).
As per Google - HTTP status codes - Webmasters/Site owners Help
- a 404 code means"Not Found". Well that's not quite true in this scenario.... it did find a page (one with a message saying product not found) which is a valid web page, so it should return a 410 "Gone" response. This is a quote from Google on what a 410 code is:
The server returns this response when the requested resource has been permanently removed. It is similar to a 404 (Not found) code, but is sometimes used in the place of a 404 for resources that used to exist but no longer do.
The w3.org site also talks about 410 codes: HTTP/1.1: Status Code Definitions
The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise. |
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
So all this while the blue widget page which is active will be indexed and ranked appropriately by Google. The red widget page will be handled a bit differently, depending on whether you serve the proper http status code or not.
So to summarize, by adding the 410 http status code to expired/deleted product pages:
- Search engines know that page is gone
and therefore will stop indexing that specific product page right away (less bandwidth for the site owner, more accurate crawling for Googlebot)
- Search engines will stop sending traffic to a product page that's gone (good for visitors coming to the site, greater chance of finding what they want in the search results)
- Easy to implement, I've already proven it works quite simply with just one extra line of code (OK, change my earlier line of code from 404 to 410 if you technically want to be correct).
- The Google search results for red widgets will be more accurate because the product page that is gone is removed from the index quickly (not misleading people who are searching)
- None ;-)
I have had that 404 code up on my site now since last October. I did a search for "Product ID is no longer in our database..." and Google has correctly de-indexed all the expired ads on my site. I wish I had calculated the "before" part of this. I can't remember how many results it found before for the same query on my site.
I hope that helps explain my position. Now, let me just add that I don't think this is some critical thing missing, I consider this a "nice-to-have" type thing that is "the best way to do it" in *my* opinion. Others may not agree, I realize that. Let me know if you have any more questions or comments. Whew, this became a long-winded response. Hopefully I've helped a little bit ;-)