PhotoPost Community

PhotoPost Community (http://www.photopost.com/forum/)
-   Classifieds Suggestions (http://www.photopost.com/forum/classifieds-suggestions/)
-   -   Expired items should return 404 HTTP error code (http://www.photopost.com/forum/classifieds-suggestions/138209-expired-items-should-return-404-http-error-code.html)

Swanny October 19th, 2008 01:05 PM

Expired items should return 404 HTTP error code
 
Hi,

I have a suggestion for PP Classifieds. Currently, when an item is expired and removed from the db, it shows this message: Product ID is no longer in our database. If you were trying to renew an ad, you will need to resubmit.. It only makes sense that when an item expires it produces the message.

The only downside to that is that Google indexes that expired page and may or may not eventually remove it from their index. The reason being is that Google has previously indexed the page when it had a product on it, now the content has changed (expired).

Specifically, the issue here is that search engines still index the page because it's returning a 200 OK HTTP status code. Since the product is no longer in the db it yields the same page (i.e. duplicate content) for multiple pages when it really should send a 404 Not Found HTTP status code because technically that page (product) no longer exists, and will never exist again.

I don't know exactly how to implement this, I may look into it myself if you guys can't implement this soon.

I know you can send 404 HTTP headers in php using code such as this [source link]:
Code:

Content visible to verified customers only.
Is it feasible to send that header code if the product is not found in the database? It would help search engines know "hey, this file doesn't exist, so let's take it out of our index".

The result is better (cleaner, more relevant) search indexes by the search engines and better search results for end users (because they won't see those pages listed in search engine results anymore). A win-win situation.

Please consider this for a future release. Like I said when I get time I may poke around and see if it's possible but you guys probably have a better idea of what templates, php files, etc. to edit to make this magic happen. Heck, even if you tell me what files I should edit I can look around myself. Just trying to help!

Swanny October 30th, 2008 09:59 AM

OK, in case anyone else here cares about this topic, here's the fix. Open showproduct.php, from lines 135 on you'll see this:
Code:

Content visible to verified customers only.
Edit it to this (note the addition of one line):
Code:

Content visible to verified customers only.
I have tested this and it works awesome. You're welcome ;-)

Swanny January 26th, 2009 06:32 PM

Hi Chuck, I just wanted to bump this. Can you please discuss this with Michael next time Classifieds is up for discussion. I have been running this code change with *great* results. It would be nice to see this in version 3.05. Thanks!

Chuck S January 26th, 2009 06:44 PM

Michael does not code Classifieds I do.

I am not sure why you would want to send an http 404 statement. I know what your saying about some google thing but lets break this down. Take vbulletin which is probally the biggest product out there. I do not believe they issue specific http 404 pages for when a thread is deleted. Same thing they issue a statement in their program much the way we do that says thread/product not found.

Chuck S January 26th, 2009 06:49 PM

More on this 404's are used for when a file is not found. Our file is indeed found as its showproduct.php

I would have to investigate the pros cons etc of doing such a move since all 404 is discussion is placed upon the actual filename not variances of php variables used by the script.

Swanny January 26th, 2009 08:40 PM

Hi Chuck,

I'm talking about URL's not files. Google indexes URL's. I'm not very familiar with vb, I don't know how they handle deleted threads.

In doing some research I discovered the status code should be 410 Gone, NOT 404 Not Found as I previously mentioned. The code would then look like this:
Code:

Content visible to verified customers only.
So as an example, let's say we have two PP classifieds URLs. One URL is showproduct.php?product=1 and that product (red widgets) just sold today. Then we have another URL showproduct.php?product=999 which is an active ad with detailed information and a photo (blue widgets). Google has indexed both pages and is sending traffic to both the red & blue widget pages.

OK, so the red widget sold today. Great, so the seller updates/deletes the ad, and in a few days when you go to that URL, you're presented with this message: "Product ID is no longer in our database. If you were trying to renew an ad, you will need to resubmit.". Sounds good, right? Well it is and it isn't.

Now Google will visit that page again (because red widgets are still in their index). When they crawl that page, they get a 200 OK http status code which means everything is A-OK. But they'll notice there is new text there now, so they know the URL is good but they don't really know the product is actually gone. Now remember when a product is gone, that URL (showproduct.php?product=1) will never be used again, it's going to produce the "Product ID is no longer..." message forever ;-) However, as far as Google is concerned, that URL still exists, it just has new text on it. So it indexes the page that used to be about red widgets, now it has to figure out what is the page about because it must be about product ids and databases, so it has to "think" about the page and figure out how/where to rank that page now...

Then think.... take that "Product ID is no longer..." page, multiply it by a few dozen (maybe more) for each PhotoPost Classifieds installation. Then multiply 25 by the hundreds of sites running PP Classifieds and you can see how there will be a lot of red widget pages being indexed by Google that shouldn't be. Sure, Google can write algorithms to "deal" with this, but the correct way to handle it IMHO is to serve a 410 Gone header code.

It makes sense to serve a 410 http status code because as soon as Google comes by the red widget page, it knows that that URL is permanently gone. They will stop crawling the page and won't send visitors to that page anymore. And there's no need to, the product is sold/deleted/gone and even if someone uploads a red widget again, it won't be at the URL showproduct.php?product=1 ever again.

So think of it this way. It's not sending an "error" for the script showphoto.php, it's sending a 410 Gone "error" for the product URL (showphoto.php?product=1).

As per Google - HTTP status codes - Webmasters/Site owners Help - a 404 code means"Not Found". Well that's not quite true in this scenario.... it did find a page (one with a message saying product not found) which is a valid web page, so it should return a 410 "Gone" response. This is a quote from Google on what a 410 code is:
Quote:

The server returns this response when the requested resource has been permanently removed. It is similar to a 404 (Not found) code, but is sometimes used in the place of a 404 for resources that used to exist but no longer do.
The w3.org site also talks about 410 codes: HTTP/1.1: Status Code Definitions
Quote:

The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead. This response is cacheable unless indicated otherwise.

The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server's site. It is not necessary to mark all permanently unavailable resources as "gone" or to keep the mark for any length of time -- that is left to the discretion of the server owner.
So all this while the blue widget page which is active will be indexed and ranked appropriately by Google. The red widget page will be handled a bit differently, depending on whether you serve the proper http status code or not.

So to summarize, by adding the 410 http status code to expired/deleted product pages:
PROS:
- Search engines know that page is gone and therefore will stop indexing that specific product page right away (less bandwidth for the site owner, more accurate crawling for Googlebot)
- Search engines will stop sending traffic to a product page that's gone (good for visitors coming to the site, greater chance of finding what they want in the search results)
- Easy to implement, I've already proven it works quite simply with just one extra line of code (OK, change my earlier line of code from 404 to 410 if you technically want to be correct).
- The Google search results for red widgets will be more accurate because the product page that is gone is removed from the index quickly (not misleading people who are searching)

CONS:
- None ;-)

I have had that 404 code up on my site now since last October. I did a search for "Product ID is no longer in our database..." and Google has correctly de-indexed all the expired ads on my site. I wish I had calculated the "before" part of this. I can't remember how many results it found before for the same query on my site.

I hope that helps explain my position. Now, let me just add that I don't think this is some critical thing missing, I consider this a "nice-to-have" type thing that is "the best way to do it" in *my* opinion. Others may not agree, I realize that. Let me know if you have any more questions or comments. Whew, this became a long-winded response. Hopefully I've helped a little bit ;-)

Swanny January 26th, 2009 08:49 PM

Oh, and here's an example of a 410 Gone page from my classifieds:
http://www.fordf150.net/classifieds/...php/product/26

and a good 200 OK page from my classifieds:
2003 F-150 Blue Supercab FX4 - Ford F150 Classifieds

Notice the user doesn't even know what a 410 is (there's no indication to them, which is good) but check the status codes on the two pages here:
Check Server Headers Tool - HTTP Status Codes Checker

As I said, I'm not considering this a "bug", more of a "suggestion", that's why it's in the suggestion forum.

Chuck S January 26th, 2009 09:17 PM

I will have to do alot of reading before I change anything.


All times are GMT -5. The time now is 11:46 AM.

Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97