PDA

View Full Version : Data Directory structure (opinions?)


Michael P
September 14th, 2004, 10:35 PM
I've been thinking about the current data directory structure:

data/cat#/files

And thinking that it might make sense to split the files into:

data/cat#/thumbs/files
data/cat#/medium/files
data/cat#/large/files

There could be some real benefits here that allow for protected directories for large images, smaller directory sizes for large categories (by having them in three and not one directory).

We could also possibly remove the userid from the filename so that files get uploaded without having their names modified. The problem I'm having is thinking about files with duplicate names and renaming them; this could defeat one of the benefits of not renaming files (uploading new copies, for example). It also would mean that two users couldn't upload the same filename without one being renamed - another possible issue that defeats the purpose of not renaming files.

Any ideas/suggestions?

HobbyTalk
September 14th, 2004, 11:21 PM
There is practically no limit on the number of files in a directory. It's a filesystem limit.

What you will encounter though, is the performance limit imposed by linear searching of the directory when there are several hundred/thousand files within the same directory. This can quickly saturate filesystem performance.

I'd recommend some sort of name-hashing scheme; putting many files into subdirectories perhaps based on the first character or two of their name; that means you get less than 700 or so sub-directories within the top level. (You want to ensure that the actual directory remains small; say less than 64kbytes.) If there will be very many files within each of the sub-directories, then do another level of hashing.

i.e. a file named "abcdefg" would be stored as "a/b/abcdefg" The performance gains can be astonishing.

HobbyTalk
September 19th, 2004, 11:25 AM
As a second suggestion, once a file is uploaded, the URL to it should never change. I sometimes use my PP as a place to upload pictures and then use those pictures in articles I write. Later, if I want to move the pictures in PP to an "archive" area, the URL to the picture changes and breaks the links in the articles.

This is basic web stuff.... once you have something on the Internet, the URL should never change. No only does it affect the above situation, but can also break links in search engines (such as Googles picture search) and links for those we allow to upload and use the pictures elsewhere.

Chuck S
September 19th, 2004, 11:56 AM
I will chime in on that one Hobbytalk ;)

The url should never change? You are forgetting quite a few things here. In fact alot.

If you as admin elect to delete that pic it will be deleted. If you as admin elect to move that picture it will be moved. If you allow users to move photos or delete them url's do change and I will suspect always continue to change. I dont see Photopost allowing photos to be left on the server clogging up space if an admin deletes the image just cause someone might have posted a link to that photo in a post or published an article somewhere or a search engine has a link to that photo. Do you see this huge point here.

You cant say url's never change. They do and will. If someone posts a post in VBulletin and say google links it and then an admin makes it archive the url changes. It happens all the time. It will continue to happen.

HobbyTalk
September 19th, 2004, 09:11 PM
If a picture is deleted that is different (did I say anything about deleting pictures?). If you move a picture then the URL to that picture shouldn't change. As an example, attachments and discussion thread URLs do not change in vB if you move them to a different topic. Try it, the URL never changes.

I guess you do not see the huge point here... time for you to read up on Search engine placement. Or it could be that PP staff donít care about customers search engine placement, make Google change the way they do things, we are right. ;)

One of the big selling points of PP is it integrates with vB. I would guess a good number of PP users post pictures in the forums PP and then include the picture in the forum. Heck, PP even gives the bb code under the picture.

Beisides that, it's not that hard to do it right if the storage system is being changed, change it so it doesn't break other things.

HobbyTalk
September 19th, 2004, 09:29 PM
I guess in the end, the questions are:

Give one good reason why they should they change other then it's easier to do it that way?

If a redesign of the directory structure is coming, why not redesign it to minimize problems?

Chuck S
September 20th, 2004, 06:27 AM
You did not read my post then if you only got half of it ;)

Yes deleting images is the first and second is moving the picture. Now the script url does not change but the direct url would and the correlation I made is the VB archive system. Go take a post and archive it. Does the url change why sure it does.

The two examples I gave are valid and will never change in any software I know of whether its ours or others. You made a broad-based statement that url's never change and it was way too broad based a statement and I clarified the point alot. No I do not need to read about search engines.

Michael P
September 20th, 2004, 08:15 AM
The current directory structure allows photos to be grouped into directories relating to the category they are posted in. I, for one, like having all the photos grouped togther; but that's just me. To do it where photos can be anywhere, then I would need some kind of scheme that would allow images to be grouped in such a way that they can be found. And doing so without altering the name of the file would mean that we need to handle potential filename conflicts which would be greater in some cases.

Grouping them by user doesn't always work because some systems have thousands of users.

Chuck S
September 20th, 2004, 08:34 AM
Exactly

The one thing to consider is randomizing the filename upon upload so it could belong to anybody really with no need to rename or reassign userid's upon moving.

mjm
September 20th, 2004, 06:10 PM
Geez I had always presumed that each uploaded pic contained a unique ID, like each post in vB, and that if I moved the pic to another cat in the gallery it would still have that unique fingerprint.

I link a lot to pics in posts...

So now you're saying that the url eg. mysite.com/showphoto.php?photo=653
will change if the pic was moved to another cat?

If so, this would mean I have a monster of a task at hand to check all links, and I will need to turn off ability to move pics after upload.

Mark

Chuck S
September 20th, 2004, 07:20 PM
You can not link an image using the script value

http://www.domain.com/showphoto.php?photo=789

It has to be with the image tag and the true image path like this

http://www.domain.com/data/501/somepic.jpg

Now if you move that picture to say cat 503 that url would change which is what I posted.

Michael P
September 20th, 2004, 07:54 PM
Yes, you can LINK to the image in that manner always, but if you DISPLAY the image using an IMG tag pointing to the actual file on the dfisk, then that URL may change.

HobbyTalk
September 20th, 2004, 11:49 PM
Maybe I'm missing something but I don't see an "archive" option in the latest version of vB where you can move messages to an archive. vB has a build-in archive that all posts are added to but that is mainly for spiders to use. If you move a thread from one topic to another in vB, the link to that thread stays the same and any attachments stay the same.

I also agree that using usernames for the directory structure is not a good idea as one person could have 1000s of pictures. That is why hashing works well, it doesn't matter who uploads it or to what topic, it gets put into a directory depending on it's name.

vB uses a system that does allow you to upload different files with the same name. It also won't let you upload the same file even if it has a different name. Also, their system allows you to put the directory the files are transfered to above the web root for added security. While their system isn't ideal (it puts attachments into directories by user ID), the system they use eliminates a lot of problems and solves many other issues. Put a hashing routine in there for directory placement and it would be close to ideal IMHO.

mjm
September 21st, 2004, 12:43 AM
Yes, you can LINK to the image in that manner always, but if you DISPLAY the image using an IMG tag pointing to the actual file on the dfisk, then that URL may change.

Whew, that's a relief!
I don't use IMG tags to show pics from the gallery in on our site (forums,etc) because it doesn't link to the gallery, ie, count as a view, provide description, provide option to view next, etc.
Yeah I only get a link/ no pic.
One of these days I hope to find an easier way to provide a thumbnail as a link, (in my forums) which, when clicked on, will link to the medium pic in the gallery....

Re: Directory stuff...
I'm also interested in this topic in relevance to running multiple PPPro applications on one database which also houses a vB.
I am on the verge of getting an additional Gallery up and running, (intergrated with vB) and sooner, rather then later, would be preferred to make changes.

Regards,
Mark