Forum OpenACS Improvement Proposals (TIPs): Add new "file" content type to acs-content-repository

This TIP is to add a new "file" content type to the content repository similar to the existing "image" content type.

This would allow storing arbitrary binary objects and make it easier to handle them in a consisntent way. For example, file-storage could be fixed.

We already have acs-subsite/www/file.vuh which will allow downloading an file stored in the content repository consistently.

This will allow us to deal with files uploaded with the oacs-attach widget for xinha or tinymce to be correctly handled and indexed for search etc.

The problem is if you enable search of generic content_revisions you could get any object which may or may not be interesting or searchable.

From years of experience it has occured to me that only subtypes of content revision should actually implement the search callbacks (or service contracts in legacy installs). There should be a way to create a non-searchable content_revision.

This seems quite straightforward and needed, so if you have the patch/upgrade, approved.
Collapse
Posted by Dave Bauer on
I am adding a corallary to this TIP that "image" content type becomes a subtype of "file".

Does anyone care to discuss this?

Collapse
Posted by Dave Bauer on
"File" is a confusing name since it seems to imply the storage method no the type of content.

I'll think of a better name.

After OCT discussion since changing supertype might have unforseen consequences there is not enough benefit to changing it.

Hi Dave,

I wasn't sure why you need a new type but I understand it's basically to enable search only on "searchable" content.

Thinking aloud:

So the diference between generic content and the one you need is its visibility. But really you're talking about "generic content".

I wonder if creating a new type is the right thing to do to solve the "searchable" thing. A type should remain a type and not a way to store a property of the object (searchable).

If the main concern is searchability, a different acs-type does not seem to be a good solution to me. If one would like to change an item from searchable to unsearchable (or vice versa) one has to change its type (or maybe some supertype in its type hierarchy). This does not seem right.

Within xowiki, we use the publish_status for changing visibility (publish_status 'production' means "not yet ready", invisible to search; status 'ready' means visible). this works quite well, it changes as well the visability in some includelets, or for example the external visablilty (rss, google site-map index, etc.)

Another option would be certainly to add to cr_items/cr_revisions (not sure what i am opting for) an attribute "searchable".

Collapse
Posted by Dave Bauer on
This still does not address the fact that a generic content revision doesn't describe what the object is.

The idea here is to handle uploaded files consistently throughout the site.

This is the most important piece of information

"This would allow storing arbitrary binary objects and make it easier to handle them in a consisntent way. For example, file-storage could be fixed. We already have acs-subsite/www/file.vuh which will allow downloading an file stored in the content repository consistently.

This will allow us to deal with files uploaded with the oacs-attach widget for xinha or tinymce to be correctly handled and indexed for search etc."

So in this case the search isn't that I want to exclude generic content revisions so much, but that I want to be able to treat all uploaded files similarly. I want to be able to search ONLY uploaded files. For example, in an improved file attachment feature, we could search all the files a user has uploaded in addition to allowing them to browse folders. To do this we need to restrict the search by content_type. Having a searchability flag does not help here.

Collapse
Posted by Don Baccus on
"This still does not address the fact that a generic content revision doesn't describe what the object is."

But you're not addressing "what an object is".

You're using the type system to state "this is how the object came into existence".

This just doesn't feel right to me, but I haven't thought a lot about it.

Think about "image" then ... it can't be a subtype of "uploaded content" because I can scribble the bits myself in a graphics generator and stuff those bits into the CR - as an image. That's not uploaded.

So using the type system in this manner means you'd need two image types to be consistent ...

Maybe an attribute describing where the content bits came from would make more sense - uploaded ...

Not sure. Other than the "type" approach seems wrong.

Collapse
Posted by Dave Bauer on
I am going to take the Google doc I shared with OCT and post it on the wiki so we can address this more openly since it seems we don't have an ideal solution here. Hopefully looking at the requirements of what we need to do we can figure out a solution.
Collapse
Posted by Dave Bauer on
Here is the wiki page https://openacs.org/xowiki/site-wide-file-upload that explains some of the ideas we are working on.
Collapse
Posted by Dave Bauer on
Ok, i went back to try to remember WHY i thought this was a good idea in the first place :)

On HUB at MGH we are using the saved list template feature quite a bit. This feature uses generic content_revision to save the state of the filters.

This appears to be the ONLY place that uses generic content reviisions that I can find.

So, if I fix this limited case to use a specific content type, that solves the problem of presenting a list of generic revisions. Preseumably the remaining generic revisions will be uploaded files.

Otherwise I don't have a better idea, it seems like this proposal is not approved of, but I don't see a clear way forward to deal with files.

Note of course, this would ideally replace file storage, in which case we would replace file_storage_object content type, which has no specific attributes, except that they _are_ files created by file storage (except images which get image content_type), so the only real way to distinguish them is which folder they are in.

Maybe looking at folders is the best way? Limiting to results to specific folders? See the wiki proposal which does use folders to store uploaded files. Of course this leaves a question of identifying the correct folder, which seems at least, a smaller problem, since the number of root folders will be less than the number of files in the folders.

Collapse
Posted by Dave Bauer on
The APM also stored tarballs you generate for release as generic content_revisions.
A different idea: seems like we could view this as a classification problem, no? How about shipping a category tree that the system could utilize to classify content as it is added to the content repository?

Content -> Text -> Uploaded|Generated
Content -> Binary -> ...

or something along those lines. Haven't given the tree any thought but clearly you could imply quite a lot by selecting a category from a well-organized tree. The subsite could use this information to act appropriately on new content.