Forum OpenACS Q&A: Request for Proposals on improving OpenACS with advanced document management facilities

Hi very body,

I wanted to initiate a discussion on how and what has to be done to make OpenACS become a even more powerful document management environment as it is already.

Status Quo

OpenACS has the following great features:

  • Filestorage
    • Directories
    • Description
    • Versioning
    • Access Control
    • Moving of files
  • Communities

What is misssing?

Now that is a good question. There are many great tools available on the market. The more features they have the less web based they are, the more confusing it is for the user. So what do we really nead to make OpenACS ready? This I cannot tell by now. I think - and that is why this post - it is important to look at the leading providers of Document Management Systems but also to look at reasearch on that field.

Here is what I have found so far:

  • Studies
    • DFKI Evaluation Center For Language Technology Results
    • People use spatial organization to express relationships between (physical) documents, so the document manager should provide means to spatially organize document representations.
    • When collecting documents, users often cannot immediately decide where to file or how to classify these documents. The tool should facilitate easy creation of informal structures out of newly discovered documents.
    • People tend to organize their documents into hierarchical structures. The tool should support this activity. People annotate documents so it is easier for them to remember what was important about the information at a later time. Hence, the tool should allow users to associated an annotation with each document. Annotations are also used to express more specific relationships between documents (e.g., "Document A describes a user evaluation of the system presented in document B'"). The document manager should explicitly support the definition and browsing of these kinds of relationships.
    • The tool should support the user in finding documents in the personal collection.
  • Vendors/Providers - a far to short list
    • Omnistar Document Manager - web based, category,description,1GB total, 16 MB max, keyword/category search
    • Lotus Domino.Doc - application, Office and Outlook integration
    • Dobrica Pavlinusic's Document Manager - based on PhPAnyPortal, undelete, changelog, notes, directories, multiuser with lock
    • CambrigdeSoft Document Manager - application, for chemistry, auto scanning, full text search, indexing
    • DSFInternet.com Document Manager - web based, private, group, public folders, forwarding, permissions, group membership management, keyword search
    • intranets.com Document Manager - .NET, folders, categories, organisation, rights, full text search, drag and drop, virus protection, automated backup
    • Adept Scientific Document Manager - equivalent to Cambridge
    • sapphire Technologies Document Manager - web based, categories, organisations, file type recognition, version control, search, only office files and graphics, coldfusion integration
    • qoobmedia Document Manager - standard
    • DrugPak Suite: DP Document Manager - application, TWAIN driver, fax, email integration
    • asp.be - Document Manager - web based, type recognition, organization, version control, access log
    • nescforge - GForge Document Manager - web based, cvs support, file publication, workflow, title, description, language, groups, part of groupware
    • SemIPort's Document Manager - application, two-dimensional, zoomable plane as the central interface component, based of DFKI research, annotations, hierarchies
    • Amicus Document Manager - users contribute files in native formats, automatic conversion to web formats (HTML, XML, WML, cHTML and PDF), J2EE, web and java application
    • northwrite Worksite - web based groupware incl. document manager (simple)
    • Electronischer Leitz Ordner - application, TWAIN, categories, keywords, search, office integration, email

First result

After looking at the I found out that

  1. OpenACS is already very powerful but some features would be nice
    • file type recognition and automated indexing for full text search
    • flexible way of defining meta attributes for files and folders with inheritance to subfolders
    • Adding Annotations or Comments to files and folders
    • search (full text, keywords, folder/subfolder)
  2. many tools are too simple
  3. pro tools are to overwhelming with allowing too many features, one looses orientation or becomes a librarian doing nothing else but creating meta-data for files

An interesting thought

Still I felt something was missing. Looking at other platforms like ILIAS an idea came to my mind which I think is very appealing. In ILIAS for every class a distibution list exists. So if you are a member of that group you can send an email to other members.

According to that all we need is to assign virtual emails to created classes to support this feature. But through the combination of user or communities and file storage we could even use emails as the key file management tool for OpenACS.

So if a user wants to store a file to OpenACS he can send a file either to himself or to one of his groups as long as he is a valid member of OpenACS and a community therein. Using OpenACS in intranets one could solve the authentication problem by only allowing mails from one single email server that uses the same user accounts which OpenACS uses via external authentication.

To keep it simple a user can simply send an email. The subject becomes the name of the file and the body the file description. What is needed is to indicate the folder where the file has to be stored. Also archiving of emails should be supported.

Every post is replied by OpenACS as an acknowledgment. Either as a html form or as a txt file containing the unique id of the file, a link to the file, the file name and its description as well as the possiblity to create an annotation or further meta attributes for the file. If the file is stored in a folder that has attributes assigned those are included in the acknowledgment in case the user wants to fill them out. If the user don't reply to that acknowledgment mail nothing happens. In case of a reply OpenACS supplements the entries to the object. The acknowledgment mail can be used by the user locally to quickly search for files stored in OpenACS

On OpenACS side after recognizing the file type the file is indexed for full text search. A conversion to web based formats is nice to have but no necessary. Important is that after indexing the file the acknowledgment email is sent to the user incl. a list of automatic generated key words from the indexing which the user can use as a starting point for indicating meta information

A standard search email with a html or txt based form can be used to let OpenACS generate an email incl. a list of document matches their ids and the links to them. The user can use the id to ask OpenACS to forward the file to another email-address and use the link to access the file over OpenACS after loggin in.

I hope this post made some people start thinking. No matter if you find the last chapter interesting or not I think it is worth discussion if there is a need of improving OpenACS with advanced document management facilites while keeping things simple.

Also I know that with a good but easy concept there are quite a few universities I know who are desperately searching for such a tool.

Why should we let them to by the IBM Document Manger for 40.000 EUR each if they could get OpenACS with a lot less by investing in improving it thus helping also the OpenACS community.

Thanks to those who didn't gave up reading this far too long post until the end :)

Greetings,

Nima
Nima, good questions. I'll respond soon enough.

Btw, swing by the IRC chatroom. This stuff (in the context of DAV) have been discussed quite a bit there...

talli

My most urgent wish for document management is much less laborious: A simple "check out" / "check in" button to enable users to do document collaboration with the normal revision features of their word processors.
Things I'd add:
  • Possibility to link files together, which would allow you to see related files. It is a different approach to finding your files.
  • Hashing of files so you could create a web showing where the same file has been used and, using some engine behind the scenes, detect relationships between people and communities. Last but not least, you could safe diskspace.
  • Obviously, all of the things mentioned above should be available using WebDAV (especially checkin/checkout)
  • Not sure if this is already possible, but when dealing with office files, they already might contain some description . Is it possible to extract this and automatically insert it into our database?
This thread already brought some very nice ideas together on how to improve the file storage system. Maybe we could write up a nice document, describing the current features (along the lines of the other document management systems) and the features that need funding. Put this document on OpenACS.org and make sure, it ranks high in Google.

I just looked for our assessment page. Brr... Even the Ilias BBoard has a higher ranking than our specs for assessment.

Malte,

I defintiely want to get file locking into the web interface for file-storage and other CR apps for intergration with WebDAV support.

Just one thought on the File Storage Package:

An interesting extension would be the usage of the Andrew File System which is a distributed filesystem product. It was originally developed at Carnegie Mellon University and at IBM Pittsburgh Labs. It offers a client-server architecture for file sharing, providing location independence, scalability and transparent migration capabilities for data.

Also there is an open source implementation OpenAFS available which can be used to transparently integrate with the unix/linux. The benefits of AFS are data replication, fine graned access control to user owned directories through access lists, caching, sharing of files, callbacks and many more.

A quick and dirty integration with OpenACS would be to switch to the file system mode of the file storage package and to replace the numeric folder numbers in the content-repository-content-files of the OpenACS installation with the unique userid's used in the external authentication which are linked to the corresponding afs folders.

But: A policy is needed how to synchronize the files with OpenACS if users remove files directly (for instance: read only if created by OpenACS) using an afs client like telnet/SSH/FTP... Also it has to be decided what happens to community files. Either OpenACS manages them as it is done now without integrating them with afs or the files are owned by the user who provided the file or for each community a virtual user is needed in afs.

At present each file stored in the file system gets a unique number without extension. If afs access is limited to OpenACS only this makes no problems at all. But I think the files should at least maintain their extension otherwise it is for a normal user impossible to figure out the file format.

Greetings,
Nima

Well, maybe this is too far off, but would it be possible to write a generic interface to file storage that would allow us to plug in various other file systems.

I'm thinking about rudimentary support for SMB, various file-sharing (P2P) networks.

Especially in the P2P area you could issue a search for a file, extend it to foreign sites and get a list of files you would like to download. Once the download is completed you get a notification that the file has arrived in the folder of your choice.

Talking about notifications. It seems that we do not have notifications on file-storage enabled. I've seen it working with AIESEC. If this is a wanted feature, Azri might squeeze this in before the feature freeze.