We currently have no method to detect whether a file has been uploaded to the system before or not.
To achieve this I would like to store the sha1 has value as computed by ns_sha1 in cr_revisions to save the time of calculating this for all content when a new file is added.
The reason for sha1 lies in the fact that bittorrent is using it and we therefore have the option to use this value to create a "download with bittorrent" link to file storage in the future.
Once approved I'd add the column "sha1_value" to the cr_revisions table and write an upgrade script. This upgrade script would *not* scan existing files as this could prove desastrous (take a long time).
Furthermore I'd implement a functionality in file-storage to calculate the ns_sha1 key when uploading a file.
Further changes could include:
- Change the upload workflow for files. If a file already exists in the system *and* the user has read permissions on it, offer to create a symlink instead of uploading the file anew.
- Write a bittorrent file generator for files, displayable in folder-chunk.tcl
- Write a closeness calculator for communities (the more files are identical in the communities, the closer they seem to be related). Obviously this involves considerably more than just files, but we need to cover the basics.
For a previous discussion which was not so limited as this TIP look at https://openacs.org/forums/message-view?message_id=179437