Dear Joey,
During DistriBits 2024, we discussed a concept that you seemed to like: emulating versioned tree export on a special remote with a non-versioned filesystem. This could be a generic mechanism of git-annex. Maybe a new option for the special remote (say: 'versioning = yes / no / emulated' or 'exporttree = yes / no / emulated')?
The idea is to save target files in the remote at paths reflecting the ones in the repo, but:
- create an extra directory at the end of the path identical to the filename,
- directory name includes the original extension of the file, which may seem a bit odd, but ensures no ambiguities,
- inside the directory, save the file under filename = key (preferably add the original extension).
Example: the content of the git-annex repo and remote filesystem after a few tree exports:
[git-annex]
repo-name
|
|-- phd
| |
| |-- thesis.pdf
| |-- images
| |
| |-- figure.png
| |-- diagram.png
|
|-- guidelines.pdf
[remote filesystem]
repo-name
|
|-- phd
| |
| |-- thesis.pdf
| | |
| | |--- SHA256E-...-key1.pdf
| | |--- SHA256E-...-key2.pdf
| | |--- SHA256E-...-key3.pdf
| |
| |-- images
| |
| |-- figure.png
| | |
| | |-- SHA256E-...-key1.png
| |
| |-- diagram.png
| |
| |-- SHA256E-...-key1.png
| |-- SHA256E-...-key2.png
|
|-- guidelines.pdf
| |
| |-- SHA256E-...-key1.pdf
Advantages:
- easy to implement,
- you get (kind of) versioning on any POSIX-like filesystem,
- older versions of files are never overwritten (history tracking),
- it's sufficient to push only the changed files,
- users can use the remote filesystem directly, as it represents something meaningful.
Disadvantages:
- not perfect,
- users need to accept the inconvenience caused by file naming on the bottom level,
- it may be hard to find the right file version in the remote, especially if there are lots of them;
- modification times will certainly help here,
- can we concatenate some extra information to the file names that could help in identification?
Feel free to contact me, I'd be happy to discuss and help make this happen.
When we were talking about this idea, I thought there was a problem, but didn't quite manage to find it then.
I see it now: If
foo
is an annexed file that gets exported this way tofoo/SHA--x
, and then that annexed file is deleted and a new annexed filefoo/SHA--x
is added, it will want to export it tofoo/SHA--x/SHA--y
.It would either fail because the file exists, or delete it and replace it with the directory. The former would cause the export to fail, the latter could case data loss. It's not defined what a special remote will do in this situation.
It seems that this case would never occur accidentially, but it's still worth considering it.
Perhaps it should simply skip exporting any files that have names that look like annex keys.
A general regex for a key looks like
^[A-Z0-9]+(?:-[a-z]\d+)*--.+$
, right? This seems like there would be many possible false positives that would not be exported, likeGIT--The Book.pdf
.I can't see a situation where git annex would produce or rename files named like their (or another) key. So if someone deliberately names a file like
SHA256E-s31390--f50d7ac4c6b9031379986bc362fcefb65f1e52621ce1708d537e740fefc59cc0.mp3
and then exports it to such a versioned tree, I'd be fine with overwriting. After all, export remotes aren't primarily for full backups, rather for access convenience and filesystem limitations, right?I don't know if it's a problem internally for git-annex though when an export suddenly causes keys to vanish from the remote, but I guess your git-tree-diffing automatically takes care of that?
$numcopies
' and error out with an explanation. That feels right and consistent with git annex' typical behavior.