A git-annex key has this format:
BACKEND[-sNNNN][-mNNNN][-SNNNN-CNNNN]--NAME
For example:
SHA256E-s31390--f50d7ac4c6b9031379986bc362fcefb65f1e52621ce1708d537e740fefc59cc0.mp3
- The backend is one of the key-value backends, which are always upper-cased.
- The name field at the end has a format dependent on the backend. It is always the last field, and is prefixed with "--". Unlike other fields, it may contain "-" in its content. It should not contain newline characters or "/"; otherwise nearly anything goes. The "E" variants of hash keys include a filename extension after the hash.
- The "-s" field is optional, and is the size of the content in bytes.
- The "-m" field is optional, and is the mtime of the file when it was added to git-annex, expressed as seconds from the epoch. This is currently only used by the WORM backend.
- The "-S" and "-C" fields are only used for keys that are chunks of some other key. "-S" is the size of the chunk, and "-C" is the chunk number (starting at 1).
- Other fields could be added in the future, if needed.
git-annex always puts the fields in the order shown above when serializing a key. Older versions of git-annex would parse keys with the fields in other orders (although the name field must always come last), but the current version requires the fields come in the order shown above.
The git annex examinekey
command can be used to extract information from
a key.
Are there limitations on the character set git-annex guarantees?
It appears from experiments that git-annex only uses ASCII characters in there, given both a file 'test.extü' (in UTF-8 encoding) 'test.ext\xff' produced extension-free key names in the SHA256E hash – but it'd be good to have that confirmed.
Like git, git-annex tries to be character set agnostic when it comes to filenames, including key filenames.
It's certianly possible for a key to contain non-ascii bytes in its name, and an extension containing unicode or some other non-ascii value is one way that can in fact happen.
(Looks like, in a C locale, ".extü" is 5 chars long, so is considered too long to be an extension, while in a unicode locale, it's 4 chars long, so is treated as an extension.)