This special remote type stores file contents in directory.
One use case for this would be if you have a removable drive that you want to use to sneakernet files between systems (possibly with encrypted contents). Just set up both systems to use the drive's mountpoint as a directory remote.
With the exporttree=yes parameter, the directory contains a tree of files
with the same filenames used in a branch of your repository. Without that
parameter, the directory contains a directory structure similar to
.git/annex/objects
or other special remotes like rsync.
Bear in mind that you can also use a regular git clone
of your git-annex
repository, rather than a directory remote.
configuration
These parameters can be passed to git annex initremote
to configure the
remote:
directory
- The path to the directory where the files should be stored for the remote. The directory must already exist. Typically this will be an empty directory, or a directory already used as a directory remote.encryption
- One of "none", "hybrid", "shared", or "pubkey". See encryption.keyid
- Specifies the gpg key to use for encryption.chunk
- Enables chunking when storing large files.chunksize
- Deprecated version of chunk parameter above.
Do not use for new remotes. It is not safe to change the chunksize setting of an existing remote.exporttree
- Set to "yes" to make this special remote usable by git-annex-export. It will not be usable as a general-purpose special remote.importtree
- Set to "yes" to make this special remote usable by git-annex-import. It will not be usable as a general-purpose special remote.annexobjects
- When set to "yes" along with "exporttree=yes", this allows storing other objects in the remote along with the exported tree. They will be stored under .git/annex/objects/ in the directory.ignoreinodes
- Usually when importing, the inode numbers of files are used to detect when files have changed. Since some filesystems generate new inode numbers each time they are mounted, that can lead to extra work being done. Setting this to "yes" will ignore the inode numbers and so avoid that extra work. This should not be used when the filesystem has stable inode numbers, as it does risk confusing two files that have the same size and mtime.
Setup example:
# git annex initremote usbdrive type=directory directory=/media/usbdrive/ encryption=none
# git annex describe usbdrive "usb drive on /media/usbdrive/"
Thanks for this great tool! I was wondering what the differences are between using
type=directory
,type=rsync
, or a bare git repo for directories?I guess I can't use just a regular repo because my USB drive is formatted as
vfat
-- which threw me for a loop the first time I heard aboutgit-annex
about a year ago, because I followed the walkthrough, and it didn't work as expected and gave up (now I know it was just a case of PEBKAC). It might be worth adding a note about vfat to the "Adding a remote" section of the walkthrough, since the unstated assumption there is that the USB drive is formatted as a filesystem that supports symlinks.Thanks again, my scientific data management just got a lot more sane!
The directory and rsync special remotes intentionally use the same layout. So the same directory could be set up as both types of special remotes.
The main reason to use this rather than a bare git repo is that it supports encryption.
git annex move --from $remote
to get all the content out of it, thengit annex dead $remote
and finally you cangit remote rm $remote
I tried the suggestion on comment 4, but when I add again a remote with the same path, it gets the same repository identifier and is considered dead. Is that expected?
My use case: I use a usb drive to transfer some large files from one git annex to another, then I use the usb drive for something else and the special remote is removed. Later I want to use the same usb drive again, but when I create the repository, it starts in the dead state.
@nicolas, I suspect you are using
git annex initremote
with the same name that you used for the now dead-and-buried remote. That causes it to be reanimated, which is not what you want.Since git-annex version 4.20130501,
git annex initremote
is reserved for creating new remotes, not enabling old ones, so it will refuse to do this. That's to avoid exactly this confusion.Using
git annex initremote
with a different remote name, and the same directory should work just fine.This is great work. I've developed a serious annex-addiction and now I want to use it everywhere! In particular I was hoping to apply it to this use case:
I have large files/directories (approx 5 TB) on an nfs mount to which is a) write-protected (think "read-only medium") and b) used by non-git users. Both reasons prevent me from setting up a git-annex repos there. However, I would like to use git-annex to keep track of the paths and get/drop files from my different computers.
On one of my servers, I set up a git annex repos, hoping to only manage the structure, the locations, and the number of copies. I don't want to have copies of the 5TB files in that repository, as disk space is not unlimited (just for the sake of making them available to my laptop).
I as banking on using a special remote (either directory or rsync) to tell the git-annex repos where the actual data is.
I am not concerned with data loss, as it is backed up in regular time intervals by our sysadmin.
I tried both directory remote and rsync remote but there seems to be a missing piece (I suppose its add). Any ideas?
This is what I did:
I added the directory remote and an rsync remote
the copy command fails without complaints $ git annex copy --from collections
I tried adding virtual files to git annex
but still any kind of get/copy command does not get any new files
It would be awesome if I could use git-annex for this, to keep track of my copies and copies of copies. And then I could also keep track of data on my write-protected DVDs.
Is there any chance?
Thanks a lot!
-- Laura
@Laura the directory special remote requires files to be in a particular directory structure with special names git-annex comes up with. So you can't use it on an existing tree of files like that.
What you can do is use the web special remote, with a
file://
url to point to the files wherever they are stored. So for example,git annex addurl file:///media/dvd/file
Using the web remote is a pretty nice trick!
Thanks, Joey - I would not have guessed that.
-- Laura
@dmitry, it just so happens that directory special remotes honor the annex.diskreserve configuration setting. This normally only applies to the local git repository, and not to remotes, but since directory special remotes are local annex.diskreserve is also checked for them.
There's no currently a way to have a different diskreserve setting for a directory special remote than is used for the local git repository.
@joeyh - trying to understand the purpose behind the layout for the directory and rsync remotes
Format appears to be /BASE_DIR/xxx/yyy/FILEKEY/FILEKEY
What's the purpose of the filekey directory? I.e. why have FILEKEY as both a directory and then a file of the same name inside that directory?
@aslkdjasd this layout is the same as used for .git/annex/objects in the local repository.
This allows the FILEKEY directory to have its write bit removed. This prevents an accidental rm -rf from losing the contents of annexed files. Since that might be the only copy of a file, git-annex adds this extra protection.
Format appears to be /BASE_DIR/xxx/yyy/FILEKEY/FILEKEY
How can I get the xxx/yyy or the complete storage path of an annexed file in a fast way?
Background:
I have to "chmod" the xxx/yyy within an '.git/hooks/pre-commit' action (shared used NFS location).
Following way using 'find' needs too much time in case of a big remote:
The hash directories for a key can be looked up by passing the key to
git annex examinekey --format '${hashdirlower}\n'
I'm using git-annex for backing up a variety of data, with several different remotes (including a USB drive for backup, rsync for encrypted cloud backup, etc). I have one particular use case that I am trying to figure out how to implement:
Pictures, Music, and Video are stored on a WD MyCloud under three corresponding folders. They are accessible via NFS, as well as via AFP; I specifically mount them on my laptop via NFS. I want to be able to access the files in the following ways:
So what I decided I needed was a directory special remote, with both the exporttree and importtree options. So I created the special remote using something like:
On my laptop's repository, I executed the following:
Problem: - This imports the content from the special remote, into the laptop's repository. I don't want that.
I do realize that my main git-annex repository on the laptop probably has to actually download all of the files in order to compute their hashes. However, I do not want them to stay on the laptop, as there is nowhere's near enough space for all of them. I did try:
However, this results in none of the files on the WD MyCloud actually getting processed.
What am I missing?
Thanks! -Rob
git-annex-import
has a--no-content
switch.This page warns against the use of the directory special remote if the user expects to be able to access his files:
By setting the options
importtree
andexporttree
it seems that the user can have the aforementioned behaviour right?