rclone is a command line program to sync files and directories to and from a vast variety of cloud provider and protocols. At the time of writing, this includes the following services:
- Amazon S3 / Dreamhost / Ceph / Minio / Wasabi
- Backblaze B2
- Box
- Dropbox
- FTP
- Google Cloud Storage
- Google Drive
- HTTP
- Hubic
- Jottacloud
- Mega
- Microsoft Azure Blob Storage
- Microsoft OneDrive
- OpenDrive
- Openstack Swift / Rackspace cloud files / Memset Memstore / OVH / Oracle Cloud Storage
- pCloud
- QingStor
- SFTP
- Webdav / Owncloud / Nextcloud
- Yandex Disk
- The local filesystem
That list is regularly expanding.
There are two ways to use rclone as a git-annex special remote.
- Install git-annex-remote-rclone. This will work with any versions of rclone and git-annex.
With a recent version of rclone and git-annex, it is not necessary to install anything else, just use
git-annex initremote type=rclone ...
For documentation on using rclone that way, see the output of
rclone gitannex -h
or here.
In order to use rclone as a special remote, the user needs to download a separate Bash scriptfrom https://github.com/DanielDent/git-annex-remote-rclone and put it in their PATH. Since that extra dependency is only a few hundred lines of Bash, I would be interested in attempting to implement
Remote/Rclone.hs
so that the rclone special remote is entirely built into git-annex. However, I wanted to run it by you before more seriously considering investing time in doing that. What are your thoughts on this? I'm assuming the only reason rclone support isn't built into git-annex is just a lack of time and incentive, rather than a more fundamental technical reason. Is that right?Thanks for all your work on this tool.
That's right, I have actually thought before that enough people use it that it would make sense to either build it in as haskell or ship the program with git-annex in a kind of contrib.
With that said, there is also something to be said for distributing maintenance, and I think I'd at least want a committment to maintain it if it were added to git-annex, since git-annex-remote-rclone already has ongoing maintenance.
Another angle is New external special remote for rclone which might see the special remote built into rclone itself, and so able to take advantage of rclone's internal API. That might supplant the shell script if it turns out to be better.
Here are a few pointers for switching from
git-annex-remote-rclone
(old helper program) torclone gitannex
(rclone's builtin support):rcloneprefix
(directory relative to the rclone remote (rclone term here)) andrclonelayout
(layout of the git-annex content therein). If you set it up just like ingit-annex-remote-rclone
's README, those aregit-annex
andlower
.git remote rename my_rclone_remote my_rclone_remote.old; git annex renameremote my_rclone_remote my_rclone_remote.old
git annex initremote my_rclone_remote --sameas=my_rclone_remote.old type=rclone rcloneremotename=my_rclone_remote rcloneprefix=git-annex rclonelayout=lower
It might be possible to just change the type of the remote but at the time I'm writing this, that didn't work so I renamed the old remote and created a new one, with
--sameas
to not lose any encryption settings.Perhaps Joey can help me out here a bit with some background knowledge:
I've been seeing sporadic corruption with this setup:
As it seems, rclone keeps partial files under the name of the full file when a transfer is interrupted, for the pcloud backend. (This is for rclone <= 1.67.0; 1.68.0 has changes for pcloud, which may fix this.) My theory how the corruption might have happened:
Joey: Is this a possible error scenario?
This is plausible. git-annex requires that special remotes only show a file as present after a successful upload. If the data store doesn't work that way, the file needs to be uploaded to a temporary name and renamed atomically instead. If that's not possible, the data store is not safe for use by git-annex.
Given all the different types data stores supported by rclone, this may be difficult, but it's the right thing for the external special remote to do. I think you should file a bug.
(Does
rclone gitannex
also have this problem?)I think this only happens with some rclone remote backends (like pcloud). The pcloud backend definitely keeps partially uploaded files, under the name of the full file. The backend attempts to do the right thing and uses the
nopartial
option of the pcloud API, but this does not work as it should [1].I believe the latest rclone updates in 1.68.x should fix this issue, because they handle partial uploads in rclone itself [2].
Re:
rclone gitannex
: I only updated one client to use this, but I've also been careful to never interrupt uploads, so I can't tell. But I don't see how it behaves differently in this regard.[1] https://forum.rclone.org/t/pcloud-keeps-partial-uploads/46026 [2] See changelog, the OpenWriterAt feature implies PartialUploads: https://rclone.org/changelog/#v1-68-0-2024-09-08