Hi,
I'm having an issue and not sure if it's to do with git-annex or something external. I'm on Ubuntu 14.04, and have been using git-annex for several months without this issue, but it suddenly started (possibly) after several packages were updated via apt-get upgrade.
We have two remotes configured, one is a local (LAN) smb share, the other is on Amazon S3. We're using shared encryption on the S3 remote, and no encryption on the smb remote.
The problem that started happening recently is when copying to the smb remote. There is no problem copying from the remote, and no problem reading or writing to the drive outside of git-annex. However, copying to the remote fails after it seems to copy most or all of the file and then hang on a gpg step. Again, there is no encryption on this remote. The other S3 remote with shared encryption has no issues. Other devices on the LAN, all OS X, have no issues writing to the remote.
I don't have enough info to necessarily claim this is a bug in git-annex, but I'm not sure what to poke at next to try to figure it out. Any help or advice would be greatly appreciated.
Below is the debug output from a failed git annex copy command.
cw@ubuntu$ git annex copy annexedfile --to smbremote --debug
[2014-11-07 15:35:13 PST] read: git ["--git-dir=/repobase/.git","--work-tree=/repobase","show-ref","git-annex"]
[2014-11-07 15:35:13 PST] read: git ["--git-dir=/repobase/.git","--work-tree=/repobase","show-ref","--hash","refs/heads/git-annex"]
[2014-11-07 15:35:13 PST] read: git ["--git-dir=/repobase/.git","--work-tree=/repobase","log","refs/heads/git-annex..aa8813d486939544701359dc28fa7b0916917961","--oneline","-n1"]
[2014-11-07 15:35:13 PST] read: git ["--git-dir=/repobase/.git","--work-tree=/repobase","log","refs/heads/git-annex..097d5b482d6856ce22814a0c2c5eee43e3e030e4","--oneline","-n1"]
[2014-11-07 15:35:13 PST] chat: git ["--git-dir=/repobase/.git","--work-tree=/repobase","cat-file","--batch"]
[2014-11-07 15:35:13 PST] read: git ["--git-dir=/repobase/.git","--work-tree=/repobase","ls-files","--cached","-z","--","annexedfile"]
copy annexedfile (gpg) (to smbremote...)
[2014-11-07 15:35:13 PST] chat: gpg ["--quiet","--trust-model","always","--batch","--passphrase-fd","11","--symmetric","--force-mdc","--no-textmode"]
95% 0.0 B/s 0s/mnt/annex/tmp/GPGHMACSHA1--a097a9b653d1facbe7d37d0e8f9f580261d9adef/GPGHMACSHA1--a097a9b653d1facbe7d37d0e8f9f580261d9adef: hClose: does not exist (Host is down)
failed
git-annex: copy: 1 failed
cw@ubuntu$
Thanks, cw
The relevant error message seems to be this:
Which doesn't point at gpg being the problem to me. git-annex is trying to close a file after writing it, and that's failing with EHOSTDOWN. Not a usual error code, but then you're using a network filesystem, which has many unusual failure modes.
The next step would probably be to strace git-annex to see what syscall is failing. Looks like it might be close(2), but then again hClose might be doing something else first.
It's possible that this has something to do with the SMB share not supporting some POSIX filesystem feature that git-annex uses. Lack of support for fcntl locking is a problem with NFS, dunno about SMB.
What would probably work better would be to set up a directory special remote on the SMB share. That requires a lot less from the filesystem than a full-fledged git remote does.
(Two parts of what you said don't make sense to me BTS. I see no evidence of it hanging in the transcript, so am unclear why you said it was hanging, as opposed to gracefully failing. And, you said the SMB remote was not configured to use encryption, but it's clearly encrypted in the transcript.)
Thanks so much. I'll try digging some more into the syscalls and see what I can figure out. We do have it set up as a directory remote.
To clarify, when I said it was "hanging" maybe that wasn't a good word choice. What I meant was that it very quickly gets to 95% or 100% progress, then freezes and does nothing for a while, before ultimately reporting the error and exiting.
Also I take back what I said about it not using encryption. It is. I was confused by the fact that there are unencrypted files on the disk, but it turns out they are all from a long time ago, perhaps before we started using encryption. Everything more recent is encrypted.
Thanks! -c
Yeah, it's close returning the error. As far as I can tell the calls involving that file are:
All the calls to write appear to succeed, followed by the call to close that fails. Maybe the previously failed ioctl command has something to do with it? I guess as was mentioned previously, there may be an operation the device doesn't support, but the weird thing is this used to work fine.
Sorry about the delay getting back to this.. It's great you were able to provide the strace.
I don't think that the ioctl is at fault. That seems to be something that's done by the IO layer when opening a handle. It does not involve locking.
Looks to me like the writes get buffered and it fails to flush to the SMB server on close.
is on the "SMB share" running something special ? ..like virus-scanner, quota, backup-in-progress
and.. smb like SAMBA or Windows ?
in theory you can do lots of funny stuff to get a smb share: sharing a samba which is a webdav mounted via nfs on a clamFS. (scary)