Recent comments posted to this site:

Here it is again:

# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log

(master_env_v152_py36) 13:01  [viral-ngs-benchmarks] $ git annex sync
On branch is-import-rabies200
Your branch is up to date with 'origin/is-import-rabies200'.


It took 6.38 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean
commit ok
remote: Enumerating objects: 2, done.
remote: Counting objects: 100% (2/2), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 0), reused 1 (delta 0), pack-reused 0
Unpacking objects: 100% (2/2), done.
From github.com:broadinstitute/viral-ngs-benchmarks
   456ce673f18..c365f519845  is-import-rabies200        -> origin/is-import-rabies200
   456ce673f18..c365f519845  synced/is-import-rabies200 -> origin/synced/is-import-rabies200
Updating 456ce673f18..c365f519845
Fast-forward
 viral-ngs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Already up to date.
pull origin ok
(master_env_v152_py36) 13:01  [viral-ngs-benchmarks] $ git annex sync
[is-import-rabies200 fe351a22089] git-annex in viral-ngs-benchmarks copy at /data/ilya-work/benchmarks/viral-ngs-benchmarks
 1 file changed, 1 insertion(+), 1 deletion(-)
commit ok
pull origin ok
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 959 bytes | 959.00 KiB/s, done.
Total 2 (delta 1), reused 0 (delta 0)
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
To github.com:broadinstitute/viral-ngs-benchmarks.git
   c365f519845..fe351a22089  is-import-rabies200 -> synced/is-import-rabies200
push origin ok
(master_env_v152_py36) 13:02  [viral-ngs-benchmarks] $ git annex sync
On branch is-import-rabies200
Your branch is up to date with 'origin/is-import-rabies200'.


It took 6.40 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean
commit ok
pull origin ok
(master_env_v152_py36) 13:04  [viral-ngs-benchmarks] $ uname -a
Linux ip-172-31-93-72.ec2.internal 4.14.133-113.112.amzn2.x86_64 #1 SMP Tue Jul 30 18:29:50 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
(master_env_v152_py36) 13:05  [viral-ngs-benchmarks] $ git annex version
git-annex version: 7.20190819-ge4cecf8
build flags: Assistant Webapp Pairing S3 WebDAV Inotify DBus DesktopNotify TorrentParser MagicMime Feeds Testsuite
dependency versions: aws-0.21.1 bloomfilter-2.0.1.0 cryptonite-0.25 DAV-1.3.3 feed-1.0.1.0 ghc-8.6.5 http-client-0.5.14 persistent-sql\
ite-2.9.3 torrent-10000.1.1 uuid-1.3.13 yesod-1.6.0
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_2\
24 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE\
2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224\
 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs hook external
operating system: linux x86_64
supported repository versions: 5 7
upgrade supported from repository versions: 0 1 2 3 4 5 6
local repository version: 5

(master_env_v152_py36) 13:05  [viral-ngs-benchmarks] $ git remote -vv
dnanexus
from-ilya-work
from-ilya-work-01
from-ilya-work-02
from-ilya-work-03
ilya-work       /data/ilya-work (fetch)
ilya-work       /data/ilya-work (push)
my-ldir
mygs
origin  git@github.com:broadinstitute/viral-ngs-benchmarks.git (fetch)
origin  git@github.com:broadinstitute/viral-ngs-benchmarks.git (push)
s3-viral-ngs-benchmarks-web
viral-ngs-benchmarks-s3

(master_env_v152_py36) 13:06  [viral-ngs-benchmarks] $ git annex info --fast
repository mode: indirect
trusted repositories: 0
semitrusted repositories: 11
        00000000-0000-0000-0000-000000000001 -- web
        00000000-0000-0000-0000-000000000002 -- bittorrent
        0928dfcc-4dbe-4c24-a5a0-ac05c4a2c829 -- ilya-work
        0f104e0a-7126-4a4a-a342-e1bb049643f0 -- [from-ilya-work-03]
        1a276c39-67e3-4098-b469-747ab29fb97e -- viral-ngs-benchmarks copy on /ilya
        380286ac-2e8f-4285-94da-406eca323411 -- [dnanexus]
        40d047cf-3548-42d4-bdb6-e1e89a83b6df -- viral-ngs-benchmarks in broad laptop vm clone
        449efa47-a0e1-4376-a17f-42c7a1f509d1 -- Benchmarks for viral-ngs, stored on Amazon S3 [viral-ngs-benchmarks-s3]
        6b9993bb-df07-4409-be5e-05c1cd528165 -- [mygs]
        7205b5c0-4475-47c1-bfc1-39f57a487ac4 -- Website displaying benchmark summaries, stored on AWS S3 [s3-viral-ngs-benchmarks-web]
        b11d667d-2f3a-432a-a167-d90a1b0559cf -- viral-ngs-benchmarks copy at /data/ilya-work/benchmarks/viral-ngs-benchmarks [here]
untrusted repositories: 1
        b9f389c1-2a37-4200-96d9-0f07d97b5417 -- [my-ldir]
transfers in progress: none
available local disk space: 427.31 gigabytes (+10 gigabytes reserved)


# End of transcript or log.
Comment by Ilya_Shlyakhter Tue Aug 20 17:08:45 2019
The hourly windows autobuild seems to be several days old?
Comment by Ilya_Shlyakhter Mon Aug 19 19:47:06 2019
One related improvement would be for git-annex to compute and store, as metadata, checksum-based keys corresponding to non-checksum-based keys, when git-annex sees the contents of the non-checksum-based key ( e.g. alternate keys for same content ). There is of course git-annex-migrate, but it requires manual invocation, clouds the commit history of the main git branch with commits that don't really change the content, and leads to either duplicate content in remotes or (if duplicates are dropped) inability to git-annex-get the contents of some past commits.
Comment by Ilya_Shlyakhter Fri Aug 16 16:06:29 2019

Confirmed; avoiding calling main has gotten the test suite to be able to run at least 500 iterations over 12 hours w/o failing. It had been failing in under 1 hour.

Comment by joey Fri Aug 16 15:03:11 2019

It is indeed detecting that the file it was sending appears to have been modified after the download started.

Probably the file has not really been modified. Instead the inode cache for it is somehow wrong.

An instrumented run had cached:

InodeCache (InodeCachePrim 21765306 20 (MTimeHighRes 1565905801.196804558s)
InodeCache (InodeCachePrim 22158907 20 (MTimeHighRes 1565905801.196804558s))

And after the download the file it was sending had:

InodeCache (InodeCachePrim 21765305 20 (MTimeHighRes 1565905802.380791792s))

Note that the test suite moves the same file from origin, and then moves it back, and a while after that get fails. So at least one of the cached inodes is for the old copy of the file. The other one is probably the work tree copy.

Still have not had luck reproducing outside the test suite with a tight loop of move --from and --to and get.

Hypothesis: The test suite uses Annex.main for many of its runs of git-annex, and so sqlite does not do whatever it normally does atexit. If flushing a change to the db file gets deferred for whatever reason, a later call to Annex.main might see old information.

If so, converting the test suite to run git-annex instead of Annex.main would avoid the problem.

Comment by joey Thu Aug 15 21:55:43 2019

I tested a git-annex made to say when the rsync process succeeded, and rsync is actually not failing.

So something in git-annex is silently making the transfer fail. Could be, for example, that it thinks the file was modified while it was being transferred.

Comment by joey Thu Aug 15 21:18:20 2019

The greatness of os-release (dash, not underscore) is that you can use the ID_LIKE field. From the os-release(5) man page:

"A space-separated list of operating system identifiers in the same syntax as the ID= setting. It should list identifiers of operating systems that are closely related to the local operating system in regards to packaging and programming interfaces [...]". If "debian" is the value of ID, or is contained in the space-separated value of ID_LIKE, then you don't need to know what the specific OS is.

Before os-release, it was common to check if the file /etc/debian_version existed, and if not, check for other distros using /etc/arch-release, /etc/fedora-release, and so on. Ubuntu historically included /etc/debian_version just so tools could identify the distro as debian-compatible, while identifying as Ubuntu only if you used lsb-release...

...

The site-functions directory should work everywhere, I think, since both the /usr/share and /usr/local/share equivalents are defined by the zsh build system (and I'm not aware of any distributors other than Debian which override the former). On my Arch system:

echo $fpath
/usr/local/share/zsh/site-functions /usr/share/zsh/site-functions /usr/share/zsh/functions/Calendar /usr/share/zsh/functions/Chpwd /usr/share/zsh/functions/Completion /usr/share/zsh/functions/Completion/Base /usr/share/zsh/functions/Completion/Linux /usr/share/zsh/functions/Completion/Unix /usr/share/zsh/functions/Completion/X /usr/share/zsh/functions/Completion/Zsh /usr/share/zsh/functions/Exceptions /usr/share/zsh/functions/Math /usr/share/zsh/functions/MIME /usr/share/zsh/functions/Misc /usr/share/zsh/functions/Newuser /usr/share/zsh/functions/Prompts /usr/share/zsh/functions/TCP /usr/share/zsh/functions/VCS_Info /usr/share/zsh/functions/VCS_Info/Backends /usr/share/zsh/functions/Zftp /usr/share/zsh/functions/Zle

Specifically, if you look at the configure.ac, it will:

  • take a configurable --enable-site-fndir (the packaging location), which defaults to /usr/share/zsh/site-functions but which Debian overrides (this is SITEFPATH_DIR in Src/init.c)
  • if enable-site-fndir did not already get defined to /usr/local/share/zsh/site-functions (the local sysadmin location), then guarantee it is included by defining it as $fixed_sitefndir and embedding it in the binary as FIXED_FPATH_DIR (Src/init.c)
  • additionally include anything configured via --enable-additional-fpath, because enable-site-fndir is a string rather than an array, and the additional-fpath can be a comma-separated array (ADDITIONAL_FPATH in Src/init.c)

It's regrettable that unlike bash-completion, there is no pkg-config file for this. :( That would at least allow the build system to build-depend on zsh in order to automatically grab this information. But ultimately, the only thing you need to worry about is which distribution-modified value to use for/instead of /usr/share/zsh/site-functions

In the case of curl, the make install only installs to site-functions, but provides an option --with-zsh-functions-dir=/usr/share/zsh/vendor-completions used at https://salsa.debian.org/debian/curl/blob/27e07a35cb9c727d6005c0afa291e2a3dc3bc5af/debian/rules#L20

Various other programs I have seen will often install to site-functions and let debian's packaging move it if needed, or try to check whether any of several known directories exist. Here is an elaborate detection mechanism which works for a proliferation of possible locations, and can be successfully packaged in a distro build recipe if you first mkdir -p "$DESTDIR/usr/share/zsh/site-functions" (or vendor-completions, depending): https://github.com/kovidgoyal/calibre/blob/7460a12a4bcd05efc822f7fe421626772e2f6575/src/calibre/linux.py#L192-L210

The downside of that is that the packager needs to know they should do this first. :(

Comment by eschwartz Thu Aug 15 21:15:09 2019

Debian also supports /usr/local/share/zsh/site-functions/, it's just that it's for local sysadmin use and not a place that packages should install files, so they added /usr/share/zsh/vendor-completions/ for that.

However, checking /etc/os_release probably entails keeping track of the name of every debian derivative, so I'd really rather not do that.

Also, even if the Makefile were changed to use /usr/local/share/zsh/site-functions/, it sounds as if that might not be a path that works universally either.

Surely there must be a way to ask zsh where to install completions to? But short of taking the first item from $fpath, which doesn't seem robust, I don't know how to. And zsh may not even be installed when building a package that should later integrate with zsh.

FWIW, I checked half a dozen packages like curl that include zsh completions, and none of them installed them with make install, they were just there to be installed manually. It seems zsh is making this too hard for software to bother integrating with it.

Comment by joey Thu Aug 15 18:52:51 2019

And even if we assume rsync never pre-allocates a file before receiving it, it probably does do some things at the end, like setting the final permissions and timestamp.

The permissions error looked like this:

get foo (from origin...)
SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
             20 100%    0.00kB/s    0:00:00  ^M             20 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/1)
(from origin...)
SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
             20 100%    0.00kB/s    0:00:00  ^M             20 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: open "/home/joey/src/git-annex/.t/tmprepo1103/.git/annex/tmp/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77" failed: Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3]

That looked as if the first rsync had gotten as far as removing the write bit from the file. Interestingly, the second rsync seems to have received the whole (small) file content before ever trying to open the file for write.

The only recent change I can think of involving rsync was the CoW probing change, but I don't see how that could possibly lead to this behavior.

And I'm not sure this is a new problem. The test suite has been intermittently failing for several months, around this same point. The failure did not include any useful error message, so I could not debug it, and I have IIRC done a few things to try to get the test suite to display an error message. Perhaps I succeeded.

The intermittent test suite failure looks like this:

copy: [adjusted/master(unlocked) 05b89a6] empty
adjust ok
copy foo (from origin...)
SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
20 100% 0.00kB/s 0:00:00 20 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1)

SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
20 100% 0.00kB/s 0:00:00 20 100% 0.00kB/s 0:00:00 (xfr#1, to-chk=0/1)
failed
copy: 1 failed
FAIL (1.12s)

I am not sure if it only happens when testing adjusted unlocked branches/ v7 unlocked files.

I've run git-annex get; git-annex drop in a tight loop for thousands of iterations on an adjusted unlocked branch, and can't reproduce the failure that way.

I've made git-annex display rsync's exit status when it's not 0 or 1, it has a lot of other exit statuses, so perhaps that will help track down how it's failing.

Comment by joey Thu Aug 15 18:00:42 2019
It's possible that I didn't paste the output correctly (it was more than a screenful, so was pasted in two parts). I'll post again if I see this happen again.
Comment by Ilya_Shlyakhter Wed Aug 14 18:40:49 2019