Recent changes to this wiki:
rename forum/Can__39__t_access_file_from_secondary_client.mdwn to forum/client_repositories_setup_problem.mdwn
diff --git a/doc/forum/Can__39__t_access_file_from_secondary_client.mdwn b/doc/forum/client_repositories_setup_problem.mdwn similarity index 100% rename from doc/forum/Can__39__t_access_file_from_secondary_client.mdwn rename to doc/forum/client_repositories_setup_problem.mdwn
diff --git a/doc/forum/Can__39__t_access_file_from_secondary_client.mdwn b/doc/forum/Can__39__t_access_file_from_secondary_client.mdwn new file mode 100644 index 0000000000..a7092c2b90 --- /dev/null +++ b/doc/forum/Can__39__t_access_file_from_secondary_client.mdwn @@ -0,0 +1,156 @@ +I'm trying to setup git-annex for syncing two clients using a transfer repository. All of that without the webapp UI. + +Here's the reproducible scenario with a bash script: + +```bash +#/usr/bin/env bash + +# Just a way to access the script's directory +cd "$(dirname "$0")" +DIR="$(pwd)" + +# Create the 1st client repository +mkdir $DIR/client1 +cd $DIR/client1 +git init && git annex init + +# Create the 2nd client repository +mkdir $DIR/client2 +cd $DIR/client2 +git init && git annex init + +# Create the transfer repository +mkdir $DIR/share +cd $DIR/share +git init && git annex init + +# Setup the remotes and groups for the transfer repository +cd $DIR/share +git remote add client1 $DIR/client1 +git remote add client2 $DIR/client1 +git annex group . transfer +git annex group client1 client +git annex group client2 client +git co -b main + +# Setup the remotes and groups for the 1st client repository. +cd $DIR/client1 +git remote add share $DIR/share +git annex group . client +git annex group share transfer +git co -b main + +# Setup the remotes and groups for the 2nd client repository. +cd $DIR/client2 +git remote add share $DIR/share +git annex group . client +git annex group share transfer +git co -b main + +# Run git-annex assistant for each repository +cd $DIR/client1 && git annex assistant +cd $DIR/client2 && git annex assistant +cd $DIR/share && git annex assistant + +# Add a single file to the 1st client. +cd $DIR/client1 +echo "My first file" >> file.txt +``` + +Result: + +client1: I see the auto-commit has been added for file.txt + +share: I get the following daemon logs: + +``` +(scanning...) (started...) +From /home/xxx/git-annex-scenarios/share-between-clients/client1 + * [new branch] git-annex -> client2/git-annex +(merging client2/git-annex into git-annex...) +From /home/xxx/git-annex-scenarios/share-between-clients/client1 + * [new branch] git-annex -> client1/git-annex + +merge: refs/remotes/client2/main - not something we can merge + +merge: refs/remotes/client2/synced/main - not something we can merge + +merge: refs/remotes/client1/main - not something we can merge + +merge: refs/remotes/client1/synced/main - not something we can merge +(merging synced/git-annex into git-annex...) +(recording state in git...) + +``` + +client2: I get the following daemon logs: + +``` +From /home/xxx/git-annex-scenarios/share-between-clients/share + * [new branch] git-annex -> share/git-annex +(merging share/git-annex into git-annex...) +(recording state in git...) + +merge: refs/remotes/share/main - not something we can merge + +merge: refs/remotes/share/synced/main - not something we can merge + +``` + +Then, I thought that maybe I needed to do an initial `git pull` for each repository. So I tried adding to the bash script the following lines: + +```bash +# Need to do this if there are no commits in the 'client2' and 'share' repositories. +# Or else, I'll get the following logs: +# +# merge: refs/remotes/share/main - not something we can merge +# merge: refs/remotes/share/synced/main - not something we can merge +sleep 3; +cd $DIR/share +git pull client1 main +sleep 3; +cd $DIR/client2 +git pull share main +``` + +But I'm still getting the same error: + +``` +(scanning...) (started...) +From /home/xxx/git-annex-scenarios/share-between-clients/share + * [new branch] git-annex -> share/git-annex +(merging share/git-annex into git-annex...) +(recording state in git...) + +merge: refs/remotes/share/main - not something we can merge + +merge: refs/remotes/share/synced/main - not something we can merge +(recording state in git...) +To /home/kolam/git-annex-scenarios/share-between-clients/share + + 28079ec...ca3c481 git-annex -> synced/git-annex (forced update) +Everything up-to-date +To /home/kolam/git-annex-scenarios/share-between-clients/share + + 28079ec...ca3c481 git-annex -> synced/git-annex (forced update) +``` + +However, even though I have that error, `file.txt` now appears in `client2`. +But, the content of `file.txt` is: + +``` +/annex/objects/SHA256E-s14--14b99b7ab1e9777f7e1c2b482fe2cd95653c7cf35f +459ef0b15bd0d75b2245c9.txt +``` + +and that link doesn't exist in my filesystem. +Running `git annex whereis file.txt` in `client2` gives me: + +``` +whereis file.txt (0 copies) failed +whereis: 1 failed +``` + +So my questions are: + +* did I miss something in the steps required to setup the repositories? +* is there some documentation outlining the steps to do so without the webapp? +* how can we enhance the UX for that scenario with better messages?
update
diff --git a/doc/todo/distributed_migration.mdwn b/doc/todo/distributed_migration.mdwn index ad11ada8f9..c605b474a1 100644 --- a/doc/todo/distributed_migration.mdwn +++ b/doc/todo/distributed_migration.mdwn @@ -29,9 +29,23 @@ and use a lot of bandwidth. Probably not a good idea. Alternatively, the old key could be left on a special remote, but update the location log for the special remote to say it has the new key, and have git-annex request the old key when it wants to get (or checkpresent) -the content from the special remote. This would need the mapping to be -cheap enough to query that it won't signficantly slow down accessing a -special remote. +the new key from the special remote. (Being careful to verify the content +using the new key when downloading from the old key on the special remote.) +This would need the mapping to be cheap enough to query that it won't +signficantly slow down accessing a special remote. + +> A complication is that the special remote could end up containing both +> old and new key. So it would need to fall back from one to the other for +> get and checkpresent. Which will double the number of round trips to the +> special remote if it tries the wrong one first. +> +> And how to handle dropping from a special remote then? It would need to +> update the location log for both old key and new key when dropping the +> old key or the new key. But when the special remote stores both the old +> and new key on it separately, dropping one should not change the location +> log for the other. So it seems it would need to drop the key, then check +> if the other key is stored there and if not, update the location log to +> indicate it's not present. Rather than a dedicated command that users need to remember to run, distributed migration could be done automatically when merging a git-annex diff --git a/doc/todo/distributed_migration/comment_1_8734d30aa0c1cb27dce81a0277d24948._comment b/doc/todo/distributed_migration/comment_1_8734d30aa0c1cb27dce81a0277d24948._comment new file mode 100644 index 0000000000..180e937c2d --- /dev/null +++ b/doc/todo/distributed_migration/comment_1_8734d30aa0c1cb27dce81a0277d24948._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-12-01T19:00:41Z" + content=""" +Even if the stuff with special remotes turned out to be too complicated to +implement, `git-annex migrate --update` would be useful for some users. +So it's worth implementing the mapping and then incrementally implementing +these ideas. +"""]]
comment
diff --git a/doc/forum/Revisiting_migration_and_multiple_keys/comment_5_93b85fbe5c36e986cf7c1fc87070c04c._comment b/doc/forum/Revisiting_migration_and_multiple_keys/comment_5_93b85fbe5c36e986cf7c1fc87070c04c._comment new file mode 100644 index 0000000000..b42947b7b8 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_5_93b85fbe5c36e986cf7c1fc87070c04c._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2023-12-01T18:42:07Z" + content=""" +I've spent a while thinking about this and came up with the ideas at +[[todo/distributed_migration]]. + +I think that probably would handle your use case. +"""]] diff --git a/doc/todo/alternate_keys_for_same_content/comment_10_22ff867952875856b20339a8829c5944._comment b/doc/todo/alternate_keys_for_same_content/comment_10_22ff867952875856b20339a8829c5944._comment new file mode 100644 index 0000000000..47ce3cdfbb --- /dev/null +++ b/doc/todo/alternate_keys_for_same_content/comment_10_22ff867952875856b20339a8829c5944._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: simpler proposal""" + date="2023-12-01T18:00:20Z" + content=""" +About the idea of recording a checksum of the content of a URL or WORM key, +without migrating to a SHA key, that does seem worth considering. (And +maybe was the original idea of this todo really..) + +If that were implemented, it would be necessary for more than one checksum +to be able to be recorded for a given URL key. Because different +clones might get different content from the URL and each add its checksum. + +So, this would not be as strong an assurance as using a SHA key that you're +referring to a specific peice of data. It would be useful to protect +against bit rot, but not as a way to pin a file to a particular version. +Which is often something one does want to do in a git repository! + +I do think that implementing that would be a lot simpler. And it would +only affect performance when verifying the content of URL or WORM keys, +when it would need to look up the checksum in the git-annex branch. +"""]] diff --git a/doc/todo/alternate_keys_for_same_content/comment_11_3323eff3d94d366595bf2b7e78c01dce._comment b/doc/todo/alternate_keys_for_same_content/comment_11_3323eff3d94d366595bf2b7e78c01dce._comment new file mode 100644 index 0000000000..154fa5a8b5 --- /dev/null +++ b/doc/todo/alternate_keys_for_same_content/comment_11_3323eff3d94d366595bf2b7e78c01dce._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2023-12-01T18:41:30Z" + content=""" +See [[distributed_migration]]... +"""]] diff --git a/doc/todo/distributed_migration.mdwn b/doc/todo/distributed_migration.mdwn new file mode 100644 index 0000000000..ad11ada8f9 --- /dev/null +++ b/doc/todo/distributed_migration.mdwn @@ -0,0 +1,47 @@ +Currently `git-annex migrate` only hard links the objects in the local +repo. This leaves other clones without the new keys' objects unless +they re-download them, or unless the same migrate command is +re-run, in the same tree, on each clone. + +It would be good to support distributed migration, so that whatever +migration is done in one repo is reflected in the other repos. + +This needs some way to store, in the git repo, a mapping between the old +key and the new key it has been migrated to. (I investigated +how much space that would need in the git repo, in +[this comment](https://git-annex.branchable.com/todo/alternate_keys_for_same_content/#comment-917eba0b2d1637236c5d900ecb5d8da0).) +The mapping might be communicated via the git branch but be locally stored +in a sqlite database to make querying it fast. + +Once that mapping is available, one simple way to use it would be a +git-annex command that updates the local repo to reflect migrations that +have happened elsewhere. It would not touch the HEAD branch, but would +just hardlink object files from the old to new key, and update the location +log for the new key to indicate the content is present in the repo. +This command could be something like `git-annex migrate --update`. + +That wouldn't be entirely sufficient though, because special remotes from +pre-migration will be populated with the old keys. A similar command could +upload the new content to special remotes, but that would double the data +stored in a special remote (or drop the old keys from them), +and use a lot of bandwidth. Probably not a good idea. + +Alternatively, the old key could be left on a special remote, but update +the location log for the special remote to say it has the new key, +and have git-annex request the old key when it wants to get (or checkpresent) +the content from the special remote. This would need the mapping to be +cheap enough to query that it won't signficantly slow down accessing a +special remote. + +Rather than a dedicated command that users need to remember to run, +distributed migration could be done automatically when merging a git-annex +branch that adds migration information. Just hardlink object files and +update the location log for the local repo and for available special +remotes. + +It would be possible to avoid updating the location log, but then all +location log queries would have to check the migration mapping. It would be +hard to make that fast enough. Consider `git-annex find --in foo`, which +queries the location log for each file. + +--[[Joey]]
git-annex branch size when storing migration information
Sponsored-by: Jack Hill on Patreon
Sponsored-by: Jack Hill on Patreon
diff --git a/doc/todo/alternate_keys_for_same_content/comment_9_42d240bbfc6ab858219ffa0f873c3eb4._comment b/doc/todo/alternate_keys_for_same_content/comment_9_42d240bbfc6ab858219ffa0f873c3eb4._comment new file mode 100644 index 0000000000..5b185f2ef6 --- /dev/null +++ b/doc/todo/alternate_keys_for_same_content/comment_9_42d240bbfc6ab858219ffa0f873c3eb4._comment @@ -0,0 +1,50 @@ +[[!comment format=mdwn + username="joey" + subject="""git-annex branch size when storing migration information""" + date="2023-12-01T16:10:11Z" + content=""" +I did a small experiment to gauge how much the git repo size would grow if +migration were recorded in log files in the git-annex branch. + +In my experiment, I started with 1000 files using sha256. The size of the +git objects (after repack by git gc --aggressive) was 0.5 mb. I then +migrated them to sha512, which increased the size of git objects to 1.1 mb +(after repacking). + +Then I recorded in the git-annex branch additional log files for each of +the sha512 keys that contained the corresponding sha256 key. That grew the +git objects to 1.4 mb after repacking. + +This was a little disappointing. I'd hoped that repacking would avoid +duplication of the sha256 keys, which are both in the log files I wrote +and are used as filenames. But the data I wrote to the logs is only 75 kb +total, and git grew 4x that. + +I tried the same thing except instead of separate log files I added to git +one log file that contained pairs of sha256 and sha512 keys. That log file +was 213 kb and adding it to the git repo grew it by 102 kb. So there was +some compression there, but less than I would have hoped, and not much +better than just gzip -9 of the log file (113 kb). Of course putting all +the migration information in a single file like this would add a lot of +complexity to accessing it. + +So adding this information to the git-annex branch would involve at best +around a 16% overhead, which is a surprising amount. + +(It would be possible to make `git-annex forget --drop-dead` remove the +information about old migrated keys if they later get marked as dead, and +so regain the space.) + +This is also rather redundant information to store in git, since most +of the time when file foo has been migrated, the old key can be determined +by looking at `git log foo`. Not always of course because foo might have +been renamed after migration, for example. + +Another way to store migration information in the git-annex branch would to +be graft in the pre-migration tree and the post-migration tree. Diffing +those two trees would show what migrated, and most of the time this would +use almost no additional space in git, because the user will have committed +both those trees anyway, or something very close to them. But it would be +more expensive to extract the migration information then, and this would +need a local cache of migrations to be built up from examining those diffs.. +"""]]
Added a comment: Another possibility to make --fast faster?
diff --git a/doc/bugs/__96__git_annex_info__96___hangs_with_git_special_remote/comment_3_8f46a9d4a7ceae80e378149d88dd1f19._comment b/doc/bugs/__96__git_annex_info__96___hangs_with_git_special_remote/comment_3_8f46a9d4a7ceae80e378149d88dd1f19._comment new file mode 100644 index 0000000000..79e4308e73 --- /dev/null +++ b/doc/bugs/__96__git_annex_info__96___hangs_with_git_special_remote/comment_3_8f46a9d4a7ceae80e378149d88dd1f19._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="Another possibility to make --fast faster?" + date="2023-12-01T11:50:25Z" + content=""" +How about having `git annex info --fast` skip this lookup step for remotes where it doesn't know the UUID of yet? + +`git annex info` can already be quite slow in the other steps it takes (counting files, disk space, etc.) in large repos, so it is not so much of a surprise that it hangs a while by default. But if `--fast` would make it actually fast by staying completely offline (right?) and skipping the slow local counting steps, this would be logical. + +"""]]
Added a comment
diff --git a/doc/bugs/__96__git_annex_info__96___hangs_with_git_special_remote/comment_2_9274223b32601ead9a508aa9852e4933._comment b/doc/bugs/__96__git_annex_info__96___hangs_with_git_special_remote/comment_2_9274223b32601ead9a508aa9852e4933._comment new file mode 100644 index 0000000000..2724e16158 --- /dev/null +++ b/doc/bugs/__96__git_annex_info__96___hangs_with_git_special_remote/comment_2_9274223b32601ead9a508aa9852e4933._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="Atemu" + avatar="http://cdn.libravatar.org/avatar/86b8c2d893dfdf2146e1bbb8ac4165fb" + subject="comment 2" + date="2023-12-01T10:21:09Z" + content=""" +I've had an idea on this: Why not only update UUIDs on (manual) sync/fetch? + +This would be in line with how git interacts with regular remotes otherwise too; always requiring an explicit fetch to update its info. + +To me it just violates the principle of least surprise to have git-annex try and reach remotes when running something as simple as `info`. +"""]]
Added a comment
diff --git a/doc/forum/Revisiting_migration_and_multiple_keys/comment_4_7d367f38250a4a3454299170700d5c6c._comment b/doc/forum/Revisiting_migration_and_multiple_keys/comment_4_7d367f38250a4a3454299170700d5c6c._comment new file mode 100644 index 0000000000..dc7ac4b638 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_4_7d367f38250a4a3454299170700d5c6c._comment @@ -0,0 +1,58 @@ +[[!comment format=mdwn + username="unqueued" + avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d" + subject="comment 4" + date="2023-12-01T02:09:07Z" + content=""" +@joey + +It isn't a huge problem, but I keep coming back to it. The only workflow I still use where this comes up is for my filesharing assets repo. I just ended up leaving it as MD5E, because much of it is downstream from gdrive shares, and I almost never have all of the content in one place at a time. + + +This is one of the scripts I sometimes use, although I wrote it awhile ago before I found out about git-annex-filter-branch +<https://gist.github.com/unqueued/06b5a5c14daa8224a659c5610dce3132> + +But I mostly rely on splitting off subset repos with no history, processing them in some way, and then re-absorbing them back into a larger repo. + +I actually started a repo that would track new builds for Microsoft Dev VMs: <https://github.com/unqueued/official-microsoft-vms-annex> + +But for my bigger repos, I almost never have all of the data in the same place at the same time. + + +@nobodyinperson + +> Hi! If I understand you correctly, your problem is that you often migrate keys to another backend, and there are situations involving merges of repos far away from each other in history that cause merge conflicts, which results in the dead old pre-migration key being reintroduced? + +Well, there aren't any conflicts, they just get silently reintroduced, which isn't the end of the world, especially if they get marked as dead. But they clutter the git-annex branch, and over time, with large repos, it may become a problem. There isn't any direct relationship between the previous key and the migrated key. + +So, if I have my `linux_isos` repo, and I do git-annex-migrate on it, but say only isos for the year 2021 are in my specific repo at that moment, then the symlinks will be updated and the new sha256 log log files will be added to my git-annex branch. + +And if you sync with another repo that also has the same files in the backend, they will still be in the repo, but just inaccessible. + +And I feel like there's enough information to efficiently track the lifecycle of a key. + + +I'm exhuming my old scripts and cleaning them up, but essentially, you can get everything you need to assemble an MD5E annex from a Google Drive share by doing `rclone lsjson -R --hash rclone-drive-remote:` + +And to get the keys, you could pipe it into something like this: +`perl -MJSON::PP -ne 'BEGIN { $/ = undef; $j = decode_json(<>); } foreach $e (@{$j}) { next if $e->{\"IsDir\"} || !exists $e->{\"Hashes\"}; print \"MD5-s\" . $e->{\"Size\"} . \"--\" . $e->{\"Hashes\"}->{\"MD5\"} . \"\t\" . $e->{\"Path\"} . \"\n\"; }' ` + +That's just part of a project I have with a Makefile that indexes, assembles and then optionally re-serves an rclone gdrive remote. I will try to post it later tonight. It was just a project I made for fun. + +And there are plenty of other places where you can get enough info to assemble a repo ahead of time, and essentially turn it into a big queue. + + +You can find all sorts of interesting things to annex. + +https://old.reddit.com/r/opendirectories sometimes has interesting stuff. + +Here are some public Google Drive shares: + +* [Bibliotheca Anonoma](https://drive.google.com/drive/folders/0B7WYx7u6HJh_Z3FjU2F0NFNyQWs) +* [Esoteric Library](https://drive.google.com/drive/folders/0B0UEkmH7vYJZRWxfSmdRbFJGNWc) +* [EBookDroid - Google Drive](https://drive.google.com/drive/folders/0B6y-A-HTzyBiYnpIRHMzR1pueFU) +* [The 510 Archives - Google Drive](https://drive.google.com/drive/folders/0ByCvxnHNk90SMzIxZWIwYWYtYzljNy00ZGU2LWI3ODctYzRjMmE0MGY3NTA1) +* [Some ebooks](https://drive.google.com/drive/folders/1SReXFt16DYpTdFsSsT5Nzkj33VAYOQLa) + + +"""]]
comment
diff --git a/doc/forum/Revisiting_migration_and_multiple_keys.mdwn b/doc/forum/Revisiting_migration_and_multiple_keys.mdwn index 74f99d97b5..13e009d2fe 100644 --- a/doc/forum/Revisiting_migration_and_multiple_keys.mdwn +++ b/doc/forum/Revisiting_migration_and_multiple_keys.mdwn @@ -1,7 +1,7 @@ I have several workflows that rely on regular key migrations, and I would love to explore some ways that migrating keys could be improved. I see there has already been discussion about this: -https://git-annex.branchable.com/todo/alternate_keys_for_same_content/ +[[todo/alternate_keys_for_same_content]] I don't know how often this comes up, but it comes up a lot for me. I have several data sources that I regularly index and mirror by constructing keys based on md5 and size, and assemble a repo with the known filename. (gdrive, many software distribution sites, and others). diff --git a/doc/forum/Revisiting_migration_and_multiple_keys/comment_2_b5545aba08c7af2f8f56caba66232c41._comment b/doc/forum/Revisiting_migration_and_multiple_keys/comment_2_b5545aba08c7af2f8f56caba66232c41._comment new file mode 100644 index 0000000000..5c15c5c516 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_2_b5545aba08c7af2f8f56caba66232c41._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-11-30T20:49:53Z" + content=""" +There seem to be difficulties with both performance and with security in +storing information in the git-annex branch to declare that one key +is replaced by another one. + +I wonder if there are any pain points that could be handled better without +recording such information in the git-annex branch. What do your helper +scripts do? +"""]] diff --git a/doc/forum/Revisiting_migration_and_multiple_keys/comment_3_a712ec9b616ca45976154fd0c98ae1c4._comment b/doc/forum/Revisiting_migration_and_multiple_keys/comment_3_a712ec9b616ca45976154fd0c98ae1c4._comment new file mode 100644 index 0000000000..8fa84204a5 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_3_a712ec9b616ca45976154fd0c98ae1c4._comment @@ -0,0 +1,17 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2023-11-30T20:55:02Z" + content=""" +I wonder if it would suffice to have a way for git-annex to record that key +A migrated to B, but not treat that as meaning that it should get B's +content when it wants A, or vice-versa. + +Instead, when a repository learns that A was elsewhere migrated to B, it +could hardlink its content for A to B and update the location log for +B to say is has a copy. The same as if `git-annex migrate` were run locally. +(It could even hash the content and verify it got B.) + +That wouldn't help if a special remote has the content of A, and +git-annex wants to get the content of B. +"""]]
comment
diff --git a/doc/todo/alternate_keys_for_same_content/comment_8_4b16c48a2d9f4926d63f6ab54fe801d3._comment b/doc/todo/alternate_keys_for_same_content/comment_8_4b16c48a2d9f4926d63f6ab54fe801d3._comment new file mode 100644 index 0000000000..cea7fe2855 --- /dev/null +++ b/doc/todo/alternate_keys_for_same_content/comment_8_4b16c48a2d9f4926d63f6ab54fe801d3._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2023-11-30T20:43:53Z" + content=""" +I think Ilya Shlyakhter gets to a fundamental problem in his comment above. +Any way that git-annex stores data about an alternate key that is recorded +in git, allows anyone to spoof bad data. + +For example, if I have a SHA256 key stored in git-annex, it would be a bad +security hole if I fetched from Ilya's repository and suddenly git-annex +was willing to accept some MD5 key as being the same content as my SHA256 +key. Even if the two keys had the same content currently, that MD5 key can +be collision attacked later. + +So there would need to be a direction in which key upgrades were allowed. +Which is fine for `WORM -> SHA256`, but less clear for `SHA1 -> SHA256` +and much less clear for other pairs of modern hashes. +"""]]
copy/move --from-anywhere --to remote
Implementation was simple because it's equivilant to
--from=foo --to remote for each other remote, followed by
--to remote when there's a local copy.
(Or, in the edge case of --from-anywhere --to=here,
it's the same as --to=here.)
Note that, when the local repo does not have a copy,
fromToPerform gets it from a remote, sends it to the destination,
and drops the local copy. Another call to that for a second remote
will notice that the dest now has a copy, and simply drop from the
second remote, avoiding a second transfer.
Also note that, when numcopies doesn't allow dropping it from
everywhere, it will drop it from the cheapest remotes first
(maybe not ideal) up to more expensive remotes, and finally from the local
repo. So the local repo will generally end up holding a copy. Maybe not
ideal in all cases either, but it seems no worse to do that than to end up
with a copy undropped from a remote.
And I'm not entirely happy with the output, eg:
copy bigfile (from r3...) ok
copy bigfile ok
That makes sense if you think of the second line as being
the same as what is output by `git-annex copy bigfile --to bar`,
but it's less clear in this context. Maybe add "(from here...)"?
Also the --json output doesn't have a machine-readable field for
the "from" uuid, and maybe it should?
Sponsored-by: Dartmouth College's DANDI project
Implementation was simple because it's equivilant to
--from=foo --to remote for each other remote, followed by
--to remote when there's a local copy.
(Or, in the edge case of --from-anywhere --to=here,
it's the same as --to=here.)
Note that, when the local repo does not have a copy,
fromToPerform gets it from a remote, sends it to the destination,
and drops the local copy. Another call to that for a second remote
will notice that the dest now has a copy, and simply drop from the
second remote, avoiding a second transfer.
Also note that, when numcopies doesn't allow dropping it from
everywhere, it will drop it from the cheapest remotes first
(maybe not ideal) up to more expensive remotes, and finally from the local
repo. So the local repo will generally end up holding a copy. Maybe not
ideal in all cases either, but it seems no worse to do that than to end up
with a copy undropped from a remote.
And I'm not entirely happy with the output, eg:
copy bigfile (from r3...) ok
copy bigfile ok
That makes sense if you think of the second line as being
the same as what is output by `git-annex copy bigfile --to bar`,
but it's less clear in this context. Maybe add "(from here...)"?
Also the --json output doesn't have a machine-readable field for
the "from" uuid, and maybe it should?
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index 3fdedb9023..de7bca3074 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -2,6 +2,7 @@ git-annex (10.20231130) UNRELEASED; urgency=medium * Make git-annex get/copy/move --from foo override configuration of remote.foo.annex-ignore, as documented. + * Support git-annex copy/move --from-anywhere --to remote. -- Joey Hess <id@joeyh.name> Thu, 30 Nov 2023 14:48:12 -0400 diff --git a/CmdLine/GitAnnex/Options.hs b/CmdLine/GitAnnex/Options.hs index b5170f3131..b3c21aeece 100644 --- a/CmdLine/GitAnnex/Options.hs +++ b/CmdLine/GitAnnex/Options.hs @@ -162,40 +162,55 @@ parseToOption = strOption <> completeRemotes ) +parseFromAnywhereOption :: Parser Bool +parseFromAnywhereOption = switch + ( long "from-anywhere" + <> help "from any remote" + ) + parseRemoteOption :: Parser RemoteName parseRemoteOption = strOption ( long "remote" <> metavar paramRemote <> completeRemotes ) --- | From or to a remote, or both, or a special --to=here +-- | --from or --to a remote, or both, or a special --to=here, +-- or --from-anywhere --to remote. data FromToHereOptions = FromOrToRemote FromToOptions | ToHere | FromRemoteToRemote (DeferredParse Remote) (DeferredParse Remote) + | FromAnywhereToRemote (DeferredParse Remote) parseFromToHereOptions :: Parser (Maybe FromToHereOptions) parseFromToHereOptions = go <$> optional parseFromOption <*> optional parseToOption + <*> parseFromAnywhereOption where - go (Just from) (Just to) = Just $ FromRemoteToRemote + go _ (Just to) True = Just $ FromAnywhereToRemote + (mkParseRemoteOption to) + go (Just from) (Just to) _ = Just $ FromRemoteToRemote (mkParseRemoteOption from) (mkParseRemoteOption to) - go (Just from) Nothing = Just $ FromOrToRemote + go (Just from) Nothing _ = Just $ FromOrToRemote (FromRemote $ mkParseRemoteOption from) - go Nothing (Just to) = Just $ case to of + go Nothing (Just to) _ = Just $ case to of "here" -> ToHere "." -> ToHere _ -> FromOrToRemote $ ToRemote $ mkParseRemoteOption to - go Nothing Nothing = Nothing + go Nothing Nothing _ = Nothing instance DeferredParseClass FromToHereOptions where - finishParse (FromOrToRemote v) = FromOrToRemote <$> finishParse v + finishParse (FromOrToRemote v) = + FromOrToRemote <$> finishParse v finishParse ToHere = pure ToHere - finishParse (FromRemoteToRemote v1 v2) = FromRemoteToRemote - <$> finishParse v1 - <*> finishParse v2 + finishParse (FromRemoteToRemote v1 v2) = + FromRemoteToRemote + <$> finishParse v1 + <*> finishParse v2 + finishParse (FromAnywhereToRemote v) = + FromAnywhereToRemote <$> finishParse v -- Options for acting on keys, rather than work tree files. data KeyOptions diff --git a/Command/Copy.hs b/Command/Copy.hs index 88d645a693..67971af2f4 100644 --- a/Command/Copy.hs +++ b/Command/Copy.hs @@ -69,6 +69,7 @@ seek' o fto = startConcurrency (Command.Move.stages fto) $ do FromOrToRemote (ToRemote _) -> Just True ToHere -> Just False FromRemoteToRemote _ _ -> Nothing + FromAnywhereToRemote _ -> Nothing , usesLocationLog = True } keyaction = Command.Move.startKey fto Command.Move.RemoveNever @@ -84,12 +85,13 @@ start o fto si file key = stopUnless shouldCopy $ | autoMode o = want <||> numCopiesCheck file key (<) | otherwise = return True want = case fto of - FromOrToRemote (ToRemote dest) -> - (Remote.uuid <$> getParsed dest) >>= checkwantsend + FromOrToRemote (ToRemote dest) -> checkwantsend dest FromOrToRemote (FromRemote _) -> checkwantget ToHere -> checkwantget - FromRemoteToRemote _ dest -> - (Remote.uuid <$> getParsed dest) >>= checkwantsend - - checkwantsend = wantGetBy False (Just key) (AssociatedFile (Just file)) + FromRemoteToRemote _ dest -> checkwantsend dest + FromAnywhereToRemote dest -> checkwantsend dest + + checkwantsend dest = + (Remote.uuid <$> getParsed dest) >>= + wantGetBy False (Just key) (AssociatedFile (Just file)) checkwantget = wantGet False (Just key) (AssociatedFile (Just file)) diff --git a/Command/Move.hs b/Command/Move.hs index 59bdf9ecad..0bc707df85 100644 --- a/Command/Move.hs +++ b/Command/Move.hs @@ -81,6 +81,7 @@ seek' o fto = startConcurrency (stages fto) $ do FromOrToRemote (ToRemote _) -> Just True ToHere -> Nothing FromRemoteToRemote _ _ -> Nothing + FromAnywhereToRemote _ -> Nothing , usesLocationLog = True } keyaction = startKey fto (removeWhen o) @@ -91,6 +92,7 @@ stages (FromOrToRemote (FromRemote _)) = transferStages stages (FromOrToRemote (ToRemote _)) = commandStages stages ToHere = transferStages stages (FromRemoteToRemote _ _) = transferStages +stages (FromAnywhereToRemote _) = transferStages start :: FromToHereOptions -> RemoveWhen -> SeekInput -> RawFilePath -> Key -> CommandStart start fromto removewhen si f k = start' fromto removewhen afile si k ai @@ -118,6 +120,9 @@ start' fromto removewhen afile si key ai = src' <- getParsed src dest' <- getParsed dest fromToStart removewhen afile key ai si src' dest' + FromAnywhereToRemote dest -> do + dest' <- getParsed dest + fromAnywhereToStart removewhen afile key ai si dest' describeMoveAction :: RemoveWhen -> String describeMoveAction RemoveNever = "copy" @@ -353,6 +358,30 @@ fromToStart removewhen afile key ai si src dest = then not <$> expectedPresent dest key else return True +fromAnywhereToStart :: RemoveWhen -> AssociatedFile -> Key -> ActionItem -> SeekInput -> Remote -> CommandStart +fromAnywhereToStart removewhen afile key ai si dest = + stopUnless somethingtodo $ do + u <- getUUID + if u == Remote.uuid dest + then toHereStart removewhen afile key ai si + else startingNoMessage (OnlyActionOn key ai) $ do + rs <- filter (/= dest) + <$> Remote.keyPossibilities (Remote.IncludeIgnored False) key + forM_ rs $ \r -> + includeCommandAction $ + starting (describeMoveAction removewhen) ai si $ + fromToPerform r dest removewhen key afile + whenM (inAnnex key) $ + void $ includeCommandAction $ + toStart removewhen afile key ai si dest + next $ return True + where + somethingtodo = do + fast <- Annex.getRead Annex.fast + if fast && removewhen == RemoveNever + then not <$> expectedPresent dest key + else return True + {- When there is a local copy, transfer it to the dest, and drop from the src. - - When the dest has a copy, drop it from the src. diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_6_c374eb44ea08f220dbcce5ecb88403fb._comment b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_6_c374eb44ea08f220dbcce5ecb88403fb._comment new file mode 100644 index 0000000000..38e8ea92d2 --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_6_c374eb44ea08f220dbcce5ecb88403fb._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2023-11-30T18:26:30Z" + content=""" +I like the idea of `copy --from-anywhere --to=remote` and just +use the lowest cost remote (when not in local repo). Like `git-annex get` +and `git-annex copy --to=here`. + +Hmm, if there's a remote that is too expensive to want to use in such a +copy, it would be possible to use `-c remote.foo.annex-ignore=true` +to make it avoid using that remote. As can also be done in the case of +`git-annex get`, although that was not documented well. + +I've implemented --from-anywhere.. +"""]] diff --git a/doc/git-annex-copy.mdwn b/doc/git-annex-copy.mdwn index 57905b672c..add7a96f06 100644 --- a/doc/git-annex-copy.mdwn +++ b/doc/git-annex-copy.mdwn (Diff truncated)
fix --from overriding annex-ignore
Make git-annex get/copy/move --from foo override configuration of
remote.foo.annex-ignore, as documented.
This already worked for remotes supporting hasKeyCheap. For others though,
git-annex copy --from foo would silently not do anything, while
git-annex copy --to foo would use the annex-ignored remote.
Also improved the annex-ignore docs, to reflect that `git-annex get`
without --from will skip using annex-ignored remotes, for example.
Sponsored-by: Dartmouth College's DANDI project
Make git-annex get/copy/move --from foo override configuration of
remote.foo.annex-ignore, as documented.
This already worked for remotes supporting hasKeyCheap. For others though,
git-annex copy --from foo would silently not do anything, while
git-annex copy --to foo would use the annex-ignored remote.
Also improved the annex-ignore docs, to reflect that `git-annex get`
without --from will skip using annex-ignored remotes, for example.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/Annex/NumCopies.hs b/Annex/NumCopies.hs index a3c6f92dcd..1d8722a3d4 100644 --- a/Annex/NumCopies.hs +++ b/Annex/NumCopies.hs @@ -317,7 +317,7 @@ pluralCopies _ = "copies" verifiableCopies :: Key -> [UUID] -> Annex ([UnVerifiedCopy], [VerifiedCopy]) verifiableCopies key exclude = do locs <- Remote.keyLocations key - (remotes, trusteduuids) <- Remote.remoteLocations locs + (remotes, trusteduuids) <- Remote.remoteLocations (Remote.IncludeIgnored False) locs =<< trustGet Trusted untrusteduuids <- trustGet UnTrusted let exclude' = exclude ++ untrusteduuids diff --git a/Assistant/TransferQueue.hs b/Assistant/TransferQueue.hs index d2d245b7b1..571899bb6d 100644 --- a/Assistant/TransferQueue.hs +++ b/Assistant/TransferQueue.hs @@ -93,7 +93,7 @@ queueTransfersMatching matching reason schedule k f direction filter (\r -> not (inset s r || Remote.readonly r)) (syncDataRemotes st) where - locs = S.fromList . map Remote.uuid <$> Remote.keyPossibilities k + locs = S.fromList . map Remote.uuid <$> Remote.keyPossibilities (Remote.IncludeIgnored False) k inset s r = S.member (Remote.uuid r) s gentransfer r = Transfer { transferDirection = direction diff --git a/CHANGELOG b/CHANGELOG index e848250109..3fdedb9023 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,3 +1,10 @@ +git-annex (10.20231130) UNRELEASED; urgency=medium + + * Make git-annex get/copy/move --from foo override configuration of + remote.foo.annex-ignore, as documented. + + -- Joey Hess <id@joeyh.name> Thu, 30 Nov 2023 14:48:12 -0400 + git-annex (10.20231129) upstream; urgency=medium * Fix bug in git-annex copy --from --to that skipped files that were diff --git a/Command/CheckPresentKey.hs b/Command/CheckPresentKey.hs index f3b4ef921c..efefc71aeb 100644 --- a/Command/CheckPresentKey.hs +++ b/Command/CheckPresentKey.hs @@ -51,7 +51,7 @@ check :: String -> Maybe Remote -> Annex Result check ks mr = case mr of Just r -> go Nothing [r] Nothing -> do - mostlikely <- Remote.keyPossibilities k + mostlikely <- Remote.keyPossibilities (Remote.IncludeIgnored False) k otherremotes <- flip Remote.remotesWithoutUUID (map Remote.uuid mostlikely) <$> remoteList diff --git a/Command/Get.hs b/Command/Get.hs index 7d3d4a2ef1..2dd48456f4 100644 --- a/Command/Get.hs +++ b/Command/Get.hs @@ -87,7 +87,8 @@ perform key afile = stopUnless (getKey key afile) $ {- Try to find a copy of the file in one of the remotes, - and copy it to here. -} getKey :: Key -> AssociatedFile -> Annex Bool -getKey key afile = getKey' key afile =<< Remote.keyPossibilities key +getKey key afile = getKey' key afile + =<< Remote.keyPossibilities (Remote.IncludeIgnored False) key getKey' :: Key -> AssociatedFile -> [Remote] -> Annex Bool getKey' key afile = dispatch diff --git a/Command/Move.hs b/Command/Move.hs index 2e450b3e4a..59bdf9ecad 100644 --- a/Command/Move.hs +++ b/Command/Move.hs @@ -146,7 +146,7 @@ toStart' dest removewhen afile key ai si = do expectedPresent :: Remote -> Key -> Annex Bool expectedPresent dest key = do - remotes <- Remote.keyPossibilities key + remotes <- Remote.keyPossibilities (Remote.IncludeIgnored True) key return $ dest `elem` remotes toPerform :: Remote -> RemoveWhen -> Key -> AssociatedFile -> Bool -> Either String Bool -> CommandPerform @@ -249,7 +249,7 @@ fromOk src key where checklog = do u <- getUUID - remotes <- Remote.keyPossibilities key + remotes <- Remote.keyPossibilities (Remote.IncludeIgnored True) key return $ u /= Remote.uuid src && elem src remotes fromPerform :: Remote -> RemoveWhen -> Key -> AssociatedFile -> CommandPerform @@ -326,7 +326,7 @@ fromDrop src destuuid deststartedwithcopy key afile adjusttocheck = toHereStart :: RemoveWhen -> AssociatedFile -> Key -> ActionItem -> SeekInput -> CommandStart toHereStart removewhen afile key ai si = startingNoMessage (OnlyActionOn key ai) $ do - rs <- Remote.keyPossibilities key + rs <- Remote.keyPossibilities (Remote.IncludeIgnored False) key forM_ rs $ \r -> includeCommandAction $ starting (describeMoveAction removewhen) ai si $ diff --git a/Command/Sync.hs b/Command/Sync.hs index fcdc807f1f..851776f95f 100644 --- a/Command/Sync.hs +++ b/Command/Sync.hs @@ -897,7 +897,7 @@ seekSyncContent o rs currbranch = do syncFile :: SyncOptions -> Either (Maybe (Bloom Key)) (Key -> Annex ()) -> [Remote] -> AssociatedFile -> Key -> Annex Bool syncFile o ebloom rs af k = do inhere <- inAnnex k - locs <- map Remote.uuid <$> Remote.keyPossibilities k + locs <- map Remote.uuid <$> Remote.keyPossibilities (Remote.IncludeIgnored False) k let (have, lack) = partition (\r -> Remote.uuid r `elem` locs) rs got <- anyM id =<< handleget have inhere diff --git a/Remote.hs b/Remote.hs index 93b2e30f87..1638777277 100644 --- a/Remote.hs +++ b/Remote.hs @@ -47,6 +47,7 @@ module Remote ( remotesWithUUID, remotesWithoutUUID, keyLocations, + IncludeIgnored(..), keyPossibilities, remoteLocations, nameToUUID, @@ -299,13 +300,16 @@ remotesWithoutUUID rs us = filter (\r -> uuid r `notElem` us) rs keyLocations :: Key -> Annex [UUID] keyLocations key = trustExclude DeadTrusted =<< loggedLocations key +{- Whether to include remotes that have annex-ignore set. -} +newtype IncludeIgnored = IncludeIgnored Bool + {- Cost ordered lists of remotes that the location log indicates - may have a key. - - Also includes remotes with remoteAnnexSpeculatePresent set. -} -keyPossibilities :: Key -> Annex [Remote] -keyPossibilities key = do +keyPossibilities :: IncludeIgnored -> Key -> Annex [Remote] +keyPossibilities ii key = do u <- getUUID -- uuids of all remotes that are recorded to have the key locations <- filter (/= u) <$> keyLocations key @@ -315,19 +319,21 @@ keyPossibilities key = do -- there are unlikely to be many speclocations, so building a Set -- is not worth the expense let locations' = speclocations ++ filter (`notElem` speclocations) locations - fst <$> remoteLocations locations' [] + fst <$> remoteLocations ii locations' [] {- Given a list of locations of a key, and a list of all - trusted repositories, generates a cost-ordered list of - remotes that contain the key, and a list of trusted locations of the key. -} -remoteLocations :: [UUID] -> [UUID] -> Annex ([Remote], [UUID]) -remoteLocations locations trusted = do +remoteLocations :: IncludeIgnored -> [UUID] -> [UUID] -> Annex ([Remote], [UUID]) +remoteLocations (IncludeIgnored ii) locations trusted = do let validtrustedlocations = nub locations `intersect` trusted -- remotes that match uuids that have the key allremotes <- remoteList - >>= filterM (not <$$> liftIO . getDynamicConfig . remoteAnnexIgnore . gitconfig) + >>= if not ii + then filterM (not <$$> liftIO . getDynamicConfig . remoteAnnexIgnore . gitconfig) + else return let validremotes = remotesWithUUID allremotes locations return (sortBy (comparing cost) validremotes, validtrustedlocations) diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index dfcd600c49..750ad923f0 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1388,9 +1388,9 @@ Remotes are configured using these settings in `.git/config`. * `remote.<name>.annex-ignore` - If set to `true`, prevents git-annex - from storing annexed file contents on this remote by default. - (You can still request it be used by the `--from` and `--to` options.) + If set to `true`, prevents git-annex from storing or retrieving annexed + file contents on this remote by default. + (You can still request it be used with the `--from` and `--to` options.) This is, for example, useful if the remote is located somewhere without git-annex-shell. (For example, if it's on GitHub). @@ -1399,7 +1399,7 @@ Remotes are configured using these settings in `.git/config`. This does not prevent `git-annex sync`, `git-annex pull`, `git-annex push`, `git-annex assist` or the `git-annex assistant` from operating on the - git repository. + git repository. It only affects annexed content. * `remote.<name>.annex-ignore-command`
Added a comment
diff --git a/doc/forum/Revisiting_migration_and_multiple_keys/comment_1_a85c3c7af6b1e01f887e0e1ffe2cde6f._comment b/doc/forum/Revisiting_migration_and_multiple_keys/comment_1_a85c3c7af6b1e01f887e0e1ffe2cde6f._comment new file mode 100644 index 0000000000..efeeece885 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys/comment_1_a85c3c7af6b1e01f887e0e1ffe2cde6f._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 1" + date="2023-11-30T06:51:53Z" + content=""" +Hi! If I understand you correctly, your problem is that you often migrate keys to another backend, and there are situations involving merges of repos far away from each other in history that cause merge conflicts, which results in the dead old pre-migration key being reintroduced? + +I never use key backend migration and I don't fully understand your workflow. Could you provide a reproducible example of your problem (incl all commands)? This would help a lot. +"""]]
diff --git a/doc/users/unqueued.mdwn b/doc/users/unqueued.mdwn new file mode 100644 index 0000000000..7ae93d8608 --- /dev/null +++ b/doc/users/unqueued.mdwn @@ -0,0 +1,7 @@ +Hello. + +I have been using git-annex since 2017, and it has been a huge boon. + +I mostly consider myself a hobbyist, but I have used git-annex professionally. Along the way, I've learned a lot about unix system programming, git, and automation. + +I personally find managing large sets of files to be very annoying. So being able to wrangle giant sets of files with the precision of git is awesome. I used to spend a lot of time checksumming files, making download queues, and trying to catalog where stuff was.
diff --git a/doc/forum/Revisiting_migration_and_multiple_keys.mdwn b/doc/forum/Revisiting_migration_and_multiple_keys.mdwn new file mode 100644 index 0000000000..74f99d97b5 --- /dev/null +++ b/doc/forum/Revisiting_migration_and_multiple_keys.mdwn @@ -0,0 +1,22 @@ +I have several workflows that rely on regular key migrations, and I would love to explore some ways that migrating keys could be improved. + +I see there has already been discussion about this: +https://git-annex.branchable.com/todo/alternate_keys_for_same_content/ + +I don't know how often this comes up, but it comes up a lot for me. I have several data sources that I regularly index and mirror by constructing keys based on md5 and size, and assemble a repo with the known filename. (gdrive, many software distribution sites, and others). + +So, I have a queue-repo, and I have the flexibilty of populating it later. I could even have a queue repo with just URL keys. Then I can handle ingestion and migration later. + + +I would love to have a simple programatic way of recording that one key is the authoritative key for another key, like for MD5 -> SHA256 migrations. + +There don't seem to be any really great solutions to the prolem of obsolete keys. Merging will often re-introduce them, even if they have been excised. Marking them as dead stil keeps them around, and doesn't preserve information about what key now represents the same object. + +I have written helper scripts, and tools like git-annex-filter-branch are also very helpful. But I like having the flexibility of many repos that may not regularly be in sync with each other, and a consistent history. + + +This would break things for sure, but what if during a migration, a symlink was made in the git-annex branch from the prev key to the migrated key. The union merge driver could defer to the upgraded or prefered backend. If an out of date repo tries syncing with an already upgraded key, the merge driver can see that the migration for that key has already happened, merge the obsolete key entries, and overwrite it back to a symlink during merge. + +A less drastic approach might be to expand the location log format to indiciate a canonical "successor" key, instead of just being dead. + +It might seem like a lot of complexity, but it would also in my opinion make a more consistent and flexible data model.
add news item for git-annex 10.20231129
diff --git a/doc/news/version_10.20230407.mdwn b/doc/news/version_10.20230407.mdwn deleted file mode 100644 index 6f3883cb46..0000000000 --- a/doc/news/version_10.20230407.mdwn +++ /dev/null @@ -1,11 +0,0 @@ -git-annex 10.20230407 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Fix laziness bug introduced in last release that breaks use - of --unlock-present and --hide-missing adjusted branches. - * Support user.useConfigOnly git config. - * registerurl, unregisterurl: Added --remote option. - * registerurl: When an url is claimed by a special remote other than the - web, update location tracking for that special remote. - (This was the behavior before version 6.20181011) - * Sped up sqlite inserts 2x when built with persistent 2.14.5.0 - * git-annex.cabal: Prevent building with unix-compat 0.7 which - removed System.PosixCompat.User."""]] \ No newline at end of file diff --git a/doc/news/version_10.20230626.mdwn b/doc/news/version_10.20230626.mdwn deleted file mode 100644 index 33e2594c3a..0000000000 --- a/doc/news/version_10.20230626.mdwn +++ /dev/null @@ -1,106 +0,0 @@ -News for git-annex 10.20230626: - -git-annex (10.20230626) upstream; urgency=medium -. - Many commands now quote filenames that contain unusual characters the - same way that git does, to avoid exposing control characters to the - terminal. The core.quotePath config can be set to false to disable this - quoting. - - -git-annex 10.20230626 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Split out two new commands, git-annex pull and git-annex push. - Those plus a git commit are equivalent to git-annex sync. - (Note that the new commands default to syncing content, unless - annex.synccontent is explicitly set to false.) - * assist: New command, which is the same as git-annex sync but with - new files added and content transferred by default. - * sync: Started a transition to --content being enabled by default. - When used without --content or --no-content, warn about the upcoming - transition, and suggest using one of the options, or setting - annex.synccontent. - * sync: Added -g as a short option for --no-content. - * Many commands now quote filenames that contain unusual characters the - same way that git does, to avoid exposing control characters to the - terminal. - * Support core.quotePath, which can be set to false to display utf8 - characters as-is in filenames. - * Control characters in non-filename data coming from the repository or - other possible untrusted sources are filtered out of the display of many - commands. When the command output is intended for use in scripting, - control characters are only filtered out when displaying to the - terminal. - * find, findkeys, examinekey: When outputting to a terminal and --format - is not used, quote control characters. Output to a pipe is unchanged. - (Similar to the behavior of GNU find.) - * addurl --preserve-filename now rejects filenames that contain other - control characters, besides the escape sequences it already rejected. - * init: Avoid autoenabling special remotes that have control characters - in their names. - * Support core.sharedRepository=0xxx at long last. - * Support --json and --json-error-messages in many more commands - (addunused, configremote, dead, describe, dropunused, enableremote, - expire, fix, importfeed, init, initremote, log, merge, migrate, reinit, - reinject, rekey, renameremote, rmurl, semitrust, setpresentkey, trust, - unannex, undo, uninit, untrust, unused, upgrade) - * importfeed: Support -J - * importfeed: Support --json-progress - * httpalso: Support being used with special remotes that use chunking. - * Several significant speedups to importing large trees from special - remotes. Imports that took over an hour now take only a few minutes. - * Cache negative lookups of global numcopies and mincopies. - Speeds up eg git-annex sync --content by up to 50%. - * Speed up sync in an adjusted branch by avoiding re-adjusting the branch - unnecessarily, particularly when it is adjusted with --hide-missing - or --unlock-present. - * config: Added the --show-origin and --for-file options. - * config: Support annex.numcopies and annex.mincopies. - * whereused: Fix display of branch:file when run in a subdirectory. - * enableremote: Support enableremote of a git remote (that was previously - set up with initremote) when additional parameters such as autoenable= - are passed. - * configremote: New command, currently limited to changing autoenable= - setting of a special remote. - * Honor --force option when operating on a local git remote. - * When a nonexistant file is passed to a command and - --json-error-messages is enabled, output a JSON object indicating the - problem. (But git ls-files --error-unmatch still displays errors about - such files in some situations.) - * Bug fix: Create .git/annex/, .git/annex/fsckdb, - .git/annex/sentinal, .git/annex/sentinal.cache, and - .git/annex/journal/* with permissions configured by core.sharedRepository. - * Bug fix: Lock files were created with wrong modes for some combinations - of core.sharedRepository and umask. - * initremote: Avoid creating a remote that is not encrypted when gpg is - broken. - * log: When --raw-date is used, display only seconds from the epoch, as - documented, omitting a trailing "s" that was included in the output - before. - * addunused: Displays the names of the files that it adds. - * reinject: Fix support for operating on multiple pairs of files and keys. - * sync: Fix buggy handling of --no-pull and --no-push when syncing - --content. With --no-pull, avoid downloading content, and with - --no-push avoid uploading content. This was done before, but - inconsistently. - * uninit: Avoid buffering the names of all annexed files in memory. - * Fix bug in -z handling of trailing NUL in input. - * version: Avoid error message when entire output is not read. - * Fix excessive CPU usage when parsing yt-dlp (or youtube-dl) progress - output fails. - * Use --progress-template with yt-dlp to fix a failure to parse - progress output when only an estimated total size is known. - * When yt-dlp is available, default to using it in preference to - youtube-dl. Using youtube-dl is now deprecated, and git-annex no longer - tries to parse its output to display download progress - * Improve resuming interrupted download when using yt-dlp or youtube-dl. - * assistant: Add dotfiles to git by default, unless annex.dotfiles - is configured, the same as git-annex add does. - * assistant --autostop: Avoid crashing when ~/.config/git-annex/autostart - lists a directory that it cannot chdir to. - * Fix display when run with -J1. - * assistant: Fix a crash when a small file is deleted immediately after - being created. - * repair: Fix handling of git ref names on Windows. - * repair: Fix a crash when .git/annex/journal/ does not exist. - * Support building with optparse-applicative 0.18.1 - (Thanks, Peter Simons)"""]] diff --git a/doc/news/version_10.20231129.mdwn b/doc/news/version_10.20231129.mdwn new file mode 100644 index 0000000000..103697888d --- /dev/null +++ b/doc/news/version_10.20231129.mdwn @@ -0,0 +1,22 @@ +git-annex 10.20231129 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Fix bug in git-annex copy --from --to that skipped files that were + locally present. + * Make git-annex copy --from --to --fast actually fast. + * Fix crash of enableremote when the special remote has embedcreds=yes. + * Ignore directories and other unusual files in .git/annex/journal/ + * info: Added calculation of combined annex size of all repositories. + * log: Added options --sizesof, --sizes and --totalsizes that + display how the size of repositories changed over time. + * log: Added options --interval, --bytes, --received, and --gnuplot + to tune the output of the above added options. + * findkeys: Support --largerthan and --smallerthan. + * importfeed: Use caching database to avoid needing to list urls + on every run, and avoid using too much memory. + * Improve memory use of --all when using annex.private. + * lookupkey: Sped up --batch. + * Windows: Consistently avoid ending standard output lines with CR. + This matches the behavior of git on Windows. + * Windows: Fix CRLF handling in some log files. + * Windows: When git-annex init is installing hook scripts, it will + avoid ending lines with CR for portability. Existing hook scripts + that do have CR line endings will not be changed."""]] \ No newline at end of file
comment
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_11_65460ed123404478ae59fed1b5cff627._comment b/doc/forum/very_slow_on_exfat_drives/comment_11_65460ed123404478ae59fed1b5cff627._comment new file mode 100644 index 0000000000..5b4cac8ac2 --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_11_65460ed123404478ae59fed1b5cff627._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2023-11-29T17:36:28Z" + content=""" +What might be happening on the exfat drive is, every time that filesystem +is mounted, it generates new inode numbers for all the files. So when you +run `git status`, git sees the new inode and needs to do work to determine +if it's changed. When the file is an annexed file that is unlocked (which +all annexed files necessarily are on this filesystem since it doesn't +support symlinks), git status needs to ask git-annex about it. +And git-annex has to either re-hash the file (for SHA) or do a +smaller amount of work (for WORM). + +A bare repository does get around that. But what I tend to use in these +situations is a [[/special_remotes/directory]] special remote configured +with `ignoreinodes=yes`. +"""]]
close as fixed
diff --git a/doc/bugs/CRLF_breaks_interoperability_Win-Ux__58__..post-receive.mdwn b/doc/bugs/CRLF_breaks_interoperability_Win-Ux__58__..post-receive.mdwn index b3e0f01375..06d01733c7 100644 --- a/doc/bugs/CRLF_breaks_interoperability_Win-Ux__58__..post-receive.mdwn +++ b/doc/bugs/CRLF_breaks_interoperability_Win-Ux__58__..post-receive.mdwn @@ -60,3 +60,5 @@ To /var/tmp/mnt/winhost-w10-5920/cygdrive/e/my.gitannex/ Yes. I think git-annex is a hidden gem of the open source community. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/CRLF_breaks_interoperability_Win-Ux__58__..post-receive/comment_5_c22f9e3c5e438b2cc05008bc3f91b7fc._comment b/doc/bugs/CRLF_breaks_interoperability_Win-Ux__58__..post-receive/comment_5_c22f9e3c5e438b2cc05008bc3f91b7fc._comment new file mode 100644 index 0000000000..de5861f990 --- /dev/null +++ b/doc/bugs/CRLF_breaks_interoperability_Win-Ux__58__..post-receive/comment_5_c22f9e3c5e438b2cc05008bc3f91b7fc._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2023-11-29T17:29:08Z" + content=""" +[[!commit 4e35067325022e4d8790d45557156fd1166484df]] fixed this. +Previously, git-annex did write hook scripts with CRLF line endings. + +Existing hook scripts won't be changed by git-annex, but if you delete +them, and re-run `git-annex init` it will reinstall it without the CRLF. +"""]]
response
diff --git a/doc/tips/cloning_a_repository_privately/comment_2_0fb78b2183932da08809d60dfc5a7374._comment b/doc/tips/cloning_a_repository_privately/comment_2_0fb78b2183932da08809d60dfc5a7374._comment new file mode 100644 index 0000000000..7ba12def9b --- /dev/null +++ b/doc/tips/cloning_a_repository_privately/comment_2_0fb78b2183932da08809d60dfc5a7374._comment @@ -0,0 +1,37 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: What about temporary annex.private declaration?""" + date="2023-11-29T16:50:05Z" + content=""" +I'm sure that the private information will not leak out from +`.git/annex/journal-private/` into the git-annex branch +after annex.private is unset. The design ensures this because, when +making a change to the branch, it only reads the private journal file +when the repository whose information is being changed is private. + +However, when git-annex does not have any private repositories configured, +an optimisation makes it skip trying to read from the private journal. So +information about those repositories, that were private, will no longer be +read. + +This effect is easy to see, for example: + + joey@darkstar:~/tmp/xxx>git-annex whereis + whereis foo (1 copy) + ff1f0bbd-7be6-45ff-8c90-fd322820b717 -- joey@darkstar:~/tmp/xxx [here] + ok + joey@darkstar:~/tmp/xxx>git config annex.private false + joey@darkstar:~/tmp/xxx>git-annex whereis + whereis foo (0 copies) failed + whereis: 1 failed + +I think this could be improved, eg it could check once if the private +journal exists and if so read from it even when no private uuids are +currently configured. A single stat to support this would be ok; the goal +was to avoid checking nonexistany files on every branch read when private +repositories are not used. + +Configuring any remote with annex-private can be used to work around that +problem, that lets it read information about all previously-private repositories +as well. +"""]]
comment
diff --git a/doc/tips/what_to_do_when_a_repository_is_corrupted/comment_6_0e3224af10362a10aa1c8786423960a9._comment b/doc/tips/what_to_do_when_a_repository_is_corrupted/comment_6_0e3224af10362a10aa1c8786423960a9._comment new file mode 100644 index 0000000000..d31da3d014 --- /dev/null +++ b/doc/tips/what_to_do_when_a_repository_is_corrupted/comment_6_0e3224af10362a10aa1c8786423960a9._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2023-11-29T16:44:35Z" + content=""" +@sng you can clone any git repository, it does not need to be a bare one. +"""]]
Added a comment: corrupted bare repo
diff --git a/doc/tips/what_to_do_when_a_repository_is_corrupted/comment_5_9b3738aa678015a58f30a22baa2012df._comment b/doc/tips/what_to_do_when_a_repository_is_corrupted/comment_5_9b3738aa678015a58f30a22baa2012df._comment new file mode 100644 index 0000000000..b951df26c3 --- /dev/null +++ b/doc/tips/what_to_do_when_a_repository_is_corrupted/comment_5_9b3738aa678015a58f30a22baa2012df._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="sng@353ca358075d9aa328f60a5439a3cee10f8301fe" + nickname="sng" + avatar="http://cdn.libravatar.org/avatar/d64f4854965b2b1c3ecafee4b2a66fac" + subject="corrupted bare repo" + date="2023-11-29T15:34:30Z" + content=""" +I've tried following these steps with a bare repo that became corrupted somehow, but at the cloning another repo step I'm stuck... how do I clone a repo when the bare repo is the one corrupted? (if it matters, this is a very large repo of photo files) +"""]]
comment
diff --git a/doc/forum/git-remote-gcrypt_and_rsyncd/comment_1_34dd343ed75918f5969f6a23dfae3317._comment b/doc/forum/git-remote-gcrypt_and_rsyncd/comment_1_34dd343ed75918f5969f6a23dfae3317._comment new file mode 100644 index 0000000000..02246cf558 --- /dev/null +++ b/doc/forum/git-remote-gcrypt_and_rsyncd/comment_1_34dd343ed75918f5969f6a23dfae3317._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-11-28T16:06:22Z" + content=""" +That is a pretty weird error message! It looks like git-annex may have run +git but tried to pass it a working directory that does not exist. It would +be interesting to know what git command, passing --debug would tell you. + +But: The gcrypt special remote is documented as needing gitrepo=rsync:// to +operate over ssh. And git-remote-gcrypt interprets a rsync:// url as rsync +over ssh (see its man page). Yes, "host::" in rsync indicates direct contact +to a rsync daemon, not using ssh, but that will not work with git-remote-gcrypt +to the best of my knowledge. +"""]]
dial back addition, but keep it
It's a semi-common point of confusion that numcopies is not something
these commands go out and copy files around specifically to satisfy,
without further configuration in preferred content. So this is a good
addition, but it also seemed too long and too specific to the user's
particular situation.
It's a semi-common point of confusion that numcopies is not something
these commands go out and copy files around specifically to satisfy,
without further configuration in preferred content. So this is a good
addition, but it also seemed too long and too specific to the user's
particular situation.
diff --git a/doc/git-annex-satisfy.mdwn b/doc/git-annex-satisfy.mdwn index e71d6dc674..1ed4ec543d 100644 --- a/doc/git-annex-satisfy.mdwn +++ b/doc/git-annex-satisfy.mdwn @@ -16,14 +16,9 @@ and pushing of git repositories, and without changing the trees that are imported to or exported from special remotes. Note that it (like [[git-annex-sync]] or [[git-annex-assist]]) does not work -specifically towards satisfying the [[git-annex-numcopies]] setting and it will -not violate the local preferred content expression in order to move files -between remotes that are not present locally. To allow for files to be present -locally for such a movement between remotes, consider adding `or -approxlackingcopies=1` to your local [[preferred_content]] expression (and -maybe increasing [[git-annex-numcopies]] accordingly) so that files may pass -through your local repo temporarily. Otherwise, `git annex satisfy` does not -see a pathway for files to pass between other remotes. +specifically towards satisfying the [[git-annex-numcopies]] setting, +unless the preferred content setting of the local repository is written to +do so by using eg `approxlackingcopies=1`. # OPTIONS
fix some language
diff --git a/doc/git-annex-matching-options.mdwn b/doc/git-annex-matching-options.mdwn index 83e07105e7..93eb7bf6c7 100644 --- a/doc/git-annex-matching-options.mdwn +++ b/doc/git-annex-matching-options.mdwn @@ -103,8 +103,8 @@ in either of two repositories. * `--lackingcopies=number` - Matches only when git-annex beleives that the specified number or - more additional copies to be made in order to satisfy numcopies + Matches only when git-annex believes that the specified number or + more additional copies need to be made in order to satisfy numcopies settings. * `--approxlackingcopies=number`
findkeys: Support --largerthan and --smallerthan
Sponsored-by: Brett Eisenberg on Patreon
Sponsored-by: Brett Eisenberg on Patreon
diff --git a/CHANGELOG b/CHANGELOG index 4ca82dcea0..1bb37b85f3 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -20,6 +20,7 @@ git-annex (10.20230927) UNRELEASED; urgency=medium * Fix bug in git-annex copy --from --to that skipped files that were locally present. * Make git-annex copy --from --to --fast actually fast. + * findkeys: Support --largerthan and --smallerthan. -- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400 diff --git a/CmdLine/GitAnnex/Options.hs b/CmdLine/GitAnnex/Options.hs index 92649fae34..b5170f3131 100644 --- a/CmdLine/GitAnnex/Options.hs +++ b/CmdLine/GitAnnex/Options.hs @@ -266,6 +266,7 @@ annexedMatchingOptions = concat keyMatchingOptions :: [AnnexOption] keyMatchingOptions = concat [ keyMatchingOptions' + , sizeMatchingOptions Limit.LimitAnnexFiles , anythingNothingOptions , combiningOptions , timeLimitOption @@ -398,7 +399,11 @@ fileMatchingOptions' lb = <> help "limit to files whose content is the same as another file matching the glob pattern" <> hidden ) - , annexOption (setAnnexState . Limit.addLargerThan lb) $ strOption + ] ++ sizeMatchingOptions lb + +sizeMatchingOptions :: Limit.LimitBy -> [AnnexOption] +sizeMatchingOptions lb = + [ annexOption (setAnnexState . Limit.addLargerThan lb) $ strOption ( long "largerthan" <> metavar paramSize <> help "match files larger than a size" <> hidden diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn index 9894921581..591564f055 100644 --- a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn @@ -46,3 +46,5 @@ as if I have misspelled the option: I am using it for my research project (data science/predictions in plant breeding) and it allows me to keep track of the current model iteration and associated results. Thank you for this! + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan/comment_1_f6484f69669439ef49513b84d3bd91cf._comment b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan/comment_1_f6484f69669439ef49513b84d3bd91cf._comment new file mode 100644 index 0000000000..05ea2c90a0 --- /dev/null +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan/comment_1_f6484f69669439ef49513b84d3bd91cf._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-11-28T15:48:09Z" + content=""" +Not all options in git-annex-matching-options can be used by findkeys. It +mentions this when it says "Some of these options can also be used by +commands to specify which keys they act on." + +However in this case, --largerthan and --smallerthan could in fact be made +to operate on keys, and I've done so. +"""]]
diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn index 247378f219..9894921581 100644 --- a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn @@ -21,7 +21,7 @@ as if I have misspelled the option: > Commonly used commands: > > add add files to annex -> ...... +> [...] ### What steps will reproduce the problem?
diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn index 17a26da754..247378f219 100644 --- a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn @@ -20,7 +20,7 @@ as if I have misspelled the option: > > Commonly used commands: > -> add add files to annex +> add add files to annex > ...... ### What steps will reproduce the problem?
diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn index afe4db9a2c..17a26da754 100644 --- a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn @@ -12,13 +12,13 @@ is important for me as it allows me to track down the files that occupy most of archive. However, if I try to call the above command, it does not show me a list of keys matching the criterion, but a help page as if I have misspelled the option: -> $ git annex findkeys --largerthan 1 -> Invalid option `--largerthan' -> -> Usage: git-annex COMMAND -> git-annex - manage files with git, without checking their contents in -> -> Commonly used commands: +> $ git annex findkeys --largerthan 1 +> Invalid option `--largerthan' +> +> Usage: git-annex COMMAND +> git-annex - manage files with git, without checking their contents in +> +> Commonly used commands: > > add add files to annex > ......
diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn index 97419f2c27..afe4db9a2c 100644 --- a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn @@ -26,13 +26,13 @@ as if I have misspelled the option: ### What steps will reproduce the problem? [[!format sh """ -echo hi > file -git annex init -git annex add file -git commit -m "Test commit" -git annex find --largerthan 1 # << this lists "file" -git annex findkeys --largerthan 1 # << this fails -]] + echo hi > file + git annex init + git annex add file + git commit -m "Test commit" + git annex find --largerthan 1 # << this lists "file" + git annex findkeys --largerthan 1 # << this fails +"""]] ### What version of git-annex are you using? On what operating system?
diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn index e95221cc48..97419f2c27 100644 --- a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn @@ -25,12 +25,14 @@ as if I have misspelled the option: ### What steps will reproduce the problem? +[[!format sh """ echo hi > file git annex init git annex add file git commit -m "Test commit" git annex find --largerthan 1 # << this lists "file" git annex findkeys --largerthan 1 # << this fails +]] ### What version of git-annex are you using? On what operating system? @@ -40,16 +42,7 @@ git annex findkeys --largerthan 1 # << this fails -### Please provide any additional information below. - -[[!format sh """ -# If you can, paste a complete transcript of the problem occurring here. -# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log - - -# End of transcript or log. -"""]] - ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) - +I am using it for my research project (data science/predictions in plant breeding) and it allows me to keep track of the +current model iteration and associated results. Thank you for this!
diff --git a/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn new file mode 100644 index 0000000000..e95221cc48 --- /dev/null +++ b/doc/bugs/git-annex_findkeys_does_not_know_--largerthan.mdwn @@ -0,0 +1,55 @@ +### Please describe the problem. + +The man page of [git-annex findkeys](https://git-annex.branchable.com/git-annex-findkeys/) says: + +> OPTIONS +> +> * matching options +> The git-annex-matching-options(1) can be used to specify which keys to list. + +However, this is not true for the options that match file size. Being able to do for example `git-annex findkeys --largerthan 100M` +is important for me as it allows me to track down the files that occupy most of my storage, allowing me to move them to some +archive. However, if I try to call the above command, it does not show me a list of keys matching the criterion, but a help page +as if I have misspelled the option: + +> $ git annex findkeys --largerthan 1 +> Invalid option `--largerthan' +> +> Usage: git-annex COMMAND +> git-annex - manage files with git, without checking their contents in +> +> Commonly used commands: +> +> add add files to annex +> ...... + +### What steps will reproduce the problem? + +echo hi > file +git annex init +git annex add file +git commit -m "Test commit" +git annex find --largerthan 1 # << this lists "file" +git annex findkeys --largerthan 1 # << this fails + +### What version of git-annex are you using? On what operating system? + +* git-annex version: 10.20230926-g44a7b4c9734adfda5912dd82c1aa97c615689f57 +* Rocky Linux 9.2 +* git 2.40.1 + + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +
Add note to 'satisfy' manpage about it not satisfying numcopies and how to set it up to pass files between remotes that are not present locally.
I got bitten several times in the past by the fact that local preferred
content expressions are not violated (even temporarily) in order to
satisfy numcopies or other remotes' preferred content expressions.
Mostly in the form of the local repo not allowing arbitrary files in
(e.g. because it's set to only want `present` files). This note I add
here explains how to get out of this situation with
`approxlackingcopies=1`.
It might be too specific for this manpage, but I didn't find a better
place to put it.
I got bitten several times in the past by the fact that local preferred
content expressions are not violated (even temporarily) in order to
satisfy numcopies or other remotes' preferred content expressions.
Mostly in the form of the local repo not allowing arbitrary files in
(e.g. because it's set to only want `present` files). This note I add
here explains how to get out of this situation with
`approxlackingcopies=1`.
It might be too specific for this manpage, but I didn't find a better
place to put it.
diff --git a/doc/git-annex-satisfy.mdwn b/doc/git-annex-satisfy.mdwn index ddbec766ae..e71d6dc674 100644 --- a/doc/git-annex-satisfy.mdwn +++ b/doc/git-annex-satisfy.mdwn @@ -15,6 +15,16 @@ It does the same thing as `git-annex sync --content` without the pulling and pushing of git repositories, and without changing the trees that are imported to or exported from special remotes. +Note that it (like [[git-annex-sync]] or [[git-annex-assist]]) does not work +specifically towards satisfying the [[git-annex-numcopies]] setting and it will +not violate the local preferred content expression in order to move files +between remotes that are not present locally. To allow for files to be present +locally for such a movement between remotes, consider adding `or +approxlackingcopies=1` to your local [[preferred_content]] expression (and +maybe increasing [[git-annex-numcopies]] accordingly) so that files may pass +through your local repo temporarily. Otherwise, `git annex satisfy` does not +see a pathway for files to pass between other remotes. + # OPTIONS * `[remote]`
diff --git a/doc/forum/git-remote-gcrypt_and_rsyncd.mdwn b/doc/forum/git-remote-gcrypt_and_rsyncd.mdwn new file mode 100644 index 0000000000..bba80cd66c --- /dev/null +++ b/doc/forum/git-remote-gcrypt_and_rsyncd.mdwn @@ -0,0 +1,32 @@ +In an attempt to simplify my setup, I have been trying to setup an encrypted repository on a `rsyncd`-based server via [`git-remote-gcrypt`](https://git-annex.branchable.com/tips/fully_encrypted_git_repositories_with_gcrypt/), which would house the file history and the annexed files themselves. I cannot provide an SSH connection to the server, so the `rsyncd` method seemed appealing. + +Using the rsync format url with "::" to signal the rsyncd method, the connection seems successful, but the initialization does not complete. + +``` +git annex initremote gcrypt-rsyncd type=gcrypt gitrepo=rsync://***::a/test keyid=*** encryption=hybrid + +initremote gcrypt-rsyncd (encryption setup) (to gpg keys: ***) gcrypt + Decrypting manifest +gpg: Signature made Wed Nov 22 22:23:16 2023 CET +gpg: using EDDSA key *** +gpg: Good signature from "archive-990" [ultimate] +gcrypt: Remote ID is :id:ya5ZivzWNEOUtVg2R0L9 +From gcrypt::rsync://***::a/test + * [new branch] git-annex -> gcrypt-rsyncd/git-annex +gcrypt: Decrypting manifest +gpg: Signature made Wed Nov 22 22:23:16 2023 CET +gpg: using EDDSA key *** +gpg: Good signature from "archive-990" [ultimate] +Everything up-to-date + +git-annex: git: createProcess: chdir: invalid argument (Bad file descriptor) +failed +initremote: 1 failed +``` + +Logs from the daemon show the following error: +``` +rsync to a/test/annex/objects from *** +``` + +I don't know whether this error is imputable to `git-annex`, or `git-remote-gcrypt`, or my settings.
comment
diff --git a/doc/forum/Using_git-annex_as_a_library/comment_7_909628f1edd0d3448498fc434c61a3a4._comment b/doc/forum/Using_git-annex_as_a_library/comment_7_909628f1edd0d3448498fc434c61a3a4._comment new file mode 100644 index 0000000000..63d06ba09f --- /dev/null +++ b/doc/forum/Using_git-annex_as_a_library/comment_7_909628f1edd0d3448498fc434c61a3a4._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2023-11-21T20:23:19Z" + content=""" +Hmm, `cabal build` does not build the library 2x in the case of propellor, +which has a similar split between the propellor library and the executables +that depend on it. Perhaps cabal has improved that since I posted my +comment. +"""]] diff --git a/doc/forum/Using_git-annex_as_a_library/comment_8_cef28b8639b6c6e84804b485fb5037f1._comment b/doc/forum/Using_git-annex_as_a_library/comment_8_cef28b8639b6c6e84804b485fb5037f1._comment new file mode 100644 index 0000000000..400e1913af --- /dev/null +++ b/doc/forum/Using_git-annex_as_a_library/comment_8_cef28b8639b6c6e84804b485fb5037f1._comment @@ -0,0 +1,37 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2023-11-21T20:25:55Z" + content=""" +A lot of git-annex's end user functionality is in the `Command/*` modules. +Much of the code in those is not in a form that would be useful in a +library, unless the library was structured to run a `Command`. + +If the goal is say, to add a file to git-annex, and then to be able to +get and drop the file, and `Command.Add`, `Command.Get`, and `Command.Drop` +were not in the library, you'd have to put together the eqivilant of what +those commands do out of the internal library code: + +* `Command.Add` essentially uses `Annex.Ingest` followed by calling into + `Logs.Location` to update the location log. +* `Command.Drop` uses `Annex.NumCopies` to construct a drop proof and passes + it off to `Annex.Drop`. +* `Command.Get` handles trying different remotes itself, calling into + `Annex.Transfer`. + +So the library structure works for git-annex as a thing for Command modules +to use, but not great for other things. The assistant actually imports some +Command modules, eg it uses Command.Add.addFile. Similarly it would be +possible to call Command.Get.getKey. + +Maybe it would be better to have a library interface that does mirror the +git-annex command line, so you could run eg: + + runCommand (Command.Add.cmd) (AddOptions {...}) "somefile" + runCommand (Command.Drop.cmd) (DropOptions {...}) "somefile" + +That would necessarily have output like those commands too (unless quiet +mode were enabled). + +It would take some work to get there from the current state. +"""]] diff --git a/doc/forum/Using_git_annex_as_a_library/comment_2_1323238fc63e121fbc0f408a23d1ada5._comment b/doc/forum/Using_git_annex_as_a_library/comment_2_1323238fc63e121fbc0f408a23d1ada5._comment new file mode 100644 index 0000000000..882691e075 --- /dev/null +++ b/doc/forum/Using_git_annex_as_a_library/comment_2_1323238fc63e121fbc0f408a23d1ada5._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-11-21T20:17:10Z" + content=""" +Hmm, I followed up on that other page to avoid fragmenting the discussion. +"""]]
Added a comment
diff --git a/doc/bugs/How_to_git_union-merge__63__/comment_2_22701b82e8d53acff66ddea5bf9448bf._comment b/doc/bugs/How_to_git_union-merge__63__/comment_2_22701b82e8d53acff66ddea5bf9448bf._comment new file mode 100644 index 0000000000..63edd6f659 --- /dev/null +++ b/doc/bugs/How_to_git_union-merge__63__/comment_2_22701b82e8d53acff66ddea5bf9448bf._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 2" + date="2023-11-21T20:20:13Z" + content=""" +It sounds promising as an automatic merge conflict resolver that I'd like to fall back to for non-annexed file conflicts. I don't really know how do achieve that manually. If it's easily possible in another way, I'll try that. +"""]]
comment
diff --git a/doc/bugs/How_to_git_union-merge__63__/comment_1_58b6c9712d7c209248d9ef87bcc0e110._comment b/doc/bugs/How_to_git_union-merge__63__/comment_1_58b6c9712d7c209248d9ef87bcc0e110._comment new file mode 100644 index 0000000000..00f291361a --- /dev/null +++ b/doc/bugs/How_to_git_union-merge__63__/comment_1_58b6c9712d7c209248d9ef87bcc0e110._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-11-21T20:12:11Z" + content=""" +I think I should probably just remove that. It's not being maintained. + +Adding it to the cabal file so it gets built would slow down builds +producing this extra binary. It would need to be handled as a multicall +program in git-annex the way git-annex-shell and git-remote-tor-annex are. + +Do you have a reason to want to use it? +"""]]
followup and wontfix this
diff --git a/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet.mdwn b/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet.mdwn index 3081d7f834..83c4d44db3 100644 --- a/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet.mdwn +++ b/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet.mdwn @@ -73,4 +73,4 @@ fsck: 1 failed Yes, great tool ! Thanks ! - +> [[wontfix|done]] per my comment --[[Joey]] diff --git a/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet/comment_1_a23d96af8c0e418350a73cbce5bc24be._comment b/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet/comment_1_a23d96af8c0e418350a73cbce5bc24be._comment new file mode 100644 index 0000000000..f0f8660751 --- /dev/null +++ b/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet/comment_1_a23d96af8c0e418350a73cbce5bc24be._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-11-21T20:03:45Z" + content=""" +Hmm, I've never considered combining --quiet with --json. It's kind of +undefined and really not clear to me what it should do. + +But, --json-error-messages makes the json contain an error-messages field, +and the error message is in there. So you can just extract that and ignore +the other messages in the json output. No need to use --quiet then. + +I suppose there may be someone who uses --json as a matter of course, but +adds --quiet to that when they want to disable the json output. So +changing the current behavior, ill-defined as it is, would be asking for +trouble. + +What actually happens currently is which ever output option comes last +overrides earlier options. So `--json --quiet` is quiet, and `--quiet +--json` outputs json. `--json-error-messages` is like `--json` in this +regard to. Which is just behavior that fell out of the option parser +implementation. +"""]]
comment
diff --git a/doc/bugs/git_init_fails_on_a_worktree_branch/comment_1_6614fc4b8f191702c1b78c5a3ce5de50._comment b/doc/bugs/git_init_fails_on_a_worktree_branch/comment_1_6614fc4b8f191702c1b78c5a3ce5de50._comment new file mode 100644 index 0000000000..37dfc93118 --- /dev/null +++ b/doc/bugs/git_init_fails_on_a_worktree_branch/comment_1_6614fc4b8f191702c1b78c5a3ce5de50._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-11-21T20:01:25Z" + content=""" +I don't use worktrees or submodules much so it's not entirely apparent to +me how to reproduce this. To avoid flailing at it, a recipe would be great. +"""]]
close
diff --git a/doc/bugs/uninit_fails_when_I_symlink_your_symlink.mdwn b/doc/bugs/uninit_fails_when_I_symlink_your_symlink.mdwn index dc32996e72..1e3506cb40 100644 --- a/doc/bugs/uninit_fails_when_I_symlink_your_symlink.mdwn +++ b/doc/bugs/uninit_fails_when_I_symlink_your_symlink.mdwn @@ -35,3 +35,5 @@ What should I do? Honestly, I'm happy with git-annex so far, I'm just thinking that I need to re-init with `annex.tune.objecthashlower=true` because my other computer is windows. Thanks! + +> I see this was resolved as ok behavior, so [[done]] --[[Joey]]
move a comment that is a bug report
diff --git a/doc/git-annex-uninit/comment_4_7d18fc21b563fb2776562809465015c7._comment b/doc/bugs/uninit_fails_when_I_symlink_your_symlink.mdwn similarity index 80% rename from doc/git-annex-uninit/comment_4_7d18fc21b563fb2776562809465015c7._comment rename to doc/bugs/uninit_fails_when_I_symlink_your_symlink.mdwn index b41dd08272..dc32996e72 100644 --- a/doc/git-annex-uninit/comment_4_7d18fc21b563fb2776562809465015c7._comment +++ b/doc/bugs/uninit_fails_when_I_symlink_your_symlink.mdwn @@ -1,10 +1,3 @@ -[[!comment format=mdwn - username="NewUser" - nickname="dont.post.me" - avatar="http://cdn.libravatar.org/avatar/90f59ddc341eaf9b2657422206c06901" - subject="`git annex uninit` fails when I symlink your symlink" - date="2023-11-20T04:22:02Z" - content=""" ``` $ mkdir annex-test $ cd annex-test/ @@ -42,4 +35,3 @@ What should I do? Honestly, I'm happy with git-annex so far, I'm just thinking that I need to re-init with `annex.tune.objecthashlower=true` because my other computer is windows. Thanks! -"""]] diff --git a/doc/git-annex-uninit/comment_5_3fe7520813dee36ffb5419ed70ea43b5._comment b/doc/bugs/uninit_fails_when_I_symlink_your_symlink/comment_5_3fe7520813dee36ffb5419ed70ea43b5._comment similarity index 100% rename from doc/git-annex-uninit/comment_5_3fe7520813dee36ffb5419ed70ea43b5._comment rename to doc/bugs/uninit_fails_when_I_symlink_your_symlink/comment_5_3fe7520813dee36ffb5419ed70ea43b5._comment diff --git a/doc/git-annex-uninit/comment_6_3afdea4f3705a9cfaa971ed3aa9f114c._comment b/doc/bugs/uninit_fails_when_I_symlink_your_symlink/comment_6_3afdea4f3705a9cfaa971ed3aa9f114c._comment similarity index 100% rename from doc/git-annex-uninit/comment_6_3afdea4f3705a9cfaa971ed3aa9f114c._comment rename to doc/bugs/uninit_fails_when_I_symlink_your_symlink/comment_6_3afdea4f3705a9cfaa971ed3aa9f114c._comment diff --git a/doc/git-annex-uninit/comment_4_3386d8419830354b4422d38448467e95._comment b/doc/git-annex-uninit/comment_4_3386d8419830354b4422d38448467e95._comment new file mode 100644 index 0000000000..a99f5f4701 --- /dev/null +++ b/doc/git-annex-uninit/comment_4_3386d8419830354b4422d38448467e95._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2023-11-21T19:56:32Z" + content=""" +Moved some comments about a bug to +[[bugs/uninit_fails_when_I_symlink_your_symlink]]. +Please do not use this man page's comment section to file bug reports. +"""]]
Added a comment
diff --git a/doc/forum/Windows_eol_issues/comment_7_f9b71ea2158c02dfa8a7c59891aea679._comment b/doc/forum/Windows_eol_issues/comment_7_f9b71ea2158c02dfa8a7c59891aea679._comment new file mode 100644 index 0000000000..952592b2db --- /dev/null +++ b/doc/forum/Windows_eol_issues/comment_7_f9b71ea2158c02dfa8a7c59891aea679._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 7" + date="2023-11-21T09:25:33Z" + content=""" +Thanks joey for your response. And thank you for jkniiv for providing the output to git show-head, it matches what I see. + +With regards to the pointer file... I think the problem I have with it is.. well, first the avoidable warning: + +[[!format sh \"\"\" +warning: in the working copy of 'ntdll.dll', LF will be replaced by CRLF the next time Git touches it +\"\"\"]] + +I guess also, because git-annex doesn't honour (??) git's actual expectations on line mode, you end up with git thinking a file is modified. Because git knows it as text, and in the default mode under Windows, it expects the text file to be CRLF. + +Also... I guess in a way, once Windows has acted on a file that came from unix, it now becomes a pointer file on disk. You can't get the benefit of the symlink to view the contents. + +Also also... I don't think I demonstrated it here, but I found that eventually, some merge would cause the line-ending to flip, and then there would be another unnecessary checkin of the pointer file. + +Sorry if this is all a bit abstract. I was working on large repos, oblivious to some of what I found and have pasted above, and was filling in the \"bitmap\" of my knowledge piecemeal. + +With the CRLF hooks, yes, I originally reported that. So in the above, not how I do a push msw to wsl only, and no pull from msw at wsl. That's because that type of pull would break, with the wsl/linux system() call interpreting the CR as part of the shebang, and being unable to execute the shell itself. + +"""]]
Added a comment
diff --git a/doc/git-annex-uninit/comment_6_3afdea4f3705a9cfaa971ed3aa9f114c._comment b/doc/git-annex-uninit/comment_6_3afdea4f3705a9cfaa971ed3aa9f114c._comment new file mode 100644 index 0000000000..d1109eb926 --- /dev/null +++ b/doc/git-annex-uninit/comment_6_3afdea4f3705a9cfaa971ed3aa9f114c._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="NewUser" + nickname="dont.post.me" + avatar="http://cdn.libravatar.org/avatar/90f59ddc341eaf9b2657422206c06901" + subject="comment 6" + date="2023-11-20T13:17:03Z" + content=""" +Thank you for the illuminating comment! + +> This is also what the error message says, btw. + + Not continuing with uninit; either delete or git annex add the file and retry. + +You’re right! I thought I only had a relative symlink pointing to the other symlink and not pointing into .git/annex/objects. It turns out that this is true for most of my symlinks but, just like you pointed out, some of them were pointing into .git/annex and the right thing to do there is to `add` them. + +Also, thank you for pointing me towards todo/bugs/forum! + +"""]]
Added a comment: forgot to add the new ignored link + shortcoming of 'ln' command
diff --git a/doc/git-annex-uninit/comment_5_3fe7520813dee36ffb5419ed70ea43b5._comment b/doc/git-annex-uninit/comment_5_3fe7520813dee36ffb5419ed70ea43b5._comment new file mode 100644 index 0000000000..7514aa5e68 --- /dev/null +++ b/doc/git-annex-uninit/comment_5_3fe7520813dee36ffb5419ed70ea43b5._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="forgot to add the new ignored link + shortcoming of 'ln' command" + date="2023-11-20T10:25:52Z" + content=""" +You forgot to add your (gitignore'd) `what.foo` to git. Another `git annex add --no-check-gitignore;git commit -m \"add link\"` will make the subsequent `git annex uninit` work properly. This is also what the error message says, btw. + +But there's another problem underneath: Why is `what.foo` an annex-style symlink anyway? It should just point to `example.txt`, right? + +``` +🐟 ❯ ln -nrs example.txt what.foo +yann in yann-desktop-nixos in …/uninit-test on main +🐟 ❯ ls -l +lrwxrwxrwx 186 yann users 20 Nov 11:08 example.txt -> .git/annex/objects/mK/4w/SHA256E-s6--5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03.txt/SHA256E-s6--5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03.txt +lrwxrwxrwx 186 yann users 20 Nov 11:08 what.foo -> .git/annex/objects/mK/4w/SHA256E-s6--5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03.txt/SHA256E-s6--5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03.txt +``` + +For some reason `ln` dereferences `example.txt` before linking, resulting in `what.foo` pointing to the same as `example.txt` -- not just to `example.txt` -- causing `what.foo` to look exactly like an annex link. I thought `-n` could fix this, but it would operate on the target `what.foo`, not the source `example.txt` 🤦. I couldn't make `ln` do this properly, except for renaming/removing `example.txt`, then `ln -rsf example.txt what.foo`, then restoring `example.txt`. Not viable. A better solution is `cp -s example.txt what.foo`. That will make `what.foo` point properly to just `example.txt` and not *its* target. + +BTW you can still just `git add` things like normal non-annex symlinks like `what.foo`, no need for `git annex add` here. + +Also, joey prefers to have bug reports/questions and the like in either [[todo]], [[bugs]] or [[forum]], because it's clear those are the places to look through for issues. Comments below random manpages are quick to be forgotten about. 🙂 +"""]]
Added a comment: Is `annex.tune.objecthashlower=true` recommended for interop with windows?
diff --git a/doc/todo/windows_support/comment_26_99d0541d1c7f7c82a67d481c61209670._comment b/doc/todo/windows_support/comment_26_99d0541d1c7f7c82a67d481c61209670._comment new file mode 100644 index 0000000000..2a586da213 --- /dev/null +++ b/doc/todo/windows_support/comment_26_99d0541d1c7f7c82a67d481c61209670._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="NewUser" + nickname="dont.post.me" + avatar="http://cdn.libravatar.org/avatar/90f59ddc341eaf9b2657422206c06901" + subject="Is `annex.tune.objecthashlower=true` recommended for interop with windows?" + date="2023-11-20T04:24:35Z" + content=""" +I've been adding stuff to git annex on my linux server and now I'm thinking that I need to uninit and re-init with `annex.tune.objecthashlower=true`. I'll want to use git-annex from windows as well. + +Should I use `annex.tune.objecthashlower=true`? Also, what's the advice around the other tuning options `objecthash1` and `branchhash1`? + +Thanks! +"""]]
Added a comment: `git annex uninit` fails when I symlink your symlink
diff --git a/doc/git-annex-uninit/comment_4_7d18fc21b563fb2776562809465015c7._comment b/doc/git-annex-uninit/comment_4_7d18fc21b563fb2776562809465015c7._comment new file mode 100644 index 0000000000..b41dd08272 --- /dev/null +++ b/doc/git-annex-uninit/comment_4_7d18fc21b563fb2776562809465015c7._comment @@ -0,0 +1,45 @@ +[[!comment format=mdwn + username="NewUser" + nickname="dont.post.me" + avatar="http://cdn.libravatar.org/avatar/90f59ddc341eaf9b2657422206c06901" + subject="`git annex uninit` fails when I symlink your symlink" + date="2023-11-20T04:22:02Z" + content=""" +``` +$ mkdir annex-test +$ cd annex-test/ +$ git init +Initialized empty Git repository in /home/me/annex-test/.git/ +$ git annex init +init (scanning for unlocked files...) +ok +(recording state in git...) +$ cat > .gitignore +* +$ cat > exampyle.txt +hello +$ git annex add --no-check-gitignore +add .gitignore (non-large file; adding content to git repository) ok +add exampyle.txt +ok +(recording state in git...) +$ git commit -m \"added\" +[master (root-commit) 2734615] added + 2 files changed, 2 insertions(+) + create mode 100644 .gitignore + create mode 120000 exampyle.txt +$ ln -rs exampyle.txt what.foo +$ git status +On branch master +nothing to commit, working tree clean +$ git annex uninit +git-annex: what.foo points to annexed content, but is not checked into git. +Perhaps this was left behind by an interrupted git annex add? +Not continuing with uninit; either delete or git annex add the file and retry. +``` +What should I do? + +Honestly, I'm happy with git-annex so far, I'm just thinking that I need to re-init with `annex.tune.objecthashlower=true` because my other computer is windows. + +Thanks! +"""]]
diff --git a/doc/design/new_repo_versions.mdwn b/doc/design/new_repo_versions.mdwn index df5004e70a..73210cac78 100644 --- a/doc/design/new_repo_versions.mdwn +++ b/doc/design/new_repo_versions.mdwn @@ -43,7 +43,7 @@ Possible reasons to make changes: The mixed case hash directories have caused trouble on case-insensitive filesystems, although that has mostly been papered over to avoid - problems. One remaining problem users can stuble on occurs + problems. One remaining problem users can stumble on occurs when [[moving a repository from OSX to Linux|bugs/OSX_case_insensitive_filesystem]]. * The hash directories, and also the per-key directories
Added a comment: Thanks Yann!
diff --git a/doc/git-annex-webapp/comment_5_31301895752f0dd81db72c463f0dc732._comment b/doc/git-annex-webapp/comment_5_31301895752f0dd81db72c463f0dc732._comment new file mode 100644 index 0000000000..0041f9127e --- /dev/null +++ b/doc/git-annex-webapp/comment_5_31301895752f0dd81db72c463f0dc732._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="NewUser" + nickname="dont.post.me" + avatar="http://cdn.libravatar.org/avatar/90f59ddc341eaf9b2657422206c06901" + subject="Thanks Yann!" + date="2023-11-19T18:20:03Z" + content=""" +Thank you [nobodyinperson](https://git-annex.branchable.com/users/nobodyinperson/)! Your note looks perfect. Oh, wow; I didn’t know that I could edit this page. Thanks again! +"""]]
Added a comment
diff --git a/doc/git-annex-webapp/comment_4_c1754cdb4087ad278867ed8fddd99409._comment b/doc/git-annex-webapp/comment_4_c1754cdb4087ad278867ed8fddd99409._comment new file mode 100644 index 0000000000..71371439bb --- /dev/null +++ b/doc/git-annex-webapp/comment_4_c1754cdb4087ad278867ed8fddd99409._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 4" + date="2023-11-19T16:27:34Z" + content=""" +I added a note to make it more clear that the webapp will sync stuff via the assistant. +"""]]
Make it more clear that `git annex webapp` will add files and sync via the assistant.
diff --git a/doc/git-annex-webapp.mdwn b/doc/git-annex-webapp.mdwn index 47dfae504a..064996ea16 100644 --- a/doc/git-annex-webapp.mdwn +++ b/doc/git-annex-webapp.mdwn @@ -10,7 +10,8 @@ git annex webapp Opens a web app, that allows easy setup of a git-annex repository, and control of the git-annex assistant. If the assistant is not -already running, it will be started. +already running, it will be started. This will cause new files to +be added and syncing operations to be performed. By default, the webapp can only be accessed from localhost, and running it opens a browser window.
Added a comment
diff --git a/doc/git-annex-webapp/comment_3_aee70625f7cff6e7312f9bc2cbbb02d0._comment b/doc/git-annex-webapp/comment_3_aee70625f7cff6e7312f9bc2cbbb02d0._comment new file mode 100644 index 0000000000..c96c374c1a --- /dev/null +++ b/doc/git-annex-webapp/comment_3_aee70625f7cff6e7312f9bc2cbbb02d0._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 3" + date="2023-11-19T16:16:15Z" + content=""" +Welcome on board. `git annex webapp` launches the assistant. It might even do `git annex assistant --autostart` and launch the assistant in all configured repos. That is indeed an important info to know. Everyone can edit these pages (the Edit button above) btw. +"""]]
Added a comment: launching `git annex webapp` starts adding to the annex which is surprising
diff --git a/doc/git-annex-webapp/comment_2_7b1c4c4356e801006081588b32075fb4._comment b/doc/git-annex-webapp/comment_2_7b1c4c4356e801006081588b32075fb4._comment new file mode 100644 index 0000000000..ad79367d0f --- /dev/null +++ b/doc/git-annex-webapp/comment_2_7b1c4c4356e801006081588b32075fb4._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="NewUser" + nickname="dont.post.me" + avatar="http://cdn.libravatar.org/avatar/90f59ddc341eaf9b2657422206c06901" + subject="launching `git annex webapp` starts adding to the annex which is surprising" + date="2023-11-18T20:08:17Z" + content=""" +Since I panicked upon seeing a message about adding files to the annex, I promptly killed the webapp. I might just be misunderstanding what's going on. Thanks! +"""]]
Added a comment: launching `git annex webapp` starts adding to the annex which is surprising
diff --git a/doc/git-annex-webapp/comment_1_443c5595412a19ef9c6948c4224297a3._comment b/doc/git-annex-webapp/comment_1_443c5595412a19ef9c6948c4224297a3._comment new file mode 100644 index 0000000000..aeba3edaad --- /dev/null +++ b/doc/git-annex-webapp/comment_1_443c5595412a19ef9c6948c4224297a3._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="NewUser" + nickname="dont.post.me" + avatar="http://cdn.libravatar.org/avatar/90f59ddc341eaf9b2657422206c06901" + subject="launching `git annex webapp` starts adding to the annex which is surprising" + date="2023-11-18T20:03:06Z" + content=""" +I have gotten as far as running `git init` and `git annex init` on my server. There’s a bunch of stuff there and I figure I’ll just add the things that I care about as I go. + +I read the page on the [webapp](https://git-annex.branchable.com/git-annex-webapp/) and it just says that it “allows easy setup of a git-annex repository, and control of the git-annex assistant”. I started the webapp on my server and I was alarmed to see a message in the webapp that it was adding things to git annex. Over on the page for the [git annex assistant](https://git-annex.branchable.com/git-annex-assistant/) it says “By default, all new files in the directory will be added to the repository” which is definitely not what I expected. I’m a new user trying to learn how to use git annex and this was an unpleasant surprise. Can we amend the docs for the webapp to warn about this behavior? + +"""]]
Added a comment
diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_5_fd23d0559018c531595cd06f81290258._comment b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_5_fd23d0559018c531595cd06f81290258._comment new file mode 100644 index 0000000000..0354a9a3a8 --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_5_fd23d0559018c531595cd06f81290258._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 5" + date="2023-11-18T01:35:35Z" + content=""" +> (Or there could be a new option like git-annex copy --to bar --from foo --or-from-here) + +or may be + +`git-annex copy --to bar --from remote1 --or-from remote2 ...` or alike so there could be a sequence (in order of preference) of remotes? or better a general `git-annex copy --to bar --from-anywhere` so that `annex` first `get`'s it following current set costs etc if not present here, and then copies over. +"""]]
Make git-annex copy --from --to --fast actually fast
Eg when the destination is logged as containing a file, skip
actively checking that it does contain it.
Note that --fast does not prevent other verifications of content
location that are done in a copy --from --to. Perhaps it could, but this
change will already avoid the real unnecessary work of operating on
files that are already in the remote.
And avoiding other verifications
might cause it to fail if the location log thinks that --to does not
contain the content but does. Such complications with `git-annex copy
--to remote --fast` led to commit d006586cd0b706c9cc92b2747b2ba3487f52c04a
which added a note that gets displayed when that fails, mentioning it
might be due to --fast being enabled.
copy --from --to is already complicated enough without needing to worry
about such edge cases, so continuing to doing some verification of
content location after the initial --fast filtering seems ok.
Sponsored-by: Dartmouth College's DANDI project
Eg when the destination is logged as containing a file, skip
actively checking that it does contain it.
Note that --fast does not prevent other verifications of content
location that are done in a copy --from --to. Perhaps it could, but this
change will already avoid the real unnecessary work of operating on
files that are already in the remote.
And avoiding other verifications
might cause it to fail if the location log thinks that --to does not
contain the content but does. Such complications with `git-annex copy
--to remote --fast` led to commit d006586cd0b706c9cc92b2747b2ba3487f52c04a
which added a note that gets displayed when that fails, mentioning it
might be due to --fast being enabled.
copy --from --to is already complicated enough without needing to worry
about such edge cases, so continuing to doing some verification of
content location after the initial --fast filtering seems ok.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index f5f13c3f88..4ca82dcea0 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -19,6 +19,7 @@ git-annex (10.20230927) UNRELEASED; urgency=medium to tune the output of the above added options. * Fix bug in git-annex copy --from --to that skipped files that were locally present. + * Make git-annex copy --from --to --fast actually fast. -- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400 diff --git a/Command/Move.hs b/Command/Move.hs index 3a7dc4859f..2e450b3e4a 100644 --- a/Command/Move.hs +++ b/Command/Move.hs @@ -345,7 +345,13 @@ fromToStart removewhen afile key ai si src dest = starting (describeMoveAction removewhen) (OnlyActionOn key ai) si $ fromToPerform src dest removewhen key afile where - somethingtodo = pure (Remote.uuid src /= Remote.uuid dest) + somethingtodo + | Remote.uuid src == Remote.uuid dest = return False + | otherwise = do + fast <- Annex.getRead Annex.fast + if fast && removewhen == RemoveNever + then not <$> expectedPresent dest key + else return True {- When there is a local copy, transfer it to the dest, and drop from the src. - diff --git a/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn b/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn index 3c140823c6..64cf7c1025 100644 --- a/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn +++ b/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn @@ -84,3 +84,5 @@ and then in conda with `10.20230626-g801c4b7` [[!meta author=yoh]] [[!tag projects/dandi]] + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/copy_--fast_--from_--to_checks_destination_files/comment_1_2a2997e6f914afb0477f2baa69b174fc._comment b/doc/bugs/copy_--fast_--from_--to_checks_destination_files/comment_1_2a2997e6f914afb0477f2baa69b174fc._comment new file mode 100644 index 0000000000..5682b0cc8a --- /dev/null +++ b/doc/bugs/copy_--fast_--from_--to_checks_destination_files/comment_1_2a2997e6f914afb0477f2baa69b174fc._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-11-17T20:57:19Z" + content=""" +> but it doesn't works out correctly whenever there are some files to actually copy + +I think that was due to the bug you linked, which is now fixed. + +I've confirmed that `--fast` is not actually implemented for `git-annex +copy --from --to`. Explicitly specifying `--not --in destremote` is a +fine workaround. But I've gone ahead and implemented `--fast` for it too. +"""]] diff --git a/doc/bugs/copy_--fast_--from_--to_checks_destination_files/comment_2_f7bf30e8cc2d1995976bde723dfbfe01._comment b/doc/bugs/copy_--fast_--from_--to_checks_destination_files/comment_2_f7bf30e8cc2d1995976bde723dfbfe01._comment new file mode 100644 index 0000000000..a12cc7c372 --- /dev/null +++ b/doc/bugs/copy_--fast_--from_--to_checks_destination_files/comment_2_f7bf30e8cc2d1995976bde723dfbfe01._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-11-17T21:16:35Z" + content=""" +BTW `git-annex find --print0` is the output eqivilant of -z. +"""]]
fixed
diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally.mdwn b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally.mdwn index 8c0eda44c9..af7a4e0da8 100644 --- a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally.mdwn +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally.mdwn @@ -90,3 +90,4 @@ I didn't check `move` command but if it does support similar `--from --to` and h [[!tag projects/dandi]] +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_3_492d7c932fe5663aab916aacc829fb5d._comment b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_3_492d7c932fe5663aab916aacc829fb5d._comment new file mode 100644 index 0000000000..a3d85237bf --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_3_492d7c932fe5663aab916aacc829fb5d._comment @@ -0,0 +1,27 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2023-11-17T20:33:01Z" + content=""" +That bug I fixed would also explain the behavior that you saw if the +content *was* present locally, and the location log *was* out of date about +that. + +In that situation, git-annex sees that the object file is present, and so +treats the content as present, despite the location log not knowing it's +present. Which triggers the situation of the bug I fixed, causing it to +skip copying the file. + +Also, there's a pretty easy way to get into this situation. When the file +is not present, run `git-annex --from --to`. Then interrupt it after it's +downloaded the file --from but before it's finished sending it --to. +This results in the file being present locally, but only transiently so it +didn't update the location log. + +So my guess is you interrupted a copy like that (or it failed incomplete +for whatever reason). + +Now that I've fixed that bug, the behavior in that situation is that it +does copy the file to the remote. And then it drops the local copy since +the location log doesn't contain it. So it resumes correctly now. +"""]] diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_4_8cfb9f2c14559a7574edd29b161cf7c7._comment b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_4_8cfb9f2c14559a7574edd29b161cf7c7._comment new file mode 100644 index 0000000000..8afe914bd7 --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_4_8cfb9f2c14559a7574edd29b161cf7c7._comment @@ -0,0 +1,19 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2023-11-17T20:42:24Z" + content=""" +So that leaves only the question of what it should do when +content is present locally but not on the --from remote. + +Another reason for the current behavior is to be symmetric with `git-annex +move --from foo --to bar`. It would be surprising, I think, if that +populated bar with files that are not present in foo, but are in the local +repository! + +So I'm inclined to not change the documented behavior. If you want to +populate a remote with files that are either in the local repo or in a +--from remote, you can just run `git-annex copy` twice after all. + +(Or there could be a new option like `git-annex copy --to bar --from foo --or-from-here`) +"""]]
Fix bug in git-annex copy --from --to
Caused it to skip files that were locally present.
Sponsored-by: Dartmouth College's DANDI project
Caused it to skip files that were locally present.
Sponsored-by: Dartmouth College's DANDI project
diff --git a/CHANGELOG b/CHANGELOG index 9761f776a1..f5f13c3f88 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -17,6 +17,8 @@ git-annex (10.20230927) UNRELEASED; urgency=medium display how the size of repositories changed over time. * log: Added options --interval, --bytes, --received, and --gnuplot to tune the output of the above added options. + * Fix bug in git-annex copy --from --to that skipped files that were + locally present. -- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400 diff --git a/Command/Copy.hs b/Command/Copy.hs index 0034bb26cb..88d645a693 100644 --- a/Command/Copy.hs +++ b/Command/Copy.hs @@ -68,7 +68,7 @@ seek' o fto = startConcurrency (Command.Move.stages fto) $ do FromOrToRemote (FromRemote _) -> Just False FromOrToRemote (ToRemote _) -> Just True ToHere -> Just False - FromRemoteToRemote _ _ -> Just False + FromRemoteToRemote _ _ -> Nothing , usesLocationLog = True } keyaction = Command.Move.startKey fto Command.Move.RemoveNever diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_2_e857c1d89b350517fcb9829e52d6c6db._comment b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_2_e857c1d89b350517fcb9829e52d6c6db._comment new file mode 100644 index 0000000000..d9d1bc1602 --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_2_e857c1d89b350517fcb9829e52d6c6db._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2023-11-17T20:27:37Z" + content=""" +> So the file content being present locally prevents it sending it to the remote! + +Fixed that. +"""]]
comment
diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_1_ae76f27d9ef4ebac7ad57e6aef9a7586._comment b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_1_ae76f27d9ef4ebac7ad57e6aef9a7586._comment new file mode 100644 index 0000000000..75b6338818 --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally/comment_1_ae76f27d9ef4ebac7ad57e6aef9a7586._comment @@ -0,0 +1,57 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-11-17T19:58:39Z" + content=""" +> -r--r--r-- 1 dandi dandi 3665589468 Mar 16 2023 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb + +This could be an unlocked file that has gotten modified but the staged +version is not actually present locally. Or if `git-annex fsck` on it says +its fixing the location logs, that would tell us something happened that +got the location tracking out of sync with reality. + +So possibly there's an issue that could be tracked down regarding the state +of that file. But in either case, git-annex doesn't know it has a local +copy of the file, so `copy --from --to` could not use it. + +---- + +But: `copy --from --to` does in fact have an interesting bug: + + joey@darkstar:~/tmp/bench/r2>git-annex whereis foo + whereis foo (2 copies) + 22dfa446-7482-4c0a-92c9-70db793859fb -- joey@darkstar:~/tmp/bench/r [origin] + 8a504049-2c22-4baa-9a16-218e9561608b -- joey@darkstar:~/tmp/bench/r2 [here] + ok + joey@darkstar:~/tmp/bench/r2>git-annex copy foo --from origin --to r3 + joey@darkstar:~/tmp/bench/r2> + +So the file content being present locally prevents it sending it to the remote! This needs to get fixed. + +Hmm: In the corresponding case of `git-annex move --from --to`, it does not +behave that way. + +---- + +As far as what the behavior ought to be when a file is present locally but not on the --from remote, +the documentation does say: + + --from=remote + + Copy the content of files from the specified remote to the local repository. + + Any files that are not available on the remote will be silently skipped. + +So it is behaving as documented. I can think of two reasons why that +documented behavior makes some sense: + +* The user may be intending to only copy files --to that are present in --from. + The local repo may have a lot of files they do not want to populate --to. + (For example, perhaps the goal is to make a replica of the --from + repository.) + With that said, the user could do `git-annex copy --from foo --to bar --in foo` + to explicitly only act on files that are present in it. +* Performance. Needing to check if there is a local copy when there is no + remote copy would be a little extra work. Likely not enough to be + significant though. +"""]]
diff --git a/doc/users/nobodyinperson.mdwn b/doc/users/nobodyinperson.mdwn index cf3f4a9d9d..aa5908dc80 100644 --- a/doc/users/nobodyinperson.mdwn +++ b/doc/users/nobodyinperson.mdwn @@ -1,9 +1,11 @@ I use git-annex to: -- manage my research data +- manage my research data, partly also with [DataLad](https://datalad.org) - sync and backup personal documents - sync media files to and from my SailfishOS phone - sync, organise and backup a huge media collection - experiment doing inventory management I made a [Thunar plugin](https://gitlab.com/nobodyinperson/thunar-plugins) for git-annex, here's a [📹 screencast](https://fosstodon.org/@nobodyinperson/109836827575976439). + +At the [Tübix 2023](https://www.tuebix.org/) I gave a (German) git annex workshop, of which you can find a recording of the initial talk [📹 here in the fediverse](https://tube.tchncs.de/w/db1ec5ca-94ad-4f49-a507-2124fd699ff1) and [📹 here on Odysee](https://odysee.com/@nobodyinperson:6/T%C3%BCbix2023-Yann-B%C3%BCchau-git-annex:6).
Added a comment: thank you
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_10_5f67c870fc7e43ae75d8e1f8ced975c2._comment b/doc/forum/very_slow_on_exfat_drives/comment_10_5f67c870fc7e43ae75d8e1f8ced975c2._comment new file mode 100644 index 0000000000..df1ebb3e7c --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_10_5f67c870fc7e43ae75d8e1f8ced975c2._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="imlew" + avatar="http://cdn.libravatar.org/avatar/23858c3eed3c3ea9e21522f4c999f1ed" + subject="thank you" + date="2023-11-17T11:43:39Z" + content=""" +Makes sense. +Thanks again, you've helped me a lot. +"""]]
Added a comment
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_9_5b69d3cb7d3a51ce8a9cbdb608be676c._comment b/doc/forum/very_slow_on_exfat_drives/comment_9_5b69d3cb7d3a51ce8a9cbdb608be676c._comment new file mode 100644 index 0000000000..f8a4c883ae --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_9_5b69d3cb7d3a51ce8a9cbdb608be676c._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 9" + date="2023-11-17T09:49:02Z" + content=""" +Alright, that's how it's supposed to be. Copying files around and keeping track of the locations are two separate things. The location tracking, metadata, etc. are stored in the `git-annex`-branch, which is only synced with `git annex sync` or `git annex assist`. If you're on the manual route (i.e. no preferred content, no `git annex sync --content`, no `git annex assist`), then you are supposed to sync the git repos yourself, e.g. with `git annex sync`. It also makes sense from a performance standpoint. git syncing can be slow, especially on slow hardware. Maybe you don't want to sync the metadata after every copy/move/drop/etc., but you batch it up. And as long as the info where the files are is *somewhere* in a `git-annex`-branch (e.g. your local repo `L`), it's fine as it will eventually be synced around. +"""]]
Added a comment
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_8_ae7380ff84e4200985c22deb647853e3._comment b/doc/forum/very_slow_on_exfat_drives/comment_8_ae7380ff84e4200985c22deb647853e3._comment new file mode 100644 index 0000000000..5cccc65493 --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_8_ae7380ff84e4200985c22deb647853e3._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="imlew" + avatar="http://cdn.libravatar.org/avatar/23858c3eed3c3ea9e21522f4c999f1ed" + subject="comment 8" + date="2023-11-17T09:41:19Z" + content=""" +To try out accessing the file from another location I created a second repo on my laptop. +So I have the \"real\" local repo (`L`), the repo on the disk (`D`) and another local repo for testing (`T`). +In `L` I added and committed `$FILE` and then moved it to `D`. +If I now run `git annex whereis $FILE` in all the repos +`L` tells me it's in `D`, while both `D` and `T` tell me the file isn't know to git. +Only when I run `git annex sync` in `L` does `T` know and `D` still doesn't. + +Not a big issue, just a little suprising, but it's fine to have to remember to run `sync` in the local repo before disconnecting the disk. + +I have started looking into preferred content and groups and I will most likely use them. At least to begin with I want to try doing things manually and then later on move on to the more sophisticated tools. + +"""]]
Added a comment
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_7_3e66ef8493d2839cb26418788c510463._comment b/doc/forum/very_slow_on_exfat_drives/comment_7_3e66ef8493d2839cb26418788c510463._comment new file mode 100644 index 0000000000..d8f8111ed4 --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_7_3e66ef8493d2839cb26418788c510463._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 7" + date="2023-11-17T08:45:56Z" + content=""" +I'm not sure exactly what you mean. If you `git annex move`d a file from a local repo to the bare repo on the HDD, your local repo should know about this immediately. I'm not sure if an immediate `git annex move` *on the HDD* afterwards knows about this. Sounds like it should but maybe it doesn't for performance reasons. That would explain the need for a subsequent `sync`. In general, I would recommend setting up preferred content expressions for each repo, and then always just run `git annex assist` to have it sync everything. Slower than manual moving and copying though, but less worrying. +"""]]
Added a comment: one more question
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_6_a6c5e8dc86ed6b4b5ff45f66f5bad50a._comment b/doc/forum/very_slow_on_exfat_drives/comment_6_a6c5e8dc86ed6b4b5ff45f66f5bad50a._comment new file mode 100644 index 0000000000..4bb4cb5291 --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_6_a6c5e8dc86ed6b4b5ff45f66f5bad50a._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="imlew" + avatar="http://cdn.libravatar.org/avatar/23858c3eed3c3ea9e21522f4c999f1ed" + subject="one more question" + date="2023-11-17T08:23:32Z" + content=""" +I'm not sure if this is caused by the disk's repo being bare, but after I have `git annex move`d a file there I still need to run `git annex sync` in the local source repo before I can find or copy the file to a third repo (in this case a second repo on my laptop). + +This is a little confusing because it seems the bare repo on the disk doesn't know it has the files that have been moved to it. + +Is this because it is bare, am I doing something wrong or why doesn't `git annex move` result in the target knowing that it has a given file? +"""]]
Added a comment: not about backends after all
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_5_ec5d339021f2a26942248962ea818a80._comment b/doc/forum/very_slow_on_exfat_drives/comment_5_ec5d339021f2a26942248962ea818a80._comment new file mode 100644 index 0000000000..6c36556513 --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_5_ec5d339021f2a26942248962ea818a80._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="imlew" + avatar="http://cdn.libravatar.org/avatar/23858c3eed3c3ea9e21522f4c999f1ed" + subject="not about backends after all" + date="2023-11-17T08:09:22Z" + content=""" +Thanks for your help. +I've created a bare repo on one of the drives that didn't have a repo yet and have been moving the files to my laptops internal drive to add and commit and then back with `git annex move`, this seems to be working much better. (I already have more than half the files in the repo after one night, previously I had left it running for days and couldn't get that far. (And yes, the drives are very slow.)) + +re the filesystems, I thought so, but unfortunately these have to compatible with both mac and windoze... +"""]]
remove content of now independent issue https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/
diff --git a/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn b/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn index 62e6a87caf..3c140823c6 100644 --- a/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn +++ b/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn @@ -63,45 +63,7 @@ dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --from web --to so the only way now would be to pipe `find` output into `copy`? -But then trying on a sample file, it also doesn't work - -``` -(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --from web --to dandi-dandisets-dropbox --debug sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb [2023-11-16 12:52:04.81241] (Utility.Process) process [2316547] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] -[2023-11-16 12:52:04.813751] (Utility.Process) process [2316547] done ExitSuccess -[2023-11-16 12:52:04.814117] (Utility.Process) process [2316548] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] -[2023-11-16 12:52:04.816003] (Utility.Process) process [2316548] done ExitSuccess -[2023-11-16 12:52:04.818154] (Utility.Process) process [2316549] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..d7eb789ba745f56dc9ee590196c5b392458010fa","--pretty=%H","-n1"] -[2023-11-16 12:52:04.821013] (Utility.Process) process [2316549] done ExitSuccess -[2023-11-16 12:52:04.8243] (Utility.Process) process [2316550] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] -[2023-11-16 12:52:04.834761] (Utility.Process) process [2316551] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb"] -[2023-11-16 12:52:04.835779] (Utility.Process) process [2316552] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"] -[2023-11-16 12:52:04.836863] (Utility.Process) process [2316553] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] -[2023-11-16 12:52:04.837628] (Utility.Process) process [2316550] done ExitSuccess -[2023-11-16 12:52:04.837998] (Utility.Process) process [2316554] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] -[2023-11-16 12:52:04.839285] (Utility.Process) process [2316554] done ExitSuccess -[2023-11-16 12:52:04.839402] (Utility.Process) process [2316553] done ExitSuccess -[2023-11-16 12:52:04.839465] (Utility.Process) process [2316552] done ExitSuccess -[2023-11-16 12:52:04.839518] (Utility.Process) process [2316551] done ExitSuccess - -(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ ls -ld sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb -lrwxrwxrwx 1 dandi dandi 209 Apr 18 2023 sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb -> ../.git/annex/objects/6V/Xx/SHA256E-s47571970892--25b98e8c5a497600cd516164ac121d906cb3cf10e0332ff871edcf0e587c5da3.nwb/SHA256E-s47571970892--25b98e8c5a497600cd516164ac121d906cb3cf10e0332ff871edcf0e587c5da3.nwb - -(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex whereis sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb -whereis sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb (1 copy) - 00000000-0000-0000-0000-000000000001 -- web - - web: https://api.dandiarchive.org/api/assets/37ae9a5f-d6ce-4c18-a752-2d67d5b27845/download/ - web: https://dandiarchive.s3.amazonaws.com/blobs/761/a81/761a81c4-d5d4-47ad-bc15-e609a0a9fb5a?versionId=hQQHvGqBX_kBgPYwhedAG.5Cghw9yvde -ok - -(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git remote -dandi-dandisets-dropbox -dandiapi -github - -``` - -so now I am just confused... +note on edit: filed a dedicated [https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/](https://git-annex.branchable.com/bugs/copy_--from_--to_does_not_copy_if_present_locally/) NB `git annex find` has `-z` for input but not for output...
initial dedicated issue on copy --from --to not copying if present locally
diff --git a/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally.mdwn b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally.mdwn new file mode 100644 index 0000000000..8c0eda44c9 --- /dev/null +++ b/doc/bugs/copy_--from_--to_does_not_copy_if_present_locally.mdwn @@ -0,0 +1,92 @@ +### Please describe the problem. + +originally reported while composing [https://git-annex.branchable.com/bugs/copy_--fast_--from_--to_checks_destination_files/](https://git-annex.branchable.com/bugs/copy_--fast_--from_--to_checks_destination_files/) but it is a separate issue: some files are simply not `annex copy`'ed at all: here it tries 6 out of 8 files and still reports that 2 are not on the target remote: + +``` +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex copy --from web --to dandi-dandisets-dropbox --fast +copy sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 696.194 MBytes (730012683 Bytes) +(from web...) (to dandi-dandisets-dropbox...) ok +copy sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 224.618 MBytes (235528804 Bytes) +(from web...) (to dandi-dandisets-dropbox...) ok +copy sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 295.387 MBytes (309735634 Bytes) +(from web...) (to dandi-dandisets-dropbox...) ok +copy sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 860.168 MBytes (901951882 Bytes) +(from web...) (to dandi-dandisets-dropbox...) ok +copy sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 856.342 MBytes (897939760 Bytes) +(from web...) (to dandi-dandisets-dropbox...) ok +copy sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 948.656 MBytes (994737479 Bytes) +(from web...) (to dandi-dandisets-dropbox...) ok + + +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | nl + 1 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb + 2 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb +``` + +and it seems to boil down (at least in one case, don't know yet if generalizes to other cases I have) to having those keys present locally: + + +``` +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex find --in web --not --in dandi-dandisets-dropbox | xargs ls -lL +-r--r--r-- 1 dandi dandi 3878847966 Mar 16 2023 sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb +-r--r--r-- 1 dandi dandi 3665589468 Mar 16 2023 sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb +``` + +but somehow it doesn't know that it has them according to `list`: + +``` +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000235$ git annex list +here +|github +||dandiapi +|||web +||||bittorrent +|||||dandi-dandisets-dropbox (untrusted) +|||||| +__XX_x sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish01-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210818T112556_behavior+ophys.nwb +__XX__ sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish02-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210818T173531_behavior+ophys.nwb +__XX_x sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish13-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210830T100716_behavior+ophys.nwb +__XX_x sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish31-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20210920T120959_behavior+ophys.nwb +__XX_x sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave/sub-Fish32-GCaMP-vlgut-FBv-5dpf-RandomWave_ses-20210920T181347_behavior+ophys.nwb +__XX__ sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave/sub-Fish41-GCaMP-vlgut-FBv-7dpf-RandomWave_ses-20210929T173736_behavior+ophys.nwb +__XX_x sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave/sub-Fish47-GCaMP-vlgut-FBd-7dpf-RandomWave_ses-20211124T174401_behavior+ophys.nwb +__XX_x sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave/sub-Fish58-GCaMP-vlgut-FBd-5dpf-RandomWave_ses-20220525T092829_behavior+ophys.nwb + +``` + +running without `--from web` starts the transfer: + +``` +git annex copy --fast --to dandi-dandisets-dropbox +``` + +IMHO it should perform copy from the local store into the remote since in effect it would be fulfilling the goal - adding a copy to the destination. +I didn't check `move` command but if it does support similar `--from --to` and has similar defect -- should just compliment with dropping after from the original remote. + +### What version of git-annex are you using? On what operating system? + +10.20230626-g801c4b7 from conda-forge . + +[[!meta author=yoh]] +[[!tag projects/dandi]] + +
initial report on --fast of being no effect for copy --from --to
diff --git a/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn b/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn new file mode 100644 index 0000000000..62e6a87caf --- /dev/null +++ b/doc/bugs/copy_--fast_--from_--to_checks_destination_files.mdwn @@ -0,0 +1,124 @@ +### Please describe the problem. + +I need to "quickly" ensure that remote has all the files it should have gotten. For that I use invocation like + +``` +time git annex copy --fast --from web --to dandi-dandisets-dropbox +``` + +or + +``` +time git annex copy --auto --from web --to dandi-dandisets-dropbox +``` + +but then in the cases where all files are already there according to + +``` +dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex find --not --in dandi-dandisets-dropbox + +real 0m0.562s +user 0m0.051s +sys 0m0.019s +``` + +the `copy` still goes and checks every chunk of every file + +``` +dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex copy --fast --from web --to dandi-dandisets-dropbox +copy sub-YutaMouse20/sub-YutaMouse20_ses-YutaMouse20-140321_behavior+ecephys.nwb Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +Total objects: 1 Total size: 953.674 MBytes (1000000000 Bytes) +^C + +real 0m3.886s +user 0m0.037s +sys 0m0.032s + +``` + +so to achieve what I need, I thought to explicitly specify the query: + +``` +dandi@drogon:/mnt/backup/dandi/dandisets/000003$ time git annex copy --fast --not --in dandi-dandisets-dropbox --from web --to dandi-dandisets-dropbox + +real 0m0.221s +user 0m0.056s +sys 0m0.018s +``` + +but it doesn't works out correctly whenever there are some files to actually copy: + +``` +dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex find --in web --not --in dandi-dandisets-dropbox | nl | tail -n 2 + 40 sub-440889/sub-440889_ses-837360280_obj-raw_behavior+image+ophys.nwb + 41 sub-440889/sub-440889_ses-838633305_obj-raw_behavior+image+ophys.nwb +dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --fast --from web --to dandi-dandisets-dropbox --not --in dandi-dandisets-dropbox +dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --fast --from web --to dandi-dandisets-dropbox --in web --not --in dandi-dandisets-dropbox +dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --from web --to dandi-dandisets-dropbox --in web --not --in dandi-dandisets-dropbox +``` + +so the only way now would be to pipe `find` output into `copy`? + +But then trying on a sample file, it also doesn't work + +``` +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex copy --from web --to dandi-dandisets-dropbox --debug sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb [2023-11-16 12:52:04.81241] (Utility.Process) process [2316547] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"] +[2023-11-16 12:52:04.813751] (Utility.Process) process [2316547] done ExitSuccess +[2023-11-16 12:52:04.814117] (Utility.Process) process [2316548] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"] +[2023-11-16 12:52:04.816003] (Utility.Process) process [2316548] done ExitSuccess +[2023-11-16 12:52:04.818154] (Utility.Process) process [2316549] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..d7eb789ba745f56dc9ee590196c5b392458010fa","--pretty=%H","-n1"] +[2023-11-16 12:52:04.821013] (Utility.Process) process [2316549] done ExitSuccess +[2023-11-16 12:52:04.8243] (Utility.Process) process [2316550] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"] +[2023-11-16 12:52:04.834761] (Utility.Process) process [2316551] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb"] +[2023-11-16 12:52:04.835779] (Utility.Process) process [2316552] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2023-11-16 12:52:04.836863] (Utility.Process) process [2316553] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2023-11-16 12:52:04.837628] (Utility.Process) process [2316550] done ExitSuccess +[2023-11-16 12:52:04.837998] (Utility.Process) process [2316554] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"] +[2023-11-16 12:52:04.839285] (Utility.Process) process [2316554] done ExitSuccess +[2023-11-16 12:52:04.839402] (Utility.Process) process [2316553] done ExitSuccess +[2023-11-16 12:52:04.839465] (Utility.Process) process [2316552] done ExitSuccess +[2023-11-16 12:52:04.839518] (Utility.Process) process [2316551] done ExitSuccess + +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ ls -ld sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb +lrwxrwxrwx 1 dandi dandi 209 Apr 18 2023 sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb -> ../.git/annex/objects/6V/Xx/SHA256E-s47571970892--25b98e8c5a497600cd516164ac121d906cb3cf10e0332ff871edcf0e587c5da3.nwb/SHA256E-s47571970892--25b98e8c5a497600cd516164ac121d906cb3cf10e0332ff871edcf0e587c5da3.nwb + +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git annex whereis sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb +whereis sub-440889/sub-440889_ses-832883243_obj-raw_behavior+image+ophys.nwb (1 copy) + 00000000-0000-0000-0000-000000000001 -- web + + web: https://api.dandiarchive.org/api/assets/37ae9a5f-d6ce-4c18-a752-2d67d5b27845/download/ + web: https://dandiarchive.s3.amazonaws.com/blobs/761/a81/761a81c4-d5d4-47ad-bc15-e609a0a9fb5a?versionId=hQQHvGqBX_kBgPYwhedAG.5Cghw9yvde +ok + +(git-annex) dandi@drogon:/mnt/backup/dandi/dandisets/000037$ git remote +dandi-dandisets-dropbox +dandiapi +github + +``` + +so now I am just confused... + +NB `git annex find` has `-z` for input but not for output... + + +refs to related reports/issues which were said to be addressed for `--fast` mode: + +- [https://git-annex.branchable.com/forum/copy_--auto_copies_already_synced_files/](https://git-annex.branchable.com/forum/copy_--auto_copies_already_synced_files/) +- [https://git-annex.branchable.com/forum/batch_check_on_remote_when_using_copy/](https://git-annex.branchable.com/forum/batch_check_on_remote_when_using_copy/) + +### What version of git-annex are you using? On what operating system? + + +``` +10.20230321-1~ndall+1 +``` + +and then in conda with `10.20230626-g801c4b7` + +[[!meta author=yoh]] +[[!tag projects/dandi]]
Added a comment
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_4_b1dbbf57a46c79e8e32885c2ee8f45d2._comment b/doc/forum/very_slow_on_exfat_drives/comment_4_b1dbbf57a46c79e8e32885c2ee8f45d2._comment new file mode 100644 index 0000000000..27486cb3de --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_4_b1dbbf57a46c79e8e32885c2ee8f45d2._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 4" + date="2023-11-16T16:34:12Z" + content=""" +Yep, apparently, [exFAT can't do symlinks](https://superuser.com/questions/232257/does-or-will-exfat-support-symlinks). + +In general, it is best to use git-annex on non-shitty filesystems or one will run into problems with their limitations -- something git-annex can't really do much about. + +But you can work around it by using a bare git repository on the HDD as I mentioned above (see [here](https://stackoverflow.com/a/2200662) for how to do that) or to use the HDD just as a directory/rsync special remote from other git-annex repos. In both cases, symlinks are not needed and no expensive slow piping through smudge/clean filters is done. The downside is that you can't work (i.e. add, read, modify files) directly on the HDD, but you only use it as a storage drive. +"""]]
Added a comment
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_3_b6ee8c6384c73b44ae9ccc0ba32c3135._comment b/doc/forum/very_slow_on_exfat_drives/comment_3_b6ee8c6384c73b44ae9ccc0ba32c3135._comment new file mode 100644 index 0000000000..5ecf5e9309 --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_3_b6ee8c6384c73b44ae9ccc0ba32c3135._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 3" + date="2023-11-16T15:55:49Z" + content=""" +Ah, and make sure you're not on an adjusted-unlocked branch. And only use locked files. If exFAT can't do symlinks properly, that might be the problem. Unlocked gigantic files are also a bottleneck. +"""]]
Added a comment
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_2_769b971263b9124ec08079336d03869d._comment b/doc/forum/very_slow_on_exfat_drives/comment_2_769b971263b9124ec08079336d03869d._comment new file mode 100644 index 0000000000..b2be94d837 --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_2_769b971263b9124ec08079336d03869d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nobodyinperson" + avatar="http://cdn.libravatar.org/avatar/736a41cd4988ede057bae805d000f4f5" + subject="comment 2" + date="2023-11-16T15:54:08Z" + content=""" +Normally, using a bare repo helps on slow hardware/filesystems. This means making yokr repo on the HDD a bare repo, then adding your files somewhere else, where it's fast, and having git annex sync it over to the HDD. Uncool, but it seems lile exFAT or your HDD is rather on the shitty side. File size shouldn't matter much, the amount of files is often a problem. +"""]]
Added a comment: I guess this question is about backends
diff --git a/doc/forum/very_slow_on_exfat_drives/comment_1_0f16c72ed7bcba01398b36cb8b3cee08._comment b/doc/forum/very_slow_on_exfat_drives/comment_1_0f16c72ed7bcba01398b36cb8b3cee08._comment new file mode 100644 index 0000000000..8d94f7094a --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives/comment_1_0f16c72ed7bcba01398b36cb8b3cee08._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="imlew" + avatar="http://cdn.libravatar.org/avatar/23858c3eed3c3ea9e21522f4c999f1ed" + subject="I guess this question is about backends" + date="2023-11-16T12:51:59Z" + content=""" +As so often happens, search the docs for hours and don't find anything and right after I asking the question I found a (seemingly) relevant page I had somehow missed before. + +In this case [backends](https://git-annex.branchable.com/backends/). + +Creating a new annex repo on a drive that already had the data but no repo and changing the backend to `WORM` (with `git config --local --add annex.backend WORM`) seems to have made things a little bit faster. Adding the first 3.5GB file in just under 2 minutes and `git status` returning after 44 seconds when first run and thereafter returning instantaneously. However adding all ~3TB in the repo is shaping up to take multiple days anyway. + +Is there anything else I've missed? I don't see what could be taking so long if all that is being checked is mtime, name and size. +"""]]
added bug report for git init on a worktree checked out for a submodule.
diff --git a/doc/bugs/git_init_fails_on_a_worktree_branch.mdwn b/doc/bugs/git_init_fails_on_a_worktree_branch.mdwn new file mode 100644 index 0000000000..511d405c4d --- /dev/null +++ b/doc/bugs/git_init_fails_on_a_worktree_branch.mdwn @@ -0,0 +1,45 @@ +### Please describe the problem. +I tried git annex init on a worktree checkout of a branch, but got an error (see below). +The worktree is for a repo that is itself a submodule of another repo. + +### What steps will reproduce the problem? +I can try later to make an isolated reproducible example, but I think the above scenario describes it. + +### What version of git-annex are you using? On what operating system? + +[[!format sh """ +[bpb23-acc /data/ilya/iwork/marti/tmp/wtree/is-231002-1155-marti-cpp]$ git annex version +git-annex version: 10.20230626-g801c4b7 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV +dependency versions: aws-0.22 bloomfilter-2.0.1.0 cryptonite-0.29 DAV-1.3.4 feed-1.3.2.0 ghc-8.10.7 http-client-0.7.9 persistent-sqlite-2.13.0.3 torrent-10000.1.1 uuid-1.3.15 yesod-1.6.1.2 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +(pbase-025-env) Wed 15 Nov 2023 01:37:24 PM EST +[bpb23-acc /data/ilya/iwork/marti/tmp/wtree/is-231002-1155-marti-cpp]$ uname -a +Linux bpb23-acc 5.4.0-136-generic #153-Ubuntu SMP Thu Nov 24 15:56:58 UTC 2022 x86_64 GNU/Linux +(pbase-025-env) Wed 15 Nov 2023 01:37:25 PM EST +[bpb23-acc /data/ilya/iwork/marti/tmp/wtree/is-231002-1155-marti-cpp]$ +"""]] + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log +[bpb23-acc /data/ilya/iwork/marti/tmp/wtree/is-231002-1155-marti-cpp]$ git annex init +init +git-annex: worktrees/is-231002-1155-marti-cpp/info/attributes: openFile: does not exist (No such file or directory) +failed +init: 1 failed + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +Well I keep coming back to it as a solution :) Right now I'm trying to set up proper test cases for a tool that processes sequencing data, and even moderate-size test files break github's size limits. I'm also planning to demo git-annex to others in my group so they can start using it. +
diff --git a/doc/forum/very_slow_on_exfat_drives.mdwn b/doc/forum/very_slow_on_exfat_drives.mdwn new file mode 100644 index 0000000000..718b008a5b --- /dev/null +++ b/doc/forum/very_slow_on_exfat_drives.mdwn @@ -0,0 +1,7 @@ +I want to use git-annex to keep track of and archive of large tarballs (on the order of 10 to 100GB each). +One of the locations are a set of external HDDs that are formatted to exFAT. + +Unfortunately every git command takes hours to execute. +e.g. every time I use `git status` the index is refreshed which takes about 3 hours, committing a single takes similarly long. + +Is there anything I can do to speed things up?
Added a comment
diff --git a/doc/forum/Windows_eol_issues/comment_6_62f748e40324bc988366e19088b33295._comment b/doc/forum/Windows_eol_issues/comment_6_62f748e40324bc988366e19088b33295._comment new file mode 100644 index 0000000000..594da81fa5 --- /dev/null +++ b/doc/forum/Windows_eol_issues/comment_6_62f748e40324bc988366e19088b33295._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="beryllium@5bc3c32eb8156390f96e363e4ba38976567425ec" + nickname="beryllium" + avatar="http://cdn.libravatar.org/avatar/62b67d68e918b381e7e9dd6a96c16137" + subject="comment 6" + date="2023-11-14T22:43:22Z" + content=""" +I'm so sorry for the delay in this response, and the fact that it is not a response. I thought I had turned on email notifications, and missed the responses. I've skimmed those at this point, but will read properly and respond to clarifications. + + +"""]]
Added a comment
diff --git a/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__/comment_2_ea362bb99294571f0b0e808e87b7d422._comment b/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__/comment_2_ea362bb99294571f0b0e808e87b7d422._comment new file mode 100644 index 0000000000..f9e24a613a --- /dev/null +++ b/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__/comment_2_ea362bb99294571f0b0e808e87b7d422._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="jcjgraf" + avatar="http://cdn.libravatar.org/avatar/9dda752f83ac44906fefbadb35e8a6ac" + subject="comment 2" + date="2023-11-14T21:25:24Z" + content=""" +Oh well, that I could have come up with myself. Thanks for your hint @Lukey! +"""]]
git-annex log --gnuplot
The gnuplot output is pretty good, but could still be improved with:
* more colors (repeating colors is confusing with a lot of repos)
* better positioning of the legend, making the plot wider and moving it
from over top of the graph
Sponsored-by: Kevin Mueller on Patreon
The gnuplot output is pretty good, but could still be improved with:
* more colors (repeating colors is confusing with a lot of repos)
* better positioning of the legend, making the plot wider and moving it
from over top of the graph
Sponsored-by: Kevin Mueller on Patreon
diff --git a/CHANGELOG b/CHANGELOG index b322f8f6be..9761f776a1 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -15,7 +15,7 @@ git-annex (10.20230927) UNRELEASED; urgency=medium * info: Added calculation of combined annex size of all repositories. * log: Added options --sizesof, --sizes and --totalsizes that display how the size of repositories changed over time. - * log: Added options --interval, --bytes, --received + * log: Added options --interval, --bytes, --received, and --gnuplot to tune the output of the above added options. -- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400 diff --git a/Command/Log.hs b/Command/Log.hs index f02d213644..8605d5c3e7 100644 --- a/Command/Log.hs +++ b/Command/Log.hs @@ -52,6 +52,7 @@ data LogOptions = LogOptions , totalSizesOption :: Bool , intervalOption :: Maybe Duration , receivedOption :: Bool + , gnuplotOption :: Bool , rawDateOption :: Bool , bytesOption :: Bool , gourceOption :: Bool @@ -88,6 +89,10 @@ optParser desc = LogOptions ( long "received" <> help "display received data per interval rather than repository sizes" ) + <*> switch + ( long "gnuplot" + <> help "graph the history" + ) <*> switch ( long "raw-date" <> help "display seconds from unix epoch" @@ -299,8 +304,7 @@ sizeHistoryInfo :: (Maybe UUID) -> LogOptions -> Annex () sizeHistoryInfo mu o = do uuidmap <- getuuidmap zone <- liftIO getCurrentTimeZone - liftIO $ displayheader uuidmap - let dispst = (zone, False, epoch, Nothing, mempty) + dispst <- displaystart uuidmap zone (l, cleanup) <- getlog g <- Annex.gitRepo liftIO $ catObjectStream g $ \feeder closer reader -> do @@ -344,7 +348,7 @@ sizeHistoryInfo mu o = do Just (_, Nothing) -> go reader sizemap locmap trustlog uuidmap dispst Nothing -> - displayendsizes dispst + displayend dispst -- Known uuids are stored in this map, and when uuids are stored in the -- state, it's a value from this map. This avoids storing multiple @@ -403,18 +407,63 @@ sizeHistoryInfo mu o = do epoch = toEnum 0 - displayheader uuidmap - | sizesOption o = putStrLn $ intercalate "," $ - "date" : map (csvquote . fromUUIDDesc . snd) - (M.elems uuidmap) - | otherwise = return () - - displaysizes (zone, displayedyet, prevt, prevoutput, prevsizemap) trustlog uuidmap sizemap t + displaystart uuidmap zone + | gnuplotOption o = do + file <- (</>) + <$> fromRepo (fromRawFilePath . gitAnnexDir) + <*> pure "gnuplot" + liftIO $ putStrLn $ "Generating gnuplot script in " ++ file + h <- liftIO $ openFile file WriteMode + liftIO $ mapM_ (hPutStrLn h) + [ "set datafile separator ','" + , "set timefmt \"%Y-%m-%dT%H:%M:%S\"" + , "set xdata time" + , "set xtics out" + , "set ytics format '%s%c'" + , "set tics front" + , "set key spacing 1 font \",8\"" + ] + unless (sizesOption o) $ + liftIO $ hPutStrLn h "set key off" + liftIO $ hPutStrLn h "$data << EOD" + liftIO $ hPutStrLn h $ if sizesOption o + then uuidmapheader + else csvheader ["value"] + let endaction = do + mapM_ (hPutStrLn h) + [ "EOD" + , "" + , "plot for [i=2:" ++ show ncols ++ ":1] \\" + , " \"$data\" using 1:(sum [col=i:" ++ show ncols ++ "] column(col)) \\" + , " title columnheader(i) \\" + , if receivedOption o + then " with boxes" + else " with filledcurves x1" + ] + hFlush h + putStrLn $ "Running gnuplot..." + void $ liftIO $ boolSystem "gnuplot" + [Param "-p", File file] + return (dispst h endaction) + | sizesOption o = do + liftIO $ putStrLn uuidmapheader + return (dispst stdout noop) + | otherwise = return (dispst stdout noop) + where + dispst fileh endaction = + (zone, False, epoch, Nothing, mempty, fileh, endaction) + ncols + | sizesOption o = 1 + length (M.elems uuidmap) + | otherwise = 2 + uuidmapheader = csvheader $ + map (fromUUIDDesc . snd) (M.elems uuidmap) + + displaysizes (zone, displayedyet, prevt, prevoutput, prevsizemap, h, endaction) trustlog uuidmap sizemap t | t - prevt >= dt && changedoutput = do - displayts zone t output - return (zone, True, t, Just output, sizemap') - | t < prevt = return (zone, displayedyet, t, Just output, prevsizemap) - | otherwise = return (zone, displayedyet, prevt, prevoutput, prevsizemap) + displayts zone t output h + return (zone, True, t, Just output, sizemap', h, endaction) + | t < prevt = return (zone, displayedyet, t, Just output, prevsizemap, h, endaction) + | otherwise = return (zone, displayedyet, prevt, prevoutput, prevsizemap, h, endaction) where output = intercalate "," (map showsize sizes) us = case mu of @@ -447,19 +496,25 @@ sizeHistoryInfo mu o = do Just DeadTrusted -> 0 _ -> v - displayts zone t output = putStrLn $ ts ++ "," ++ output + displayts zone t output h = do + hPutStrLn h (ts ++ "," ++ output) + hFlush h where - ts = if rawDateOption o + ts = if rawDateOption o && not (gnuplotOption o) then rawTimeStamp t else showTimeStamp zone "%Y-%m-%dT%H:%M:%S" t - displayendsizes (zone , _, _, Just output, _) = do + displayend dispst@(_, _, _, _, _, _, endaction) = do + displayendsizes dispst + endaction + + displayendsizes (zone, _, _, Just output, _, h, _) = do now <- getPOSIXTime - displayts zone now output + displayts zone now output h displayendsizes _ = return () showsize n - | bytesOption o = show n + | bytesOption o || gnuplotOption o = show n | otherwise = roughSize storageUnits True n csvquote s @@ -469,3 +524,5 @@ sizeHistoryInfo mu o = do where escquote '"' = "\"\"" escquote c = [c] + + csvheader l = intercalate "," ("date" : map csvquote l) diff --git a/doc/git-annex-log.mdwn b/doc/git-annex-log.mdwn index 9e24fac86f..68f749b241 100644 --- a/doc/git-annex-log.mdwn +++ b/doc/git-annex-log.mdwn @@ -75,6 +75,21 @@ false, information may not have been committed to the branch yet. the amount of data received into repositories since the last line was output. +* `--gnuplot` + + Combine this option with `--sizesof` or `--sizes` or `--totalsizes` + to use gnuplot(1) to graph the data. The gnuplot file will be left on + disk for you to reuse. + + For example, to graph the sizes of all repositories: + + git-annex log --sizes --interval=1d --gnuplot + + To graph the amount of new data received into each repository every 30 + days: + + git-annex log --sizes --interval=30d --gnuplot --recieved + * `--bytes` Show sizes in bytes, disabling the default nicer units.
git-annex log --received modifier option
Only counting received and not dropped makes this show the bandwidth of
data coming into the repository, although only in a sense. Since
git-annex branch updates only happen at the end of a command, and we
don't know when a command started, it's only an approximation of the
actual bandwidth. (A previous git-annex branch update made have
happened in a different repository.)
It would be possible to also add a --dropped option, but I don't know
how useful that would be?
Sponsored-by: Nicholas Golder-Manning on Patreon
Only counting received and not dropped makes this show the bandwidth of
data coming into the repository, although only in a sense. Since
git-annex branch updates only happen at the end of a command, and we
don't know when a command started, it's only an approximation of the
actual bandwidth. (A previous git-annex branch update made have
happened in a different repository.)
It would be possible to also add a --dropped option, but I don't know
how useful that would be?
Sponsored-by: Nicholas Golder-Manning on Patreon
diff --git a/CHANGELOG b/CHANGELOG index df257afc45..b322f8f6be 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -15,6 +15,8 @@ git-annex (10.20230927) UNRELEASED; urgency=medium * info: Added calculation of combined annex size of all repositories. * log: Added options --sizesof, --sizes and --totalsizes that display how the size of repositories changed over time. + * log: Added options --interval, --bytes, --received + to tune the output of the above added options. -- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400 diff --git a/Command/Log.hs b/Command/Log.hs index 3eba85040d..f02d213644 100644 --- a/Command/Log.hs +++ b/Command/Log.hs @@ -50,7 +50,8 @@ data LogOptions = LogOptions , sizesOfOption :: Maybe (DeferredParse UUID) , sizesOption :: Bool , totalSizesOption :: Bool - , whenOption :: Maybe Duration + , intervalOption :: Maybe Duration + , receivedOption :: Bool , rawDateOption :: Bool , bytesOption :: Bool , gourceOption :: Bool @@ -83,6 +84,10 @@ optParser desc = LogOptions ( long "interval" <> metavar paramTime <> help "minimum time between displays of changed size" )) + <*> switch + ( long "received" + <> help "display received data per interval rather than repository sizes" + ) <*> switch ( long "raw-date" <> help "display seconds from unix epoch" @@ -295,7 +300,7 @@ sizeHistoryInfo mu o = do uuidmap <- getuuidmap zone <- liftIO getCurrentTimeZone liftIO $ displayheader uuidmap - let dispst = (zone, False, epoch, Nothing) + let dispst = (zone, False, epoch, Nothing, mempty) (l, cleanup) <- getlog g <- Annex.gitRepo liftIO $ catObjectStream g $ \feeder closer reader -> do @@ -379,7 +384,9 @@ sizeHistoryInfo mu o = do combinedlog = PLog.compactLog (oldlog ++ newlog) combinedlocs = S.fromList (presentlocs combinedlog) addedlocs = S.difference combinedlocs oldlocs - removedlocs = S.difference oldlocs combinedlocs + removedlocs + | receivedOption o = S.empty + | otherwise = S.difference oldlocs combinedlocs addnew k sizemap locmap newlog = ( updatesize sizemap (ksz k) locs @@ -402,23 +409,36 @@ sizeHistoryInfo mu o = do (M.elems uuidmap) | otherwise = return () - displaysizes (zone, displayedyet, prevt, prevoutput) trustlog uuidmap sizemap t - | t - prevt >= dt - && (displayedyet || any (/= 0) sizes) - && (prevoutput /= Just output) = do + displaysizes (zone, displayedyet, prevt, prevoutput, prevsizemap) trustlog uuidmap sizemap t + | t - prevt >= dt && changedoutput = do displayts zone t output - return (zone, True, t, Just output) - | t < prevt = return (zone, displayedyet, t, Just output) - | otherwise = return (zone, displayedyet, prevt, Just output) + return (zone, True, t, Just output, sizemap') + | t < prevt = return (zone, displayedyet, t, Just output, prevsizemap) + | otherwise = return (zone, displayedyet, prevt, prevoutput, prevsizemap) where output = intercalate "," (map showsize sizes) us = case mu of Just u -> [u] Nothing -> M.keys uuidmap sizes - | totalSizesOption o = [sum (M.elems sizemap')] - | otherwise = map (\u -> fromMaybe 0 (M.lookup u sizemap')) us - dt = maybe 1 durationToPOSIXTime (whenOption o) + | totalSizesOption o = [sum (M.elems sizedisplaymap)] + | otherwise = map (\u -> fromMaybe 0 (M.lookup u sizedisplaymap)) us + dt = maybe 1 durationToPOSIXTime (intervalOption o) + + changedoutput + | receivedOption o = + any (/= 0) sizes + || prevoutput /= Just output + | otherwise = + (displayedyet || any (/= 0) sizes) + && (prevoutput /= Just output) + + sizedisplaymap + | receivedOption o = + M.unionWith posminus sizemap' prevsizemap + | otherwise = sizemap' + + posminus a b = max 0 (a - b) -- A verison of sizemap where uuids that are currently dead -- have 0 size. @@ -433,7 +453,7 @@ sizeHistoryInfo mu o = do then rawTimeStamp t else showTimeStamp zone "%Y-%m-%dT%H:%M:%S" t - displayendsizes (zone , _, _, Just output) = do + displayendsizes (zone , _, _, Just output, _) = do now <- getPOSIXTime displayts zone now output displayendsizes _ = return () diff --git a/doc/git-annex-log.mdwn b/doc/git-annex-log.mdwn index 65636e5c9c..9e24fac86f 100644 --- a/doc/git-annex-log.mdwn +++ b/doc/git-annex-log.mdwn @@ -41,7 +41,7 @@ false, information may not have been committed to the branch yet. * `--sizesof=repository` Displays a history of the total size of the annexed files in a repository - as it changed over time from the creation of the repository to the present. + over time from the creation of the repository to the present. The repository can be "here" for the current repository, or the name of a remote, or a repository description or uuid. @@ -65,19 +65,15 @@ false, information may not have been committed to the branch yet. When using `--sizesof`, `--sizes`, and `--totalsizes`, this controls the minimum interval between displays of the size. - The default is to display each recorded change to the size. + The default is to display each new recorded size. The time is of the form "30d" or "1y". -* `--since=date`, `--after=date`, `--until=date`, `--before=date`, `--max-count=N` +* `--received` - These options are passed through to `git log`, and can be used to limit - how far back to search for location log changes. - - For example: `--since "1 month ago"` - - These options do not have an affect when using `--sizesof`, `--sizes`, - and `--totalsizes`. + Combine this option with `--sizesof` or `--sizes` to display + the amount of data received into repositories since the last + line was output. * `--bytes` @@ -88,6 +84,16 @@ false, information may not have been committed to the branch yet. Rather than the normal display of a date in the local time zone, displays seconds since the unix epoch. +* `--since=date`, `--after=date`, `--until=date`, `--before=date`, `--max-count=N` + + These options are passed through to `git log`, and can be used to limit + how far back to search for location log changes. + + For example: `--since "1 month ago"` + + These options do not have an affect when using `--sizesof`, `--sizes`, + and `--totalsizes`. + * `--gource` Generates output suitable for the `gource` visualization program.
rename --when to --interval
More accurately describes its behavior.
More accurately describes its behavior.
diff --git a/Command/Log.hs b/Command/Log.hs index 78190f28fb..3eba85040d 100644 --- a/Command/Log.hs +++ b/Command/Log.hs @@ -80,8 +80,8 @@ optParser desc = LogOptions <> help "display history of total sizes of all repositories" ) <*> optional (option (eitherReader parseDuration) - ( long "when" <> metavar paramTime - <> help "when to display changed size" + ( long "interval" <> metavar paramTime + <> help "minimum time between displays of changed size" )) <*> switch ( long "raw-date" diff --git a/doc/git-annex-log.mdwn b/doc/git-annex-log.mdwn index 5db94468fb..65636e5c9c 100644 --- a/doc/git-annex-log.mdwn +++ b/doc/git-annex-log.mdwn @@ -61,11 +61,11 @@ false, information may not have been committed to the branch yet. This is like `--sizesof`, but it displays the total size of all known repositories. -* `--when=time` +* `--interval=time` When using `--sizesof`, `--sizes`, and `--totalsizes`, this - controls how often to display the size. The default is to - display each change to the size. + controls the minimum interval between displays of the size. + The default is to display each recorded change to the size. The time is of the form "30d" or "1y".
diff --git a/doc/bugs/__96__git_annex_sync___60__REMOTE__62____96___swallows_network_failure.mdwn b/doc/bugs/__96__git_annex_sync___60__REMOTE__62____96___swallows_network_failure.mdwn new file mode 100644 index 0000000000..171b4cffb8 --- /dev/null +++ b/doc/bugs/__96__git_annex_sync___60__REMOTE__62____96___swallows_network_failure.mdwn @@ -0,0 +1,68 @@ +### Please describe the problem. + +`git annex sync` does not report when `git fetch` and `git push` fail due to +network issues. While `git fetch`'s error messages are printed, the exit +status of `git annex sync` will still be `0`. + +Looking at the source code, this seems to be a deliberate design decision. The +synchronization operation are coded in such a way that failing to send/receive +commits to/from a remote is not reported as an error. (See `pullRemote` and +`pushRemote` in `Command/Sync.hs`.) This has the advantage of allowing the user +to simply say `git annex sync` without worrying too much about whether all of +their configured remotes are reachable. + +However, this poses problems when trying to use `git annex sync` in an +automated way. If networking issues (including authentication failures) are +ignored, this can easily convince a script using `git annex sync` that the +operation has succeeded, when in fact it has failed. + +I can think of a few ways of addressing this issue: + +1. Keep going of any of the `pullRemote`/`pushRemote` invocations fail, but + keep track of the fact that something has failed, and exit with status 1 if + this happens. This has the advantage that scripts will be properly alerted + when things go wrong, but isn't strictly backwards compatible. +2. Add an option to `git annex sync` which causes any failures within + `pullRemote`/`pushRemote` to be considered fatal errors. Perhaps the option + could be called `--batch` or `--report-errors`. This would allow for strict + backwards-compatibility. +3. Make failures of `pullRemote`/`pushRemote` fatal errors if exactly one + remote is given on the command line. This isn't backwards compatible, and + also has issues because the semantics are not necessarily obvious to those + using `git annex sync`. + +I'm sure there are other solutions, but I think users of `git annex sync` need +a way of detecting network errors and responding to them appropriately. +Especially when you're only trying to synchronize with a single remote, and +failing to reach that remote is by definition a failure of the entire process. + +### What steps will reproduce the problem? + +Attempt to pull from any standard properly-configured `git-annex` remote +`<REMOTE>` with +[[!format sh """ + $ git annex sync <REMOTE> +"""]] +when `<REMOTE>` is not reachable on the network. +Then check the exit status with +[[!format sh """ + $ echo $? +"""]] +You should get `0` as the result. + +### What version of git-annex are you using? On what operating system? + +I'm using the latest version in Debian stable, 8.20210223-2. + +### Please provide any additional information below. + +Not really sure what to put here. Except perhaps to apologize for the overly +design-oriented bug report. Also, be aware that I'm more than happy to put in +the legwork of fixing the issue I've described above. But I'd like to make sure +we agree on a solution before I spend a lot of effort assembling a patch +series. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Oh yes, I've used `git annex` for many years now to handle synchronizing files +between various machines. Many thanks for your work on this project.
close
diff --git a/doc/todo/info_--size-history.mdwn b/doc/todo/info_--size-history.mdwn index 024f4c5c9c..15c464ace8 100644 --- a/doc/todo/info_--size-history.mdwn +++ b/doc/todo/info_--size-history.mdwn @@ -58,3 +58,5 @@ not worth the bother.) [[!tag confirmed]] --[[Joey]] + +> [[done]]
update
diff --git a/doc/thanks/list b/doc/thanks/list index 9d177683b9..a549f5fc64 100644 --- a/doc/thanks/list +++ b/doc/thanks/list @@ -142,3 +142,4 @@ Luke T. Shumaker, Nathaniel B, Jaime Marquínez Ferrándiz, Lukas Waymann, +kk,
git-annex log --sizes
CSV format so it can be fed into a program to graph it.
Note that dead repositories are not yet handled so their sizes show as
nonzero after they are marked dead.
Sponsored-By: k0ld on Patreon
CSV format so it can be fed into a program to graph it.
Note that dead repositories are not yet handled so their sizes show as
nonzero after they are marked dead.
Sponsored-By: k0ld on Patreon
diff --git a/Command/Log.hs b/Command/Log.hs index 21c2b4cdce..0a86c8ad5e 100644 --- a/Command/Log.hs +++ b/Command/Log.hs @@ -45,6 +45,7 @@ data LogOptions = LogOptions { logFiles :: CmdParams , allOption :: Bool , sizesOfOption :: Maybe (DeferredParse UUID) + , sizesOption :: Bool , whenOption :: Maybe Duration , rawDateOption :: Bool , bytesOption :: Bool @@ -66,6 +67,10 @@ optParser desc = LogOptions <> help "display history of sizes of this repository" <> completeRemotes ))) + <*> switch + ( long "sizes" + <> help "display history of sizes of all repositories" + ) <*> optional (option (eitherReader parseDuration) ( long "when" <> metavar paramTime <> help "when to display changed size" @@ -103,7 +108,9 @@ seek :: LogOptions -> CommandSeek seek o = ifM (null <$> Annex.Branch.getUnmergedRefs) ( maybe (pure Nothing) (Just <$$> getParsed) (sizesOfOption o) >>= \case Just u -> sizeHistoryInfo (Just u) o - Nothing -> go + Nothing -> if sizesOption o + then sizeHistoryInfo Nothing o + else go , giveup "This repository is read-only, and there are unmerged git-annex branches, which prevents displaying location log changes. (Set annex.merge-annex-branches to false to ignore the unmerged git-annex branches.)" ) where @@ -277,9 +284,10 @@ rawTimeStamp t = filter (/= 's') (show t) sizeHistoryInfo :: (Maybe UUID) -> LogOptions -> Annex () sizeHistoryInfo mu o = do + uuidmap <- getuuidmap zone <- liftIO getCurrentTimeZone + liftIO $ displayheader uuidmap let dispst = (zone, False, epoch, Nothing) - uuidmap <- getuuidmap (l, cleanup) <- getlog g <- Annex.gitRepo liftIO $ catObjectStream g $ \feeder closer reader -> do @@ -328,8 +336,8 @@ sizeHistoryInfo mu o = do -- state, it's a value from this map. This avoids storing multiple -- copies of the same uuid in memory. getuuidmap = do - us <- M.keys <$> uuidDescMap - return $ M.fromList (zip us us) + (us, ds) <- unzip . M.toList <$> uuidDescMap + return $ M.fromList (zip us (zip us ds)) -- Parses a location log file, and replaces the logged uuid -- with one from the uuidmap. @@ -338,7 +346,7 @@ sizeHistoryInfo mu o = do where replaceuuid ll = let !u = toUUID $ PLog.fromLogInfo $ PLog.info ll - !ushared = fromMaybe u $ M.lookup u uuidmap + !ushared = maybe u fst $ M.lookup u uuidmap in ll { PLog.info = PLog.LogInfo (fromUUID ushared) } presentlocs = map (toUUID . PLog.fromLogInfo . PLog.info) @@ -379,6 +387,12 @@ sizeHistoryInfo mu o = do epoch = toEnum 0 + displayheader uuidmap + | sizesOption o = putStrLn $ intercalate "," $ + "date" : map (csvquote . fromUUIDDesc . snd) + (M.elems uuidmap) + | otherwise = return () + displaysizes (zone, displayedyet, prevt, prevoutput) uuidmap sizemap t | t - prevt >= dt && (displayedyet || any (/= 0) sizes) @@ -387,14 +401,14 @@ sizeHistoryInfo mu o = do return (zone, True, t, Just output) | otherwise = return (zone, displayedyet, prevt, Just output) where - output = intercalate ", " (map showsize sizes) + output = intercalate "," (map showsize sizes) us = case mu of Just u -> [u] Nothing -> M.keys uuidmap sizes = map (\u -> fromMaybe 0 (M.lookup u sizemap)) us dt = maybe 1 durationToPOSIXTime (whenOption o) - displayts zone t output = putStrLn $ ts ++ ", " ++ output + displayts zone t output = putStrLn $ ts ++ "," ++ output where ts = if rawDateOption o then rawTimeStamp t @@ -408,3 +422,11 @@ sizeHistoryInfo mu o = do showsize n | bytesOption o = show n | otherwise = roughSize storageUnits True n + + csvquote s + | ',' `elem` s || '"' `elem` s = + '"' : concatMap escquote s ++ ['"'] + | otherwise = s + where + escquote '"' = "\"\"" + escquote c = [c] diff --git a/doc/git-annex-log.mdwn b/doc/git-annex-log.mdwn index 966eb6aa8e..5db94468fb 100644 --- a/doc/git-annex-log.mdwn +++ b/doc/git-annex-log.mdwn @@ -11,7 +11,7 @@ git annex log `[path ...]` This command displays information from the history of the git-annex branch. Several things can prevent that information being available to display. -When [[git-annex-dead]] and [[git-annex-forget]] are used, old historical +When [[git-annex-forget]] is used, old historical data gets cleared from the branch. When annex.private or remote.name.annex-private is configured, git-annex does not write information to the branch at all. And when annex.alwayscommit is set to @@ -40,28 +40,27 @@ false, information may not have been committed to the branch yet. * `--sizesof=repository` - Displays a history of the size of the annexed files in a repository as it - changed over time from the creation of the repository to the present. + Displays a history of the total size of the annexed files in a repository + as it changed over time from the creation of the repository to the present. The repository can be "here" for the current repository, or the name of a remote, or a repository description or uuid. - Note that keys that do not have a known size are skipped. + Note that keys that do not have a known size are not included in the + total. * `--sizes` This is like --sizesof, but rather than display the size of a single - repository, it displays the sizes of all known repositories in a table. + repository, it displays the sizes of all known repositories. + + The output is a CSV formatted table. * `--totalsizes` This is like `--sizesof`, but it displays the total size of all known repositories. - Note that dead repositories have their size included in the total - for times before the point they were marked dead. Once marked dead, - their size will no longer be included in the total. - * `--when=time` When using `--sizesof`, `--sizes`, and `--totalsizes`, this
Added a comment
diff --git a/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__/comment_1_1db8c68e8d82ed169b4687dfb5da1ba6._comment b/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__/comment_1_1db8c68e8d82ed169b4687dfb5da1ba6._comment new file mode 100644 index 0000000000..9b1b94397a --- /dev/null +++ b/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__/comment_1_1db8c68e8d82ed169b4687dfb5da1ba6._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="Lukey" + avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b" + subject="comment 1" + date="2023-11-11T16:13:28Z" + content=""" +AFAIK, copies=<group>:<x> matches if the group *contains at least* x copies. +"""]]
git-annex log --sizesof
This can take a lot of memory. I decided to violate the usual rule in
git-annex that it operate in constant memory no matter how many annexed
objects. In this case, it would be hard to be fast without using a big
map of the location logs. The main difficulty here is that there can be
many git-annex branches and it needs to display a consistent view at a
point in time, which means merging information from multiple git-annex
branches.
I have not checked if there are any laziness leaks in this code. It
takes 1 gb to run in my big repo, which is around what I estimated
before writing it.
2 options that are documented are not yet implemented.
Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the
next change after 12:59 is then. Then it waits until after 2:10 to
display the next change. It ought to wait until after 2:00.
Sponsored-by: Brock Spratlen on Patreon
This can take a lot of memory. I decided to violate the usual rule in
git-annex that it operate in constant memory no matter how many annexed
objects. In this case, it would be hard to be fast without using a big
map of the location logs. The main difficulty here is that there can be
many git-annex branches and it needs to display a consistent view at a
point in time, which means merging information from multiple git-annex
branches.
I have not checked if there are any laziness leaks in this code. It
takes 1 gb to run in my big repo, which is around what I estimated
before writing it.
2 options that are documented are not yet implemented.
Small bug: With eg --when=1h, it will display at 12:00 then 1:10 if the
next change after 12:59 is then. Then it waits until after 2:10 to
display the next change. It ought to wait until after 2:00.
Sponsored-by: Brock Spratlen on Patreon
diff --git a/CHANGELOG b/CHANGELOG index 852f18688d..df257afc45 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,8 @@ git-annex (10.20230927) UNRELEASED; urgency=medium avoid ending lines with CR for portability. Existing hook scripts that do have CR line endings will not be changed. * info: Added calculation of combined annex size of all repositories. + * log: Added options --sizesof, --sizes and --totalsizes that + display how the size of repositories changed over time. -- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400 diff --git a/Command/Info.hs b/Command/Info.hs index d2dc50c776..3027572945 100644 --- a/Command/Info.hs +++ b/Command/Info.hs @@ -809,5 +809,3 @@ matchOnKey matcher k = matcher $ MatchingInfo $ ProvidedInfo , providedMimeEncoding = Nothing , providedLinkType = Nothing } - - diff --git a/Command/Log.hs b/Command/Log.hs index c737f1066c..7124019e31 100644 --- a/Command/Log.hs +++ b/Command/Log.hs @@ -5,7 +5,7 @@ - Licensed under the GNU AGPL version 3 or higher. -} -{-# LANGUAGE OverloadedStrings #-} +{-# LANGUAGE OverloadedStrings, BangPatterns #-} module Command.Log where @@ -16,15 +16,21 @@ import Data.Time.Clock.POSIX import Data.Time import qualified Data.ByteString.Char8 as B8 import qualified System.FilePath.ByteString as P +import Control.Concurrent.Async import Command import Logs import Logs.Location +import Logs.UUID +import qualified Logs.Presence.Pure as PLog import qualified Annex import qualified Annex.Branch import qualified Remote import qualified Git import Git.Log +import Git.CatFile +import Utility.DataUnits +import Utility.HumanTime data LogChange = Added | Removed @@ -38,7 +44,10 @@ cmd = withAnnexOptions [jsonOptions, annexedMatchingOptions] $ data LogOptions = LogOptions { logFiles :: CmdParams , allOption :: Bool + , sizesOfOption :: Maybe (DeferredParse UUID) + , whenOption :: Maybe Duration , rawDateOption :: Bool + , bytesOption :: Bool , gourceOption :: Bool , passthruOptions :: [CommandParam] } @@ -51,10 +60,24 @@ optParser desc = LogOptions <> short 'A' <> help "display location log changes to all files" ) + <*> optional ((parseUUIDOption <$> strOption + ( long "sizesof" + <> metavar (paramRemote `paramOr` paramDesc `paramOr` paramUUID) + <> help "display history of sizes of this repository" + <> completeRemotes + ))) + <*> optional (option (eitherReader parseDuration) + ( long "when" <> metavar paramTime + <> help "when to display changed size" + )) <*> switch ( long "raw-date" <> help "display seconds from unix epoch" ) + <*> switch + ( long "bytes" + <> help "display sizes in bytes" + ) <*> switch ( long "gource" <> help "format output for gource" @@ -78,7 +101,14 @@ optParser desc = LogOptions seek :: LogOptions -> CommandSeek seek o = ifM (null <$> Annex.Branch.getUnmergedRefs) - ( do + ( maybe (pure Nothing) (Just <$$> getParsed) (sizesOfOption o) >>= \case + Just u -> sizeHistoryInfo (Just u) o + Nothing -> go + , giveup "This repository is read-only, and there are unmerged git-annex branches, which prevents displaying location log changes. (Set annex.merge-annex-branches to false to ignore the unmerged git-annex branches.)" + ) + where + ww = WarnUnmatchLsFiles "log" + go = do m <- Remote.uuidDescriptions zone <- liftIO getCurrentTimeZone outputter <- mkOutputter m zone o <$> jsonOutputEnabled @@ -94,10 +124,6 @@ seek o = ifM (null <$> Annex.Branch.getUnmergedRefs) =<< workTreeItems ww fs ([], True) -> commandAction (startAll o outputter) (_, True) -> giveup "Cannot specify both files and --all" - , giveup "This repository is read-only, and there are unmerged git-annex branches, which prevents displaying location log changes. (Set annex.merge-annex-branches to false to ignore the unmerged git-annex branches.)" - ) - where - ww = WarnUnmatchLsFiles "log" start :: LogOptions -> (ActionItem -> SeekInput -> Outputter) -> SeekInput -> RawFilePath -> Key -> CommandStart start o outputter si file key = do @@ -158,7 +184,7 @@ mkOutputter m zone o jsonenabled ai si | jsonenabled = jsonOutput m ai si | rawDateOption o = normalOutput lookupdescription ai rawTimeStamp | gourceOption o = gourceOutput lookupdescription ai - | otherwise = normalOutput lookupdescription ai (showTimeStamp zone) + | otherwise = normalOutput lookupdescription ai (showTimeStamp zone rfc822DateFormat) where lookupdescription u = maybe (fromUUID u) (fromUUIDDesc) (M.lookup u m) @@ -242,9 +268,142 @@ getGitLogAnnex fs os = do let fileselector = locationLogFileKey config . toRawFilePath inRepo $ getGitLog Annex.Branch.fullname fs os fileselector -showTimeStamp :: TimeZone -> POSIXTime -> String -showTimeStamp zone = formatTime defaultTimeLocale rfc822DateFormat +showTimeStamp :: TimeZone -> String -> POSIXTime -> String +showTimeStamp zone format = formatTime defaultTimeLocale format . utcToZonedTime zone . posixSecondsToUTCTime rawTimeStamp :: POSIXTime -> String rawTimeStamp t = filter (/= 's') (show t) + +sizeHistoryInfo :: (Maybe UUID) -> LogOptions -> Annex () +sizeHistoryInfo mu o = do + zone <- liftIO getCurrentTimeZone + let dispst = (zone, False, epoch, Nothing) + uuidmap <- getuuidmap + (l, cleanup) <- getlog + g <- Annex.gitRepo + liftIO $ catObjectStream g $ \feeder closer reader -> do + tid <- async $ do + forM_ l $ \c -> + feeder ((changed c, changetime c), newref c) + closer + go reader M.empty M.empty M.empty uuidmap dispst + wait tid + void $ liftIO cleanup + where + -- Go through the log of the git-annex branch in reverse, + -- and in date order, and pick out changes to location log files + -- and to the trust log. + getlog = do + config <- Annex.getGitConfig + let fileselector = \f -> let f' = toRawFilePath f in + case locationLogFileKey config f' of + Just k -> Just (Right k) + Nothing + | f' == trustLog -> Just (Left ()) + | otherwise -> Nothing + inRepo $ getGitLog Annex.Branch.fullname [] + [ Param "--date-order" + , Param "--reverse" + ] + fileselector + + go reader sizemap locmap deadmap uuidmap dispst = reader >>= \case + Just ((Right k, t), Just logcontent) -> do + let !newlog = parselocationlog logcontent uuidmap + let !(sizemap', locmap') = case M.lookup k locmap of + Nothing -> addnew k sizemap locmap newlog + Just v -> update k sizemap locmap v newlog + dispst' <- displaysizes dispst uuidmap sizemap' t + go reader sizemap' locmap' deadmap uuidmap dispst' + Just ((Left (), t), Just logcontent) -> do + -- XXX todo update deadmap + go reader sizemap locmap deadmap uuidmap dispst + Just (_, Nothing) -> + go reader sizemap locmap deadmap uuidmap dispst + Nothing -> + displayendsizes dispst + + -- Known uuids are stored in this map, and when uuids are stored in the + -- state, it's a value from this map. This avoids storing multiple + -- copies of the same uuid in memory. + getuuidmap = do + us <- M.keys <$> uuidDescMap + return $ M.fromList (zip us us) + + -- Parses a location log file, and replaces the logged uuid (Diff truncated)
todo
diff --git a/doc/todo/info_--size-history.mdwn b/doc/todo/info_--size-history.mdwn new file mode 100644 index 0000000000..024f4c5c9c --- /dev/null +++ b/doc/todo/info_--size-history.mdwn @@ -0,0 +1,60 @@ +Support eg `git-annex info --size-history=30d` which would display +the combined size of all repositories every 30 days throughout the history +of the git-annex branch. This would allow graphing, analysis, etc of repo +growth patterns. + +Also, `git-annex info somerepo --size-history=30d` would display the size +of only the selected repository. + +Maybe also a way to get the size of each repository plus total size in a +single line of output? + +---- + +Implementation of this is fairly subtle. My abandoned first try just went +through `git log` and updated counters as the location logs were updated. +That resulted in bad numbers. (The size went negative eventually in fact!) +The problem is that the git-annex branch is often updated both locally and +on a remote, eg when copying a file to a remote. And that results in 2 +changes to the git-annex branch that both record the same data. So it gets +counted twice by my naive implementation. + +I think it is not possible for an accumulation based approach to work in +constant memory and fast. In the worst case, there is a fork of the branch +that diverges hugely over a long period of time. So that divergence either +needs to be buffered in memory, or recalculated repeatedly. + +What I think needs to be done is use `git log --reverse --date-order git-annex`. +Feed the changed annex log file refs into catObjectStream to get the log +files. (Or use --patch and parse the diff to extract log file lines, +might be faster?) Parse the log files, and update a simple data structure: + + Map Key [UUIDPointer] + +Where UUIDPointer is a number that points to the UUID in a Map. This +avoids storing copies of the uuids in the map. + +This is essentially union merging all forks of the git-annex branch at +each commit, but far faster and in memory. Since union merging a git-annex +branch can be done at any point and always results in a consistent view of +the data, this will be consistent as well. + +And when updating the data structure, then it can update a counter when +something changed, and avoid updating it when a redundant log was logged. + +This approach will use an amount of memory that scales with +the number of keys and numbers of copies. I mocked it up using my big +repository. Storing every key in it in such a map, with 64 UUIDPointers +in the list (many more than the usual number of copies) took 2 gb of +memory. Which is a lot but also most users have that much if necessary. +With a more usual 5 copies, memory use was only 0.5 gb. So I think this is +an acceptable exception to git-annex's desire to use a constant amount of +memory. + +(I considered a bloom filter, but a false positive would wreck the +statistics. An in-memory sqlite db might be more efficient, but probably +not worth the bother.) + +[[!tag confirmed]] + +--[[Joey]]
info: Added calculation of combined annex size of all repositories
Factored out overLocationLogs from CmdLine.Seek, which can calculate this
pretty fast even in a large repo. In my big repo, the time to run git-annex
info went up from 1.33s to 8.5s.
Note that the "backend usage" stats are for annexed files in the working
tree only, not all annexed files. This new data source would let that be
changed, but that would be a confusing behavior change. And I cannot
retitle it either, out of fear something uses the current title (eg parsing
the json).
Also note that, while time says "402108maxresident" in my big repo now,
up from "54092maxresident", top shows the RES constant at 64mb, and it
was 48mb before. So I don't think there is a memory leak. I tried using
deepseq to force full evaluation of addKeyCopies and memory use didn't
change, which also says no memory leak. And indeed, not even calling
addKeyCopies resulted in the same memory use. Probably the increased memory
usage is buffering the stream of data from git in overLocationLogs.
Sponsored-by: Brett Eisenberg on Patreon
Factored out overLocationLogs from CmdLine.Seek, which can calculate this
pretty fast even in a large repo. In my big repo, the time to run git-annex
info went up from 1.33s to 8.5s.
Note that the "backend usage" stats are for annexed files in the working
tree only, not all annexed files. This new data source would let that be
changed, but that would be a confusing behavior change. And I cannot
retitle it either, out of fear something uses the current title (eg parsing
the json).
Also note that, while time says "402108maxresident" in my big repo now,
up from "54092maxresident", top shows the RES constant at 64mb, and it
was 48mb before. So I don't think there is a memory leak. I tried using
deepseq to force full evaluation of addKeyCopies and memory use didn't
change, which also says no memory leak. And indeed, not even calling
addKeyCopies resulted in the same memory use. Probably the increased memory
usage is buffering the stream of data from git in overLocationLogs.
Sponsored-by: Brett Eisenberg on Patreon
diff --git a/CHANGELOG b/CHANGELOG index 4dec40cb78..852f18688d 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -12,6 +12,7 @@ git-annex (10.20230927) UNRELEASED; urgency=medium * Windows: When git-annex init is installing hook scripts, it will avoid ending lines with CR for portability. Existing hook scripts that do have CR line endings will not be changed. + * info: Added calculation of combined annex size of all repositories. -- Joey Hess <id@joeyh.name> Tue, 10 Oct 2023 13:17:31 -0400 diff --git a/CmdLine/Seek.hs b/CmdLine/Seek.hs index 18be0a44f7..30aa5e4adb 100644 --- a/CmdLine/Seek.hs +++ b/CmdLine/Seek.hs @@ -276,24 +276,14 @@ withKeyOptions' ko auto mkkeyaction fallbackaction worktreeitems = do -- those. This significantly speeds up typical operations -- that need to look at the location log for each key. runallkeys = do - checktimelimit <- mkCheckTimeLimit keyaction <- mkkeyaction - config <- Annex.getGitConfig - - let getk = locationLogFileKey config + checktimelimit <- mkCheckTimeLimit let discard reader = reader >>= \case Nothing -> noop Just _ -> discard reader - let go reader = reader >>= \case - Just (k, f, content) -> checktimelimit (discard reader) $ do - maybe noop (Annex.Branch.precache f) content - unlessM (checkDead k) $ - keyaction Nothing (SeekInput [], k, mkActionItem k) - go reader - Nothing -> return () - Annex.Branch.overBranchFileContents getk go >>= \case - Just r -> return r - Nothing -> giveup "This repository is read-only, and there are unmerged git-annex branches, which prevents operating on all keys. (Set annex.merge-annex-branches to false to ignore the unmerged git-annex branches.)" + overLocationLogs' () + (\reader cont -> checktimelimit (discard reader) cont) + (\k _ () -> keyaction Nothing (SeekInput [], k, mkActionItem k)) runkeyaction getks = do keyaction <- mkkeyaction diff --git a/Command/Info.hs b/Command/Info.hs index f487f8db19..d2dc50c776 100644 --- a/Command/Info.hs +++ b/Command/Info.hs @@ -89,12 +89,13 @@ data StatInfo = StatInfo { presentData :: Maybe KeyInfo , referencedData :: Maybe KeyInfo , repoData :: M.Map UUID KeyInfo + , allRepoData :: Maybe KeyInfo , numCopiesStats :: Maybe NumCopiesStats , infoOptions :: InfoOptions } emptyStatInfo :: InfoOptions -> StatInfo -emptyStatInfo = StatInfo Nothing Nothing M.empty Nothing +emptyStatInfo = StatInfo Nothing Nothing M.empty Nothing Nothing -- a state monad for running Stats in type StatState = StateT StatInfo Annex @@ -281,8 +282,9 @@ global_slow_stats = , local_annex_size , known_annex_files True , known_annex_size True - , bloom_info + , total_annex_size , backend_usage + , bloom_info ] tree_fast_stats :: Bool -> [FilePath -> Stat] @@ -435,6 +437,11 @@ known_annex_size :: Bool -> Stat known_annex_size isworktree = simpleStat ("size of annexed files in " ++ treeDesc isworktree) $ showSizeKeys =<< cachedReferencedData + +total_annex_size :: Stat +total_annex_size = + simpleStat "combined annex size of all repositories" $ + showSizeKeys =<< cachedAllRepoData treeDesc :: Bool -> String treeDesc True = "working tree" @@ -612,6 +619,23 @@ cachedReferencedData = do put s { referencedData = Just v } return v +cachedAllRepoData :: StatState KeyInfo +cachedAllRepoData = do + s <- get + case allRepoData s of + Just v -> return v + Nothing -> do + matcher <- lift getKeyOnlyMatcher + !v <- lift $ overLocationLogs emptyKeyInfo $ \k locs d -> do + numcopies <- genericLength . snd + <$> trustPartition DeadTrusted locs + ifM (matchOnKey matcher k) + ( return (addKeyCopies numcopies k d) + , return d + ) + put s { allRepoData = Just v } + return v + -- currently only available for directory info cachedNumCopiesStats :: StatState (Maybe NumCopiesStats) cachedNumCopiesStats = numCopiesStats <$> get @@ -627,7 +651,13 @@ getDirStatInfo o dir = do (presentdata, referenceddata, numcopiesstats, repodata) <- Command.Unused.withKeysFilesReferencedIn dir initial (update matcher fast) - return $ StatInfo (Just presentdata) (Just referenceddata) repodata (Just numcopiesstats) o + return $ StatInfo + (Just presentdata) + (Just referenceddata) + repodata + Nothing + (Just numcopiesstats) + o where initial = (emptyKeyInfo, emptyKeyInfo, emptyNumCopiesStats, M.empty) update matcher fast key file vs@(presentdata, referenceddata, numcopiesstats, repodata) = @@ -663,7 +693,7 @@ getTreeStatInfo o r = do (presentdata, referenceddata, repodata) <- go fast matcher ls initial ifM (liftIO cleanup) ( return $ Just $ - StatInfo (Just presentdata) (Just referenceddata) repodata Nothing o + StatInfo (Just presentdata) (Just referenceddata) repodata Nothing Nothing o , return Nothing ) where @@ -695,16 +725,19 @@ emptyNumCopiesStats :: NumCopiesStats emptyNumCopiesStats = NumCopiesStats M.empty addKey :: Key -> KeyInfo -> KeyInfo -addKey key (KeyInfo count size unknownsize backends) = +addKey = addKeyCopies 1 + +addKeyCopies :: Integer -> Key -> KeyInfo -> KeyInfo +addKeyCopies numcopies key (KeyInfo count size unknownsize backends) = KeyInfo count' size' unknownsize' backends' where {- All calculations strict to avoid thunks when repeatedly - applied to many keys. -} !count' = count + 1 !backends' = M.insertWith (+) (fromKey keyVariety key) 1 backends - !size' = maybe size (+ size) ks + !size' = maybe size (\sz -> sz * numcopies + size) ks !unknownsize' = maybe (unknownsize + 1) (const unknownsize) ks - ks = fromKey keySize key + !ks = fromKey keySize key updateRepoData :: Key -> [UUID] -> M.Map UUID KeyInfo -> M.Map UUID KeyInfo updateRepoData key locs m = m' @@ -776,3 +809,5 @@ matchOnKey matcher k = matcher $ MatchingInfo $ ProvidedInfo , providedMimeEncoding = Nothing , providedLinkType = Nothing } + + diff --git a/Logs/Location.hs b/Logs/Location.hs index 860d0f456b..e9ae0213a3 100644 --- a/Logs/Location.hs +++ b/Logs/Location.hs @@ -8,11 +8,13 @@ - Repositories record their UUID and the date when they --get or --drop - a value. - - - Copyright 2010-2021 Joey Hess <id@joeyh.name> + - Copyright 2010-2023 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} +{-# LANGUAGE BangPatterns #-} + module Logs.Location ( LogStatus(..), logStatus, @@ -29,6 +31,8 @@ module Logs.Location ( loggedKeys, loggedKeysFor, loggedKeysFor', + overLocationLogs, + overLocationLogs', ) where import Annex.Common @@ -42,6 +46,7 @@ import Git.Types (RefDate, Ref) import qualified Annex import Data.Time.Clock +import qualified Data.ByteString.Lazy as L {- Log a change in the presence of a key's value in current repository. -} logStatus :: Key -> LogStatus -> Annex () (Diff truncated)
diff --git a/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__.mdwn b/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__.mdwn new file mode 100644 index 0000000000..0e4332efc5 --- /dev/null +++ b/doc/forum/Required_Content_Always_True___40____61____62___cannot_drop__41__.mdwn @@ -0,0 +1,82 @@ +Hey there, + +I am having the issue that git-annex reports files to be required even though it should not be. + +The setup is as follows (The groups that the repo is part of is specified in brackets): + +- Main repo (`origin`, `server`) on a server + - Regularly syncs all its data two two backup repos (`server`, `backup`, `backup_server`) + - These two backup repos are trusted, as they are not reachable by clients +- Client repo (`client`) on PC + - Syncs with other clients via the server + - Manually syncs with a backup repo (`backup` `offline`, `backup_offline`) on a disk + - Each client has two remotes, the server (named `origin`) and the backup drive (named `usb`) + +I want that the required content is everything present, that is not yet (all of them): + +1. in the server repo +2. in both server backup repos +3. in a backup repo on an external drive + +I specify that using `git annex required . "present and (copies=origin:0 or (not copies=backup_server:2) or copies=backup_offline:0)"`. + +Now I want to drop file `A` + +File `A` is present on 5 repos. However, I cannot delete it, as it is "apparently" not on the server (i.e. `copies=origin:0` is true). + +``` +$ git annex whereis A +whereis A (5 copies) + 0cdca96a-e44d-4168-a3a0-8ab846451e74 -- Server_Backup2 + 44039708-f0d9-4ed0-833b-4d146d419b5d -- jeanclaude [here] + 4ac4c649-b37c-403e-94e0-9497a7bc2a91 -- Server_Backup1 + c0d1b661-1e19-4956-b290-ff62abc6d61a -- jc_backup_wd6tb [usb] + da3e14a5-188f-4a65-bf93-8fce9e409d09 -- [origin] +ok + +$ git annex drop A --explain +drop A [ A matches required content: present[TRUE] and ( copies=origin:0[TRUE] ) ] + + That file is required content. It cannot be dropped! + + (Use --force to override this check, or adjust required content configuration.) +failed +drop: 1 failed +``` + +What is it saying that it is not no `origin`? Is there something wrong with my setup? Does the drop command try to *lock* file `A` on all remotes (except on the server backups, as they are trusted)? + + +Me doing some "debugging": + +``` +$ git annex required . "present and (copies=origin:1 or (not copies=backup_server:2) or copies=backup_offline:0)" +required . ok +(recording state in git...) + +$ git annex drop A --explain +drop A [ A matches required content: present[TRUE] and ( copies=origin:1[TRUE] ) ] + + That file is required content. It cannot be dropped! + + (Use --force to override this check, or adjust required content configuration.) +failed +drop: 1 failed + +$ git annex required . "present and (copies=origin:10 or (not copies=backup_server:2) or copies=backup_offline:0)" +required . ok +(recording state in git...) + +$ git annex drop 2022/220513-PolybandInConcert/220513_220256.cr3 --explain +drop A [ A matches required content: present[TRUE] and ( copies=origin:10[FALSE] or not copies=backup_server:2[TRUE] or copies=backup_offline:0[TRUE] ) ] + + That file is required content. It cannot be dropped! + + (Use --force to override this check, or adjust required content configuration.) +failed +drop: 1 failed +``` + +How can the number of copies on `origin` be `0` and `1` at the same time? Or do I misunderstand something completely? It also reports that the file in none of the backups (neither local nor remote). + +Thanks a lot your your time and effort!
Added a comment: What about temporary annex.private declaration?
diff --git a/doc/tips/cloning_a_repository_privately/comment_1_f5490d034074ca80a712bdd41c307139._comment b/doc/tips/cloning_a_repository_privately/comment_1_f5490d034074ca80a712bdd41c307139._comment new file mode 100644 index 0000000000..fb46c94c95 --- /dev/null +++ b/doc/tips/cloning_a_repository_privately/comment_1_f5490d034074ca80a712bdd41c307139._comment @@ -0,0 +1,39 @@ +[[!comment format=mdwn + username="mih" + avatar="http://cdn.libravatar.org/avatar/f881df265a423e4f24eff27c623148fd" + subject="What about temporary annex.private declaration?" + date="2023-11-07T15:49:47Z" + content=""" +The instructions indicate that `annex.private` should be set in the local repository configuration. + +However, the following approach is also a possibility: + +``` +❯ mkdir priv +❯ cd priv +❯ git init +Initialized empty Git repository in /tmp/priv/.git/ + +❯ git -c annex.private=1 annex init +init ok + +❯ ls .git/annex/journal-private +uuid.log + +❯ cat .git/config +[core] + repositoryformatversion = 0 + filemode = true + bare = false + logallrefupdates = true +[annex] + uuid = 955373ac-6044-493e-a696-1a706437b542 + version = 10 +[filter \"annex\"] + smudge = git-annex smudge -- %f + clean = git-annex smudge --clean -- %f + process = git-annex filter-process +``` + +It seems this repository was in private mode when it was initialized (expected). What is the implication of the switch not being permanent in the config? And by extension: what are the implications of removing the switch later in the lifetime of a repository clone? +"""]]
Added a comment
diff --git a/doc/forum/Using_git_annex_as_a_library/comment_1_42cd26878e4e5c2c238c1227e3c372d9._comment b/doc/forum/Using_git_annex_as_a_library/comment_1_42cd26878e4e5c2c238c1227e3c372d9._comment new file mode 100644 index 0000000000..d4ce3acd23 --- /dev/null +++ b/doc/forum/Using_git_annex_as_a_library/comment_1_42cd26878e4e5c2c238c1227e3c372d9._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="oadams" + avatar="http://cdn.libravatar.org/avatar/ac166a5f89f10c4108e5150015e6751b" + subject="comment 1" + date="2023-11-07T08:30:21Z" + content=""" +I should add that I have just seen this discussion https://git-annex.branchable.com/forum/Using_git-annex_as_a_library/. Perhaps I should have posted there, but either way I'm wondering if things are different these days? +"""]]
diff --git a/doc/forum/Using_git_annex_as_a_library.mdwn b/doc/forum/Using_git_annex_as_a_library.mdwn new file mode 100644 index 0000000000..c76f98e9a6 --- /dev/null +++ b/doc/forum/Using_git_annex_as_a_library.mdwn @@ -0,0 +1,6 @@ +I've started writing a small tool in Haskell to manage data processing pipelines. I'm interested in using git-annex as a library as part of this (since the idea is that the raw data is tracked in git-annex and git). I'm wondering what the best way to go about doing it is. + +If I set my stack.yaml `extra-deps` to point to the git repository and run `stack build` then stack visibly goes and downloads the package. But from there I can't actually import any modules in my own Haskell program. When I use `import Git` or `import Annex.HashObject` then I get a compile error along the lines of `Could not find module ‘Annex.HashObject’`. + +Do you have any pointers for how I might best use git-annex as a library? Thanks for your help, sorry if it's so basic as I'm new to the Haskell tooling ecosystem. +
expand description
diff --git a/doc/git-annex-info.mdwn b/doc/git-annex-info.mdwn index aaba7ef182..bdbcf1415e 100644 --- a/doc/git-annex-info.mdwn +++ b/doc/git-annex-info.mdwn @@ -8,18 +8,35 @@ git annex info `[directory|file|treeish|remote|description|uuid ...]` # DESCRIPTION -Displays statistics and other information for the specified item, -which can be a directory, or a file, or a treeish, or a remote, -or the description or uuid of a repository. - -When no item is specified, displays statistics and information -for the local repository and all annexed content. +Displays statistics and other information for the specified item. + +When no item is specified, displays overall information. This includes a +list of all known repositories, how much annexed data is present in the +local repository, and the total size of all annexed data in the working +tree. + +When a directory is specified, displays information +about the annexed files in that directory (and subdirectories). +This includes how much annexed data is present in the local repository, +the total size of all annexed data in the directory, how many files +have the specified numcopies or more (+1, +2 etc) or less (-1, -2 etc), +and information about how much of the annexed data is stored in known +repositories. + +When a treeish is specified, displays similar information +as when a directory is specified, but about the annexed files in that +treeish. + +When a remote, or description of a repository, or uuid is specified, +displays information about the specified repository, including the total +amount of annexed data stored in it, and a variety of configuration +information. # OPTIONS * `--fast` - Only show the data that can be gathered quickly. + Only show the information that can be gathered quickly. * `--json`
diff --git a/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet.mdwn b/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet.mdwn new file mode 100644 index 0000000000..3081d7f834 --- /dev/null +++ b/doc/bugs/fsck_ignores_--json-error-messages_when_--quiet.mdwn @@ -0,0 +1,76 @@ +### Please describe the problem. + +As I understand the manual: + +- Options `--json --json-error-messages` are provided so that another program can parse the `git annex fsck` results. + +- Option `--quiet` is provided to list only problems (not print anything for OK files). + +However, when options are combined, only plain text error messages are provided, no json output is provided. + +I understand this may be "as designed", quiet is quiet... But then how to log only errors in json? I have +300k files in the annex, and no need to log when everything is fine. + +### What steps will reproduce the problem? + +Create a repo with files b and c + +Corrupt file b + +`git annex fsck --json --json-error-messages --quiet` + +I expected to have a json output with only files that fail the fsck, instead I get only normal stderr, just like with +`git annex fsck --quiet` + +### What version of git-annex are you using? On what operating system? +10.20230926-12 on arch + +### Please provide any additional information below. + +[[!format sh """ + +# Expected plain result +> git annex fsck + +fsck b + ** No known copies exist of b +failed +fsck c (checksum...) ok +(recording state in git...) +fsck: 1 failed + +# Expected json result (error message to stderr, both logs) +> git annex fsck --json + + ** No known copies exist of b +{"command":"fsck","dead":[],"error-messages":[],"file":"b","input":["b"],"key":"SHA256E-s5--f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2","success":false,"untrusted":[]} +{"command":"fsck","error-messages":[],"file":"c","input":["c"],"key":"SHA256E-s4--530a0b93b8c1ea618546d3aaa6ec71f888d2a6095322bfdb1b04c9225e26481e","note":"checksum...","success":true} +fsck: 1 failed + +# Expected json output with error message embedded +> git annex fsck --json --json-error-messages + +{"command":"fsck","dead":[],"error-messages":["** No known copies exist of b"],"file":"b","input":["b"],"key":"SHA256E-s5--f2ca1bb6c7e907d06dafe4687e579fce76b37e4e93b7605022da52e6ccc26fd2","success":false,"untrusted":[]} +{"command":"fsck","error-messages":[],"file":"c","input":["c"],"key":"SHA256E-s4--530a0b93b8c1ea618546d3aaa6ec71f888d2a6095322bfdb1b04c9225e26481e","note":"checksum...","success":true} +fsck: 1 failed + +# Expected only error message +> git annex fsck --quiet + + ** No known copies exist of b +fsck: 1 failed + +# UnExpected result: I expected a json output with the error message embedded "--json --json-error-messages" seem ignored here +> git annex fsck --json --json-error-messages --quiet + + ** No known copies exist of b +fsck: 1 failed + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes, great tool ! Thanks ! + + +
Added a comment
diff --git a/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_4_4529364f2919bd05f53da94cf8ba4268._comment b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_4_4529364f2919bd05f53da94cf8ba4268._comment new file mode 100644 index 0000000000..ad42a4de31 --- /dev/null +++ b/doc/forum/Use_on_large_media_collection_without_modifying_it/comment_4_4529364f2919bd05f53da94cf8ba4268._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="unqueued" + avatar="http://cdn.libravatar.org/avatar/3bcbe0c9e9825637ad7efa70f458640d" + subject="comment 4" + date="2023-11-05T21:32:17Z" + content=""" +Just putting this out there, but if you are on ZFS or BTRFS, you can just duplicate the subvolume/dataset, remove what you want, and send it. It will by default verify your data integrity, and it is often faster. + +On BTRFS, it is easy to `btrfs sub create send.RW; cp --reflink=always .git/annex/objects send.RW; btrfs sub snap -r send.RW send.RO; btrfs sub del send.RW` + +Then, on the target, I can reflink copy into the target repo's .git/annex/objects, and the `git annex fsck --all --fast`, since the send operation verified the integrity. + + +Sometimes, if the target repo does not exist, I can take a snapshot of an entire repo, and then enter it, then re-init it with the target uuid, force drop what I don't want, and then send it. If you're dealing with hundreds of thousands of files, it can be more practical to do that. + +If you want to verify the integrity of an annexed file on ZFS or BTRFS, all you have to do is read it, and let the filesystem verify the checksums for you. + +If you want a nice progress display, you can just do `pv myfile > /dev/null` + +I considered making a git-annex-scrub script that would check if the underlying fs supports integrity verification, then just read the file and update the log. + +BTRFS uses hardware accelerated crc32, which is fine for bitrot, but it is not secure from intentional tampering. +"""]]
report
diff --git a/doc/bugs/How_to_git_union-merge__63__.mdwn b/doc/bugs/How_to_git_union-merge__63__.mdwn new file mode 100644 index 0000000000..3272cd2508 --- /dev/null +++ b/doc/bugs/How_to_git_union-merge__63__.mdwn @@ -0,0 +1,24 @@ +### Please describe the problem. + +It's unclear how to use `git union-merge` as described [here](https://git-annex.branchable.com/git-union-merge/). + +### What steps will reproduce the problem? + +Run `git union-merge`, yields `git: 'union-merge' is not a git command. See 'git --help'.` + +No binary `git-union-merge` is shipped in the standalone tarball and distros also don't seem to ship it. + +### What version of git-annex are you using? On what operating system? + +Tried with (on a Manjaro box): + +- git-annex-standalone-amd64.tar.gz 2023-10-09 14:21 51M from [here](https://downloads.kitenet.net/git-annex/linux/current/) (the build is a month old, is that right?) +- `git-annex` in Manjaro repos +- `git-annex-standalone-nightly-bin` from AUR +- `nix-shell -p git-annex` (10.20230926 in nixos-unstable) + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +git-annex rules and is a marvelous tool. + +I wanted to try the union merging to resolve merge conflicts on non-annexed files. It's not ideal, but might be better than repeated `git annex assist|sync` eventually adding the merge conflict markers `<<<<<<<<<` and the like to the files, breaking things like `.gitattributes` syntax which in turn has more devastating lockup consequences...