Please describe the problem.
Adb special remote on both "import" and "sync" commands creates empty commits in "main" branch, even if files hadn't changed in either remote or local repo.
This clutters history of "main" branch (which is mostly manually curated), and hides commits actually introducing added/deleted files amongst the clutter.
Moreover, this behavior differs from expectations: "sync" with proper remote never created empty commits in "main" branch.
What steps will reproduce the problem?
Setup adb import/export, like here: https://git-annex.branchable.com/forum/Mixed_content_repos_with_import_and_export/#comment-764ac971faf756140055333649ffb94c.
Repetition of "git annex sync --content" introduces new empty commits without proper reason.
What version of git-annex are you using? On what operating system?
git-annex version: 8.20211118-g23ee48898
Please provide any additional information below.
Example of "main" branch history cluttered by empty commits:
* 2022-01-01 bb23200 remote tracking branch (android/main) [annex]
|\
* | 2022-01-01 42b8111 remote tracking branch (HEAD -> main, synced/main) [annex]
|\|
| * 2022-01-01 223f5f7 import from android [annex]
* | 2022-01-01 f5b5f0a remote tracking branch [annex]
|\|
* | 2022-01-01 2a6177e remote tracking branch [annex]
|\|
| * 2022-01-01 7ef4649 import from android [annex]
I guess you mean commits like "import from foo", or perhaps "remote tracking branch". Not adb-specific at all.
The "remote tracking branch" commits are merge commits, so cannot be avoided, even though they don't make any apparent changes.
It mostly does manage to avoid making the "import from" commits when there is no difference from the previous commit. The only case I know of where it does not is in the initial import, if the branch was exported first, and then imported. I see an empty "import from" then. Subsequent imports, when there are no changes to the tree, do not make a new commit.
Of course, you can rebase out any of these commits if you want to.
Yes.
Can't we simply... don't do them? Skip? Revert? Fast-forward?
I don't mind this history hell inside git-annex branch, but cluttering "main" branch... is something else.
After first merge of unrelated histories -- we don't really need another merge commit unless new changes where introduced during import, do we?
Theoretically yes, practically no. If I run "git annex sync" -- this history will be propagated to all my connected in that moment devices -- PC, laptop, NAS, remote VPS, and maybe HDD. Rebasing and cleaning up them after that -- impractical.
It seems this does not work as described.
You depend on CID (size, modificiation time, and inode) when doing comparisons.
1) Most likely this issue is related to https://git-annex.branchable.com/bugs/adb_pull_does_not_preserve_timestamp/ , that results in a different mtime on remote and on PC when you "import" new files.
Then each time you do "import" -- git-annex detect changes by mtime and creates commit, but it's empty because content hadn't actually changed.
2) Alternative heisenbug is the time skew between what android "sdcard" virtual filesystem reports and what it actually transfers. Probability is too low, though
P.S. I'm not sure if it's a separate bug or related to "mtime" too (scenario (1) or even (2)), but:
But I had no changes in either repo or in android folder. And it worked multiple times in a row the day before!
Please show me how to get more than one "import from foo" commit that is empty, starting from a new remote.
I have only managed to get one, maximum, as I said.
I do not thik your CID analysis is correct. The CID is calculated on the android side and has nothing to do with the timestamps of anything in the git repo.
Now, if Android is varying the mtime it reports for files, so resulting in a changed CID, it may be that would cause an import of content it had already imported before, and maybe that leads to the unnecessary commit.
Yes, sorry, you are right. You only use "find -exec stat" to get remote CID.
Hm, it really looks like heisenbug as I can't reproduce it stably and immediately. Usually issue arises after some indeterminate time had passed. Dunno what action could trigger it, as all of them look unrelated and improbable: phone reboot, system reboot, cold file cache, NTP skew...
When I wrap everything into the script which completes in 10 sec -- there is no problem. But after several days -- import creates new empty commit and export refuses to overwrite existing file.
I read something kind of similar months ago: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1631854.html
Let's focus on empty "remote tracking branch" merge commits for now. At least until I find how to reproduce empty "import from foo" too.
Can we avoid new empty merge commits if "import" was done first?
I do not think this idea of mine can cause it. I tried, using a directory special remote, touching a file in the remote after having already imported it once. This resulted in git-annex sync importing the same file again, but since the content was the same it built the same tree it had before, and noticed this and avoided making an empty commit.
On the merge commits, importing creates one, and exporting creates one. So sync creates two. Also, if you export and then merge the remote tracking branch (a fast-forward merge), and then export again, it makes another merge commit. So any number can be stacked up that way.
See 1503b86a14865ce300ebb9c4d96315eeb254d0b8 (and subsequent 2bd0e07ed83db39907f0c824854d68c1a8ba77ac and a32f31235a67d572d989ad9e344efe11d78774a5 where this was introduced. This stuff makes my head hurt, and getting it wrong leads to broken merges from the remote tracking branch...
Now, if Android is varying the mtime it reports for files [...]
Hm, I think I will enable debug logging for awhile, and will try to catch more info for my heisenbug. It may take weeks though, so simply know that no activity in this issue does not mean I had abandoned it. I will explicitly state so, if it will ever be a case.
Yes, and I hoped for a fast and dirty fix -- check diff before merge -- and if it's empty -- don't do that useless merge commit. It will unblock my primary workflow to start using ADB in full, as I stop fearing to trash my history on all my remotes (as I mentioned "rebase" won't help due to how "git annex sync" works). But maybe on empty commits still better to print something into debug logs or in warnings -- so the original bug still could be tracked and I continued searching for root cause.
I skimmed through those diffs, and I may say my head huts too And I will need to look more into surrounding code to understand them in full. Still I will return to them again after some debug logs were collected.
Until then -- is it possible to do what I mentioned above -- "check diff before merge -- and don't merge if it's empty" ?
I somewhat narrowed down the issue. Error "unsafe to overwrite file" only appears for plaint text files (and symlinks) committed to .git itself. This error only appears after several exports and some time passes (still can't catch conditions).
Current hypothesis is that empty commits in history are related to these plain text files. @joey, do you have any thoughts?
Well that message suggests that the CID of the file appears to have changed. But I investigated the changed CID theory back in comment #4 and could not get it to produce an empty commit.
The file being not annexed might be a clue.
If it happens with a symlink, that would certianly be a clue. But git-annex avoids exporting symlinks to special remotes now, so it would be worth upgrading to the most recent release to avoid the old behavior with them.
@joey I had found out the root cause of the bug ! CID really changes between different days.
Basically, on android /sdcard internal user memory (and /storage/ mounted SDCard) are not exposed directly, but through virtual flash driver. And this driver REGULARLY increments inode for ALL files in FS, around once per day on average (but I spotted once per several days, and several times a day too). For example same file had 3 different inodes over the week: 323733 -> 338757 -> 364584 -> ... -> 426292. I dunno why it rotates regularly or after some events -- but truth is inode is not reliable* as CID on android.
In that case -- can we change CID for android remote to (mtime, name) only?
Well, a change to the inode would change the CID, but what I saw back in comment #4 was that a change to the CID when the file content did not actually change, did not result in an empty commit. So I still don't understand how you are getting empty commits here.
I also don't think all android devices do that, because such a change of the inode will mean it needs to re-download the file, and I do not see periodic re-downloads of files when syncing with my own android phone.
I no longer see an empty commit in the scenario I described in comment #1. I'm not sure what fixed that.
So this bug is blocked on more information being available, since I still don't know how to reproduce the empty commit problem. Marking moreinfo.