Hello,
I am a newbie git annex and english is not my native language. Please bare with me. After evaluation so many options, I have decided to keep track of my family photo collection via git annex. My family photo collection is around 300gb and consisting of 120K items currently and its getting bigger fast.
Previously, I was using my home server and attached usb disks. I had also cold backup disks. But keeping them in sync without making errors had been troublesome.
For this task, I decided to use three external usb disks along with a directory in my home server each keeping a git annex repository. I use my desktop as source annex repository to upload photos to these git annex repositories. I will keep external disks off site and rotate them off and on.
I have implemented this design but I have run into a problem.
Currently, there are five git annex repositories as described above. One of them is on the home serve internal disk, three of them are in the usb disks. One is in my desktop harddrive. Apart from my desktop repository, all other repositories are configured as "standard/backup". My desktop annex repository is configured as standard/manual. Each repository has four remotes of other repositories. They can all sync each other and get the contents. However currently I am receiving git message from all my backup repositories as follows.
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
warning: The last gc run reported the following. Please correct the root cause
and remove .git/gc.log
Automatic cleanup will not be performed until the file is removed.
warning: There are too many unreachable loose objects; run 'git prune' to remove them.
I have tried git prune --dry-run
as suggested some internet sites and I have seen that git wants to remove almost all annex blob files.
The annex repositories are mostly on btrfs file system, one is on ext4 file system, I do know if it matters. Git-annex versions are 10.20230126-9 in every machine (arch and manjaro).
git annex unused
gives nothing. I am not using any encryption or any other git pluggable system other than git-annex.
Could you provide some insight please. Is there something wrong my repositories. It really took days to upload and sync all these repositories. Any help is much appreciated.
Did you have a look at
https://git-annex.branchable.com/tips/Repositories_with_large_number_of_files/?
What is your current object count (aka non-packed files)?:
If you have many of them (multiple 10k) then it might be a good idea to repack them:
Thank you for the response. Yes, repacking and calling garbage collection solved the problem. I am not familiar with git plumbing commands, even though I have been using git for years, I have never needed using these commands. I had to look it up about what they do. I was afraid to lose files, because running prune with dry run options listed objects as many as my files. After your response, I applied on a backed up repository, it seemed I did not lose anything, and applied all other annex repositories, the problem has been solved.
I read the post you referenced after your pointing out, I would like ask if still
git update-index --index-version 4
is relevant for today. The post is 7 years old, by that time, git must have improved a lot.I am also not quite sure about losing file tracking information. I like to keep this information. My annex repository which is in source standard group and does not have any actual content files is around 147mb. My full backup repositories are around 320gb and
git annex find
reports around 37500 files which is way less that I thought.git status
commands takes about 5 or 6 seconds to complete. Yesterday I have added around 30gb of new photos. From source repository in my desktop to four repositories (home server and attached three usb disks) over ssh took about a day.Uploading is very slow with ssh. For each file, it copies to each remote in turn. It does not blast out all files to one remote in one connection. How can improve sync speed?
Thank you againt for the directions...
By "all annex blob files", do you mean that all the files it wants to remove look like git-annex links?
The way this can generally happen is if you
git add
a file (git-annex add
would be the same) and then, without committing that, yougit rm
it, or you change the file and rungit add
again on the new content. The original version of the file is not pointed to by any commit, so it's unreachable.So having a lot of such git objects is not itself evidence of a problem, if you have done that a lot. Or something similar, such as deleting a branch.