Lots of 4k symlinks

Hi,

this is a minor issue and probably there is no better solution, but nevertheless I would like to point it out and maybe discuss a little about the issue.

Given that the symlinks generated by annex are pretty large in size (they point to a file named by a large hash number), ext4 is using an entire block (4K) of storage instead of embedding the symlink into the inode itself. For the "archivist use case" of annex, this might lead to tens or hundreds of MBs of disk occupied by symlinks which actually don't add up to more than a few MBs.

Here is a real world example:

(ins)carlos@carlos home$ du -hs music/
56M music/
(ins)carlos@carlos home$ du -bhs music/
3.3M    music/
(ins)carlos@carlos home$ ln -s /tmp/x x
(ins)carlos@carlos home$ du x
0   x
(ins)carlos@carlos home$ ln -s /tmp/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx
(ins)carlos@carlos home$ du xx
4   xx

Cheers, Carlos

RSS Atom

comment 1

For the "archivist use case" of annex, this might lead to tens or hundreds of MBs of disk occupied by symlinks which actually don't add up to more than a few MBs.

$ pwd
/home/sorting_annex/mnt/keyfile
$ du -shc *-*
...
33M     fd0dc9d3-ad62-429e-ba1b-acc26a453ca4
33M     fd2fc989-bea7-4ffb-bbc8-2e34cd0e5be5
33M     fd79bbd4-d41e-4ea8-acc8-86437c5eed7c
33M     ffbd042e-f6d9-4450-9a57-8ed1086f587c
2.7G    total
$

Just a bit :P (yes, that is 2.7G of symlinks so far)

Comment by CandyAngel — Mon Apr 24 13:55:02 2017

Remove comment

comment 2

You also get better seek speed with packed inodes.

With default 256 byte inodes, there seems to be 59 bytes to play with. (Determined experimentally.)

Note that disks over 4 tb default to 32 kilobyte inodes, so probably most spinning hard disks these days do pack regular git-annex symlinks efficiently. (I don't have a 4 tb disk online to check this.. And I doubt CandyAngel was counting only the sizes of symlinks and not git repos or at least directory inodes to hold all the symlinks.)

With a prefix like ".git/annex/objects/zX/Wx/S-s1000000000-" that leaves 20 bytes out of the 59 for the hash.

That's not enough data to be cryptographically secure, but if we use SHA1 or MD5 as the base hash, it wouldn't be anyway. 15 bytes of hash state will base64 encode to 20 bytes. SHA1 is a 20 byte hash; MD5 is a 16 byte hash. So even MD5 would need to be truncated a little bit. Chances of (non-malicious) collision would still be small, only 256 times as likely as a (non-malicious) MD5 collision. It could easily be made harder than MD5/SHA1 to maliciously collide by using truncated SHA2.

(Files larger than 9.3 gb would still have too long symlinks due to the size field. The size field could also be omitted or encoded more efficiently, but omitting it would reduce git-annex's ability to not overfill disk and I don't think re-encoding buys enough to bother.)

Comment by joey — Tue May 9 21:07:11 2017

Remove comment

comment 3

My analysis above assumes no subdirectories.

To leave space for even a single "../" would need to drop to 13 bytes of hash state. 1/79228162514264337593543950336 chance of 2 files colliding. Not comfortable with something so worse than md5, and that still doesn't help when files are 2 directories deep. Droping to 11 bytes for that, 1/1208925819614629174706176 chance is starting to get into could really happen territory.

Comment by joey — Tue May 9 22:49:01 2017

Remove comment

comment 4

And I doubt CandyAngel was counting only the sizes of symlinks and not git repos or at least directory inodes to hold all the symlinks.)

In that repository, it is only top level directories (no sub directories) and each directory in it only has symlinks (up to 8000 of them). Directories are mkdir $(uuidgen -r), hence the wildcard for du.

It would be including the directory size to hold all the inodes, but it definitely isn't counting .git as this annex spans 3 drives with 6TB of content so far. Well, 6 drives because of "numcopies 2" :P

I will calculate this a different way and only count symlinks, when I have access to it again.

Comment by CandyAngel — Wed May 10 09:21:34 2017

Remove comment

comment 5

$ find -name .git -prune -o -type l | wc -l
1034886

Just over a million symlinks.. very convenient

$ find -name .git -prune -o -type l -printf '%s\n' | awk '{sum+=$1} END {print sum/1024**3}'
195.9 # 195MB actual size
$ find -name .git -prune -o -type l -print0 | du -ch --files0-from=- | tail -n1
4.0G    total # 4GB disk usage

And in comparison to my earlier comment 2 weeks ago:

$ du -shc *-* | tail -n3
33M     fd79bbd4-d41e-4ea8-acc8-86437c5eed7c
33M     ffbd042e-f6d9-4450-9a57-8ed1086f587c
4.1G    total

So directory inode sizes are dwarfed by the 4K disk usage but ~198b actual usage of the symlinks (~96% wasted space?).

Comment by CandyAngel — Wed May 10 12:44:08 2017

Remove comment

comment 6

Oops,

find -name .git -prune -o -type l -printf '%s\n' | awk '{sum+=$1} END {print sum/1024**3}'

should have been

find -name .git -prune -o -type l -printf '%s\n' | awk '{sum+=$1} END {print sum/1024**2}'

That'll teach me to prematurely copy it :P

Comment by CandyAngel — Wed May 10 12:45:59 2017

Remove comment

comment 7

Note that the analysis in my earlier comment assumes that the .git/annex/objects/xx/yy/key/ directory is removed. As long as those per-key directories are used, the symlinks cannot possibly be made short enough to pack.

There have been some other requests for that (datalad requested it because all those per-key directories use disk space, add to the size of the git repo, and slow down traversal). However, git-annex relies on those directories to prevent accidential rm -rf deleting the annexed objects and prevent some symlink following programs from editing/corrupting the annexed objects (the per-key directories are left mode 400 most of the time). So it would be fairly complicated to add a tuning that eliminated those while locking down the permissions some other way (eg, making the yy directories mode 400 except when one or more thread/process needs to write to them), and since it would have to be a tuning, it would introduce a lot of conditional complexity into the code.

Comment by joey — Thu May 11 15:57:10 2017

Remove comment

Add a comment