macOS has CoW copies too but they're only available via the /bin/cp
binary as GNU coreutils doesn't support using macOS' calls for this unfortunately.
One trouble here is that it doesn't automatically fall back to a non-reflink copy when a reflink copy is not possible, so that would have to be handled by us somehow.
The cp command not falling back is ok, as long as it exits nonzero in a situation where it can't make a CoW copy and doesn't misbehave badly. In Annex/CopyFile.hs, tryCopyCoW probes to see if cp can make a CoW copy, and if not it falls back to copying itself, without cp.
So what you can do is change copyCow in Utility/CopyFile.hs to support the OSX cp command and parameters (inside a
#ifdef darwin_HOST_OS
).Notice that the probe deals with discarding error messages from the command, also also deals with linux's cp failing after creating an empty file when it doesn't support CoW.
That would leave copyFileExternal not using CoW; it could be made to probe too on OSX, but that's not really used in many places and it would probably make more sense to convert any of those places that matter to use Annex/CopyFile.hs's tryCopyCoW. This could be deferred until later.
I don't think it needs to be a configure flag unless this feature is somehow flakey.
Oh, I've already got all of that implemented; it's just the flag for disabling that behaviour at build time that's missing.
What I did is to conditionally set the executable to
/bin/cp
and the reflink param to-c
.The problem with using it without a fallback is that when you use it on a FS that doesn't support CoW,
/bin/cp
will hard-fail and make unlocking impossible. GNU coreutils actually fall back automatically by themselves, GA couldn't handle reflink cp failing before AFAICT. I refactored the copy functions a bit to make it fall back properly.The reason I want it to be a configure flag is that some users might use GA exclusively on non-APFS FSs (trying to reflink copy here would be a waste of time) and some might prefer to use their $PATH's uutils-coreutils whose
cp
can handle--reflink
just like the GNU ones.I originally wanted to add it as a cabal configure flag but apparently you can't reference those anywhere?Found this: https://stackoverflow.com/questions/48157516/conditional-compilation-in-haskell-submodule, that's probably what I'll end up doing. Will default to true on macOS.I've also got some small fixes for things that came up during development:
https://github.com/Atemu/git-annex/tree/misc-fixes
The patch makes copyFileExternal slower on linux when CoW is not supported, as it will try cp --reflink=always every time, and then when it fails, run cp a second time. This is why I discussed in my comment above that it would make sense to switch code using copyFileExternal to instead use tryCopyCoW.
Yes, it could, this is done in tryCopyCoW.
Current places that use tryCopyCoW maintain state, so it only pays the overhead of running cp one extra time to probe if reflinks work. That should also be possible on OSX.
I do not see a need for a build flag, also I doubt that many users in such a situation would rebuild with that flag.
The system /bin/cp will always be there, right? So I don't see a need to bother about other cp implementations.
I missed
copyFileExternal
actually using--reflink=auto
instead ofalways
, thus lettingcp
handle falling back. Although,cp
would have to do the check and fallback internally for every call we make to it too so there might not be that much of an performance impact of doing the same in GA.Tracking the reflinkability for a batch of
cp
s is a good idea though and I like it as a generally cleaner solution but I'm not so sure about any real performance benefits of that.I tried using tryCopyCoW in copyFileExternal (seemed like the most obvious thing to do) but it doesn't seem to support the
CopyMetaData
argcopyFileExternal
is supposed to be able to handle, right?fork/exec can end up being fairly significant overhead when the files are small enough. And, at least on linux systems, most people are probably not using a filesystem supporting CoW, so most people would pay that perf penalty. That's why the current code is optimised to avoid it, at least in places where it's seemed worthwhile.
I'll suggest again: Just start with copyCoW, leave copyFileExternal not doing CoW copies, I'll accept that easy patch. And then if any of the 6 or so call sites of copyFileExternal end up being ones you want to make support CoW on OSX, convert them to use tryCopyCoW.
(Alternatively, if there were a haskell library that provided the syscall that does a CoW copy, it could just use it.)
GNU coreutils will implement reflink copies on macOS in the next major version, so we will get this for free: https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=5e36c0ce078a65c7dac6ac5ebdfb0cf096856427
I use GNU coreutils system-wide because they're much better but not all users do that. We might need extra logic to handle odd aliases some distros give to GNU coreutils use like
gcp
. Is there a build-time way to specify which coreutils git-annex should use at runtime?Build/Configure.hs
probes for supported cp options, and could be extended to try gcp.Problem is that in our Nix sandbox,
cp
is always coreutils' while on the actual system, it might be macOS'.It'd be nice to have a way to declare what binary to use (with absolute path or just binary name) for
cp
and the like, that way we can get rid of our runtime wrapper that puts coreutils in git-annex' PATH in anticipation of features that rely on GNU coreutils.