In DataLad, there are some dotfiles that we want to annex
(e.g. .datalad/metadata/objects/*
), so we need to set
annex.dotfiles=true
. Ideally that would be in effect for annex
commands that DataLad executes without setting annex.dotfiles=true
in the repo. DataLad configuring annex.dotfiles
in the repo seems
problematic because the change in behavior would be surprising to
users that call git annex add
directly. And more generally, DataLad
should be able to operate in existing git(-annex) repos without
changing configuration values in the background.
That reasoning leads to using -c annex.dotfiles=true
in our calls to
git-annex. By doing that combined with --force-large
, we can make
git annex add
send a dotfile to the annex. However, if the file is
later unlocked, it switches to being store in git, presumably when the
clean filter runs. The script below provides a concrete example of
this when running on an unlocked adjusted branch.
So, I think this is expected behavior. annex.dotfiles=true
is no
longer in effect when the clean filter runs, and the dotfile goes into
git instead. The only way I can think of to work around this
annex->git conversion is to set annex.dotfiles=true
in the repo.
But, as I mentioned in the first paragraph, I'm hoping to avoid that.
Is there another solution that I'm overlooking?
#!/bin/sh
set -eux
cd "$(mktemp -d --tmpdir gx-XXXXXXX)"
echo $(git annex version --raw)
git init
git annex init
git commit --allow-empty -mc0
git annex adjust --unlock
echo one >.dot
git annex add -c annex.dotfiles=true --force-large .dot
git commit -mdot
git diff
+ mktemp -d --tmpdir gx-XXXXXXX
+ cd /tmp/gx-dVCRKik
+ git annex version --raw
+ echo 8.20200226
8.20200226
+ git init
Initialized empty Git repository in /tmp/gx-dVCRKik/.git/
+ git annex init
init (scanning for unlocked files...)
ok
(recording state in git...)
+ git commit --allow-empty -mc0
[master (root-commit) 594fa63] c0
+ git annex adjust --unlock
adjust
Switched to branch 'adjusted/master(unlocked)'
ok
+ echo one
+ git annex add -c annex.dotfiles=true --force-large .dot
add .dot
ok
(recording state in git...)
+ git commit -mdot
[adjusted/master(unlocked) 826ad5d] dot
1 file changed, 1 insertion(+)
create mode 100644 .dot
+ git diff
diff --git a/.dot b/.dot
index 9a70ce7..5626abf 100644
--- a/.dot
+++ b/.dot
@@ -1 +1 @@
-/annex/objects/SHA256E-s4--2c8b08da5ce60398e1f19af0e5dccc744df274b826abe585eaba68c525434806.dot
+one
annex.dotfiles=true
, then in.gitattributes
at repo root put.* annex.largefiles=nothing
, and in .gitattributes under .datalad put.* annex.largefiles=anything
?Thanks for the reply, Ilya. Sorry, I think my mention of
.datalad/
along with an example that used a top-level dotfile was unnecessarily confusing. It's not an issue of sending some files dotfiles to annex and some to git. It's just wanting to avoid settingannex.dotfiles
in.git/config
(orgit-annex:config.log
) of the repos that DataLad touches.datalad save
on a preexisting datalad dataset and got previously unseen error with "Invalid option `--include-dotfiles'". Is this related to ongoing development? Or is there some easy fix? Thanks! (apologies if this is a poor place to post)DataLad has been updated for the removal of
--include-dotfiles
in the latest git-annex release (8.20200226), but there hasn't been a DataLad release yet that includes that fix. So I'd say the easiest fix for now would be installing a developmental version of DataLad (bothmaster
andmaint
have the fix). I think downgrading your git-annex version would be problematic because your repo has probably already been auto-upgraded to v8.--no-patch
flag forgit show
, but that's a separate issue.conda list
?I conda uninstalled datalad in this conda env, then pip installed from master using
pip install git+https://github.com/datalad/datalad.git
Output of conda list:
I think the problem demonstrated by the script is the same problem as ?strong>, which I've now fixed.
(git-annex does propagate -c settings on to all git commands it runs)
Hmm yeah I may have previously screwed something up with the conda env, because it didn't have git-annex either, which I just manually installed via conda and is now present. This also seems to have installed a new git in the conda env (see conda list below), but for whatever reason, invoking git still reaches outside the conda env (I got the server admin to upgrade to 2.9):
I'm still hitting a datalad error in trying to save the dataset.
and on and on listing many files, then:
I did a
datalad remove
on this derivatives/mriqc directory a few days ago, not sure why that's tripping up.conda install -c conda-forge datalad
. On a second try, can confirm thatgit
is not listed among the condaNEW packages that will be INSTALLED
, although it does getgit-annex conda-forge/linux-64::git-annex-8.20200226-nodep_h1234567_1
(as well asgitdb
andgitpython
). Conda env still points to the external git installation. Running the conda version ofdatalad save
gives me theInvalid option
--include-dotfiles'error, so I install from GitHub with
pip install git+https://github.com/datalad/datalad.git`. However, this doesn't seem to change anything (I think this is why I ended up conda uninstalling datalad previouosly). Is there away to ensure the version installed from GitHub via pip takes precedence?Did it work? I wonder why you got the standalone (nodep) git-annex build in your environment -- this should only happen when package conflicts prevent the normal package from getting installed. When you create a fresh env with git-annex, which package version do you get?
"don't have admin privileges, so I'm using the existing conda (v4.7.5) installation" -- with conda you don't need admin privileges, you can get the installer and install it in your own dir.
Oh right, good point about conda. Although I'm not sure why using a new installation of miniconda would make any difference relative to using the system's conda. When running
conda create -c conda-forge -n datalad-env git-annex
it wants to install: git conda-forge/linux-64::git-2.25.0-pl526hce37bd2_0 git-annex conda-forge/linux-64::git-annex-8.20200226-hfc01302_101I think I understand the git version issue now. I had requested admin update git (v1.8), so they had me sourcing some environment variables pointing to a newer version (v2.9). Apparently this was overriding the conda git installation; when I started from scratch on the server, it correctly uses git v2.25 rather than v1.8.
Following your recipe, I needed a
conda install pip
, then pip installed DataLad from GitHub. However, I'm still running against the previous error:...listing many files, then:
Am I just missing some step in the workflow? If there's no obvious solution, I can probably just re-create the entire datalad dataset from scratch with the new software versions in this conda env.
It's not obvious to me why an uninstalled subdataset would be causing issues with the repo upgrade process. Quickly trying, I can't trigger an issue, but perhaps the submodule got into some weird, not-really-removed state. You can open an issue on DataLad's side and we can try to debug it, though doing so might be tricky without a recipe to trigger it.