git-annex tries to run in a constant amount of memory, however knownUrls loads all urls ever seen into a list, so the more urls there are, the more memory git annex importfeed will need.

This is probably not a big problem in practice, but seems worth doing something about if somehow possible.

Unfortunately, can't use a bloom filter, because a false positive would prevent importing an url that has not been imported before. A sqlite database would work, but would need to be updated whenever the git-annex branch is changed. --Joey

A sqlite database that is updated by diffing would probably also speed up the stage of importfeed where it gathers known urls. That can be significantly slow in large repos. So I think worth doing.

done --Joey