Recent comments posted to this site:

comment 17 360caa8972c2daa94044cc95188306e9
[[!comment Error: unsupported page format sh]]
Tue Apr 30 19:27:31 2024
comment 23 70dcb7e7ffdd14351adaf4c40ee7fdd0
[[!comment Error: unsupported page format hs]]
Tue Apr 30 19:27:31 2024
comment 3 e6ce9bb92c973350852c9498b7ffb50f
[[!comment Error: unsupported page format sh]]
Tue Apr 30 19:27:31 2024

There's not currently a way to do that without some scripting to get the keys, and then git-annex whereis --key.

I think this idea is worth doing something about, so I made this todo: wherewas.

Comment by joey Tue Apr 30 18:52:12 2024

Repo that contains the latest/current version of a file is not accessible. Can git annex whereis find the last available version of a file in other repos (or a specific repo)?

I can looping through commit log and running whereis for each commit until an earlier version of a file is found, but perhaps there is a better way to do it with a single command?

Comment by rshalaev Mon Apr 29 14:07:40 2024

Hi,

I'm wondering whether there an any easy way to delay "progress reporting" (a.k.a. "report progress for ALL transfer_store operations ONCE", a.k.a. "bulk transfer") for a special remote?

What I'm trying to achieve: there is an archiver called dar, which I would like to implement a special remote for. It can write many files into a single archive and also supports incremental/differential backups. A one can create an archive with this utility, by providing a list of files or directories as params.

The problem with the current git annex special remote API is that it does not allows to report transfer progress for ALL key/files for a special remote (e.g. with transfer_store), and then check the progress at ONCE for ALL files at the end of the process. Ideally, the protocol should have some kind of "write test" command to check the written archive for errors, and only then report the progress as "successful".

What I was thinking of is to just write all files into a temporarily list during transfer_store, and then externally archive this list of files after git annex copy --to dar-remote is done. But seems like git annex will think that the process of writing files to that remote was successful, while it may not (e.g. file access error happened, or an archive was corrupted, etc).

How can it be achieved? Do we need to extend git annex with another protocol extension? How difficult it may be, and where to start? I suppose there is no way Joey or anyone else will work on it any time soon if there is no workaround, and I have to submit a patch?

P.S.: I've seen async extensions but it seems like it's tied to a threads, which most likely won't allow to achieve the described goals.

Comment by psxvoid Tue Apr 2 06:41:25 2024
That makes a lot of sense. Yeah, I saw that post not long after posting my last comment. Looks promising, I'll have to try it out when I set up my next remote.
Comment by oadams Tue Apr 2 03:56:50 2024

That's right, I have actually thought before that enough people use it that it would make sense to either build it in as haskell or ship the program with git-annex in a kind of contrib.

With that said, there is also something to be said for distributing maintenance, and I think I'd at least want a committment to maintain it if it were added to git-annex, since git-annex-remote-rclone already has ongoing maintenance.

Another angle is New external special remote for rclone which might see the special remote built into rclone itself, and so able to take advantage of rclone's internal API. That might supplant the shell script if it turns out to be better.

Comment by joey Fri Mar 22 15:01:43 2024

In order to use rclone as a special remote, the user needs to download a separate Bash scriptfrom https://github.com/DanielDent/git-annex-remote-rclone and put it in their PATH. Since that extra dependency is only a few hundred lines of Bash, I would be interested in attempting to implement Remote/Rclone.hs so that the rclone special remote is entirely built into git-annex. However, I wanted to run it by you before more seriously considering investing time in doing that. What are your thoughts on this? I'm assuming the only reason rclone support isn't built into git-annex is just a lack of time and incentive, rather than a more fundamental technical reason. Is that right?

Thanks for all your work on this tool.

Comment by oadams Thu Mar 14 01:03:46 2024

annex.tune.objecthash1=true and annex.tune.branchhash1=true seem like they could be helpful in reducing git annex's inode usage, but the disclaimer about this feature being experimental is a little worrying.

Since this it is over 10 years old though, is it still considered experimental or has it graduated to being a stable feature? I.e. will using this meaningfully increase the chance of losing data?

Also, what is the (potential) benefit of using lowercase for the hashes?

Comment by imlew Wed Mar 6 12:26:56 2024