GetComponents should not use shallow clones for Git checkouts

Issue #942 closed
Ian Hinder created an issue

GetComponents currently uses a "shallow clone" for Git repositories by default. This checks out only the repository information needed for the current version.

Problems with shallow clones:

  • It is impossible to switch branches to another version after the initial clone. Since GetComponents checks out release branches using "git clone ...; git checkout ...", this means that release branches cannot be checked out using the default GetComponents options (see #934). This has to be fixed.

  • You cannot push or pull from/to a shallow clone. In fact, you can do very little with a shallow clone that you couldn't also do with source obtained from a tarball.

Benefits of shallow clones:

  • You save a small amount of space and checkout time due to not including the (compressed) version history, which was the original rationale for using them (see #148). The checkout size of Carpet was measured to increase from 73 MB to 110 MB.

There was discussion in #148 amounting to the idea of providing a two-tier Einstein Toolkit. The "developer tier" would be interested in full clones and authenticated repositories, and the "user tier" would be interested in shallow clones and non-authenticated (incorrectly conflated with "public") repositories. I strongly dislike this idea, and agree with the comments in that ticket which said that nearly all users of Cactus are also developers, and should be treated in the same way. Let's keep things simple and egalitarian.

I do not consider the space-saving to be significant, even if this was representative of the ET as a whole, which it is not.

Shallow clones are nonstandard and lead to problems and confusion. I propose removing support for shallow clones from GetComponents. If there are truly users of the ET who do not want to interact with version control systems at all, then we can accommodate these users by providing release tarballs, which will be much smaller, easier and faster to download.

Keyword:

Comments (9)

  1. Barry Wardell
    • removed comment

    I think this is a good idea. I don't see any advantage to shallow clones which is significant enough to outweigh the many disadvantages.

    The only case where I can see shallow clones making a significant difference is in the ExternalLibraries where a new tarball is committed each time the library is updated. I can imagine this leading to very large repositories in the future, but if that does happen then we can cross that bridge when we get to it (for example, by not including the actual tarballs in the repository and having some mechanism for downloading them). As an example, a shallow clone of OpenSSL is 12 M while a full clone is 42M. I would consider this is to still be acceptable.

  2. Eric Seidel
    • removed comment

    I seem to remember some concern about space limitations in $HOME being raised in the original discussion of shallow clones. If that is not (no longer?) an issue, then I wholeheartedly support removing the shallow-clone default. A full clone of a git repo tends to be smaller than an svn checkout of the same repo anyway.

    Also, the mention of tarballs raises what seems to have been a silly oversight on my part when I first wrote !GetComponents. We can download arbitrary files using wget/curl, but since we most often deal with code it would be natural to add support for auto-extracting tarballs into the destination folder.

  3. Frank Löffler
    • removed comment

    I seem to remember that one of the arguments beside space was that the time to checkout a git repository was quite a bit reduced for a shallow clone compared to a full clone, but I don't have hard numbers now. In addition the old discussion was about Carpet, and Carpet doesn't use git anymore now.

    ExternalLibraries are not really an issue, as 'only' two copies are checked out using svn, not the whole history.

    I agree that obtaining the release using newer git versions should work (older versions are ok, even with shallow clones: something must have changed in git). If 'deep' clones are the only clean way to do that with git we will have to go that way.

  4. Roland Haas
    • removed comment

    Replying to [comment:2 eric9]:

    I seem to remember some concern about space limitations in $HOME being raised in the original discussion of shallow clones. If that is not (no longer?) an issue, then I wholeheartedly support removing the shallow-clone default. A full clone of a git repo tends to be smaller than an svn checkout of the same repo anyway.

    Also, the mention of tarballs raises what seems to have been a silly oversight on my part when I first wrote !GetComponents. We can download arbitrary files using wget/curl, but since we most often deal with code it would be natural to add support for auto-extracting tarballs into the destination folder.

    I seem to be usually the one arguing for small sizes due to HOME directory restrictions (usually on Kraken since I have to keep my codes there since it (used to be) is the only folder accessible to me that is not purged). I don't remember having said anything about shallow clones since I would never actually check the code out on kraken but rather use sim sync to transfer which does not (or can be configured to not) include the repository metadata.

    I'd welcome making --noshallow the default (whether or not --shallow stays as an option I don't mind, though if it simplifies the code I should be removed). This would also solve occasional problems where I could not check out from bitbucket (but have not been able to really pinpoint what is causing it).

  5. Erik Schnetter
    • removed comment

    The option --shallow does not work at the moment, period. It therefore should not be the default.

    Any discussion about possible saving space in the home directory is moot since the option cannot be used anyway.

  6. Frank Löffler
    • removed comment

    --shallow does only seem to have problems with a really new version of git. It seems to work for most users (because they are not using such a new version yet). However, I agree that there is enough reason to make --noshallow the default and try to fix --shallow later (or remove it). Please go ahead.

  7. Roland Haas
    • changed status to open
    • removed comment

    I attach a patch to make full clones the default (again). Note that this reverts the decision in #148. I will have to apply for write permission before committing it.

  8. Log in to comment