Enhance GetComponents functionality when dealing with git branches

Issue #248 closed
Bruno Mundim created an issue

When using git it is very common to develop new code or functionality in separate branches. Sometimes these branches are maintained in a central repository too (besides in the user local repo). It would be nice if GetComponents could handle the checkout and update of all these branches in the same Cactus tree. Currently we have to either checkout these branches manually (a bit inconvenient) or to have several Cactus trees, each containing the branch that we want to work with (quite impractical and expensive). I suggest that GetComponents understands the following lines written in the same thorn list file and checkout all the appropriate branches:

!CRL_VERSION = 1.0

!DEFINE ROOT = Cactus !DEFINE ARR = $ROOT/arrangements

Checkout the master branch:

!TARGET = $ARR !TYPE = git !URL = ssh://path/to/repo !CHECKOUT = arrangement1/thorn1 arrangement1/thorn2

#Checkout the development branch: !TARGET = $ARR !TYPE = git !URL = ssh://path/to/repo !REPO_BRANCH = development !CHECKOUT = arrangement1/thorn1 arrangement1/thorn2

#Checkout the experimental branch: !TARGET = $ARR !TYPE = git !URL = ssh://path/to/repo !REPO_BRANCH = experimental !CHECKOUT = arrangement1/thorn1 arrangement1/thorn2

#Checkout the test branch: !TARGET = $ARR !TYPE = git !URL = ssh://path/to/repo !REPO_BRANCH = test !CHECKOUT = arrangement1/thorn1 arrangement1/thorn2

As far as I understand this workflow is not usual for either cvs or svn. So the "feature" proposed here would be limited to git repositories (I know nothing about darcs and very little about hg. So someone may want to comment for these repo types.).

Thanks, Bruno.

Keyword: git
Keyword: branches

Comments (17)

  1. Eric Seidel
    • changed status to open
    • removed comment

    What would the resulting directory structure look like? Do you want separate directories for each branch, or rather just be able to cd to arrangement1/thorn2 and `git checkout experimental` without having to fetch the branch yourself? The latter would likely be easier to implement, as well as more in tune with standard git practices as I understand them.

  2. Bruno Mundim reporter
    • removed comment

    No separate directories for each branch. Following the usual git practices would be great. So your second option is the best one.

    Thanks, Bruno.

  3. Erik Schnetter
    • removed comment

    You specify different branches for the same directory (arrangement/thorn). Which branch should be checked out?

    It would be easy to pull (download) all branches into the repository, but only one branch can be checked out.

    If all available branches are pulled, but only one branch is checked out, then you can still switch between the branches manually. When updating or examining the status, GetComponents could ignore the branch specified in the thorn list and instead look at the branch currently checked out. This is somewhat dangerous (since some people will forget that they switched to a branch or a tag), but a message may suffice for this.

    svn supports the same mechanism, we could treat it in the same way, i.e. checking out all branches, and then setting symbolic links to the desired branch or tag. (The language is different here, "check out" means different things for git and svn.)

  4. Eric Seidel
    • removed comment

    I am going to propose a slightly more compact syntax. We only need to specify the repo once; multiple branches can be separated by a comma.

    1. Checkout a few branches: !TARGET = $ARR !TYPE = git !URL = ssh:path/to/repo !REPO_BRANCH = master, stable, experimental # etc. !CHECKOUT = arrangement1/thorn1 arrangement1/thorn2

    This would setup the local repo to track the master, stable, and experimental branches, with the master branch being the one initially checked out. To default to the stable branch, one would simply swap the order of master and stable. In addition, if you really only wanted a specific branch (i.e. stable), you could specify stable by itself. Then the repo would be cloned with only the stable branch. I imagine this could speed up the process somewhat for large git repos.

  5. Bruno Mundim reporter
    • removed comment

    You specify different branches for the same directory (arrangement/thorn). Which branch should be checked out?

    The way I proposed earlier: the last branch in your thorn list would be the last branch to be checked out. However I think Eric's suggestion is even better and quite economical. To make Eric's idea simpler to implement I would keep the order of the list as the order of check out. For example:

    !REPO_BRANCH = master, stable, experimental

    would default to experimental.

    It would be easy to pull (download) all branches into the repository, but only one branch can be checked out.

    That's true. It would be up to the user which branch to use in her configuration build. Switching branches is as inconvenient as changing directories. I don't think GetComponents should be of any help in this case.

    When updating or examining the status, GetComponents could ignore the branch specified in the thorn list and instead look at the branch currently checked out.

    GetComponents should update all branches. Do you see any reason not to? When examining the status, I think it should list the status of all branches too, so that the user make an informed decision to update or not.

    svn supports the same mechanism...

    How do you envision this change for svn? The problem with svn is that the branches lie in different directory paths, then whenever you checkout a different branch you would have to redo all the sym links and you would have to have that implemented in GetComponents. Other difficulties may arise in that case, since people have private branches that won't be present in the public version. For example, if you want to change from the development to experimental in your private list of thorns and release to trunk in the ET list, how would you specify that using GetComponents? I think it complicates things, despite being a nice feature to have.

    Anyways, I like Eric new syntax proposal and would vote for that. I would hold any changes in the svn side until we have a better picture of what its changes would imply.

  6. Eric Seidel
    • removed comment

    Actually I'm going to change my suggestion a bit. There's really no overhead to setting up all branches for remote-tracking as far as I can tell. It seems that all the necessary data is cloned even if a shallow clone is performed, at least that is the case with the CRL repo. In that case, why not just setup all branches? That would be significantly easier than my previous suggestion since it wouldn't interfere with the logic I already have to determine if !REPO_BRANCH is actually a tag.

  7. Bruno Mundim reporter
    • removed comment

    What do you mean by "setup all branches"? Not specifying the branch names in !REPO_BRANCH? Please give us an example.

  8. Ian Hinder
    • removed comment

    Do you mean that local branches should be created for all remote branches? There is a --mirror option to git clone which does this, but it only works for bare repositories. I seem to remember some advice which said that cloning all remote branches by default, while it sounds like a good idea, is in practice more trouble than it is worth, because you have to keep them all up-to-date. I'm not sure of the details, but maybe google a bit to read up on it.

    I think users of GetComponents would expect git repositories to behave as "standardly" as possible, so it might not be a good idea to do too much "extra" work. Cloning the repository pulls in all data, and we can select the default branch to check out in the CRL file. If a user wants to check out another branch, they can use the appropriate git command to do that.

    The way that I usually work with multiple branches is to have a different arrangement for each branch. I would have two separate repositories, each with one of the branches checked out, and the symbolic links in the arrangements directory would point to the appropriate thorns. So for example, for the Kranc Examples arrangement, I have

    KrancExamples -> ../repos/Kranc/Examples KrancExamplesVector -> ../repos/KrancVector/Examples

    I don't have this configured in my thornlist, I do it by hand. But I think it would be possible to have this work in a thornlist if you could specify the arrangement name and the branch to check out, and just have two entries for the arrangement.

  9. Eric Seidel
    • removed comment

    Replying to [comment:8 hinder]:

    Do you mean that local branches should be created for all remote branches? There is a --mirror option to git clone which does this, but it only works for bare repositories. I seem to remember some advice which said that cloning all remote branches by default, while it sounds like a good idea, is in practice more trouble than it is worth, because you have to keep them all up-to-date. I'm not sure of the details, but maybe google a bit to read up on it.

    BR I see. I guess part of the reasoning is that git is designed for easy branching, so it's quite possible that a repository could have a very large number of branches. That could become a bit unwieldy I suppose. BR

    I think users of GetComponents would expect git repositories to behave as "standardly" as possible, so it might not be a good idea to do too much "extra" work. Cloning the repository pulls in all data, and we can select the default branch to check out in the CRL file. If a user wants to check out another branch, they can use the appropriate git command to do that.

    BR Right, so I'll maintain the current behavior of doing a regular clone with the master branch, and then allow users to specify additional branches to track. This will sidestep the issue I mentioned in my last post. !REPO_BRANCH is serving as a branch and tag indicator for git since I didn't want to add yet another directive, but I can't determine if I'm dealing with a branch or a tag until the repository has been cloned. BR

    The way that I usually work with multiple branches is to have a different arrangement for each branch. I would have two separate repositories, each with one of the branches checked out, and the symbolic links in the arrangements directory would point to the appropriate thorns. So for example, for the Kranc Examples arrangement, I have

    KrancExamples -> ../repos/Kranc/Examples KrancExamplesVector -> ../repos/KrancVector/Examples

    I don't have this configured in my thornlist, I do it by hand. But I think it would be possible to have this work in a thornlist if you could specify the arrangement name and the branch to check out, and just have two entries for the arrangement.

    BR Currently, this is not possible because GetComponents will automatically clone the repository using the same name; however, we could use the !NAME directive to specify an alternative name for the clone. This would leave us with the following syntax BR

    1. this will setup stable and experimental as remote-tracking branches,
    2. and then checkout the ET_2010_11 tagged commit !TARGET = $ROOT/bin !TYPE = git !URL = git:github.com/gridaphobe/CRL.git !REPO_BRANCH = stable, experimental, ET_2010_11 !CHECKOUT = GetComponents
    1. this will clone into CRL-experimental, and checkout the experimental branch !TARGET = $ROOT/bin !TYPE = git !URL = git:github.com/gridaphobe/CRL.git !REPO_BRANCH = experimental !CHECKOUT = GetComponents !NAME = CRL-experimental BR How does that sound?
  10. Erik Schnetter
    • removed comment

    This begins to become quite complex. First, there is the option to pull multiple branches into the same local repository, and then there is the option to pull the same branch into multiple local repositories. I don't think we need both.

    What about just pulling all remote branches into the local repository? This should work well for all the repositories we use, and if and when there is a problem with this, we can think about remedying it, but we should not begin to design something around a problem that doesn't exist for us. That means there is a single local repository, and people can quickly switch (using git checkout). We would not have to handle lists of branches.

  11. Bruno Mundim reporter
    • removed comment

    Replying to [comment:8 hinder]:

    Do you mean that local branches should be created for all remote branches? There is a --mirror option to git clone which does this, but it only works for bare repositories. I seem to remember some advice which said that cloning all remote branches by default, while it sounds like a good idea, is in practice more trouble than it is worth, because you have to keep them all up-to-date. I'm not sure of the details, but maybe google a bit to read up on it.

    All remote branches as specified in the CRL file in the !REPO_BRANCH directive. I already do that with a set of shell scripts for each of the private git repositories I work with. It is simply a loop over the list of branch names on the following command:

    git checkout --track -b $branchname origin/$branchname

    I thought this information could be stored in the CRL file and GetComponent could handle the checkout of these different branches. The order or which branch should be the default really doesn't matter that much to me. I would checkout then the branch I want to use for building my new configuration, for example. Regarding the --mirror option, I know nothing about it. Unless someone really understands it, I would stick with the command I suggested above.

    Why do you think it is trouble to update them all? The list is already in the CRL file, all GetComponents has to do is to loop over it with git pull --rebase or a similar command.

    I think users of GetComponents would expect git repositories to behave as "standardly" as possible, so it might not be a good idea to do too much "extra" work. Cloning the repository pulls in all data, and we can select the default branch to check out in the CRL file. If a user wants to check out another branch, they can use the appropriate git command to do that.

    Git has powerful tools that allow easy creation, merger or switching of branches. Indeed it is good practice and even encouraged that code developments happen in different branches. So working with branches is the standard practice when using git as a revision control tool. From the point of view of the novice, my suggestion to enhance GetComponents would not change anything. This user certainly would start using the ET thorn lists, either the development or the release branch. The development defaults to master branch and it is not even necessary to have it listed in the CRL file (as it is already the case). The release branch would be referred exactly as it is today: !REPO_BRANCH = ET_20XX_XX. So no confusion is added to what is already established. On the other hand, my suggestion helps users that end up working with several different git repositories, each with different branch names. Sure, this case is not part of ET, but it is a situation that users like me already face and has a very simple solution.

    The way that I usually work with multiple branches is to have a different arrangement for each branch. I would have two separate repositories, each with one of the branches checked out, and the symbolic links in the arrangements directory would point to the appropriate thorns. So for example, for the Kranc Examples arrangement, I have

    KrancExamples -> ../repos/Kranc/Examples KrancExamplesVector -> ../repos/KrancVector/Examples

    I don't have this configured in my thornlist, I do it by hand. But I think it would be possible to have this work in a thornlist if you could specify the arrangement name and the branch to check out, and just have two entries for the arrangement.

    Now add 5 different git repos with 5 different branches each one and your way to deal with branches just becomes impractical. Besides you have unnecessary duplication of git repositories! If you are using svn, then yes you have no choice but doing something as you suggested, but git provides you with tools to make your life easier when dealing with branches. You don't need to make multiple repo copies, you don't need to change directories to inspect another branch and in fact you don't need to type the different branch paths when diff'ing them. A simple git checkout branchname and that's it. So I would not recommend to deal with git branches as you suggested. It would subvert git philosophy and best practices :)

  12. Bruno Mundim reporter
    • removed comment

    Replying to [comment:9 eric9]:

    Replying to [comment:8 hinder]: > Do you mean that local branches should be created for all remote branches? There is a --mirror option to git clone which does this, but it only works for bare repositories. I seem to remember some advice which said that cloning all remote branches by default, while it sounds like a good idea, is in practice more trouble than it is worth, because you have to keep them all up-to-date. I'm not sure of the details, but maybe google a bit to read up on it. BR I see. I guess part of the reasoning is that git is designed for easy branching, so it's quite possible that a repository could have a very large number of branches. That could become a bit unwieldy I suppose. BR > I think users of GetComponents would expect git repositories to behave as "standardly" as possible, so it might not be a good idea to do too much "extra" work. Cloning the repository pulls in all data, and we can select the default branch to check out in the CRL file. If a user wants to check out another branch, they can use the appropriate git command to do that. BR Right, so I'll maintain the current behavior of doing a regular clone with the master branch, and then allow users to specify additional branches to track. This will sidestep the issue I mentioned in my last post. !REPO_BRANCH is serving as a branch and tag indicator for git since I didn't want to add yet another directive, but I can't determine if I'm dealing with a branch or a tag until the repository has been cloned. BR

    That sounds good to me!

    > The way that I usually work with multiple branches is to have a different arrangement for each branch. I would have two separate repositories, each with one of the branches checked out, and the symbolic links in the arrangements directory would point to the appropriate thorns. So for example, for the Kranc Examples arrangement, I have > > KrancExamples -> ../repos/Kranc/Examples > KrancExamplesVector -> ../repos/KrancVector/Examples > > I don't have this configured in my thornlist, I do it by hand. But I think it would be possible to have this work in a thornlist if you could specify the arrangement name and the branch to check out, and just have two entries for the arrangement. BR Currently, this is not possible because GetComponents will automatically clone the repository using the same name; however, we could use the !NAME directive to specify an alternative name for the clone. This would leave us with the following syntax BR

    `

    1. this will setup stable and experimental as remote-tracking branches,
    2. and then checkout the ET_2010_11 tagged commit !TARGET = $ROOT/bin !TYPE = git !URL = git:github.com/gridaphobe/CRL.git !REPO_BRANCH = stable, experimental, ET_2010_11 !CHECKOUT = GetComponents
    1. this will clone into CRL-experimental, and checkout the experimental branch !TARGET = $ROOT/bin !TYPE = git !URL = git:github.com/gridaphobe/CRL.git !REPO_BRANCH = experimental !CHECKOUT = GetComponents !NAME = CRL-experimental ` BR How does that sound?

    The first set of directives is fine for me.

    Unless Ian has a strong opinion about it, I would suggest not to implement this last set of directives. I don't think it is a good idea to have multiple copies of git repos, each one corresponding to different branches. I would stick with the standard way of using git: one repo with several different branches pulled from the remote repo.

  13. Bruno Mundim reporter
    • removed comment

    Replying to [comment:10 eschnett]:

    This begins to become quite complex. First, there is the option to pull multiple branches into the same local repository, and then there is the option to pull the same branch into multiple local repositories. I don't think we need both.

    Agreed.

    What about just pulling all remote branches into the local repository? This should work well for all the repositories we use, and if and when there is a problem with this, we can think about remedying it, but we should not begin to design something around a problem that doesn't exist for us. That means there is a single local repository, and people can quickly switch (using git checkout).

    Agreed.

    We would not have to handle lists of branches.

    What is the difficulty here? How GetComponents would checkout the different branch names without knowing their names in advance? Does the --mirror command do this job? why not let the user specify this list in the CRL file? It seems much easier to me and it is part of the repository information anyway.

  14. Eric Seidel
    • removed comment

    It seems that the general wisdom is that one should not set up local branches unless you are actually planning to working on that branch. The problem is that the local branches will fall behind and become stale, which I believe is what Ian was saying. This means that if we created local branches to track each remote branch, when users run `GetComponents --update` I would have to essentially do the following

    for $branch in `git branch` { `git checkout $branch`; `git pull --rebase $branch`; }

    This is certainly a possibility, but it involves a lot of extra work and could possibly lead to conflicts. Therefore I would suggest simply setting up remote-tracking branches for the branches that the user requests, like I said in my last post.

  15. Bruno Mundim reporter
    • removed comment

    Replying to [comment:14 eric9]:

    It seems that the general wisdom is that one should not set up local branches unless you are actually planning to working on that branch.

    That seems good to me.

    The problem is that the local branches will fall behind and become stale, which I believe is what Ian was saying.

    Isn't that up to the user? I think it is the user responsibility to either merger to master their branches with the new features or keep the several different branches up to dated with the changes committed to the master branch. This can be easily done by cherrypicking the commit number.

    This means that if we created local branches to track each remote branch, when users run `GetComponents --update` I would have to essentially do the following ` for $branch in `git branch` { `git checkout $branch`; `git pull --rebase $branch`; } `

    This is certainly a possibility, but it involves a lot of extra work and could possibly lead to conflicts.

    Conflicts will be unavoidable even for the master branch, if some sort of management or policy is not enforced, and that is valid for any revision control tool. So I am not sure what you suggest above will make things worse. However I prefer to have a $git_list instead of executing git branch, so the user is more control of the branches they want to track (and update).

    Therefore I would suggest simply setting up remote-tracking branches for the branches that the user requests, like I said in my last post.

    oh, ok. Disregard my comments above then. We both agree on this issue.

  16. Eric Seidel
    • changed status to resolved
    • removed comment

    `

    1. this will setup stable and experimental as remote-tracking branches,
    2. and then checkout the ET_2010_11 tagged commit !TARGET = $ROOT/bin !TYPE = git !URL = git:github.com/gridaphobe/CRL.git !REPO_BRANCH = stable, experimental, ET_2010_11 !CHECKOUT = GetComponents
    1. this will clone into CRL-experimental, and checkout the experimental branch !TARGET = $ROOT/bin !TYPE = git !URL = git:github.com/gridaphobe/CRL.git !REPO_BRANCH = experimental !CHECKOUT = GetComponents !NAME = CRL-experimental `

    Last night I implemented this functionality. As Bruno said, the second bit doesn't follow standard git practices, so I would discourage using it like that. However, there could be cases where renaming the repository would be useful, i.e. someone has their own git repo called CRL, which is unrelated to !GetComponents.

  17. Log in to comment