Improve determining make dependencies

Issue #1522 new
Erik Schnetter created an issue

Cactus automatically determines the dependencies between source files and their include files. This mechanism doesn't quite work in all cases; e.g. when a source file (or header file?) is removed, the build aborts, and the auto-generated dependencies have to be deleted automatically.

I have recently come across a mechanism that works reliably. Here is the relevant Makefile fragment, applied to building C++ files:

# Taken from <http://mad-scientist.net/make/autodep.html> as written
# by Paul D. Smith <psmith@gnu.org>, originally developed by Tom
# Tromey <tromey@cygnus.com>
PROCESS_DEPENDENCIES =                  \
    sed -e 's/$@.tmp/$@/g' < $*.o.d > $*.d &&   \
    sed -e 's/\#.*//'               \
        -e 's/^[^:]*: *//'          \
        -e 's/ *\\$$//'             \
        -e '/^$$/ d'                \
        -e 's/$$/ :/' < $*.o.d >> $*.d &&   \
    rm -f $*.o.d

%.o: %.cc
    ${CXX} -MD ${CPPFLAGS} ${CXXFLAGS} -o $@.tmp -c $*.cc
    @${PROCESS_DEPENDENCIES}
    @mv $@.tmp $@

-include ${DEPS}

Keyword:

Comments (16)

  1. Steven R. Brandt
    • removed comment

    This confuses me a little bit. Why do replace all the line continuations with colons in the sed script? Shouldn't they remain line continuations?

  2. Roland Haas
    • removed comment

    This seems to combine dependency generation and compilation of the source code. Is that right?

    While I am aware that this is considered the correct thing to do, with other codes that also do this I found that occasionally I would end up with corrupted dependency files (eg change in Makefile, change in compile options) that would invalidate some of the files in the dependency list. On that project the only fix is a make clean followed by a complete recompile, which takes a very long time. It would be nice if the cleandeps target still existed and if dependencies could be quickly regenerated.

  3. Erik Schnetter reporter
    • removed comment

    sbrandt: The script doesn't replace continuations with dependencies, it adds dependencies (">>") to the continuation that remain. This means that files that don't exist any more don't lead to make errors.

    rhaas: Yes, dependencies are generated when a file changes. If the makefile or the compile options change, then all dependencies are invalid and need to be regenerated. If Cactus doesn't do that automatically, then a make *-cleandeps may be necessary. (I'm not proposing to remove this.)

    There is no need to generate dependencies "quickly". If they are missing, then the respective source file needs to be recompiled anyway. (Otherwise, how would you know whether an existing object file was created from the right dependencies and options?)

    The latter point confused me for a bit. Here is an example: File source.c:

    #if useA
    #  include "a.h"
    #else
    #  include "b.h"
    #endif
    ... code using stuff declared in a.h or b.h ...
    

    If there is a change that makes source.c switch from using a.h to using b.h, then it needs to be recompiled. However, both a.h and b.h may be older than the object file source.o -- so keeping source.o around is an error, as neither the old nor the new dependencies would indicate that source.o is out of date. Hence, re-generating dependencies without recompiling is an error.

  4. Steven R. Brandt
    • removed comment

    I created a small copy of this code and tested it. Here's what I got in my tst.d. Note what happens to the appended piece.

     /usr/include/c++/4.8.2/bits/basic_ios.tcc \
     /usr/include/c++/4.8.2/bits/ostream.tcc /usr/include/c++/4.8.2/istream \
     /usr/include/c++/4.8.2/bits/istream.tcc
      ...
    tst.cc /usr/include/stdc-predef.h header.h :
     /usr/include/c++/4.8.2/iostream :
     /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/c++config.h :
     /usr/include/bits/wordsize.h :
    

    I think you want something more like this:

    PROCESS_DEPENDENCIES =                  \
        sed -e 's/$@.tmp/$@/g' < $*.o.d > $*.d &&   \
        sed -e 's/\#.*//'               \
            -e 's/^[^:]*: *//'          \
            -e '/^$$/ d'                \
            -e 's/$<[ ]/$< : /' < $*.o.d >> $*.d
    
    %.o: %.cc
        ${CXX} -MD ${CPPFLAGS} ${CXXFLAGS} -o $@.tmp -c $*.cc
        ${PROCESS_DEPENDENCIES}
        mv $@.tmp $@
    
    -include ${DEPS}
    
  5. Erik Schnetter reporter
    • removed comment

    The bottom of the output you show looks correct. However, the top is missing the "tst.o: ..." part -- did you cut off some lines at the top?

    The lines I pasted are from a working Makefile of mine.

  6. Frank Löffler
    • removed comment

    I am a bit confused by the whole discussion, but probably just don't know enough about Makefiles. One question: Why isn't a dependency on just the .o file enough? Why should there be a dependency line for the .cc file (assuming it is not generated of course). And another question: I agree with Steve in that I don't understand what lines like

    tst.cc /usr/include/stdc-predef.h header.h :
     /usr/include/c++/4.8.2/iostream :
     /usr/include/c++/4.8.2/x86_64-redhat-linux/bits/c++config.h :
     /usr/include/bits/wordsize.h :
    

    should do. They all are dependency lines, but the dependencies (right hand sides of the ":"s) are empty. In addition, most lines start with a space - why?

  7. Steven R. Brandt
    • removed comment

    Erik: Yes, I cut off the top. I just wanted to show the transition between the original and appended parts.

  8. Steven R. Brandt
    • removed comment

    Frank has a good point though. Why isn't the code below sufficient?

    %.o: %.cc
        ${CXX} -MD ${CPPFLAGS} ${CXXFLAGS} -c $*.cc
    
    -include ${DEPS}
    
  9. Erik Schnetter reporter
    • removed comment

    This simple line does not work if one of the dependencies vanishes. Currently, if you have a header file in a thorn, and then remove the header file because you don't need it any more, the respective .d file still records the dependency. Cactus will then abort with an error stating that it does not know how to generate this header file. A make -cleandeps is required. This is annoying.

    The magic above -- listing the header file without dependencies -- teaches make that this is fine, i.e. that the header file does not need to be generated. The respective source file will be recompiled (which needs to happen), and the correct dependencies be generated.

    This is actually a long-standing and deep problem with Makefiles. I'm sure there is a discussion on the web regarding this, but I unfortunately don't have a pointer.

  10. Roland Haas
    • removed comment

    Replying to comment:3. I do not know on right now what the usual situation is that makes me have to recompile from scratch because of the dependency trackin,g so do not know if it falls under this point. I'll wait until it happens again and try and verify.

    For the example: Rather than recompiling when when the dependency file is missing I'd rather say that this example circumvents make's dependency tracking systems since the is a case where the file needs to be recompiled (when useA changes) that cannot be tracked by make's dependency tracking system. So rather one would have to change useA in a file and then add this file to the list of dependencies. This is pretty much the situation we currently have with all the external libraries and their FOO_DIR variables that are set in the Makefile but there is no way of tracking their change (unless we implement #1264 and #1017) so that changing them then either requires some (private) knowledge of the build system to touch just the right files to trigger recompilation or (shudder) a -realclan and full rebuild.

    We currently do not include the Makefile (or at least not the option list in either config-info or config-data/make.configuration.defn) in the list of dependencies. While including them might strictly be the correct thing, I would rather not do so since it would trigger a full recompile. Changing those files often happens when one wants to add an extra define or change and optimization setting or change a library path when debugging. Having this trigger a full recompile would be quite disruptive to this process. I think that having to do a make sim-clean after such a change is acceptable in this case.

    Basically, I am saying that have a few (well documented) cases where it fails (but a system that is fast and useful in the more common cases) is preferable to having a system that always works but is always very slow.

    Given that sed syntax is occasionally a bit hard to read and that the "$" duplication required in Makefiles does not make it better, possibly a set of comments along the lines

    # postprocess dependency files into a format acceptable to make
    # 1. adjust final target name due to mv at end of recipe
    # 2. remove comments (everything after a '#')
    # 3. remove the target
    # 3. remove line continuation characters (\)
    # 4. delete empty lines
    # 5. add a ":" to the end of the line
    

    which is a pretty much the set of comments that is in SpEC for the (almost) same set of sed commands.

  11. anonymous
    • removed comment

    Sorry, I'm having trouble following parts of this last comment. What is useA? What do changes in variables have to do with this issue? I'll grant that needing a recompile when FOO_DIR changes is a problem, but isn't that a different problem than the one being considered here?

  12. Erik Schnetter reporter
    • removed comment

    Let's state the problem again. Assume we have a source tree consisting of two files, "header.h" and "source.c", and where source.c includes header.h. We build, everything is working fine, and the automatic dependency tracking records that source.c depends on header.h

    We then change the code, removing the file header.h, and chancing source.c so that it doesn't include header.h any more. This is still self-consistent, and if we say "make cleandeps", it will compile fine.

    However, without the cleandeps, the Cactus make system remembers that source.c depends on header.h. Since header.h does not exist any more, and since there is no rule for generating it automatically, make will abort with an error message. Yes, it will detect that source.c has changed and needs to be rebuilt, but it currently doesn't know that this also means that the dependencies are outdated and should be ignored.

    The proposed change tells make in this case that it should not worry about the missing file header.h. Thus make continues, decides that source.c needs to be rebuilt since it changed, and will also generate new, correct dependencies. All is fine.

    I did not understand Roland's text about "useA" and "change in a file", or why this would "circumvent make".

    We can also examine how the proposed new system behaves in the presence of errors. Assume that header.h has been deleted, and source.c has not been modified. Make will then treat header.h as changed (since it is not present), and rebuild source.c, leading to a compiler error pointing to the problem. This is also good.

  13. Roland Haas
    • removed comment

    comment:3 contains a sample code snippet which is used to explain why a file needs to be recompiled when its dependency information (.d file) is missing. This seems to have been in reply to me voicing a desire to be able to quickly regenerate dependency files without having to fully compile (and link etc) the cdoe. The example given is:

    #if useA
    #  include "a.h"
    #else
    #  include "b.h"
    #endif
    ... code using stuff declared in a.h or b.h ...
    

    and the text in comment:3 continues "f there is a change that makes source.c switch from using a.h to using b.h, then it needs to be recompiled. However, both a.h and b.h may be older than the object file source.o -- so keeping source.o around is an error, as neither the old nor the new dependencies would indicate that source.o is out of date. Hence, re-generating dependencies without recompiling is an error.". I wanted to point out that the reason that the file needs recompilation is that "useA" was changed which is what makes is switch from using a.h to b.h. If useA is changed in a file on which the example file depends then make will pick up the change and will recompile the example file. If useA is changed eg via command line options then make will not pick up the change, the example file will not be compiled and there is a problem. I wanted to point out that the problem is that the example introduces a cause for the example file to require recompilation (switching from a.h to b.h due to a change in useA) that is not trackable by make. Making files recompile when the dependency information file (.d) is missing seems to be a workaround since it forces all files to be recompiled when one performs a -cleandeps. So I am not convinced by the example that (fully) recompiling a source file when the dependency (.d) file is missing is required, rather than first constructing a fresh .d file and then using the information in it to decide whether the source file needs recompilation. That is I would have expected the .d files to be just caches for deciding when to recompile but that they can at all times be regenerated from the source files. In that I may well be mistaken.

  14. Roland Haas
    • removed comment

    I just ran into a situation requiring -cleandeps which would require tricks if cleandeps is no longer available.

    For testing GRHydro before pushing commits I either change the thornlist to switch from EinsteinEvolve/GRHydro to Zelmani/GRHydro or change a symbolic link in EinsteinEvolve to point to Zelmani/GRHydro (basically, what I '''really''' do is a little bit different but it is the same operation). However changing the ThornList messes up the dependency tracking since the .d files in build still point to the old location. In this particular case the old location did no longer exist (different git branch that contained the thorn while the current one does not). Right now I do a make sim-cleandeps followed by a find EinsteinEvolve/GRHydro | xargs touch (which one can certainly argue is ''ugly''). If missing .d files cause a full recompile then this would recompile all of the ET which takes a while. Right now this only recompiles the changed thorn.

    The issue could be solved if the build system kept track of the arrangement where a thorn was found and would mark all object files as invalid if a thorn moved.

    I think this would actually be helped by the proposed patch in that it would work more gracefully with the vanishing source files.

    This is admittedly a non-standard workflow but it is something I end up doing each time I have new batch of changes for GRhydro (so about once a week if I actually stay on top of it).

  15. Erik Schnetter reporter
    • removed comment

    Moving between thorns in arrangements is probably best handled by using the arrangement name as part of the path to the thorn's build directory.

  16. Log in to comment