Do not copy .svn directories for test suite

Issue #563 closed
Erik Schnetter created an issue

Simfactory copies the content of .svn directories when it copies the test case data into the simulation directory. The problem is line 509 in simrestart.py; this rsync command doesn't exclude .svn directories.

Keyword:

Comments (15)

  1. Roland Haas
    • changed status to open
    • removed comment

    I attach a proposed patch. It really only ignores .svn (not .git etc since I don't expect to find a test only git repository).

    It also only copies the current configuration out of configs instead of everything in configs.

  2. Erik Schnetter reporter
    • removed comment

    I approve.

    Please add a TODO to the code, indicating that excluding .svn should be replaced by looking at the etc/filter.rules files instead, and/or leave this ticket open after applying the patch.

    Please also break the insanely long string over multiple lines. I don't know the "right" Python syntax for this, but string concatenation (+) may be one way.

  3. Roland Haas
    • removed comment

    Filter.rules does not work right now since it currently (for whatever reason) only excludes _darcs and .hg (ie. not .git or .svn or CVS or even backup files [*~ or *#] it seems). I don't know the philosophy behind filter.rules and whether or not it would be a good source for the excluded pattern for the test suites. I would suspect not, since at least I for myself tend to to add the testsuites themselves to the excluded files since I don't want to copy them eg. to kraken all the time where I constantly run into the file number quota.

    Is there an official policy on this?

    Sure I can do the breakup of lines. I just wanted a minimal patch.

  4. Erik Schnetter reporter
    • removed comment

    For the moment, just apply the patch. The code cleanup can come later.

    If Kraken doesn't provide sufficient disk space to hold the source tree, then we should maybe officially request more disk space (or more files) for Cactus users. I have done this for myself on a few occasion (mostly successfully). On other systems, I keep the source tree outside of my home, e.g. in a project directory.

    I think it is very important to have the test suites available on all systems. Otherwise, we cannot run the test suites there. I am worried if test suites are viewed as baggage. How many files are source files, and how many are test suite output?

  5. Roland Haas
    • removed comment

    I had no idea how many files were source and test so I did a

    find $CACTUS_ROOT | grep -vF '.git' | grep -vF '.svn' | grep -vF _darcs | grep -vF .hg | wc --lines in my ET Cactus tree and in a directory containing testuites (copied everything from arrangements/*/*/test) excluding .svn and find (roughly at least):

    all of Cactus (ET_2011_10 checkout): 21856 tests: 11022

    so about half of all files are tests. Compiling code baloons this number.

  6. Frank Löffler
    • removed comment

    One way to avoid the "too many files" problem might be to try to new (and yet undocumented) possibility to compress entire testsuites to a (compressed) tarball. Could you give this a test?

  7. Ian Hinder
    • removed comment

    Do we have a convenience script to do this compression yet? One problem would be that when updating, the testsuite data will be pulled down again into separated files.

  8. anonymous
    • removed comment

    Replying to [comment:3 rhaas]:

    Filter.rules does not work right now since it currently (for whatever reason) only excludes _darcs and .hg (ie. not .git or .svn or CVS or even backup files [*~ or *#] it seems). I don't know the philosophy behind filter.rules and whether or not it would be a good source for the excluded pattern for the test suites. I would suspect not, since at least I for myself tend to to add the testsuites themselves to the excluded files since I don't want to copy them eg. to kraken all the time where I constantly run into the file number quota.

    Are you sure it doesn't work? The filter.rules file includes "-C" which is supposed to exclude most version control files (see the rsync man page for more details), including CVS, .svn and .git and also excludes common backup files. The _darcs and .hg excludes are there explicitly because they are typically not included in the "-C" rules. The "-C" filter rule is not supported by rsync versions before 3.0, which is why there is a filter.prersync3.rules and in that case, we need to manually specify what to exclude.

  9. Barry Wardell
    • removed comment

    (That last comment was by me by the way.)

    The philosophy of the filter.rules is that we can have several different rules files for different scenarios. I agree that you do not necessarily want the same filter rules for the testsuites as you do for your source tree. Maybe it would be best to add a filter.test.rules file?

  10. Erik Schnetter reporter
    • removed comment

    I don't like the idea of making things too configurable. This complicates things, both development and explaining the subtle differences to users who don't want to care. Instead, Simfactory should "just work".

    We are facing one particular problem here, which is that test cases are viewed as less important than source code, and hence people don't like that there are so many of them. When people run into real problems (quota, svn checkout times, rsync times), they blame the test case results.

    In my opinion, as long as the test cases are not significantly larger than the source code, we are fine, and reducing the space/time taken up by test cases doesn't make a real difference. (Although it can make a real difference in corner cases.)

    Anyway: Instead of devising piece-wise solution for small parts of the problem (patterns including and excluding test cases, tarring/zipping test case results on disk or for transport, etc.), we should work towards a once-and-for-all solution to this problem, and not spend much time with micro-optimisations that shift the problem 10% into the future.

    I have created https://docs.einsteintoolkit.org/et-docs/Test_suite_results_are_unwieldy for discussing this.

  11. Barry Wardell
    • removed comment

    I agree that it would be nice to find a solution to the unwieldy test suites problem, but that is a separate issue and should probably be discussed in a new ticket. My suggestion is for addressing the original issue (and reason this ticket was kept open) of VCS directories being copied with test-suite data.

  12. Erik Schnetter reporter
    • removed comment

    The connection I made was that using the existing filters rules would be fine, except if people modified them locally to exclude test cases.

  13. Erik Schnetter reporter
    • changed status to resolved
    • removed comment

    I have committed a code cleanup:

    - break overly long lines - use Python lists to represent lists - move helper routines to simlib.py - use MDB to find good rsync and its options - use standard filter rules instead of hard-coding excluding .svn

  14. Log in to comment