Modify

Opened 6 years ago

Last modified 4 years ago

#1075 reopened enhancement

Test cases with specific number of processes

Reported by: Erik Schnetter Owned by:
Priority: major Milestone:
Component: Cactus Version:
Keywords: Cc:

Description

Some test cases require a specific number of processes. Instead not running, they should run on that number of processes. This would ensure that all tests execute all the time.

Attachments (3)

RunTestUtils.pl.diff (2.9 KB) - added by Erik Schnetter 6 years ago.
RunTestUtils.pl.diff2 (3.1 KB) - added by Roland Haas 6 years ago.
replacement patch
RunTestUtils.pl.diff3 (3.3 KB) - added by Roland Haas 6 years ago.

Download all attachments as: .zip

Change History (43)

comment:1 Changed 6 years ago by Frank Löffler

While it might be possible to do that with fewer processes than specified, doing to with more is usually not possible.

comment:2 Changed 6 years ago by Ian Hinder

I disagree. The test system chooses the number of processes to launch and calls the given mpirun command to do so. There is nothing that says we have to run the "right" number of processes for the number of cores and threads that we have. I agree that this would be a very nice thing to have. I think it is much more user-friendly than just not running some tests.

comment:3 Changed 6 years ago by Frank Löffler

When I specify to run a testsuite on, let's say 2 processes, I might only have two. That means mpi only knows about two and will choke on a request for, let's say four or even more. Even if it would not it might actually bring down the machine or at least slow it down more than it should. After all, I requested only 2 processes. Thus, I think it can be problematic to try to run testsuites with more processes than have been specified.

comment:4 Changed 6 years ago by Ian Hinder

Will mpirun choke on a request for more? That must be system-dependent. If I run on my laptop, I can ask mpirun for as many processes as I like. I think Erik calls this "oversubscribing". I don't think that running a test case on 2 processes rather than 1 is going to "bring a machine down". The performance will of course be lower, but these are not benchmarks.

comment:5 Changed 6 years ago by Frank Löffler

Some mpi implementations will choke on asking for more processes than known from the hostlist which they get from the batch system. You are right about 2 processes on a 1-processor system, but it is easy to come up with more problematic scenarios.

What I think it more important it to fix why we need to have these specific processor numbers in the first place. Most of the time it is probably the ASCII output. Assuming we get finally someone working on hdf5 testsuites this could be history.

comment:6 Changed 6 years ago by Roland Haas

hdf5 test suites will only help if we also either redesign the internal hdf5 format or write a clever comparison program. Right now, even 1d or 2d hdf5 files will internally show how many components (ie. processes) existed. The layout is very similar to CarpetIOASCII's output, just with hdf5 datasets instead of blocks of numbers.

You can get almost process number independent output when using compact_format, no headers, no ghost or symmetry points. The difference then is only some extra blank lines which currently are significant for the test system. One could very likely make compact format do away with the extra empty line.

comment:7 Changed 6 years ago by Ian Hinder

I would stick with ASCII and do what Roland suggests. We could then modify the tests so that they can run on any number of processes.

comment:8 Changed 6 years ago by Roland Haas

It turns out that removing the "extra" blank line is harder than expected. It is the eol from line 1450 of ioascii.cc

1445 for (int d=0; d<outdim; ++d) {
1446   if ((*it)[d]!=(*ext.end())[d]) break;
1447   if (not compact_format or ext.shape()[d] > ext.stride()[d]) {
1448     // In the compact format, don't separate outputs that
1449     // consist of a single lines only
1450     os << eol;
1451   }
1452 }

which is output with all points that touch the outer edge of the output for any component (ie. any one of x = xmax, y = ymax, z = zmax holds). Unfortunately this means that it will always trigger for the last point in the output region for a component and since the components are already process decomposed ones, there is one black line per MPI process. A way around would have been to only output a black line once we are at the edge of the containing superregion, which unfortunately requires the code to compute if this components will be (once we are all done with output) the last one output for this superregion. It is also hard to insert the separating blank line at the beginning of a each chunk since non blank line should be output if only a single line of output will be produced.

comment:9 Changed 6 years ago by Frank Löffler

Might not also the order of the lines be different for especially a lot of processes? Wouldn't it be better to have the perl comparison tool understand these differences not being important? One way could be to pre-process the files before diffing: sort by iteration and coordinates (or indices), removing blank lines and comparing then?

comment:10 Changed 6 years ago by Erik Schnetter

Status: newreview

The attached patch changes the number of processes with which test cases run.

Changed 6 years ago by Erik Schnetter

Attachment: RunTestUtils.pl.diff added

comment:11 Changed 6 years ago by Roland Haas

Until we change either the test comparison algorithm to ignore white space or make sure output from Carpet's IOASCII thorn is processor independent I believe the patch in comment:10 would cause test failures when a test is run on a different number of processes than it was designed on (at least for the test suites where I explicitly specified a number of processes that was always due to the test failing otherwise [eg grid too small to be split over two processes] or producing different answers [eg CarpetIOASCII]).

comment:12 Changed 6 years ago by Erik Schnetter

The patch in comment:10 ensures that a test is always run on the number of processes it specified. Currently the test would be skipped; with this patch, it is run, using the correct number of processes.

comment:13 Changed 6 years ago by Frank Löffler

Does Cactus currently output how many MPI processes have been used for each test separately? We would need that in order to not get confused about how many processes have in fact been used because with this patch it might be different to what has been specified.

Also, can we have this tested before a commit: Are mpi implementations going to choke on specifying more processes in mpirun than have been requested using the queue?

comment:14 in reply to:  13 Changed 6 years ago by Roland Haas

Replying to knarf:

Does Cactus currently output how many MPI processes have been used for each test separately? We would need that in order to not get confused about how many processes have in fact been used because with this patch it might be different to what has been specified.

Also, can we have this tested before a commit: Are mpi implementations going to choke on specifying more processes in mpirun than have been requested using the queue?

The ones using Infiniband usually do in my experience, at least if you were to actually ask for more processes than there are cores on the node. I tried this occasionally when trying to debug something that needed many cores but not much memory on the clusters where you can get an interactive node (ranger,zwicky,kraken [I believe]).

comment:15 Changed 6 years ago by Ian Hinder

I wouldn't worry about this. Typical machines have at least 4 cores per node, and I don't think any of the tests ask for more processes than this. We can still ask for as many OpenMP threads as we like, in order to test OpenMP.

Cactus does not output the number of MPI processes used for the test, but Carpet does, to standard output ("There are XX processes in total"). It might be convenient to modify the test system to output the number of processes to its log file for each test as well. Of course, this is going to break all the ad-hoc parsers because we are not using an extensible output format.

comment:16 Changed 6 years ago by Frank Löffler

This is a problem regardless of the number of cores. Some mpi implementations look for a file containing a list of the nodes used, together with the number of processes on each, e.g.

 cnode01: 4
 cnode02: 4

Now consider such a file when requesting only one process

 cnode01: 1

When you then try to run with two the mpi implementation would complain that it doesn't know where to run two processes. After all, the list only contains one and you are not supposed to run on more than one because you only requested one through the queuing system.

comment:17 Changed 6 years ago by Erik Schnetter

A queueing system does not provide MPI processes and OpenMP threads, it provides nodes and cores. One has to map these manually, in particular when OpenMP is used; the queueing systems' output is usually not good enough for this. The respective logic is present in most Simfactory run scripts. Simfactory supports over- and under-subscribing, and this works fine on most systems (probably on all except BlueGene/P).

In your case above, most MPI implementations will wrap around, and assign the second process to cnode01 as well.

comment:18 Changed 6 years ago by Frank Löffler

I agree that simfactory usually handles/parses/changes these lists. However, simfactory's wouldn't know here that one particular testsuite should be run with a different number of processes than specified (even to simfactory). The problem isn't really made easier because of simfactory.

If an mpi implementation wraps around there should be no problem, yes. However, some don't because especially when asking for 1 process (and, e.g., one openmp thread, you might actually get only part of a node and running two processes there would oversubscribe the system and hurt other users (granted, using openmp would as well). In any case: what do you propose to do on systems that are not wrapping around? Currently the test would be marked as failure.

Could we not avoid this issue by only overriding the users' choice if the number of processes specified by the testsuite is smaller than what the user specified?

comment:19 Changed 6 years ago by Erik Schnetter

Couldn't we just test it and see what happens? I have not encountered problems.

comment:20 Changed 6 years ago by Ian Hinder

HPC machines must support the ability to run different numbers of MPI processes due to not all codes supporting OpenMP. At least up to the number of cores, there is always a way to run that many MPI processes. For a given machine, we will be able to come up with simfactory command-line arguments which give the correct number of MPI processes, and then we can choose the number of threads separately from this. Assuming that simfactory requests resources sufficient to run the maximum number of MPI processes required by any test, there is no limit imposed by the queuing system.

A separate issue (as I think you are referring to) is that if simfactory is configured to run a simulation on N processes, the run script will have this hard-coded after expansion by simfactory. i.e. the run script (which is used in the "mpirun" command by the simfactory testsuite mechanism, http://git.barrywardell.net/simfactory2.git/blob/HEAD:/bin/RunTestSuite) doesn't look at the -np option given by the test system. We should think about how we can modify simfactory to run the correct number of MPI processes in this case, unless Erik has done so already.

Erik: I believe the only potential problem is the one related to simfactory, and that is definitely a problem unless it has already been fixed.

comment:21 Changed 6 years ago by Frank Löffler

I agree that we are able to request more processes through the queuing system than needed, and then run (most tests) with less and a few (that request more) using up to the requested amount. But that is not my point. Applying this patch as is would break the testsuite mechanism for cases that currently work, it would be a regression. Simfactory might be used to circumvent this a bit, but it also has to work without.

Let me suggest something else. What I believe is a problem is that with this patch Cactus will try "to be smart", overrides a users decision and might actually fail. What I believe the intention of the patch is that Cactus should run as many testsuites as possible, even if that means using different numbers of processes for different tests. So, why don't we add an option to the test system that, instead of specifying a fixed number of processes, specifies 'as many processes as needed, and if nothing is specified then use the default of 1 (which we should be able to change, and simfactory might actually do that). This would introduce one more option for the testsuite mechanism. It wouldn't break anything if this is not the default in Cactus, and it could be the default within simfactory because there we know whether a machine supports over-subscribing or not.

comment:22 Changed 6 years ago by Roland Haas

I just tried it using simfactory on lonestar. I used

sim create-submit bns_all-tests --procs 1 --testsuite --configuration bns_all --walltime 0:20:0

which unfortunately then ran all tests on 1 process even the one (ML_BSSN_Test/ML_BSSN_NewRad) that requested 2 processes. I suspect this is due to simfactory not going through the test-suite mechanism and instead uses its own RunScript to start the run.

The test still claimed everything worked though close inspection showed that it says "Success: 0 files identical" and also that in fact the files are not identical (ran a vimdiff). So Ian's comment points out a real issue.

I have also tried the reverse and was offered to run a 1 processor test with 2 processes on my workstation:

  Issuing mpirun -np 2 /mnt/data/rhaas/postdoc/gr/Zelmani/exe/cactus_bns_all /mnt/data/rhaas/postdoc/gr/Zelmani/arrangements/EinsteinAnalysis/AHFinderDirect/test/checkpointML.par

Something is still odd.

Changed 6 years ago by Roland Haas

Attachment: RunTestUtils.pl.diff2 added

replacement patch

comment:23 Changed 6 years ago by Roland Haas

Ignoring NPROCS for test was due to the NPROCS information not being passed along in the correct data structure (testdata vs. rundata). I updated the patch and attach a replacement version. This does nothing for simfactory of course.

Changed 6 years ago by Roland Haas

Attachment: RunTestUtils.pl.diff3 added

comment:24 Changed 6 years ago by Roland Haas

Cleaned up the patch and added code to output the number of MPI processes used for a test if it is not the user specified default.

We should also rename the $nprocs_required and $nprocs_available variable names to something more in line with what we (now) do. Maybe $nprocs_requested_by_test and $nprocs_requested_by_user .

comment:25 Changed 6 years ago by Roland Haas

Frank: is this ok to apply in the form of RunTestUtils.pl.diff3?

comment:26 Changed 6 years ago by Frank Löffler

Yes, please.

comment:27 Changed 6 years ago by Roland Haas

Resolution: fixed
Status: reviewclosed

Applied as rev 4901 of the flesh.

comment:28 Changed 6 years ago by Ian Hinder

Milestone: ET_2012_11
Priority: majorcritical
Resolution: fixed
Status: closedreopened

Did anyone test that this new version works with SimFactory before committing the change? I explained why it would not work (comment:20), and Roland tested that this was the case. As of now, I think the test system tries to run all the tests, and assumes that it can set the number of processes in the mpirun command. This does not work; SimFactory assumes that @NUM_PROCS@ is correct, and this doesn't come from the test system. If you now try to run a set of tests with different process number requirements, some of them are always going to fail, because they are always run on the same number of processes.

I think that such a drastic change to the test system, one that breaks the ability to use simfactory to run the tests, should not be applied just before a release. I propose reverting this commit, and revisiting this after the release when we can come up with a comprehensive solution that ensures that simfactory can still work correctly.

comment:29 Changed 6 years ago by Roland Haas

Well I was also the one to commit the flesh change to the test system. So clearly I had forgotten my own comment. I did not test it with simfactory at that time.

Reverting it would be ok with me unless we can come up with a tested, simple solution.

I am not familiar with way simfactory interface with the test system (but assume it to be horrible). Would a carefully crafted "mpirun" option to the test system be sufficient, ie. one that does simfactory run --procs XXX --num-threads YYY ?

Last edited 6 years ago by Roland Haas (previous) (diff)

comment:30 Changed 6 years ago by Erik Schnetter

Simfactory uses the "make sim-testsuite" command, setting several environment variables to pass options. This part of the interface is quite clean.

Let's revert the patch.

comment:31 Changed 6 years ago by Roland Haas

I reverted the commit.

comment:32 Changed 6 years ago by Frank Löffler

Milestone: ET_2012_11ET_2013_05
Priority: criticalmajor

comment:33 Changed 6 years ago by Frank Löffler

Milestone: ET_2013_05ET_2013_11

Unlikely to get this into this release.

comment:34 Changed 6 years ago by Erik Schnetter

I notice that you moved the release milestone, which also happened for the last release. Does this make sense? Either it is release critical (then it needs to be addressed), or the milestone can easily be moved (then it may as well be removed altogether). I don't see any particular reason why this feature is important for the 2013 November release.

comment:35 Changed 6 years ago by Frank Löffler

Moving the release target again is unfortunate, I agree. It is not critical, however. The release tag should only indicate that this "should be fixed by that release".

It would be nice if issues with the test system would get more attention though. I would think they should be critical. The past showed that while most probably agree with that statement, reality tells differently. As for the next release: I don't think whatever solution we might come up with would make it in the release. There isn't enough time for it. I still think this is important enough to keep reminding ourselves.

On this specific issue: the underlying root issue is that most testsuites that set a specific number of processes run just fine on other numbers, but their output can currently not be successfully compared by Cactus. This could be fixed by either Cactus being able to compare the files, or by having processor-number-independend output files. The former was started by a student at LSU, but died after the student decided to stop working on it (undergraduate...). The latter would be nice, but didn't happen so far and especially performance implications are unclear. So, in order to pick up one or the other again in the future we should keep this ticket open - and it is important enough to keep reminding us about this embarrassing issue.

comment:36 Changed 6 years ago by Ian Hinder

I see the milestone as an indication that this is a goal for the next release. This is something that we explicitly want to work on, rather than something which is just "major", with no indication of whether it will ever be addressed. Being a milestone does not mean that it is a release blocker, or is critical in any way. The fact that it keeps slipping means that there have been insufficient resources to implement it so far, but I think the milestones should be allowed to move like this, as they demonstrate intention to work on things.

comment:37 Changed 5 years ago by Ian Hinder

I believe that this can be made to work with SimFactory, but it will require modifications to SimFactory. When SimFactory needs to run the tests as part of a simulation, it does this by using the file simfactory/bin/RunTestSuite as its "run script", passing the real machine runscript as an argument to that script. RunTestSuite then sets CCTK_TESTSUITE_RUN_COMMAND to call the passed runscript on '$parfile', which is supplied by the test system. The test system will also supply '$nprocs', since this command is usually an mpirun command, but simfactory ignores this at present. The reason is because the runscript has already been expanded at this point, so that @NUM_PROCS@ has been replaced with the number of processes that the whole simulation is running on. One solution would be to modify simfactory so that, in the case of running a testsuite, it expands @NUM_PROCS@ to a reference to an environment variable, e.g. $NUM_PROCS. The CCTK_TESTSUITE_RUN_COMMAND set in RunTestSuite can then set this variable to '$nprocs' before launching the machine's run script. This is similar to what is done for the parameter file; there is special code in simfactory to look at a TESTSUITE_PARFILE environment variable which is passed by the CCTK_TESTSUITE_RUN_COMMAND.

This assumes that changing only NUM_PROCS in the machine run script is sufficient to do what we want; other variables might have been used, and some mpirun commands might take the number of cores or something similar instead. I expect this to break some systems.

comment:38 Changed 5 years ago by Frank Löffler

Milestone: ET_2013_11ET_2014_05

What we should do (after the ET_2013_11 release) is:

  • When a testsuite doesn't specify a certain number of processes, run it with however many have been specified as default (and if nothing was specified there, use some reasonable default which at the moment is 2)
  • When a testsuite does specify a particular number of processes, use this number. In particular (at least for the moment) also use more than had been specified as default. Since currently we don't have testsuites with a large number of processes this is hopefully fine on most machines. We might have to change this though if we encounter too many problems or turn towards testsuites with too many processes such that single-processor systems would be unreasonably overloaded. What we likely would do then is using the specified default as upper bound. Testsuites that specify more resources wouldn't be run in that case.

comment:39 Changed 5 years ago by Frank Löffler

Milestone: ET_2014_05ET_2014_11

comment:40 Changed 4 years ago by Ian Hinder

Milestone: ET_2014_11

Modify Ticket

Change Properties
Set your email in Preferences
Action
as reopened The ticket will remain with no owner.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from (none) to the specified user.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.