Binary neutron star sample has poor OpenMP parallelization

Issue #1964 closed
anonymous created an issue

When running the sample from the ETK/gallery page the number of OpenMP threads seems to be limited to 2, showing only 200% in top, although 16 OMP threads were used.

Other param files from /par go way beyond 1000%, showing that more threads are fully used.

Could it be that there is a omp_set_num_threads(2) somewhere in the hydro code?

http://einsteintoolkit.org/about/gallery/NsNsToHMNS/

Creating 1 MPI process per core fixes the issue, obviously, and the CPU is fully utilized

Keyword: NsNs

Comments (6)

  1. Roland Haas
    • removed comment

    Cactus does not control the number of threads through parfiles. They are controlled by options to simfactory. Unless something changed radically (and I am not aware of any such change), GRHydro will use more than 2 threads if they are provided. Can you let use know which options to simfactory you used and which cluster this was one? If this was a private cluster or you did not use simfactory, pleasec attach you submission script etc to the ticket if possible.

    If you can log into the compute nodes while the job is running, I would suggest doing a

    top -H
    

    to list all threads and check that in particular not all threads are bound to the same 2 cores which can happen if incorrect OpenMP/NUMA/mpirun options are used.

  2. Frank Löffler
    • removed comment

    I agree with Roland: we need more information to see what is going on. Especially, we need to know how exactly you run the par files (the exact command you used, and ideally also the content of the environment variable OMP_NUM_THREADS). Also, it would be good to know where in the simulation you see only 200%: is it during evolution or while still setting up initial data?

  3. anonymous reporter
    • removed comment

    This was run on a single node using

    OMP_NUM_THREADS=16 mpirun -np 2 cactus ns.par

    It seemed that most of the time it was using 200% only.

    Later doing "the same" run again it suddenly did full 1600%, so there seems to be no issue.

    It was during the evolution, while the iterations incremented, so that suggested that it was that low during the whole iteration, which seemed weird.

    I tried it again and it was fine!

  4. Roland Haas
    • changed status to resolved
    • removed comment

    ok, I will close the ticket then. BTW you ran this on a machine with 32 cores per node, yes (2 MPI ranks, 16 cores each)?

  5. Log in to comment