Opened 3 years ago

Last modified 2 years ago

#1865 reopened enhancement

Automatically start SystemTopology

Reported by: dradice@… Owned by: Erik Schnetter
Priority: major Milestone:
Component: Carpet Version: development version
Keywords: Cc:


Carpet used to load hwloc automatically and that would set thread affinities. Now this functionality is in the SystemTopology thorn, which is not automatically activated. This change could result in a significant performance regression on some systems (see discussion in #1850).

Would it make sense to activate SystemTopology automatically?

Attachments (0)

Change History (14)

comment:1 Changed 3 years ago by Erik Schnetter

Yes it would -- this also used to be the default behaviour.

We should then also point people to SystemTopology::set_thread_bindings = "no".

comment:2 Changed 3 years ago by Frank Löffler

Please do.

comment:3 Changed 2 years ago by Ian Hinder

Currently, hwloc is OPTIONALly activated by LoopControl and MPI. I was going to say that these should be changed to require SystemTopology instead, so that they don't depend directly on the library used to provide this information, but on the interfaces provided by SystemTopology (this was the reason to split the thorn, right?). SystemTopology then requires hwloc. So both thorns would be activated if present in the thorn list and either LoopControl or MPI were activated.

However, now I notice that MPI itself requires hwloc; is this correct? So there is a circular dependency. That feels wrong. Erik, could you clarify what should be done here?

The result of all this is that when someone uses a parameter file which doesn't activate SystemTopology, they don't get their threads pinned.

comment:4 Changed 2 years ago by Frank Löffler

I only see MPI optionally depending on hwloc, in order to add it to 'configure' in case it builds MPI.

comment:5 Changed 2 years ago by Ian Hinder

Status: newreview

Aside: I think we shouldn't use the word "depends" in this case; OPTIONAL means "use this capability if it is present", so maybe we should say "MPI optionally uses hwloc".

I don't understand what I wrote above. Let me try again. Looking at the configuration.ccl files mentioning hwloc, we have:

CactusUtils/SystemTopology/configuration.ccl:REQUIRES hwloc MPI

Carpet/LoopControl/configuration.ccl:OPTIONAL CycleClock hwloc Vectors

ExternalLibraries/MPI/configuration.ccl:OPTIONAL hwloc

SystemTopology requires both hwloc and MPI.

MPI optionally uses hwloc, because it might be building MPI, and the MPI library might make use of hwloc.

LoopControl optionally uses hwloc. There doesn't seem to be anything in LoopControl that directly references hwloc. Is this because LoopControl uses threads, and it is good to pin those threads?

Assuming all the above is right, I think the right thing is to change the LoopControl OPTIONAL from hwloc to SystemTopology. This would have the effect of automatically activating SystemTopology whenever anyone uses the ET thornlist, and allowing it to manage thread pinning.


comment:6 Changed 2 years ago by Erik Schnetter

LoopControl optionally uses hwloc since this mechanism ensures that, if LoopControl is activated, then hwloc will also be auto-activated if it is present. hwloc then used to provide certain aliased functions that LoopControl uses to determine cache sizes and thread topology. This functionality moved to SystemTopology, thus this needs to be changed to optionally using SystemTopology instead.

Requiring SystemTopology from some thorn (be it Carpet, or LoopControl, or the flesh) instead of just optionally using it is also possible. I think it is a good idea, but that's more of a policy change.

Using SystemTopology is not always a good idea. If you are running on a single workstation, and are maybe running multiple instances of Cactus, then these will get into each other's way. In this case you should disable SystemTopology. Last week, I implemented a respective mechanism: The parameter "set_thread_bindings" has a new value "env" that makes it look at an environment variable "CACTUS_SET_THREAD_BINDINGS". If this variable is unset (obviously the default), SystemTopology does nothing. On HPC systems, this variable is set in the respective run scripts, thus enabling SystemTopology. The advantage is that this automatically does the right thing for unsuspecting users running on a workstation, the disadvantage is that unsuspecting HPC users need to use Simfactory or need to update their submit scripts.

comment:7 Changed 2 years ago by Ian Hinder

Does set_thread_bindings default to "env"?

comment:8 Changed 2 years ago by Erik Schnetter

In the new version it does.

comment:9 Changed 2 years ago by dradice@…

I don't use simfactory and I see myself easily forgetting to set that environment variable in my runscripts, so I would prefer a solution where binding is active by default. Either way, I think that this should be prominently displayed in the documentation and/or Cactus should warn about this in the stderr or stdout.

comment:10 Changed 2 years ago by Frank Löffler

Is SystemTopology able to find out if the current mpi job (not process) uses less than one whole node? If so, isn't it possible to let it set thread bindings whenever there is at least one 'full' node (and thus, achieving best performance then), and not set them, if there is not (e.g., running more than one, independent Cactus job)?

comment:11 Changed 2 years ago by Ian Hinder

Status: reviewreopened

We no longer have a clear proposal, so removing the "review" state.

comment:12 Changed 2 years ago by Erik Schnetter

SystemTopology can check, but that's dangerous. What if it decides that you're not running on a "full node" simply because it counts cores differently from what the user expects? If you read the documentation for Blue Waters, you'll see that not even NCSA could agree with itself on how many cores a node has. (Slurm, MPI, and the accounting system use different core counts.) What if you run in a virtual machine or a Docker environment? What if you run on a workstation, and by chance are using all cores, while there is also a second Cactus job running?

In my view, the default behaviour should do the right thing for the unsuspecting user. Experts can be expected to put in a bit more effort, e.g. changing a line in the scripts they use to submit jobs.

comment:13 Changed 2 years ago by Frank Löffler

I agree. Then the question becomes: what is an unsuspecting user most likely to do? I think it is either running one simulation concurrently on any machine, which would include any typical simulation setup on clusters. Running multiple jobs at the same time on the same node is not very common, I would think. Or is it? I can only speak for myself.

comment:14 Changed 2 years ago by Erik Schnetter

I do this for debugging -- I start a small test on on my workstation, and sometimes want to start several test runs simultaneously so that they can finish over lunch.

Modify Ticket

Change Properties
Set your email in Preferences
as reopened The owner will remain Erik Schnetter.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from Erik Schnetter to the specified user.
The owner will be changed from Erik Schnetter to anonymous.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.