Modify

Opened 5 years ago

Last modified 4 years ago

#1620 new defect

TimerReport doesn't scale very well.

Reported by: bmundim Owned by:
Priority: minor Milestone:
Component: Other Version: development version
Keywords: Cc:

Description

During some scaling tests of a unigrid domain using only MPI (no OpenMP)
on Supermuc with Noether release, I noticed that the routine doing the
TimerReport routine scales very poorly. I had set the following relevant parameters:

Cactus::cctk_itlast                        = 256
TimerReport::out_every                     = 128

TimerReport::output_all_timers_readable    = "yes"
TimerReport::output_schedule_timers        = "no"
TimerReport::n_top_timers                  = 200

Carpet::output_timer_tree_every            = 128
Carpet::output_initialise_timer_tree       = "yes"

Carpet::sync_barriers                      = "yes"

With 144 MPI processes the timer routine didn't take much time of the simulation:

INFO (TimerReport): Top timers at iteration 256 time 29.0133
======================================================================
    %    Time/s   Min/s   Max/s   Timer (gettimeofday)
======================================================================
100.0   6172.83 6172.82 6172.92   meta mode
...
  0.0      1.24    1.21    2.34   [0176] TimerReport: zzz_TimerReport_Output in CCTK_ANALYSIS
...
  0.0      0.96    0.93    1.81   main/Evolve/CallAnalysis/CCTK_ANALYSIS/CallFunction/thorns/zzz_TimerReport_Output
...

While with 2304 MPI jobs, it clearly shows how bad it scales:

INFO (TimerReport): Top timers at iteration 256 time 29.0133
======================================================================
    %    Time/s   Min/s   Max/s   Timer (gettimeofday)
======================================================================
100.0   618.300 618.288 618.316   meta mode
...
  8.5    52.462  52.372  79.462   [0176] TimerReport: zzz_TimerReport_Output in CCTK_ANALYSIS
...
  6.7    41.451  41.435  61.203   main/Evolve/CallAnalysis/CCTK_ANALYSIS/CallFunction/thorns/zzz_TimerReport_Output
...

Two things caught my attention here. One was obviously how bad zzz_TimerReport is scaling as the number of MPI processes increase. The second was the inconsistency between the timers "[0176] TimerReport:..." and "main/Evolve/...". Why do they show different times for the same function?

Anyways, it might not be overall important but I would like to leave it reported here about this issue.

Attachments (0)

Change History (2)

comment:1 Changed 5 years ago by Erik Schnetter

The overhead in TimerReport surprises me. TimerReport needs to collect information from all processes and sort it, which requires a reduction operation and string operations. Maybe one of the algorithms used isn't scalable -- we should investigate further.

The difference in timings comes likely from overhead between the respective measurement intervals. The timing prefixed "[0176]" comes from the Cactus scheduler, which calls into Carpet, which then has to perform certain operations before calling the actual function, which is then measured by the other timer (".../thorns/..."). One of these operations is synchronization, but there are also others.

comment:2 Changed 4 years ago by bmundim

The overhead in TimerReport surprises me. TimerReport needs to collect information from all processes and sort it, which requires a reduction operation and string operations. Maybe one of the algorithms used isn't scalable -- we should investigate further.

That surprised me too. As we start using a larger number of cores for our simulations, it might be good to take a look at this.

The difference in timings comes likely from overhead between the respective measurement intervals. The timing prefixed "[0176]" comes from the Cactus scheduler, which calls into Carpet, which then has to perform certain operations before calling the actual function, which is then measured by the other timer (".../thorns/..."). One of these operations is synchronization, but there are also others.

Thanks a lot for this clarification.

Cheers,
Bruno.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as new The ticket will remain with no owner.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from (none) to the specified user.
Next status will be 'confirmed'.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.