Reduce overhead of Formaline

Issue #1568 new

Ian Hinder created an issue 2014-03-26

The Formaline thorn, used for storing important information about a simulation in the simulation output, currently has some performance and size overheads which discourage people from using it. These should be reduced or mitigated to encourage people to use this important thorn.

This ticket is based on discussion in ~~#1565~~.

Problems with Formaline:

During compilation, Formaline stores the built source tree in a cactusjar.git repository. My impression is that this takes a lot of time on slow filesystems.
During compilation, Formaline stores the whole source tree as tarballs in the executable. My impression is that this is also slow.
The source tarballs output in the simulation output directory are often many times larger than the rest of the simulation, and cause a lot of overhead when transferring simulations for analysis. This might be improved by a better sync tool (simfactory 3?) which identified that tarballs were identical to those which have already been transferred, and skipped the transfer. I would like that the source tarballs are identical if the contents are identical, even if Cactus has been rebuilt. I think I observed that this was not the case (maybe a build ID is included?).
As an aside: there is a message at link time that Formaline has finished doing something, but this is misleading; Formaline has other tasks which it does after this message is displayed; the wait for the final executable is not due solely to linking, but also to some archiving task of formaline.

Keyword:

Comments (16)

Barry Wardell
- removed comment
What about providing the option to use a more lightweight representation of the source tree? For example, the relevant svn revision or git sha1, plus a diff containing any local changes. Such a representation would also make it convenient to share your exact source tree with others, as long as there is a straightforward way to reconstruct the source tree from the information.

This would rely on the git/svn repositories remaining available so it may not be desirable by everyone, but it is something I would likely use.
- 2014-03-26T10:40:11+00:00
Ian Hinder reporter
- removed comment
Yes, I have thought about this extensively. For production simulations that end up in papers, I think you probably want the tarballs as well, just for safety. But in general use, in the case that the repositories are still available and haven't been rewritten, having the version control information is much more useful, as you can easily see which changes are present in one simulation, and you can refer to them by commit message and author, rather than by source code diff. The main problem with this is that repository information is not usually synced to the machine where the tree is built. For my own workflow, where I never modify source on the remote machine, it would be sufficient to collect the "manifest" of the source tree (i.e. repository/commit/diff info) locally and sync it across when source tree changes are synced. It could then be included in the simulation executable and output by formaline.
- 2014-03-26T10:48:57+00:00
Erik Schnetter
- removed comment
That may be a nice idea. However, I would be careful especially with git repositories where commits or branches may exist only locally.

I don't think that storing the source code should be as expensive as it is. If we exclude the ExternalLibraries tarballs, then each source file is already read by the compiler, and converted to an object file. The executable (without tarballs) is larger than the tarballs. I don't understand why generating them is so expensive. My current assumption is that this process is only insufficiently parallelized by make, and that e.g. creating the tarballs requires a lot of disk I/O with little CPU action and happens all at once. Spreading this out over a longer time, and having tar read the source files near the time when the compiler already reads them, should improve performance considerably.

We may also further improve performance by not storing the temporary tarballs on disk before using perl to convert them to a C source file; this could happen in one step, and the result could be fed directly to the compiler. This would require writing a small C program to do so, based on zlib for compression.

Regarding the git repositories: There is currently one git repo per executable, which is a serialization choking point. Using one git repo per thorn would be much faster (since parallel), and combining these (via branches? merging? simple sequential commits?) may be faster than a single, large commit.
- 2014-03-26T10:56:48+00:00
Roland Haas
- removed comment
You can actually generate the tarfiles etc on the fly whithout having to write C code:
```
tar c foo bar | od -to1 -v | awk '
  BEGIN {
    print "const unsigned char *SpEC_tarball_data = "
  }
  {
    printf "\"";
    for(i = 2 ; i <= NF ; i++) {
      printf "\\%s",$$i; sz++
    }
    print "\""
  }
  END {
    print ";\nint SpEC_tarball_size = ",sz,";"
}' ) | gcc -O0 -o SpEC-tarball.o -xc -c -
```
does the trick without any intermediate files. This was an attempt to include tarballs in SpEC which so far is stalled due to it being hard to decide which files to include. icc can also be made to use pipes though I think we can safely rely on gcc being around. With this the C code in Formaline sees a char* pointer SpEC_tarball_data pointing to the tarball data and SpEC_tarball_size lists its size so a simple fwrite(SpEC_tarball_data, SpEC_tarball_size, 1, stdout) can write the tarball. This only uses POSIX tools I think so should work on all machines.
- 2014-03-26T11:09:00+00:00
Frank Löffler
- removed comment
For personal use, I would probably always use the "tarball" option. Local changes, local unpushed commits and whatnot would be not easy to include, and I don't usually see the tarball creation as too slow - my personal experience. What I do see is that Formaline includes some build-id into the executable, which forces a new link every time I rebuild Cactus, even if nothing else changed. If nothing else changes I don't see why Formaline should use a new build-id, and force a new linkage stage, and the executable updated.

Another issue I have (had) was that (gnu) tar complains if files changed while processing them. A change in atime also counts as change, and this can happen when things are compiled in parallel. We don't care about the atime, so this isn't a problem, but these messages are annoying regardless. I am not aware of an option to tar to prevent this. The only way I currently see to avoid this entirely would be to make a copy before using tar, but that would be unacceptable performance-wise. On the other hand, I currently don't see this myself anymore on my development machines because I mounted my local file systems with "relatime" - access time changes are "almost ignored" on a file system level.

Besides these two points (of which really only the first bothers me), I am currently quite happy with Formaline. It has "issues" from time to time (like every code), and usually they are fixed quite quickly.

Roland: your code would put all data into one source code to be compiled. I believe this fails on several machines because that gets too large, which is why we have to split up some of the larger thorns into several 'source C files' to be compiled and combined on a C-level.
- 2014-03-26T11:16:31+00:00
Erik Schnetter
- removed comment
If the tarball is too large, then one has to create multiple C files from it. Very unfortunately, most compilers are really bad at compiling large static arrays, in the sense that they require large amounts of memory and take a long time to compile such as file. Apart from that, Formaline uses Perl instead of awk.
- 2014-03-26T11:34:12+00:00
Roland Haas
- removed comment
Sigh. Another comment lost due to the captcha system badly interacting with the browser.

I wanted to ask if we know more details on what makes large files fail to compile. Is it the linker, the compiler, or the OS limiting memory consumption?

I played around with the best way of compiling and found that yes indeed arrays are bad and therefore use a long string with all characters encoded in octal notation. I am also using only gcc so that I have a compiler that is the same everywhere. Perl is also fine with me its not in POSIX though so I did not want to use it in SpEC. Really the intention is only to show that one can pipe the source into gcc.
- 2014-03-26T11:42:25+00:00
Erik Schnetter
- removed comment
The problem is purely the compiler. Yes, using gcc should work, except (of course) on Blue Gene or Intel MIC or other systems that essentially cross-compile.

I was thinking of using the Bash syntax "<(some perl command here)" to pipe the output into the compiler.
- 2014-03-26T12:02:02+00:00
Roland Haas
- removed comment
Ah, I had not thought of the MIC and Blue Genes. The trick of compiling stdin also works with intel compilers (tried this for different reasons) but i have no idea how fast they are or if PGI has similar issues. This all only really matters if the thing that makes formaline slow is the number of files it creates on disk.
- 2014-03-26T12:20:22+00:00
Barry Wardell
- removed comment
Replying to [comment:8 eschnett]:

The problem is purely the compiler. Yes, using gcc should work, except (of course) on Blue Gene or Intel MIC or other systems that essentially cross-compile.

I have noticed that GCC is particularly bad with large files. I have had much better success (sometimes by orders of magnitude) with using LLVM-based compilers such as clang, but then they are probably not so widely available yet.
- 2014-03-26T14:12:01+00:00
Roland Haas
- edited description
- 2019-06-30T00:43:29+00:00
Roland Haas
Issue ~~#2218~~ was marked as a duplicate of this issue.
- 2022-02-14T19:28:07+00:00
Roland Haas
@Erik Schnetter says:

i found a way to put binary blobs directly into object files, without generating C code in between. this should speed up generating the Formaline objects for our source code quite a bit. thsi project https://git.astron.nl/RD/tensor-core-correlator uses it; look how the `libtcc/TCcorrelator.cu` CUDA file is "compiled".

‌
- 2022-02-14T19:29:13+00:00
Roland Haas
Somewhat similar proposal was in #2218

‌
- 2022-02-14T19:32:59+00:00

Roland Haas

https://bitbucket.org/einsteintoolkit/tickets/issues/1568/reduce-overhead-of-formaline#comment-61834235 uses GNU ld to include files.

Using regular ld one gets something like:

rhaas@8992d193:~/tmp$ ld -r -b binary -o foo.o xcts.pdf
rhaas@8992d193:~/tmp$ file foo.o
foo.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
rhaas@8992d193:~/tmp$ nm foo.o
000000000002bebd D _binary_xcts_pdf_end
000000000002bebd A _binary_xcts_pdf_size
0000000000000000 D _binary_xcts_pdf_start

but eg the xalt wrapper linker does:

rhaas@h2ologin4:~$ ld -r -b binary -o foo.o foo.tar.gz
rhaas@h2ologin4:~$ nm foo.o
00000000c0421df1 D _binary__tmp_rhaas_2022_02_14_13_22_52_a8ab3fb3_e7ae_4131_a8a4_7ae6143cb4c6_xalt_o_end
0000000000000481 A _binary__tmp_rhaas_2022_02_14_13_22_52_a8ab3fb3_e7ae_4131_a8a4_7ae6143cb4c6_xalt_o_size
00000000c0421970 D _binary__tmp_rhaas_2022_02_14_13_22_52_a8ab3fb3_e7ae_4131_a8a4_7ae6143cb4c6_xalt_o_start
00000000c0421970 D _binary_foo.tar.gz_end
00000000c0421970 A _binary_foo.tar.gz_size
0000000000000000 D _binary_foo.tar.gz_start

ie there is “extra” struff being included. So we must ensure that we use the actual GNU linker. We need a special variable for this since in Cactus $LD is usually $CXX. The fact that it relies on the GNU linker is not a big issue I think, we keep the current method as a fallback.

2022-02-14T19:38:25+00:00

Roland Haas
Hackathon (build system, make)
- 2022-06-15T13:58:36+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: minor

Status: new

Component: Cactus

Milestone: –

Version: development version

Votes: 0

Watchers: 0