Modify

Opened 8 years ago

Last modified 7 years ago

#136 reopened enhancement

Don't rebuild external libraries so often

Reported by: Erik Schnetter Owned by:
Priority: minor Milestone:
Component: Cactus Version:
Keywords: Cc: bcmsma@…

Description

One way to keep the external libraries that have been built would be the following. Create a dummy configuration "ext" where all external libraries are built. When another configuration is built, it should be easy to specify to look there for the external libraries, or maybe this should even be the default.

Attachments (1)

GSL.sh.diff (862 bytes) - added by bmundim 8 years ago.

Download all attachments as: .zip

Change History (28)

comment:1 Changed 8 years ago by Frank Löffler

That doesn't work in a general setting. Different configurations might have different configuration options. For example I do have two default configurations on the numrel machine: one with the Intel, one with the Gnu compiler. Using libraries from one within the other does not work.

comment:2 Changed 8 years ago by Erik Schnetter

If you have two incompatible configurations, then you can use two dummy configurations, one for each.

comment:3 Changed 8 years ago by Frank Löffler

Then I would have to specify the dummy configuration as well, and remember to update it if I update the configuration it belongs to. What exactly do you mean with 'too often'? I can see two possibilities:

a) You might have several configurations, only differing in their thornlist. In this case external libraries built for one could be used within the other, and rebuilding would probably be not necessary. It wastes compile time and and space.
Of course you can in this case always build the library yourself and let Cactus use this instead of the Cactus-build version.

b) You might have the problem that the libraries within one configuration are rebuilt too often. If that is the case, we should be able to fix this without creating a dummy configuration.

Should we take this discussion to the Cactus developers mailing list?

comment:4 Changed 8 years ago by Erik Schnetter

Yes, building the external libraries too often wastes time and space, this is exactly why I opened this issue. The only current remedy is to build the library yourself, outside of Cactus. But this is what I want to avoid: building an external library yourself is often complicated, and Cactus already knows how to do it, so Cactus should do this for you.

The Cactus-built external libraries are rebuilt after a "make clean" or "make realclean" (or when the library itself is updated, of course). Most people use "make clean" if something inexplicable goes wrong with their build, and they want a fresh start. In this case, rebuilding external libraries is probably a good idea, since it's not clear what actually is going wrong.

comment:5 Changed 8 years ago by Frank Löffler

We could let Cactus search other existing configurations for identical config-info files (apart from the timestamps...) and copy the relevant files (using hard links if possible), instead of rebuilding them. Of course that would mean that we would have to have a way to a) let Cactus know which thorn builds a library and which not and b) which files belong to the result of building that library.

comment:6 Changed 8 years ago by Ian Hinder

I think that is too complicated, and would get in the way of debugging. Is it true that using libraries from one compiler will not work with another? If I install HDF5 from source I'm pretty sure I have been able to use it with the intel compiler in the past, even though it will have built with GCC.

The problem which this ticket attempts to solve is that one does not want to build external libraries outside Cactus because it is sometimes complicated and Cactus already knows how to do it. Currently, the libraries are built within Cactus and stored in each configuration and are rebuilt whenever the configuration is rebuilt.

How about implementing a command which uses Cactus to build the libraries and install them outside the Cactus tree. For example,

make HDF5-buildlib prefix=$HOME/software

This would look for the thorn HDF5 and run its configuration script, telling it to build and install in a particular location. This is independent of any specific configuration.

If there are corner-cases where the build needs to be customised to a particular configuration, we would need there to be an association with a configuration for the build, and maybe the prefix would be modified to include the config name.

comment:7 in reply to:  6 Changed 8 years ago by Roland Haas

Replying to hinder:

I think that is too complicated, and would get in the way of debugging.
Is it true that using libraries from one compiler will not work with another?
If I install HDF5 from source I'm pretty sure I have been able to use it with
the intel compiler in the past, even though it will have built with GCC.

I found that Fortran modules (the .mod files in /include directories to be specific) are apparently compiler (and version) specific. If I compile HDF5 (+Fortran interface) using gcc 4.1.2 then I cannot use it with gcc 4.5 (the error message is something like "error parsing module description"). I assume that different compilers could also use different methods to mangle eg. Fortran routine names. I never had problems with C/C++ routines.

comment:8 Changed 8 years ago by Erik Schnetter

When Cactus builds external libraries, it makes extensive use of the options that the user specified. In many cases, building is not possible without such options, e.g. when it comes to choosing a CPU architecture, switching between 32-bit and 64-bit mode, or finding good C and Fortran compilers. Another problem is finding other libraries on which a certain library depends (e.g. PETSc depends on LAPACK). On some systems, the standard make/ar/ranlib/tar tools do not quite work out of the box.

In general, when one uses only C, and when a library does not depend on other libraries, then it can be built without problems. Often, this is also the case for C++, but certainly not for Fortran.

Cactus "knows" how to build and install these libraries because it knows how to configure them.

What would be possible is to take a certain option list and to build a set of external libraries with these. These can then be installed somewhere, but it does not matter whether this is inside a Cactus source tree, or outside, or whether we call this a "mini-configuration", or whether we look for such mini-configurations in other Cactus source trees, or in the home directory of another user.

comment:9 Changed 8 years ago by bmundim

Hi,

the way I used to deal with this problem was simply to build the libraries and move them elsewhere
outside the configuration directory or Cactus tree. That had worked fine for me so far and I think
that would work for most people as well, since they can always name the external directory to match
the configuration and let the config-info to exist there as well as a reminder of the compiler options
used. However, recently, after one of those "sim-reconfig" that doesn't actually reconfigure anything,
I erased the configuration that I used originally to build the libraries. I started then to have
problems with the gsl library, which has already been installed. Investigating this issue I found out
that the options set in GSL.sh uses ${GSL_DIR}/bin/gsl-config, which actually hard coded the
original installation directory (the default on config/scratch/GSL). So moving things around
ended up not being a good idea.

The best way, in my opinion, to deal with this issue at the moment would be to let the user define
the installation directory in its .cactus/config option file. So whenever we build the library
the script also looks for GSL_INSTALL_DIR variable and if it finds it installs the library at
that user specified directory, otherwise it installs in the default location at config/scratch/GSL.
We would set something like this in the .cactus/config file:

GSL_DIR = BUILD
GSL_INSTALL_DIR = /home/me/local/gsl-1.14

and after building and installation we would change to only

GSL_DIR = /home/me/local/gsl-1.14

in order to use it for most other configurations.

I have a patch attached with a small change to GSL.sh script. I encourage you to give a look and
express your opinion. My proposal doesn't change the default behaviour. It only adds an option
for the user to specify the installation directory. The only drawback I can think of at the moment
it is that you may forget which configuration was used to build the libraries. However, you could
always name the directory accordingly and we could improve the script to make it dump a similar
config-info file there as a reminder, if really necessary

Anyway, I find annoying to have to rebuild libraries frequently and I hope my proposed solution to
GSL.sh can be ported for the other library scripts as well and that it can be really a step forward
on avoiding all this unnecessary and time consuming re-compilations.

Changed 8 years ago by bmundim

Attachment: GSL.sh.diff added

comment:10 Changed 8 years ago by bmundim

Status: newreview

comment:11 Changed 8 years ago by Erik Schnetter

The patch is missing the declaration of GSL_INSTALL_DIR in the configuration.ccl file. Or do you envision this to be a global environment variable instead of a Cactus configuration variable?

It's a somewhat dangerous options; e.g. we couldn't use it in our standard simfactory option lists, because there is no mechanism to clean these external locations. It's also dangerous because it doesn't take compiler compatibility into account.

However, these are not arguments against such an option. What do others think?

comment:12 Changed 8 years ago by bmundim

Hi Erik:

The patch is missing the declaration of GSL_INSTALL_DIR in the configuration.ccl file.
Or do you envision this to be a global environment variable instead of a Cactus
configuration variable?

I thought more like an environment variable, but I am fine with declaring it in configuration.ccl.
What is the advantage of doing so though? It seems to me that it serves only to document
the environment or configuration variables used. Is there anything else? What is the idea
or road map behind these options at configuration.ccl?

It's a somewhat dangerous options; e.g. we couldn't use it in our standard simfactory option lists, >because there is no mechanism to clean these external locations.

I am not sure I understood why it is dangerous. Once you know the directory the external libraries
were installed you know where to clean them. It is all in your GSL_INSTALL_DIR variables (or equivalent).

It's also dangerous because it doesn't take compiler compatibility into account.

Yes, you are right about the compiler compatibility and it is the only problem I see
at the moment. However, the user or simfactory option list can still use the default
installation config/scratch directories. Those users that see the compatibility issue
as a big problem would still be configuring the same way as before. Those that don't
face this issue on a daily basis or it is happy to work around it (for example by
having different external locations for different compilers) would now have at least
a better alternative to do so (instead of simply moving the library away from its original
installation directory).

Again, I am not advocating to change the default, I am only proposing to add an option
such that we can install libraries in directories we choose to do so.

comment:13 Changed 8 years ago by Erik Schnetter

The advantage of a configuration option over an environment variable is that one can have two different install dirs e.g. on Kraken, e.g. if one wants to experiment with using the Intel vs. the PGI compiler. The respective libraries would be incompatible.

The idea behind declaring these options in a .ccl file is that, at one point, we may/should/could clean up the configuration mechanism to ensure that only intended variables are passed to the configuration scripts. For example, I found a system that had the R package installed, and had an environment variable RPATH set; RPATH is also a configuration option for (HDF?), which then breaks many things.

Again: what do others think?

comment:14 Changed 8 years ago by Frank Löffler

The patch only handles the problem for one library. Most users would probably like to have this mechanism for all libraries at the same time, without having to specify a lot of installation directories.

What about the following:
within configs/ (or a new subdirectory of the main tree) we create (if requested) directories containing all build libraries of a given configuration. We also save the configuration file (also including options given on the command line) which was used for these there. Comparing these saved configuration files to the one of a currently being built configuration shouldn't be hard or take long. Thus, when building a new configuration, we could (if requested) look through these directories for a matching configuration file and if found, let the library scripts know about that directory as installation directory, and they can then decide to either rebuilt into there or just reuse it.
One issue with that is to decide what should happen on a make -clean. Right now this also rebuilds the libraries, which is fine, as they are local to one configuration and do not affect others. We should probably still clean the libraries, accepting that this will potentiall affect other configurations as well. That shouldn't be too much of a problem, as the same configuration options /should/ result in the same library being built, but that might not always be the case - like after an update of the library itself.

On a higher level this could even be done with thorns in theory. I am not suggesting this right now, just would like to note it. I say in theory, because in practice this will not work as there might be dependencies of thorns on each other, changing something if some other thorn is present in a configuration or not, e.g., through the include-file mechanism. We would have to look out for similar dependencies of libraries on each other though.

All a user would have to do is to enable this mechanism with one directive. Once we tested this for some time I could even think about making this the default, but of course we shouldn't do that too soon.

comment:15 Changed 8 years ago by bmundim

The patch only handles the problem for one library. Most users would probably like to have this > mechanism for all libraries at the same time, without having to specify a lot of installation
directories.

The patch is easy and simple enough to be ported to all other libraries,
and I volunteer to do so if it is the case. I disagree it is a lot of
installation directories. Right now the default in a .cactus/config
file is to have this to force a library building:

GSL_DIR = BUILD

I propose to allow an extra option (that would add one *optional* line
to each library line in the configuration file):

GSL_DIR = BUILD
GSL_INSTALL_DIR = /home/me/local/intel11/gsl-1.14.etc

I mean there is even no need to have it there. If the library is not found
in the system it is built and installed in the config/scratch directories.
My suggestion doesn't change this behaviour, it only adds an option: to free
the user to install the libraries where he or she wants to. The external
libraries have nothing to do with Cactus and it doesn't make sense to me
to bury them in the Cactus tree. The only advantage of using Cactus to build
the external libraries is that Cactus know how to do so, as once Erik said.
The user may want to use these libraries with other pieces of software or
he/she may just not be willing to build libraries whenever a new configuration
is created or an old one cleaned. There is no such an urgent need to do so,
and, besides, we may inadvertently erase configs or whole Cactus trees
without remembering that the libraries were buried there. So I think
we should at least to provide a way out of this, and the simplest way
I thought so far was to let the user especify the library installation
directory.

What about the following:
within configs/ (or a new subdirectory of the main tree) we create (if requested) directories
containing all build libraries of a given configuration. We also save the configuration file
(also including options given on the command line) which was used for these there. Comparing
these saved configuration files to the one of a currently being built configuration shouldn't be > hard or take long. Thus, when building a new configuration, we could (if requested) look through > these directories for a matching configuration file and if found, let the library scripts know
about that directory as installation directory, and they can then decide to either rebuilt into > there or just reuse it.
One issue with that is to decide what should happen on a make -clean. Right now this also
rebuilds the libraries, which is fine, as they are local to one configuration and do not affect > others. We should probably still clean the libraries, accepting that this will potentiall affect > other configurations as well. That shouldn't be too much of a problem, as the same configuration > options /should/ result in the same library being built, but that might not always be the case - > like after an update of the library itself.

Your suggestion may improve things a bit re the frequency in which the
libraries are built, but still doesn't give the user the freedom where
to install the external libraries, it only changes the current default
installation directories. These libraries would still be inside
Cactus/configs tree.

Concerning make -clean we could add an option to make libraries-clean or
similar.

comment:16 Changed 8 years ago by anonymous

I think we might be converging on a solution, but we also have two problems at hand here. Let's try to disentangle them.

  1. We might want to use a library location outside of Cactus, but still we would like Cactus to build that library.

A user would have to specify that location. It might be useful to do that for each library separately, but it might also be of interest to have one switch to build all libraries in some specified place.

  1. Regardless of the location being inside or outside of Cactus it should be possible to avoid too many rebuilds (the original intent of the ticket) in an automatic fashion.

Here we could, as described above, let the user specify that location, or choose some default within
the Cactus tree for that, possibly containing the configuration name, but for sure containing the
configuration options in a file. Then Cactus can compare configuration data of different
configurations within subdirectories there, and figure out by itself if a library needs to be rebuilt,
or if an existing version can be used.

Solving problem 1 needs user interaction to avoid rebuilds, solving problem 2 would make it work automatically, but is also more work to implement.

I suggest we go ahead with the current patch, and keep this as overwrite even if we implement something for problem 2.

comment:17 Changed 8 years ago by bmundim

Ok, I will prepare similar patches for all the libraries in the einsteintoolkit.th
and apply them shortly after I test them all. I will follow Erik's suggestion to
declare the *_INSTALL_DIR as a configuration option in configuration.ccl.

We can keep on the discussions on how to devise the mechanism suggested in 2.

comment:18 Changed 7 years ago by Erik Schnetter

Bruno, have you tried whether your patch actually reduces the number of times external libraries are built? The mechanism that detects whether external libraries need rebuilding is unchanged, and will still rebuild them for every new configuration, and after every make clean.

comment:19 Changed 7 years ago by bmundim

Yes, it does. I haven't built libraries anymore since then.
Note however that my patch doesn't change the default. Maybe
SimFactory could add an option for the path where the user
wants to install his/her external libraries. That would make
it easier to modify the option lists of the machines. Right
now I change the configuration file by hand, but I guess we
don't want that to happen for a beginner user, right?
What do you think about this new option for SimFactory or
the mdb entries?

comment:20 Changed 7 years ago by Erik Schnetter

Which thorn are you using to build e.g. GSL, where your patch helps? Is it in the ExternalLibraries arrangement, or is it in CactusExternal?

comment:21 Changed 7 years ago by bmundim

I applied my patch to thorns in the ExternalLibraries arrangement.

comment:22 Changed 7 years ago by Erik Schnetter

ExternalLibraries/GSL rebuilds a library whenever (a) the GSL thorn changes, or (b) the file configs/*/scratch/done/GSL is missing, e.g. after a make *-clean. The location of GSL_INSTALL_DIR does not factor into this decision at all! In particular:

  • If two Cactus configurations use the same install locations, then the library there will be built twice -- the first will silently be overwritten
  • If you clean a configuration or remove it, and then rebuild it, the GSL library will be built again, even if it already exists there
  • If you install GSL into the Cactus source tree, the same thing happens.

You can easily see this in the build script. The test whether to build is done before GSL_INSTALL_DIR is checked.

comment:23 Changed 7 years ago by bmundim

What my patch does is to provide a way out of the default behaviour,
but it does *not* change the default behaviour. The way I usually do
when building the libraries is as follows:

1) set in my .cactus/config the path where I want the library built
and the BUILD statement to force it to be built. For example, I used
the following configuration for GSL to be built in my laptop with gcc:

GSL_DIR = BUILD
GSL_INSTALL_DIR = /home/bruno/local/gcc4.4.1/gsl-1.14

2) Once I have built them, I change my .cactus/config file to reflect
their location and comment out the two lines above:

#GSL_DIR = BUILD
#GSL_INSTALL_DIR = /home/bruno/local/gcc4.4.1/gsl-1.14
GSL_DIR = /home/bruno/local/gcc4.4.1/gsl-1.14

3) All other configurations that are built with gcc, I use the latest
version of my .cactus/config file.

This has worked well for me so far. I didn't think hard of any corner
case and I didn't think of having this process automated. I hope it is
clearer how I did/do. Now your points:

If two Cactus configurations use the same install locations, then the library there will be built twice -- the first will silently be overwritten

I didn't run into this problem, since once I build the libraries I set its location
in the .cactus/config file as I described above. However, we could work on making this
more robust.

If you clean a configuration or remove it, and then rebuild it, the GSL library will be built again, even if it already exists there

It won't be built again, because at this time I have already set GSL_DIR to the library
location. Again we may want to make it more robust, maybe be by assigning
GSL_DIR = GSL_INSTALL_DIR *after* the library is installed.

If you install GSL into the Cactus source tree, the same thing happens.

This is true. Remember that I didn't change the default. I just provided a way out of it.

comment:24 Changed 7 years ago by Erik Schnetter

I see. Your proposal is then to use Cactus to install (on every system you use) one version of all the libraries you need, instead of installing them yourself, or looking for pre-installed libraries. Later you just use these libraries.

Please apply this patch.

comment:25 Changed 7 years ago by Frank Löffler

Cc: bcmsma@… added

Bruno: Is this still uncommitted?

comment:26 Changed 7 years ago by bmundim

No, it isn't. I have committed it on r19. The same applies to the other ET external libraries.
I haven't done so to the external libraries exclusive to Cactus.

We left this ticket open because my solution is just a work around. People want something more
sophisticated. Meanwhile this patch allows you to just use Cactus to install libraries where
you want.

comment:27 Changed 7 years ago by Frank Löffler

Status: reviewreopened

Ok, since this is commited - reopening the ticket as reminder for the more general solution.

Modify Ticket

Change Properties
Set your email in Preferences
Action
as reopened The ticket will remain with no owner.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from (none) to the specified user.
The owner will be changed from (none) to anonymous.

Add Comment


E-mail address and name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.