formaline capture simfactory information

Issue #1890 new
Jonah Miller created an issue

It would be nice if Formaline captured the machine description used by simfactory (i.e., properties.ini, optionlist, submitscript) for a simulation. This information would be convenient for reproducing the exact configuration on a machine later.

Keyword: formaline

Comments (8)

  1. Erik Schnetter
    • removed comment

    I don't think the MDB entry is stored.

    Also, it would be good to have these in the json files, not just in the directory.

  2. Ian Hinder
    • removed comment

    Is this a property of the configuration (at the time of build), or the simulation (at the time of submission/run)? I would say the latter, since we want to know with 100% certainty what was used (simfactory could have been updated between build and submission, which would change the MDB file). Is there a mechanism that simfactory could use to "register" the files at submission time, so that Formaline would pick them up and store them in its records?

  3. Frank Löffler
    • removed comment

    I also don't quite understand what Formaline should do here. Formaline deals with the source code, and it generally does it's job. Simfactory deals with build options, submission scripts ect, and it does so too.

    The option list can usually be found in name/SIMFACTORY/cfg/OPTIONLIST, and the other scripts are in run/, or expanded versions in individual restarts, properties.ini is saved as well. As long as users don't delete this, I don't really see a reason why we should duplicate it.

    On more general teams, I see Simfactory 'above' Formaline in terms of layers. If anything, Simfactory might know about Formaline's special output and post-process it if wanted, not the other way around.

  4. Jonah Miller reporter
    • removed comment

    What I was thinking of when I submitted the ticket was a convenient way of packaging all of simfactory's information in, e.g., a json file or tarball so that the whole directory tree structure doesn't have to be carried around. Perhaps this would be a job for simfactory not formaline?

  5. Frank Löffler
    • removed comment

    Replying to [comment:5 jonah.maxwell.miller@…]:

    What I was thinking of when I submitted the ticket was a convenient way of packaging all of simfactory's information in, e.g., a json file or tarball so that the whole directory tree structure doesn't have to be carried around. Perhaps this would be a job for simfactory not formaline?

    I think it would. On the other hand, simfactory already makes a copy of the exectuable, which contains the formaline output. So, all you really would need is the simfactory directory. If you dislike that this is a directory with a number of files in it, what about putting it into a 'tar', but what would be the point of that, other than archiving possibly?

    What do you plan to do? Maybe there are other ways to do that?

  6. Jonah Miller reporter
    • removed comment

    For the time being, I plan to make a tarball of the simfactory parameters, optionlist, and submit script, the source tree generated by formaline, the parameter file, and the formaline json file. This way I have a single file containing all I need to reproduce a result that I can attach to, say, a plot or visualization. I don't actually want the executable, because it's large and it will probably become stale in a bit.

    There is probably a better way to do this.

  7. Erik Schnetter
    • removed comment

    Formaline is not just about storing the source code. Formaline is about making simulations reproducible, and that includes capturing all parameters (and parameter changes) at run time, and recording the simulation environment (machine, user, directory, time, various UUIDs, etc.), and also putting part of that information automatically into various places (ASCII and HDF5 output, unless disabled).

    I don't know whether Formaline records the number of MPI processes (probably yes) and the number of OpenMP threads (probably no), but it should record both, as well as the job id from the queuing system. It would be trivial to record the MDB entry that Simfactory used to create a simulation, and thus it should be done. "Recording" doesn't mean that it should be in one of the directories -- "recording" means that Formaline puts this information into all the output channels where recorded information is available. These are currently (mostly) a human-readable text file and a json file, as well as a database server if one is configured (it typically isn't, unfortunately). ((We should revisit this -- people find it quite easy to set up a private git repository, so we should go this route if it makes people use it.)) This information is also available at run time, if there is e.g. a web server or a log server running as a thorn. It should probably also go, in its entirety, into large output files.

  8. Log in to comment