Opened 8 years ago

Last modified 6 years ago

#322 new enhancement

SimFactory metadata deleted by periodic filesystem purges

Reported by: Ian Hinder Owned by: mthomas
Priority: minor Milestone:
Component: SimFactory Version:
Keywords: Cc:


Production filesystems are subject to periodic purges (typically on the order of weeks or months) where data which has not been accessed recently is deleted. This means that it is possible for some restarts of very long-running simulations to be deleted by the system. This can be addressed by an automated archiving system, but such a system does not address the problem that the simulation metadata directory (currently called SIMFACTORY) and any restarts which have not been run yet, will also be purged. This would make it impossible to submit future restarts and limits the number of chained restarts you can submit to the purge time of the system.

One possibility to solve this problem would be to store a backup, or "shadow" copy of all the simulation metadata in a non-volatile location. This could be the user's home directory, or a "work" directory which is not purged. The details would need to be worked out.

This is not a serious issue yet.

Attachments (0)

Change History (4)

comment:1 Changed 7 years ago by Ian Hinder

This could also be addressed by "touching" each of the required metadata files when each restart begins. Since the metadata files are not large, this should not be seen as an abuse of the system.

comment:2 Changed 7 years ago by Erik Schnetter

I'm afraid that the instructions on these systems are quite clear -- touching files is considered abuse. I would not suggest people to do this without permission from the HPC centres.

However, touching files that will be needed for a currently running or submitted restart is a different issue. We'd need a mechanism to touch these files often enough if the job waits in the queue for a significant amount of time.

comment:3 Changed 7 years ago by Ian Hinder

Another (more complicated, possibly too complicated and confusing) option is to have the simulation metadata stored in a work filesystem and the actual data only in the scratch filesystem. They could be connected by symbolic links in the work filesystem so the user "sees" a unified simulation there. SimFactory itself would know how to handle these links when archiving, purging or getting the simulation.

comment:4 Changed 6 years ago by Ian Hinder

Re: comment:2, we would also need a mechanism to touch the checkpoint files, or arrange that these were also stored in a nonvolatile location. These might be too big for that. Without the checkpoint files, the simulation can't be recovered anyway. We should ask the admins what to do about checkpoint files which are sufficiently old to be purged between jobs. I have heard of 2-week purge times, and it's not impossible that jobs could take longer than that to start.

Modify Ticket

Change Properties
Set your email in Preferences
as new The owner will remain mthomas.
Next status will be 'review'.
as The resolution will be set.
to The owner will be changed from mthomas to the specified user.
Next status will be 'confirmed'.
The owner will be changed from mthomas to anonymous.

Add Comment

E-mail address and name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.