Table of Contents
BLCR is the acronym of Berkeley Lab Checkpoint/Restart. Features of BLCR:
Fully SMP safe
Rebuilds the virtual address space and restores registers
Supports both LinuxThreads and NPTL implementations of POSIX threads
Restores file descriptors, and state associated with an open file
Restores signal handlers, signal mask, and pending signals.
Restores the process ID (PID), thread group ID (TGID), parent process ID (PPID), and process tree to old state.
Tested with the GNU C library (glibc) versions 2.1 through 2.3
[root@iggi ~]# urpmi blcr To satisfy dependencies, the following packages are going to be installed: blcr-0.4.pre3_06_09_06-2mdviggi.i586 blcr-libs-0.4.pre3_06_09_06-2mdviggi.i586 blcr-modules_2.6.12_26mdksmp-0.4.pre3_06_09_06-2mdviggi.i586 Proceed with the installation of the 3 packages? (0 MB) (Y/n) installing blcr-0.4.pre3_06_09_06-2mdviggi.i586.rpm blcr-libs-0.4.pre3_06_09_06-2mdviggi.i586.rpm blcr-modules_2.6.12_26mdksmp-0.4.pre3_06_09_06-2mdviggi.i586.rpm from //var/install/cluster/media/main Preparing... ############################################# 1/3: blcr-libs ############################################# 2/3: blcr-modules_2.6.12_26mdksmp############################################# 3/3: blcr ############################################# [root@iggi ~]# modinfo blcr filename: /lib/modules/2.6.12-26mdksmp/blcr/blcr.ko author: Lawrence Berkeley National Lab http://ftg.lbl.gov/checkpoint description: Berkeley Lab Checkpoint/Restart (BLCR) kernel module license: GPL vermagic: 2.6.12-26mdksmp SMP 686 gcc-4.0 depends: blcr_imports,blcr_vmadump [root@iggi ~]# modprobe blcr [root@iggi ~]# dmesg | grep blcr vmadump: Modified for blcr 0.4.pre3_snapshot_2006_09_06 <http://ftg.lbl.gov/checkpoint> blcr: Berkeley Lab Checkpoint/Restart (BLCR) module version 0.4.pre3_snapshot_2006_09_06. blcr: Supports BLCR kernel interface version 0.2.0. blcr: http://ftg.lbl.gov/checkpoint
Command availables are:
cr_run: runs a subprocess with checkpoint library loaded
cr_checkpoint: checkpoints a process, process group, or session
cr_restart: restarts a process, process group, or session from a checkpoint file
If you want to check that blcr works you can test it, intall inria_tools package, and test blcr with tri_fusion.
[guibo@guibpiv ~] 10:20:59 $ cr_run /usr/bin/tri_fusion & init Sort number 0 [1] 16006 [guibo@guibpiv ~] 10:21:00 $ sorting data checking sorted table init Sort number 1 sorting data checking sorted table init Sort number 2 sorting data checking sorted table init Sort number 3 sorting data checking sorted table
We get the id of the processus: 16006. Now on another terminal we will do the backup:
[guibo@guibpiv ~] 10:23:40 $ cr_checkpoint --stop 16006
This command stop the processus and generate the backup file called context.16006. We can now destroy the processus:
[guibo@guibpiv ~] 10:24:41 $ kill -9 16006 [guibo@guibpiv ~] 10:26:12 $ du -h context.16006 25M context.16006
To restart the processus using the backup file:
[guibo@guibpiv ~] 10:26:49 $ cr_restart context.16006 checking sorted table init Sort number 18 sorting data checking sorted table init Sort number 19 sorting data