Chapter 20. BLCR

Table of Contents

20.1. Quick BLCR user guide

20.1. Quick BLCR user guide

BLCR is the acronym of Berkeley Lab Checkpoint/Restart. Features of BLCR:

  • Fully SMP safe

  • Rebuilds the virtual address space and restores registers

  • Supports both LinuxThreads and NPTL implementations of POSIX threads

  • Restores file descriptors, and state associated with an open file

  • Restores signal handlers, signal mask, and pending signals.

  • Restores the process ID (PID), thread group ID (TGID), parent process ID (PPID), and process tree to old state.

  • Tested with the GNU C library (glibc) versions 2.1 through 2.3

[root@iggi ~]# urpmi blcr
To satisfy dependencies, the following packages are going to be installed:
blcr-0.4.pre3_06_09_06-2mdviggi.i586
blcr-libs-0.4.pre3_06_09_06-2mdviggi.i586
blcr-modules_2.6.12_26mdksmp-0.4.pre3_06_09_06-2mdviggi.i586
Proceed with the installation of the 3 packages? (0 MB) (Y/n) 

installing blcr-0.4.pre3_06_09_06-2mdviggi.i586.rpm blcr-libs-0.4.pre3_06_09_06-2mdviggi.i586.rpm blcr-modules_2.6.12_26mdksmp-0.4.pre3_06_09_06-2mdviggi.i586.rpm from //var/install/cluster/media/main
Preparing...                     #############################################
      1/3: blcr-libs             #############################################
      2/3: blcr-modules_2.6.12_26mdksmp#############################################
      3/3: blcr                  #############################################

[root@iggi ~]# modinfo blcr
filename:       /lib/modules/2.6.12-26mdksmp/blcr/blcr.ko
author:         Lawrence Berkeley National Lab http://ftg.lbl.gov/checkpoint
description:    Berkeley Lab Checkpoint/Restart (BLCR) kernel module
license:        GPL
vermagic:       2.6.12-26mdksmp SMP 686 gcc-4.0
depends:        blcr_imports,blcr_vmadump

[root@iggi ~]# modprobe blcr

[root@iggi ~]# dmesg | grep blcr
vmadump: Modified for blcr 0.4.pre3_snapshot_2006_09_06 <http://ftg.lbl.gov/checkpoint>
blcr: Berkeley Lab Checkpoint/Restart (BLCR) module version 0.4.pre3_snapshot_2006_09_06.
blcr: Supports BLCR kernel interface version 0.2.0.
blcr: http://ftg.lbl.gov/checkpoint

Command availables are:

  • cr_run: runs a subprocess with checkpoint library loaded

  • cr_checkpoint: checkpoints a process, process group, or session

  • cr_restart: restarts a process, process group, or session from a checkpoint file

If you want to check that blcr works you can test it, intall inria_tools package, and test blcr with tri_fusion.

[guibo@guibpiv ~] 10:20:59 $
cr_run /usr/bin/tri_fusion &
init Sort number 0 [1] 16006
[guibo@guibpiv ~] 10:21:00 $
 sorting data  checking  sorted table
init Sort number 1  sorting data  checking  sorted table
init Sort number 2  sorting data  checking  sorted table
init Sort number 3  sorting data  checking  sorted table

We get the id of the processus: 16006. Now on another terminal we will do the backup:

[guibo@guibpiv ~] 10:23:40 $
cr_checkpoint --stop 16006

This command stop the processus and generate the backup file called context.16006. We can now destroy the processus:

[guibo@guibpiv ~] 10:24:41 $
kill -9 16006

[guibo@guibpiv ~] 10:26:12 $
du -h context.16006
25M     context.16006

To restart the processus using the backup file:

[guibo@guibpiv ~] 10:26:49 $
cr_restart context.16006
 checking  sorted table
init Sort number 18  sorting data  checking  sorted table
init Sort number 19  sorting data