Chapter 13. Administration of IGGI Nodes

Table of Contents

13.1. setup_admin.pl script
13.2. Using Berkeley Tools
13.2.1. authd
13.2.2. gexec
13.3. ka Tools
13.3.1. ka-run
13.3.1.1. rshp
13.3.1.2. mput
13.3.2. Taktuk2
13.4. dssh
13.5. Tentakel
13.6. gsh
13.7. pssh
13.8. fanout
13.9. sauvegarde (Saving Data)
13.10. adduserNis.pl and deluserNis.pl

13.1. setup_admin.pl script

This script configure all tools available in administration mode. If you launch it without parameter, it show you a quick help.

[root@iggi ~]# setup_admin.pl 

 HELP:
 |---------------------------------------------------------|
 | clusterit       configre clusterit environement (dsh..) |
 | gexec           configure gexec environement            |
 | gsh             configure gsh remote command            |
 | dssh            configure dssh remote command           |
 | tentakel        configure tentakel remote command       |
 | rshp            set rshp environement (ka-tools)        |
 | pssh            pssh env and conf                       |
 | wulf            configure wulfstat                      |
 | urpmi           configure urpmi parallel (reset)        |
 | node_status     display node status in admin mode       |
 | info            display info of configuration           |
 | doall           do all above                            |
 |---------------------------------------------------------|

13.2. Using Berkeley Tools

Those definitions are based on relative web sites.

13.2.1. authd

Authd is a software package for obtaining and verifying user credentials containing cryptographic signatures based on RSA public key cryptography. It includes a server (authd) for authenticating local users through UNIX domain sockets and processing credentials, and a client library (libauth.a) for requesting new credentials and verifying credentials signed by the server.

Home page: http://www.cs.berkeley.edu/~bnc/authd/

Authd is used by gexec. The server key must be replicated on all client nodes. The authd package copies this key into the /var/lib/tftpboot/X86PC directory.

13.2.2. gexec

GEXEC is a scalable remote execution system which provides fast, RSA authenticated remote execution of parallel and distributed jobs for clusters. It provides transparent forwarding of stdin, stdout and stderr, the passing of signals to and from remote processes, provides local environment propagation, and is designed to be robust and scalable on systems with over 1000 nodes. Home page: http://www.cs.berkeley.edu/~bnc/gexec/. The GEXEC variable is set by default, and reset when you use the setup_server_cluster.pl script with the gennodeone parameter.

There are two ways to configure gexec: add a list of all nodes you wish to administer with gexec in the .bashrc of your current user:

export GEXEC_SVRS="node1 node2 node3 node4"

or specify the address of the gmond server:

export GEXEC_GMOND_SVRS="12.12.12.253:8649"

The following sample shows how to execute the "hostname" command on all configured nodes using the -n 0 parameter.

The number at the beginning of each line represents the number of the node that answered your request.

Example:

[root@iggi ~]# gexec -n 0 hostname ; uptime
0 node1.guibland.com
1 node2.guibland.com
2 node3.guibland.com

[root@iggi ~]# gexec -n 0 uptime
1  13:47:16 up 18 min,  0 users,  load average: 0.04, 0.10, 0.03
0  13:47:16 up 26 min,  1 user,  load average: 0.04, 0.07, 0.02
2  13:47:16 up 18 min,  0 users,  load average: 0.04, 0.10, 0.05

13.3. ka Tools

Ka is a set of open source tools designed to assist with the installation and use of a cluster of PC's running Linux or Windows. It includes a scalable solution for cloning nodes (ka-deploy) and a process management library (ka-run). Ka also provides a scalable NFS-compliant file systems (ka-nfsp) and system monitoring tools (ka-admin). Home page: http://ka-tools.sourceforge.net/

13.3.1. ka-run

You can find more information at: http://www-id.imag.fr/Laboratoire/Membres/Martin_Cyrille/karun.html

13.3.1.1. rshp

Simply use the command: rshp $NKA -- command_you_want_to_perform

$NKA is set in /etc/profile.d/cluster.sh, and by default is set to the remote command (here, it's ssh), and the list of the nodes in rshp format. If you want to update all your environement, just do a:

[root@iggi ~]# echo $NKA
-c ssh -m node1.guibland.com
[root@iggi ~]# source /etc/profile.d/cluster.sh
[root@iggi ~]# echo $NKA
-c ssh -m node1.guibland.com -m node2.guibland.com -m node3.guibland.com

[root@iggi ~]# rshp -v $NKA -- "w"
<node1.guibland.com> :->: 13:54:36 up 34 min,  1 user,  load average: 0.02, 0.03, 0.00
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
<node2.guibland.com> :->: 13:54:36 up 25 min,  0 users,  load average: 0.04, 0.05, 0.01
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
<node3.guibland.com> :->: 13:54:36 up 25 min,  0 users,  load average: 0.00, 0.02, 0.01
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT

A more complex one, i want to know if ypbind is runninng on nodes, so i use "ps axf | grep ypbind", but i use "| grep -v rshp_wrap" to remove the ka-tools wrapper rshp_wrap from my output command.

[root@iggi ~]# rshp -v $NKA -- "ps axf | grep ypbind" | grep  -v rshp_wrap
<node1.guibland.com> :->: 2195 ?        Sl     0:00 ypbind
 4570 ?        R      0:00              \_ /bin/bash -c  ps axf | grep ypbind 
<node2.guibland.com> :->: 2200 ?        Sl     0:00 ypbind
 4007 ?        R      0:00              \_ /bin/bash -c  ps axf | grep ypbind 
<node3.guibland.com> :->: 2202 ?        Sl     0:00 ypbind
 3993 ?        R      0:00              \_ /bin/bash -c  ps axf | grep ypbind 

13.3.1.2. mput

Mput is used to copy file in parrallel mode on the cluster. Syntax: mput $NAK -- /path_to_file/filename /path_dest/filename_dest. You must specify the filename destination.

[root@iggi ~]# mput $NKA -- /root/t1.xwd /tmp/t1.xwd
copy of file  /root/t1.xwd

[root@iggi ~]# rshp $NKA -- md5sum /tmp/t1.xwd
e6e027c493bdd98e488ba7762a21a822  /tmp/t1.xwd
e6e027c493bdd98e488ba7762a21a822  /tmp/t1.xwd
e6e027c493bdd98e488ba7762a21a822  /tmp/t1.xwd

13.3.2. Taktuk2

taktuk2 website: Taktuk is a parallel and scalable remote execution tool for cluster. It works by propagating the execution of a parallel program on all target nodes, using standard remote execution protocols (rsh, ssh, etc...). Remote call scheduling automatically adapts its behavior to the remote execution protocol used, and to the load of the network and remote hosts. This tool is completly independent of the remote protocol used. All remote execution protocol providing an IO redirection of the remote process launched may be used. A grammar may be optionally provided to describe the environment, hence providing increased overall deployment speed in the context of a complex topology (for instance in a grid environment). For a more responsive deployment a remote launch (rsh ssh or other) delay can be bounded to bypass slow nodes and ignore the specific timeout provided by the protocol used. Taktuk provides full IO and signal redirection to the original console user.

Copy of an RPM on 3 nodes using mput2

[root@iggi tmp]# du kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm
51M     kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm
[root@iggi tmp]# mput2 -P $NKA -- kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm /tmp/
Taktuk Time: 5.05384

Md5sum of the RPM on all nodes using rshp2:

[root@iggi tmp]# rshp2 $NKA -- md5sum /tmp/kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm
9561929bad85fd4d9930160c1799d110  /tmp/kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm
9561929bad85fd4d9930160c1799d110  /tmp/kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm
9561929bad85fd4d9930160c1799d110  /tmp/kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm

[root@iggi tmp]# md5sum kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm
9561929bad85fd4d9930160c1799d110  kernel-source-2.6.17.5mdv-1-1mdv.i586.rpm

But mput2 can also copy a full directory.

[root@iggi media]# du -sh main
562M    main
[root@iggi media]# mput2 -P $NKA -- main/ /tmp/main_mirror
Taktuk Time: 57.6031

[root@iggi media]# rshp2 $NKA -- du -sh /tmp/main_mirror
562M    /tmp/main_mirror
562M    /tmp/main_mirror
562M    /tmp/main_mirror
      

Waouh ! 57 secondes to copy 562Mo on 3 nodes ! that's really fast !

13.4. dssh

dssh is another remote command. Use the VAR DSSH to get all nodes availables in administration.

[root@iggi ~]# dssh [root@iggi ~]# dssh $DSSH -e uptime
executing 'uptime'
[email protected]:22|      14:15:57 up 55 min,  1 user,  load average: 0.00, 0.06, 0.03
[email protected]:22|      14:15:57 up 47 min,  0 users,  load average: 0.00, 0.08, 0.06
[email protected]:22|      14:15:57 up 46 min,  0 users,  load average: 0.00, 0.04, 0.01

13.5. Tentakel

Tentakel is a program that executes the same command on many hosts in parallel using SSH. It is designed to be easily extendable. The output of the remote command can be controlled by means of format strings. A basic tentakel configuration file has been created with setup_admin.pl script. See /etc/tentakel.conf.

[root@iggi ~]# tentakel
interactive mode
tentakel(default)> help
commands (type help <topic>):
conf  exec  help  hosts  listgroups  quit  use

tentakel(default)> exec uptime
### node2.guibland.com(stat: 0, dur(s): 0.2):
 14:19:04 up 50 min,  0 users,  load average: 0.00, 0.04, 0.04
### node1.guibland.com(stat: 0, dur(s): 0.2):
 14:19:04 up 58 min,  1 user,  load average: 0.00, 0.03, 0.01
### node3.guibland.com(stat: 0, dur(s): 0.21):
 14:19:04 up 49 min,  0 users,  load average: 0.00, 0.01, 0.00
tentakel(default)

13.6. gsh

It runs commands on other hosts through ssh. /etc/ghosts file is configured by default with list of node availables in administration.

[root@iggi ~]# gsh -r iggi "uptime"
node1.guibland.com:      14:26:02 up  5:51,  4 users,  load average: 0.01, 0.05, 0.07
node2.guibland.com:      14:26:02 up  5:51,  4 users,  load average: 0.01, 0.05, 0.07
node3.guibland.com:      14:26:02 up  5:51,  4 users,  load average: 0.01, 0.05, 0.07

13.7. pssh

dsh

13.8. fanout

Fanout allows you to run non-interactive commands on remote machines simultaneously, collecting the output in an organized fashion.

[root@iggi ~]# fanout "n1 n2 n3" "uptime"
Starting n1
Starting n2
Starting n3
Fanout executing "uptime"
Start time Tue Nov 7 14:28:01 CET 2006 , End time Tue Nov 7 14:28:07 CET 2006
==== On n1 ====
   14:28:03 up  1:07,  2 users,  load average: 0.00, 0.00, 0.00

==== On n2 ====
   14:28:05 up 59 min,  0 users,  load average: 0.16, 0.03, 0.01

==== On n3 ====
   14:28:07 up 59 min,  0 users,  load average: 0.00, 0.00, 0.00

Exiting fanout, cleaning up...done.

13.9. sauvegarde (Saving Data)

All users can save their own data with the /usr/bin/sauvegarde script. The Backup is set read-only in the /home/backup/USER_NAME directory.

Example:

[root@iggi ~]# sauvegarde 
|---------------------------------------------------------|
| usage: sauvegarde name_backup rep_to_backup             |
| Sauvegarde automatically add the Hostname and DATE.     |
|                                                         |
| example:                                                |
| sauvegarde conf /root/conf/                             |
| produce this output filename: conf-HOSTNAME-DATE.tar.gz |
|                                                         |
| File is store in /home/backup/
| log of Backup are store in /var/log/sauvegarde.log
|---------------------------------------------------------|


[root@iggi ~]# sauvegarde iggi_user /home/nis/iggi/
 - Saving /home/nis/iggi/

tar: Removing leading `/' from member names
/home/nis/iggi/
/home/nis/iggi/tmp/
/home/nis/iggi/.mozilla/
/home/nis/iggi/.mozilla/bookmarks.html
/home/nis/iggi/.bash_logout
...............

 - Backup SUCCESS in /home/backup/ directory
 - Setting read-only and mode undelete on file

13.10. adduserNis.pl and deluserNis.pl

adduserNis and deluserNis are simple scripts to add or remove a user in a Nis domain. When adding a user, complete environment is set, so the user can launch jobs and work on all nodes. When removing a user, a test is performed to determine if their home directory is mounted on a node. If it is, the procedure to delete the user is canceled.

Example of adduserNis

[root@iggi ~]# adduserNis.pl 
-----------------------------------------------------------
Add New user in NIS environnement on iggi.guibland.com
user with an uid > 500 are NIS user
-----------------------------------------------------------
Login : 
iggi
Group(s) [users] (You are member of mpi, oar, pvm by default) : 

 - Backup of /etc/group configuration
Adding iggi in mpi group.
mpi group not found!
Exiting
Adding iggi in pvm group.
Adding iggi in oar group.
----------------------------------------------------------
Login: iggi
Group: users
Comment: 
passwd iggi:
Changing password for user iggi.
New UNIX password: 
BAD PASSWORD: it's WAY too short
Retype new UNIX password: 
passwd: all authentication tokens updated successfully.
gmake[1]: Entering directory `/var/yp/guibland.com'
Updating passwd.byname...
Updating passwd.byuid...
Updating group.byname...
Updating group.bygid...
Updating netid.byname...
# mail netgrp publickey networks ethers bootparams printcap \
# amd.home auto.master auto.home auto.local passwd.adjunct \
# timezone locale netmasks
gmake[1]: Leaving directory `/var/yp/guibland.com'
 - Creating ssh key for user iggi
Generating public/private dsa key pair.
ssh_askpass: exec(/usr/lib/ssh/ssh-askpass): No such file or directory
ssh_askpass: exec(/usr/lib/ssh/ssh-askpass): No such file or directory
Your identification has been saved in /home/nis/iggi/.ssh/id_dsa.
Your public key has been saved in /home/nis/iggi/.ssh/id_dsa.pub.
The key fingerprint is:
50:93:9a:44:1c:89:ea:45:fc:27:5f:ac:a1:2d:d4:22 [email protected]

 - Authorize user to ssh himself
 - Setting .rhosts file for iggi
 - Setting default .xinitrc for user
 - Create mutt config
 - Setting permission on file
 - Adjust chmod to 0644 on .rhost key

Example of deluserNis

[root@server tmp]# deluserNis
 - Remove user from NIS Map

 User: guibo
 - Deleting user
 - Updating NIS table
gmake[1]: Entering directory /var/yp/iggi.com'
Updating passwd.byname...
Updating passwd.byuid...
Updating group.byname...
Updating group.bygid...
Updating netid.byname...
gmake[1]: Leaving directory /var/yp/iggi.com'

deluserNis error

[root@server tmp]# deluserNis
 - Remove user from NIS Map
 User: guibo

 !!!! WARNING !!!!

 Can't del user guibo, before delete this user
 Umount /home/nis/guibo from:
node3.iggi.com