Oct 30, 2008

Ubuntu 8.10 Released (Intrepid Ibex)

Ubuntu 8.10 is released. You might want to read the release notes and the press release.

I waited until today to try out the ibex on my laptop, because my wireless card from intel had some firmwire issues with the new linux kernel (2.6.27) which weren't fixed in 8.10 for the beta version.

A lot of people are downloading from the servers. Before you start, make sure you have configured a repository, which is close to you (go to system->administration->software sources->download from->other-> select best server). You might also want to try out apt-p2p. Read a tutorial here.

If you are ready to upgrade to intrepid, press alt-f2 (or open the terminal) and type
update-manager -d

At first I had sound issues on my office computer, but killing and removing pulseaudio and switching to OSS worked (see forum post).

Oct 23, 2008

Access machine in private network with shared ip

Suppose your computer at work is in a private network that does IP masquerading. You want to access that machine from home or from another computer outside the network. Suppose you have sent your admins about 30 mails in about 14 weeks about you wanting to access your machine from outside and suppose they answered about a third of the mails, but you haven't made any significant progress towards accessing your machine. I am not talking hypothetically here unfortunately.

Actually it's quite simple if you use reverse ssh tunnels (thank you howtoforge). Say you want to access from a machine, which we call second_machine. From your machine at work you type (as root):
# ssh -R 19999:localhost:22  user@second_machine 


From second_machine you can now access your machine at work:
ssh localhost -p 19999


By default this reverse tunnel should stay alive with the ssh session from your machine at work to the second_machine. You can change the time this session stays alive in second_machine:/etc/ssh/sshd_config with ClientAliveInterval n. Default is 0, which means there is no automatic logout. You may also use autossh to automatically restart the ssh tunel.

If you copy files by scp use the -P switch to specify the port. With rsync, rsync ... -e 'ssh -p 19999' ....

Oct 14, 2008

Cloning of Slaves

We are talking about software installation on the beowulf cluster. A beowulf is a dedicated compute cluster that runs scientific computations in parallel. Usually there is one head node (also called master) and several slaves, connected over the network. Processes are started from the master and distributed from there on the slaves. Installing and configuring the OS on each machine manually is cumbersome and prone to error. However, nodes are identical, so we installed just one slave and now we just copy everything to the other slaves. This process is called cloning.

In earlier posts I explained the hardware assembly of the beowulf cluster and installation of the master and one slave. In this post we want to scale up this configuration to more slaves. In a later post, I am going to explain how to run parallel processes in matlab and GNU R using MPI and PVM.


We first setup a so-called golden node or model node and then transfer the system to other slave machines. Each new node will come with only one new entry in the head node's DHCP server file (/etc/dhcpd.conf) and /etc/hosts file.

Finding out how to clone took me about a week of time and many failed trials, but it is very important to know for clustering. I learned about rsync, tar + ssh, netcat, and dd.

For preparation, make sure that in /etc/fstab and in the /boot/grub/menu.lst, there are no physical addresses of hardware (e.g. a hard disk), as they will be different among the nodes. All hardware should be addressed by their subdirectory in /dev which you can see in the output when you type mount.

I read many articles on the internet on disk cloning and system cloning and tried a lot of things. g4u, a tool for hard disk image cloning, however the boot disk didn't boot.

I tried cloning that was inspired by the linux cluster howto by a tar command, however suspected due to reasons having to do with file permissions, user ids, sockets, pipes, or similar, it didn't work out.

You can send data from one computer to a second over ssh, however tar by default maps the user ids to the system where you compress and extract, respectively. On cloning over network you have to tell tar explicitely by the --numeric-owner option to conserve the original mapping. I made some mistake I didn't figure out which.

I also tried a more elegant way to clone systems using rsync (importantly with the numeric-ids option, same issue as tar):

node2# rsync -Saq --numeric-ids --exclude=/proc --exclude=/sys --exclude=/dev
-e 'ssh -c blowfish' node1:/ /mnt/hd



where node2 is the new machine, node1, the machine from where we want to clone.

However this neither worked. ;(

I did this after booting node2 from CD, mounting file system starting /mnt/hd with subdirectories mirrowing node1 filesystem. This implied that /boot on node2 and node1 has same permissions... I don't get it.

Comment: tar and rsync are very fast, You can directly unpack and copy it using ssh and tar together, for example:
ssh master "cat /usr/local/gnslash.tgz" | tar -xzf --numeric-owner --same-permissions --atime-preserve=replace -
I presuppose you have partitioned node2 (see for example gparted liveCD) and mounted the disk. You should mount them all under /mnt/golden_node (or whatever mount point) with the same organization they will have later. For example, if you have these mount points:
/dev/hdb1: /
/dev/hdb2: /home
/dev/hdb3: /usr/local


Then mount the three partitions under /mnt/sysimage as follows:
/dev/hdb1: /golden_node
/dev/hdb2: /golden_node/home
/dev/hdb3: /golden_node/usr/local


If you have to mount your LVM devices, use lvm vgscan and lvm vgchange -a y to make them available first.

An elegant solution would be bootdisk creation or network booting with clone script. livecd-tools allow you to create live images of your distribution which you can boot up and provide with scripts. See a fedora live cd howto and the red hat linux customization guide for details. Before compiling commands together in the install script you can try them out interactively in a virtual machine embedded on one of your computers (qemu -m 1024 -cdrom filename.iso
or on x86_64 machines qemu-system-x86_64 -m 1024 -cdrom bootcd.iso).

The basic steps for copying the archive to the other machines are pretty obvious, however my problem at this point was that I couldn't find my hard disk in /dev and therefore couldn't mount it. I downloaded one distribution after another and couldn't find my hard disk, before I found out that /dev/mapper/VolGroup00-LogVol00 refers to the Logical Volume Manager (LVM), which are not given in some installcds. I finally downloaded a Fedora install CD, unpacked the iso, put the .tgz archive in the directory together with it, and created a bootable CD from this using mkisofs:
mkisofs -J -R -o installcd.iso -b isolinux/isolinux.bin -c isolinux/boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table Fedora-9-x86_64-disc1


Long introduction, now I come to the solution. After all this trials, I finally used low level R/W with dd and piping to and from netcat, respectively on machine to clone from and machine to clone to, as described in a second howto from same site. It is the first time I heard of netcat (nc) which is a very cool program, a kind of pipe over the network.

On node1 you run:

node1# dd if=/dev/hda conv=sync,noerror bs=64k | nc -l 5000

On node2 you run:

node2# nc 192.168.1.1 5000 | dd of=/dev/hda bs=64k

where 192.168.1.1 is the ip of node1. This presupposes the disk of node2 is at least as big as node1's.

This took a lot of time (it said 158GB read/written), but it worked.

Later, you can always apply changes to nodes by the following command on the golden node:

node1# rsync -avHx --delete / root@node2:/mnt/sysimage/

Be careful, how to use this command. You might screw up your freshly configured node.


Book recommendation:
High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI (Nutshell Handbooks) by Joseph Sloan

Oct 9, 2008

Cluster Installation and Administration

In this post I want to share some experiences and to tell about the software installation on our beowulf cluster. Short reminder: a beowulf cluster is a cluster of computers designed to run computing jobs in parallel, running linux, and dedicated to scientific calculations. In an earlier post I explained the hardware setup of the beowulf, now it's time to install operating system and software. In later post I am going to explain cloning of configurations on the nodes, and running parallel processes in matlab and GNU R using MPI and PVM.

This post is going to be about the following topics:
1. Basic setup choices
2. network setup with DHCP,
3. sharing files over the network with the network file system (NFS),

The cluster has a master - slave structure. The master (or head node) has two network interfaces (eth0, eth1), one connected to the outside world and one connected to the cluster intranet over a network switch. All other computers computers (the slaves or nodes) are connected to the switch. In order to start processes on the cluster a user logs in on the master and spawns the processes from there to the slaves. This means, the master and the slaves need a slightly different setup.

Basic Setup Choices

We first tried installing Rocks Clusters (v. 5), a linux distribution specialized on facilitating the installation and administration of clusters, however it didn't recognize the network cards of our computers. Then we tried the Ubuntu Server Edition (Hardy), which didn't recognize the CD-ROM. Fedora 9 recognized all hardware at once. As for the slaves, I started from vanilla installations, on the master, I installed everything that could be of any remote interest.

Select list of installed software:
- (of course) gnu toolchain
- vim, emacs
- subversion
- openssh server+client
- X
- script languages: Perl, python, R, matlab
- compilers: c/c++, java, fortran, mpicc
- sun java
- mpich2, pvm

Network Setup

I set up the network ips to 192.168.1.1 until 192.168.1.8, where 8 is my master. See the linux networking howto for help on this. I first worked with static addresses, but later for practical purposes I found it much easier to set up first one slave and scale up the cluster by cloning.

For this I changed to dynamic addresses (DHCP) handled by the master on the basis of physical addresses of slaves' network interfaces. DHCP simplifies the installation of new nodes, because the mac address is the only thing that is different among the nodes and the DHCP server on the master can manage a new node by a new entry into the configuration file. See the linux DHCP howto for help on configuration. Here a short example:

slave(s):/etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=eth0
ONBOOT=yes
BOOTPROTO=dhcp


master:/etc/dhcpd.conf

option subnet-mask 255.255.255.0;
option broadcast-address 192.168.1.255;
option routers 10.5.170.254;
# define individual nodes
subnet 192.168.1.0 netmask 255.255.255.0 {
group {
# define all slaves, the master (head node) is has a static ip address
host node0{
hardware ethernet 00:1E:8C:30:AC:2A;
fixed-address 192.168.1.250;
}
host node1{
hardware ethernet 00:1E:8C:30:B0:A1;
fixed-address 192.168.1.1;
}
#... Here you can put more nodes. Make a list of the mac addresses of all your machines and enter them in the list.
}
}
# ignore petitions from second network interface
subnet 10.5.170.0 netmask 255.255.255.0 { not authoritative; }


The idea is to give to slaves the names nodei corresponding to their ip address 192.168.1.i.

Note that the DHCP server provides IP addresses for the other machines not for itself. The master you give a static ip address (192.168.1.250 here). In red hat based distributions this is configured in /etc/sysconfig/network-scripts/ifcfg-eth0 or /etc/sysconfig/network-scripts/ifcfg-eth1.

I configured eth0 for the organization network and eth1 for the cluster intranet. Example files:
/etc/sysconfig/network-scripts/ifcfg-eth0 corresponds to your organization network settings.

DEVICE=eth0
ONBOOT=yes
...


/etc/sysconfig/network-scripts/ifcfg-eth1

DEVICE=eth1
ONBOOT=yes
BOOTPROTO=static
NETWORK=192.168.1.0
IPADDR=192.168.1.250
TYPE=Ethernet


For the slaves, the interface to the cluster intranet is as follows:
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=dhcp
NETMASK=255.255.255.0
NETWORK=192.168.1.0
BROADCAST=192.168.1.255
DNS1=...
DNS2=...


If you don't use a DNS service on your head you use the DNS service of the network of your organization.

/etc/hosts

127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
192.168.1.250 node0
192.168.1.1 node1
# ... add more names of machines here


Note in /etc/hosts that in the loopback line (first line) the hostname is not given in order to avoid problems with message protocols (PVM, MPI).

You need to activate ip forwarding on the head in order to have internet access on all machines. You enable the firewall and include masquerading on the network interface to you cluster. This you do by changing the /etc/sysconfig/iptables file or using some user interface, e.g. system-config-firewall on red hat based systems. See the linux firewall howto for details. Be careful not to make your firewall too restrictive as this can cause problems.

In the /etc/sysconfig/network you need to have:
NETWORKING=YES
IPFORWARD=YES


You have to reinitiate the network services and startup the dhcp server daemon (dhcpd). To have dhcpd startup at boot, in fedora you the ntsysv program allows you to search a list and mark the corresponding entry. If you haven't enabled nfs at boot, do it now.

Ssh should be made password-less for root (see earlier post). Use option -v with the ssh command to speed up ssh changing parameters in /etc/ssh/ssh_config and /etc/ssh/sshd_config. See this faq on setup of the ssh server.

You might want to specify protocol 2, port 22, PreferredAuthentications PublicKey, IdentityFile ~/.ssh/id_rsa. You might want to turn off checking of .ssh/known_hosts, and on the slaves you might want to ignore the host ip (CheckHostIP no), and turn off any authentication method except for public key and password. According to this speed comparison, one of the fastest cipher methods is RC4 (arcfour in ssh config file).

I wrote two scripts and put them into /etc/bashrc. These two scripts can help in setting up the cluster.


# execute command on all slave nodes
function slave {
for i in $(seq 1 7) ;
do ssh 192.168.1.$i $1 ;
done ;
}

# copy to all slave nodes
function slavecopy {
for i in $(seq 1 7) ;
do scp $1 192.168.1.$i:$2 ;
done ;
}


I used these scripts early on in the installation of software.

However I found that working only on one slave simplified matters significantly. GNU Screen can help you speed up the configuration.

Of course, on a side note, using screen the above two scripts could be already used for parallelization of computing tasks, however there are dedicated protocols for more advanced usage, you might want to have resource balancing, for example. I treated parallelization protocols in an earlier post.


Sharing files over the network with NFS

NFS configuration is remarkably simple (see the linux nfs faq for help). You basically have to install the package, start the nfs services, and change two files. Example files:

on the master:/etc/export

/home/ 192.168.1.0/255.255.255.0(rw,sync,no_root_squash)
/root 192.168.1.0/255.255.255.0(rw,sync,no_root_squash)


on the slave(s):/etc/fstab

192.168.1.250:/home /home nfs rw,hard,intr 0 0
192.168.1.250:/root /root nfs rw,hard,intr 0 0


You can share data sharing over different directories. If you share /home and /root directories password-less ssh login is simplified. Users can just copy their .ssh/id_[rsa|dsa].pub to .ssh/authorized_keys.

You may want to setup your printers on master and slave (you can copy an existing printer configuration recursively from /etc/cups e.g. from your local office desktop computer).


Book recommendation:
High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI (Nutshell Handbooks) by Joseph Sloan

Oct 8, 2008

Running parallel processes in R and Matlab (using MPI and PVM)

I put myself to the task of setting up our beowulf computing cluster for parallel distributed processing. A reminder: a beowulf cluster is a compute cluster dedicated to scientific calculations. In earlier posts I explained the hardware part of how to build a beowulf, then software installation and administration, and cloning of setups, and now it's time to make the cluster do some work.

In order to run programs in parallel on different machines of the cluster, I needed to install protocols for message-passing for distributed-memory applications. There are two common protocols: the Message Parsing Interface (MPI) and Parallel Virtual Machine (PVM). This post is about running parallel processes in GNU R and matlab using these interfaces.

Content:
  • Prerequisites
    • MPI
    • PVM
  • Running Parallel Processes
    • GNU R
    • Matlab
  • Profiling
You need MPI or PVM to distribute processes over different computers. Note that newer versions of matlab automatically distribute processes over different processors on a single computer.

Prerequisites

You need to enable password-less ssh access (see earlier post) from the server to all clients. Note that for the network configuration you have to remove the host names from the loop-back line (where it says 127.0.0.1) of the /etc/hosts file. Just put localhost instead. Then you need a text file with a list of all machines you wish to use for computing and call it pvmhosts and mpihosts.

PVM
I had installed PVM some time ago, which took me some time, because of my complete lack of experience with PVM. Installation was straightforward from packages, for configuration I needed to set some environment variables.

Add to /etc/profile:

export PVM_ROOT=/usr/share/pvm3
export PVM_TMP=/tmp
export PVM_RSH=/usr/bin/ssh
export PVM_ARCH=LINUXX86_64
export PATH=/usr/local/bin/:$PATH


In the pvm environment (which you enter typing 'pvm') after adding your machines you can then try e.g. spawn -> -10 /bin/hostname and see if you can get a list of your machines.

MPI
As for MPI, I compiled mpich2 an implementation of the MPI standard and was also able to spawn processes (/bin/hostname). The install guide was a great help in setting up MPICH. On the head node, from where you spawn the processes, you type:

mpd &
mpdtrace -l

And you should get back the node name (say node0) and port number (say 51227). Then you connect from each node to by typing:
mpd -h node0 -p 51227

After this, you can spawn processes with mpiexec, e.g. mpiexec -n 10 /bin/hostname

I wanted to try some examples from within the R computing platform (statistical computing similar to Matlab) using the snow package.

Parallel Processes

GNU R
I am relatively new to R and first, I had to find out that I needed the development packages of R in order to be able to install new packages from within R. I just installed all R packages that were available in the R repository (http://cran.es.r-project.org/bin/linux/redhat/fedora9/x86_64, needless to say this may differ for your distribution, machine architecture, and country code).

Make sure, you installed R libraries snow and rpvm on all your machines! Because, if you don't, like me, R lets you create the PVM cluster object but then freezes when you try to execute a job (thanks to Luke Tierney for his help to find this out).

In R:

> library('snow')
> library('rpvm')
> cl<-makePVMcluster(count=2,names=c('node0','node1')) > clusterCall(cl, function() Sys.info()[c("nodename","machine")])
[[1]]
nodename machine
"node1" "x86_64"

[[2]]
nodename machine
"node0" "x86_64"

As for MPI, you need the R library Rmpi for interfacing mpi and R. However I got an error installing Rmpi:


/usr/bin/ld: /usr/local//lib/libmpich.a(comm_get_attr.o): relocation R_X86_64_32 against `a local symbol' can not be used when making a shared object; recompile with -fPIC
/usr/local//lib/libmpich.a: could not read symbols: Bad value
collect2: ld devolvió el estado de salida 1
make: *** [Rmpi.so] Error 1

I contacted the author of Rmpi for this issue, however I couldn't resolve it.

Parallel Processes in Matlab
As for matlab, note that in order to run it on a 64bit system you need shared libraries (on Fedora the package is called libXp, on ubuntu ia32) and some java packages.

Matlab comes with the parallel computing toolbox, the distributed computing server, and an implementation of mpich2. You can find manuals in pdf format on the corresponding mathworks site.

You start the mpich2 server as root:
$MATLAB/toolbox/distcomp/bin/mdce start

Note: Here holds the same as for PVM. For mdce to work, you need to remove from /etc/hosts the loopback line with your host name in it, i.e. 127.0.1.1 node0 becomes your_network_ip node0.

Start the job manager:
$MATLAB/toolbox/distcomp/bin/startjobmanager -name MyJobManager

Connect one worker:
/opt/matlab/toolbox/distcomp/bin/startworker -jobmanager MyJobManager -jobmanagerhost node0

where node0 is the machine where your jobmanager is running (obviously).

Run a second worker on a different machine. From the head node node0:

$MATLAB/toolbox/distcomp/bin/startworker -jobmanager MyJobManager -jobmanagerhost node0 -name worker2 -remotehost node1

The option remotehost is to start a worker on a different machine.

Make sure that job manager and workers are running:
$MATLAB/toolbox/distcomp/bin/nodestatus

If you want to be mild on system resources you can launch matlab like this:
> nice /opt/matlab/bin/matlab -nojvm -nosplash -nodesktop -logfile ${HOME}/$(date +matlab_%F_-%H.log)

However, we need to start matlab in desktop mode (we need jvm), then in the menu we go parallel->configure and parallel->administrate, we choose new configuration, jobmanager, put in MyJobManager, the hostname of our head node, minimum and maximum number of workers we want to use, the path where we stored the scripts, then we choose the new configuration and start the "matlabpool":
>> matlabpool open
You should see the confirmation" Connected to a matlabpool session with 2 labs."

A simple test again:
>> parfor i=1:5
unix('hostname');
end


Profiling

As a rule of thumb, for speed, void access to physical devices, i.e. files or screen. In matlab don't use load/save, close, figure, etc.

Here some basic commands and scripts to find out who is running which jobs, how long already, etc.

Simplest, just one command: w. How many users are logged in and what are they doing?
# w

Get all jobs done by a user $i:
# ps -eott,user,fname,tmout,f,wchan | grep $i ; done

See all users that are logged in:
# (for i in $(users) ; do echo $i ; done) | uniq

More details: what are they running?
for j in $((for i in $(users) ; do echo $i ; done) | uniq | grep -v root ) ; do ps -eott,user,fname,tmout,f,wchan | grep $j ; done

How many matlabs are running by user $i?
(slave "top -b -n 1 -u benjamin | grep -i matlab | grep -v helper") | wc -l

System Temperature
Now that you are running all these processes, and you checked that they are running as you want them to run, you want to make sure you are not overheating the system. How to check the temperature of the CPUs?
> cat /proc/acpi/thermal_zone/THRM/temperature

(you might have different THRM directories for your cores)

If this doesn't work or you want more information, install sensors (in fedora: yum install sensors). You install it by sensors-detect and sensors then prints you information.
> sensors
...
Adapter: ISA adapter
Core 0: +67.0°C (high = +82.0°C, crit = +100.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1: +66.0°C (high = +82.0°C, crit = +100.0°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 2: +61.0°C (high = +82.0°C, crit = +100.0°C)

coretemp-isa-0003
Adapter: ISA adapter
Core 3: +61.0°C (high = +82.0°C, crit = +100.0°C)



Book recommendation:
High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI (Nutshell Handbooks) by Joseph Sloan

Oct 5, 2008

Sanity

After three days resting in bed trying to recover my health I thought it was time to get up and show some spirit. I had read in bed on the decline of civilization in "The Clash of Civilization." Decline of civilization is associated with decadence which shows in (among other characteristics) loss of motivation and moral ethics, and refusal of intellectual effort. Still coughing and struggling with respiration I bought an mp3 player and athletic shoes (in that order), with the idea of running outdoors. Mens sane in corpore sano. Of course, I would get bored without listening to something so I needed an mp3 player.

I had a shock first trying out the mp3 player as it wasn't recognized in linux. It is a Sansa Sandisk with 2GB. Here I found how to do it. Actually it turned out quite simple and convenient, however sometimes I have to rely on the lsusb utility as recommended in this very comprehensive review of the player. Push some buttons on the mp3 player to have it recognized sometimes works. Also try this command:

> sudo rmmod ehci-hcd; sleep 1s; sudo modprobe ehci-hcd


For now, I put some audiobooks from librivox and music from two groups, 1 Ohm facile and 0x17, downloaded from jamendo, which I know from the rhythmbox application.

Now I have to find some track where the air is less polluted from cars.


---
P.S.: As of late november, i still haven't found a good running track, but i've been to the gym several times, without audio book, though.