Job submission to the compute server via sun gridengine
Only mpich will use the dedicated GBit fabric. Since LAM has problems with the mangled name resolution it is bound to the switched Gigabit network.
Please note:
Do not write your own hostfiles as you will usually not know which nodes gridengine will choose and how many processors will be available. SGE will do it for you.
Do not use email notification when you have a restricted account. The emails will never arrive (until I figure our a way around it) and it will trigger a security alert in cairngorm linked to your userid.
For LAM/MPI you do not need to boot the lam. SGE will boot the lam automatically with the assigned nodes.
Jobs are submitted via submission scripts using the qsub command:
qsub mpi_job.sge
where mpi_job.sge is the name of the submission script.
Submission script for MPICH2 jobs:
#!/bin/bash
#
# UNIVERSITY OF CAMBRIDGE
# Institute for Manufacturing
# Computational Simulation for Manufacturing (CSfM) group
#
# September 2005
#
# Example startup script for a MPICH2 run
#
# Filename: mpich2_job.sge
# Created : 2005-09-06 19:34 by M Gross
#
# lines starting with a single hash symbol (#) are comment lines
# lines starting with a hash-dollar (#$) are grid-engine parameters
#
# setting the shell used for execution of this job
#$ -S /bin/bash
# switch on email notification
# REMEMBER: only use it if you have a full accont i.e. full
# login access to cairngorm
#
# please fill in your camid below
##$ -M camid@cam.ac.uk
# email notification at (b)egin and (e)nd of the job
##$ -m be
# Work from submission directory
#$ -cwd
# parallel environment (pe) request
#$ -pe mpich 12
# enables $TMPDIR/rsh to catch rsh calls if available
set path=($TMPDIR $path)
. ~/.bashrc
# just copy hosts and machines file to stdout for the record
more $TMPDIR/hosts
more $TMPDIR/machines
# count the hosts ... we could beautyfy this by expanding startmpi a bit ... but it works for now
HOSTS=`cat $TMPDIR/machines | wc -l - | sed 's/-//'`
# copy mpdboot line to stdout for the record
echo mpdboot -n $HOSTS -f $TMPDIR/hosts -v --ifhn=mpi-`hostname`
echo '================================================'
mpdboot -n $HOSTS -f $TMPDIR/hosts -v --ifhn=mpi-`hostname`
echo '================================================'
# since using the local harddisk is faster, copy the necessary files
# hardcoded node list at the moment, would be nicer to parse host/machine file
for host in imrc-csfm-01 imrc-csfm-02 imrc-csfm-03
do
echo scp cairngorm:ftLMPs.par $host:/home_local/markus/
scp cairngorm:ftLMPs.par $host:/home_local/markus/
echo scp cairngorm:ftLMPs.O3.x $host:/home_local/markus/
scp cairngorm:ftLMPs.O3.x $host:/home_local/markus/
done
# change into the directory were we want to run
cd /home_local/markus/
# start the job
mpiexec -genv LD_LIBRARY_PATH `echo $LD_LIBRARY_PATH` -machinefile $TMPDIR/machines \
-n $NSLOTS /home_local/markus/ftLMPs.O3.x
# close mpd
mpdallexit
#copy data back to the home directory
mv ftLMPs.log ~
#done.
#
# EOF mpich2_job.sge
Submission script for MPICH jobs:
#!/bin/bash
#
# UNIVERSITY OF CAMBRIDGE
# Institute for Manufacturing
# Computational Simulation for Manufacturing (CSfM) group
#
# November 2004
#
# Example startup script for a MPI run
#
# Filename: job1.sge
# Created : 2004-12-04 19:34 by M Gross
#
# lines starting with a single hash symbol (#) are comment lines
# lines starting with a hash-dollar (#$) are grid-engine parameters
#
# setting the shell used for execution of this job
#$ -S /bin/bash
# switch on email notification
# REMEMBER: only use it if you have a full accont i.e. full
# login access to cairngorm
#
# please fill in your camid below
##$ -M camid@cam.ac.uk
# email notification at (b)egin and (e)nd of the job
##$ -m be
# Work from submission directory
#$ -cwd
# change into the directory were the work should be done
cd /home/markus/mpi_example
# enables $TMPDIR/rsh to catch rsh calls if available
set path=($TMPDIR $path)
. ~/.bashrc
export P4_GLOBMEMSIZE=41943040
/opt/mpich-1.2.6/bin/mpirun -nolocal -np $NSLOTS \
-machinefile $TMPDIR/machines cpi.x
#
# EOF mpi_job.sge
where cpi.x is the application, cpu count is 4. Note the NOLOCAL flag in the mpirun call. Not using nolocal will almost everytime leasd to a wrong process count!
Submission script for LAM/MPI job:
#!/bin/bash
#
# UNIVERSITY OF CAMBRIDGE
# Institute for Manufacturing
# Computational Simulation for Manufacturing (CSfM) group
#
# November 2004
#
# Example startup script for a LAM/MPI run
#
# Filename: job1.sge
# Created : 2004-12-04 19:34 by M Gross
#
# lines starting with a single hash symbol (#) are comment lines
# lines starting with a hash-dollar (#$) are grid-engine parameters
#
# setting the shell used for execution of this job
#$ -S /bin/sh
# switch on email notification
# please fill in your camid below
##$ -M camid@cam.ac.uk
# email notification at (b)egin and (e)nd of the job
##$ -m be
# Work from submission directory
#$ -cwd
# change into the directory were the work should be done
cd /home/markus/mpi_example
# parallel environment (pe) request
#$ -pe lam 4
# enables $TMPDIR/rsh to catch rsh calls if available
set path=($TMPDIR /opt/lam/bin $path)
/opt/lam/bin/mpirun /home/markus/lam_mandel/myapp
#
# EOF job1.sge
where myapp is application shema which reads:
# We specify the full pathname to the "master" executable so that it
# is sure to be found. It may not be necessary to specify the full
# pathname in all cases; see mpirun(1) for more details.
#
h /home/markus/lam_mandel/master
#
# Run any number of slaves, but one per CPU is the most sensible thing.
# Assuming the multicomputer is homogeneous, ship the executable
# to each node. This is slower but more convenient than placing it
# there yourself or relying upon NFS and your shell path to be right.
#
# We specify the full pathname to slave for the same reason as above.
#
C -s h /home/markus/lam_mandel/slave
Submission script for Fluent jobs:
#!/bin/bash
#
# UNIVERSITY OF CAMBRIDGE
# Institute for Manufacturing
# Computational Simulation for Manufacturing (CSfM) group
#
# November 2004
#
# Example startup script for a Fluent run
#
# Filename: job1.sge
# Created : Fri Nov 26 16:19:54 GMT 2004 by M Gross
#
# lines starting with a single hash symbol (#) are comment lines
# lines starting with a hash-dollar (#$) are grid-engine parameters
#
# setting the shell used for execution of this job
#$ -S /bin/bash
# switch on email notification
# please fill in your camid below
#$ -M camid@cam.ac.uk
# email notification at (b)egin and (e)nd of the job
#$ -m be
# Work from submission directory
#$ -cwd
# specifiy the queue to use. options are: node1,node2 and csfm.all
# where csfm.all is a superset of node1 and node2
#$ -q csfm.all
# change into the directory were the work should be done
cd /home/markus/sge_example
# some variables needed by FLUENT
export FLUENT_ARCH=lnia64
export FLUENT_INC=/opt/Fluent.Inc
# the command to execute
/opt/Fluent.Inc/bin/fluent 2ddp -sge -g -i iterate.jou
#
# EOF fluent_job.sge
where iterate.jou is the journal file to be executed by fluent. Note: When reading in case files make sure that when you safe your case file on your workstation that all plotting (residuals etc) is switched off as this functionality is not available in batch mode.
Submission script for compile jobs (make):
#!/bin/bash
#
# UNIVERSITY OF CAMBRIDGE
# Institute for Manufacturing
# Computational Simulation for Manufacturing (CSfM) group
#
# November 2004
#
# Example startup script for compilation jobs
#
# Filename: make_job.sge
# Created : Fri Nov 26 16:19:54 GMT 2004 by M Gross
#
# lines starting with a single hash symbol (#) are comment lines
# lines starting with a hash-dollar (#$) are grid-engine parameters
#
# setting the shell used for execution of this job
#$ -S /bin/bash
# Work from submission directory
#$ -cwd
# specifiy the queue to use. options are: node1,node2 and csfm.all
# where csfm.all is a superset of node1 and node2
#$ -q csfm.all
# change into the directory were the make should be performed
cd /home/user/my_code/repository/appxyz
# the command to execute
make
#
# EOF make_job.sge