Tuesday, March 11, 2008

Running AMBER-pbsa on BigRed:[4] Parallel pmemd through condorG

There were few obstacles to submit amber jobs through condorG system.

(1) loadleveler specific commands
BigRed uses loadleveler as it's batch system. Compared to PBS or LSF, loadlever has some distinguished keyword, such as class instead of queue, and using machine list file. But we didn't have any problem with those during submitting job through condorG. machine list file is generated and used automatically by the job script file created by job manager.

(2) passing arguments
Amber uses arguments for its input/output/refer... files. You have to include those arguments in the globusrsl string. The condor keyword, "arguments" is related to the arguments for mpirun. However, it didn't really work for mpirun arguments too.

(3) specifying number of process and machine
loadleveler requires commands for setting the number of node and process, such as node, tasks_per_node. Also mpirun provides argument -np to specify the number of total processes being used for this job. To passing right value to loadleveler and mpirun, you have to specify those valued in the globusrsl. "count" in globusrsl string will be mapped to the value of -np in mpirun. And "hostCount" in the globusrsl string will be mapped to the value of "node" in the job script generated by job manager. I could not find how to specify the task_per_node. However, somehow based on the values of node and -np, job manager generated task_per_node.

(4) script
This is the condorG script for this job submission.
========================================================================
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/pmemd.MPI
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_com.top, /home/leesangm/bio/mm_pbsa/ZINC04273785.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd

universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster), ZINC04273785.crd
error = amber.err.$(Cluster)
log = amber.log.$(Cluster)
x509userproxy = /tmp/x509up_u500

globusrsl = (jobtype=mpi)\
(count=16)\
(hostCount=4)\
(maxWallTime=00:15)\
(queue=DEBUG)\
(arguments= -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd )


queue

=========================================================================

Friday, March 7, 2008

Running AMBER-pbsa on SDSC machines:[1] Serial Job interactively

Good News! There were AMBER installations in three of SDSC machines: DataStar, BlueGene and Teragrid. For the serial example, I could not run on BlueGene, because there was no "sander" executable under amber9 installation. However, here are some guidelines about running amber on SDSC machines.
All of the machines keep the amber installation under the directory with same name structure.
/usr/local/apps/amber9
Therefore, just set the environment and run the same command on each of the account.


leesangm/amber> setenv AMBERHOME /usr/local/apps/amber9
leesangm/amber> set path = ( $path $AMBERHOME/exe )
leesangm/amber> $AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

Running AMBER-pbsa on NCSA machines:[1] Serial Job interactively

Running on tungsten and cobalt takes much longer time(tungsten:55 mins, cobalt:32 mins) than I expected. It takes longer time than BigRed.

1. tungsten
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/AMBER/Amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

2. cobalt
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/amber/amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

Running AMBER-pbsa on BigRed:[3] Serial Job submit through CondorG

[Step 1] First check if you have required environment setup in your .soft file. My .soft file looks like,
#
# This is the .soft file.
# It is used to customize your environment by setting up environment
# variables such as PATH and MANPATH.
# To learn what can be in this file, use 'man softenv'.
#
#
@bigred
@amber9
@teragrid-basic
@globus-4.0
@teragrid-dev
+mpich-mx-ibm-64


[Step 2] Create condor script including relevant arguments. I put all the required arguments in the "argument" command line of the script. I could get the result with using both batch system and system fork. Don't forget to transfer back the output file. My test script file is the following:
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/sander
arguments = -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd
-p ZINC04273785_com.top -r ZINC04273785.crd
-ref ZINC04273785_ini.crd
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in,
/home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd,
/home/leesangm/bio/mm_pbsa/ZINC04273785_com.top,
/home/leesangm/bio/mm_pbsa/ZINC04273785.crd,
/home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd

universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster)
error = condorG.err.$(Cluster)
log = condorG.log.$(Cluster)
x509userproxy = /tmp/x509up_u500
queue

Thursday, March 6, 2008

Running AMBER-pbsa on BigRed:[2] Serial-LoadLeveler

IU BigRed loadleveler provides several queues such as DEBUG, LONG, and NORMAL, etc. DEBUG queue(class) has limit of 4hours maximum job cpu time and 15 minutes of maximum processor cpu time. I could run my serial job with DEBUG queue and it took 14 minutes. llclass shows the complete list of queues and description.

Step 1. setup the environment in .soft file
@amber9
+mpich-mx-ibm-64

Step 2. go to the work directory

Step 3. llsubmit serial.job

Running AMBER-pbsa on BigRed:[1] Serial-Interactive

Step 1. Setup the environment in .soft file
@amber9
+mpich-mx-ibm-64

Step 2. Go to the work directory and run the following command

$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

Step 3. Output file is updated every 2 minutes and it took 20 minutes for me to finish the job completely.