Sangmi Lee Pallickara: 2008

Tuesday, December 9, 2008

renewing proxy

more /usr/local/gateway/RenewCred/renewCred.sh
#!/bin/bash
export GLOBUS_LOCATION=$HOME/globus-condor/globus
source $GLOBUS_LOCATION/etc/globus-user-env.sh
myproxy-logon -s myproxy.teragrid.org -l quakesim -t 5000 -S << EOF
PUT_PASSWORD_HERE
EOF

Monday, December 1, 2008

Writing simple Java application with HBase APIs[0]

To write a simple Java application with HBase APIs, you surely need hadoop and hbase installation on your machine. For this example, I used hadoop installation with pseudo distributed setup on the localhost.
The code is mainly downloaded from the hbase site,
http://hadoop.apache.org/hbase/docs/r0.2.1/api/index.html


import java.io.IOException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.io.BatchUpdate;
import org.apache.hadoop.hbase.io.Cell;
import org.apache.hadoop.hbase.io.RowResult;
import org.apache.hadoop.hbase.HBaseConfiguration;

public class MySimpleTest {

 public static void main(String args[]) throws IOException {
   // You need a configuration object to tell the client where to connect.
   // But don't worry, the defaults are pulled from the local config file.
   HBaseConfiguration config = new HBaseConfiguration();

   // This instantiates an HTable object that connects you to the "myTable"
   // table.
   HTable table = new HTable(config, "myTable");

   // To do any sort of update on a row, you use an instance of the BatchUpdate
   // class. A BatchUpdate takes a row and optionally a timestamp which your
   // updates will affect.
   BatchUpdate batchUpdate = new BatchUpdate("myRow");

   // The BatchUpdate#put method takes a Text that describes what cell you want
   // to put a value into, and a byte array that is the value you want to
   // store. Note that if you want to store strings, you have to getBytes()
   // from the string for HBase to understand how to store it. (The same goes
   // for primitives like ints and longs and user-defined classes - you must
   // find a way to reduce it to bytes.)
   batchUpdate.put("myColumnFamily:columnQualifier1",
     "columnQualifier1 value!".getBytes());

   // Deletes are batch operations in HBase as well.
   batchUpdate.delete("myColumnFamily:cellIWantDeleted");

   // Once you've done all the puts you want, you need to commit the results.
   // The HTable#commit method takes the BatchUpdate instance you've been
   // building and pushes the batch of changes you made into HBase.
   table.commit(batchUpdate);

   // Now, to retrieve the data we just wrote. The values that come back are
   // Cell instances. A Cell is a combination of the value as a byte array and
   // the timestamp the value was stored with. If you happen to know that the
   // value contained is a string and want an actual string, then you must
   // convert it yourself.
   Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
   String valueStr = new String(cell.getValue());
 
   // Sometimes, you won't know the row you're looking for. In this case, you
   // use a Scanner. This will give you cursor-like interface to the contents
   // of the table.
   Scanner scanner =
     // we want to get back only "myColumnFamily:columnQualifier1" when we iterate
     table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});
 
 
   // Scanners in HBase 0.2 return RowResult instances. A RowResult is like the
   // row key and the columns all wrapped up in a single interface.
   // RowResult#getRow gives you the row key. RowResult also implements
   // Map, so you can get to your column results easily.
 
   // Now, for the actual iteration. One way is to use a while loop like so:
   RowResult rowResult = scanner.next();
 
   while(rowResult != null) {
     // print out the row we found and the columns we were looking for
     System.out.println("Found row: " + new String(rowResult.getRow()) + " with value: " +
      rowResult.get("myColumnFamily:columnQualifier1".getBytes()));
   
     rowResult = scanner.next();
   }
 
   // The other approach is to use a foreach loop. Scanners are iterable!
   for (RowResult result : scanner) {
     // print out the row we found and the columns we were looking for
     System.out.println("Found row: " + new String(result.getRow()) + " with value: " +
      result.get("myColumnFamily:columnQualifier1".getBytes()));
   }
 
   // Make sure you close your scanners when you are done!
   scanner.close();
 }
}

Hadoop: java.io.IOException: Incompatible namespaceIDs

The error java.io.IOException: Incompatible namespaceIDs in the logs of a datanode (/logs/hadoop-hadoop-datanode-.log) might be caused by bug HADOOP-1212. Here is a site which provides how to get around this bug,
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

The complete error message was,

 ... ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible namespaceIDs in /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data: namenode namespaceID = 308967713; datanode namespaceID = 113030094
      at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:281)
      at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:121)
      at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:230)
      at org.apache.hadoop.dfs.DataNode.(DataNode.java:199)
      at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1202)
      at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1146)
      at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1167)
      at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1326)

Friday, November 21, 2008

Test #4(2mil): Nov.21.2008

Full sequence test for 2million sequences.

(1) Resource setup: BigRed(150), Ornl(80), Cobalt(80)
(2) Service setup: Status check interval: 60 secs
Job queue scan interval: 60 secs
Job queue size: 100
(3) Input files : 2mil.tar copied to each of cluster in the $TG_CLUSTER_SCRATCH directory.
Referred by arguments with full path to the input files
(4) Output files: staged out to swarm host

(5) Client side setup: input files are located in my desktop(same machine with swarm host).
Scan the directory and find files which contain more than 1 sequence(with grep unix command through Java Runtime). Send the request to swarm with 10 batch per rpc call.

Total duration of the Submission: 170364307 milliseconds(around 47.3 hours).
Total number of jobs submitted: 75533
Total number of files scanned: 536825

(6) Completed Jobs : To be added
(7) Held Jobs: To be added
(8) Open Issues:
Submission time requires to be improved.
Reason:

Loading 536825 objects which represent the filename takes too much of memory.[Approach]: Use filefilter and load partial list at a time
Using Java Runtime: Java Runtime requires extra memory to execute system fork. [Approach]: Try checking the number of sequences by means of Java FileInputStream
Running client and host in the same machine.[Approach]: Try the client in different machine.

Friday, November 14, 2008

Java Runtime Class with VM memory

For swarm service, I worte a client kit to crawl a directory and find cluster files which need to be assembled. To access the files and count the number of gene sequences, I used Java Runtime Process class. With the 2 million sequences, I could get around 600000 clustered files. Those clustered files were visited by the crawler program. Especially when I tried to create DOM object to interact with Web service, this crawler started to throw IO exception-memory allocation error. My Java option was -Xmn 512M -Xmx 1024M.

On linux, Runtime.exec does a fork and execute. That means that you will need double what your current java process is using in virtual memory(real + swap). Therefore, if I specify initial heap as 512M then my total heap must be bigger than 1024M.

I tried to set my java option to -Xmn 256M -Xmx 1024M. Crawler was slowed down quite a bit, but it did not throw IO exception anymore.

Friday, November 7, 2008

Test #3(2mil): Nov.07.2008

(1) Starting time: 04:00 pm

(2) Server Setup:

-BigRed : max 5

-Cobalt : max5

(3) Client Setup:

-Total job max: 20

-Input source: EST Human 2mil

* Note

New Setup:(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)(2) Decreased resource pool size(3) Decreased client jobs

Test #2(2mil) : Nov.07.2008

(1) Starting time: 02:05 pm

(2) Server Setup:
-BigRed : max 20
-Cobalt : max20

(3) Client Setup:
-Total job max: 1000
-Input source: EST Human 2mil

* Note
New Setup:
(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)
(2) Decreased resource pool size
(3) Decreased client jobs

Result:
(1) Job submission successfully done. (total 1000 jobs)

References:
First condorjob clusterID: 50537

Test #1(2mil): Nov.07.2008

(1) Starting time: 12:54 pm

(2) Server Setup:
-BigRed max 400
-Cobalt max 200

(3) Client Setup:
-Max job submission: 2000000
-Input files source: EST Human 2mil

Note:
(1) Reading directory with 2mil sequence takes less than 20 secs.
(2) Client hung with http timeout
(3)Cobalt started to hold jobs with globus err code 17.

Wednesday, October 15, 2008

[ABSTRACT] Scheduling Large-scale Jobs over the Loosely-Coupled HPC Clusters

Compute-intensive scientific applications are heavily reliant on the available quantity of computing resources. The Grid paradigm provides a large scale computing environment for scientific users. However, conventional Grid job submission tools do not provide a high-level job scheduling environment for these users across multiple institutions. For extremely large number of jobs, a more scalable job scheduling framework that can leverage highly distributed clusters and supercomputers is required. In this presentation, we propose a high-level job scheduling Web service framework, Swarm. Swarm is developed for scientific applications that must submit massive number of high-throughput jobs or workflows to highly distributed computing clusters. The Swarm service itself is designed to be extensible, lightweight, and easily installable on a desktop or small server. As a Web service, derivative services based on Swarm can be straightforwardly integrated with Web portals and science gateways. In this talk, we present the motivation for this research, the architecture of the Swarm framework, and a performance evaluation of the system prototype.

Tuesday, September 9, 2008

Installing Hadoop

This is my note written while I was following the installation documentation from hadoop's webpage.
http://hadoop.apache.org/core/docs/current/quickstart.html
I installed as a root, but I'm not sure if it is necessary.

Step 0. You have to have ssh, rsync, and java VM on your machine. I used,
1)ssh OpenSSH_4.3p2, OpenSSL 0.9.8b 04
2)rsync version 2.6.8
3)java 1.5.0_12

Step 1. Download software from a Hadoop distribution site.
http://hadoop.apache.org/core/releases.html

Step 2. Untar file

Step 3. reset the JAVA_HOME under your_hadoop_dir/conf/hadoop-env.sh
*note: I had JAVA_HOME defined in my .bashrc file. But I had to specify it again in the hadoop-env.sh.

Step 4. now you can just run your standalone operation as it is.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Step 5. For the Pseudo-Distributed Operation, which runs multiple virtual machines in single node so that it imitates real distributed file systems you have to set up the configuration in conf/hadoop-site.xml
The 'name' element is defined by hadoop system. Therefore, you can just use the names in the example from the hadoop page. I change the value of fs.default.name to hdfs://localhost:54310, and the one of mapred.job.tracker to localhost:54311.

Step 6. Check ssh localhost
In my case, I could not connect to localhost, but I could access to my numerical ip address. I changed my /etc/hosts.allow to have ALL:127.0.0.1 and it started to recognize localhost.
If it requires your passphrase:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

If you still prefer use the passphrase, it will cause problem during the starting the daemons.

Step 7. Formatting namenode and starting the daemon

$ bin/hadoop namenode -format

$ bin/start-all.sh

Now you can check your namenode at http://localhost:50070/

Also your job tracker is available at http://localhost:50030/

Step 8. Test functions

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

Step 9. Stop the daemon

$ bin/stop-all.sh

Monday, September 8, 2008

Update OGCE:file-manager portlet with new tacc classes

This update adds sorted view to the directory. JSP files and two java classes are modified: FileManagerConstants.java and FileManagerPortlet.java
The list of file updated is following.
portlets/comp-file-management/src/main/webapp/jsp/fileBrowser.jsp
portlets/comp-file-management/src/main/webapp/jsp/view.jsp
portlets/comp-file-management/src/main/webapp/css/fileManagement.css
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementConstants.java
portlets/comp-file-management/src/main/java/edu/tacc/gridport/portlets/interactive/FileManagementPortlet.java
portlets/gp-common/src/main/webapp/jsp/fileBrowser.jsp - identical tofileBrowser.jsp above
portlets/gp-common/src/main/webapp/javascript/fileBrowser.js

* editted part is noted as "lukas edit"

Thursday, August 21, 2008

Teragrid access end-to-end test

Teragrid sistes end-to-end test
- condorG (grid universe)
- cap3 apps
- stage output file
(08/22/2008 current)
==============================================================
[bigred] gatekeeper.iu.teragrid.org:2119/jobmanager-loadleveler yes
[steele] tg-steele.purdue.teragrid.org:2119/jobmanager-pbs yes
[sdsc(ds)] dslogin.sdsc.teragrid.org:2119/jobmanager-loadleveler job state write error
[mercury] https://grid-hg.ncsa.teragrid.org:2119/jobmanager-pbs yes
[ornl] tg-login.ornl.teragrid.org:2119/jobmanager-pbs yes
[lonestar] gatekeeper.lonestar.tacc.teragrid.org:2119/jobmanager-lsf job state read error
[cobalt] grid-co.ncsa.teragrid.org:2119/jobmanager-pbs yes
[pople] gram.pople.psc.teragrid.org:2119/jobmanager-pbs cannot login
[sdsc(dtf)]tg-login1.sdsc.teragrid.org:2119/jobmanager-pbs disk quota error

Friday, August 15, 2008

Limit of job submissiont?

I'm gathering the information about the limit of job submissions.
http://kb.iu.edu/data/awyt.html
http://kb.iu.edu/data/axal.html
NCSA(cobalt)
http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/Doc/Jobs.html
qstat -Q
BigRed
llclass

Wednesday, August 13, 2008

Java memory setup

To run Job submitssion service or Swarm service, please note that your environment variable should be set up as,
export JAVA_OPTS="-server -Xms512m -Xmx1024m -XX:MaxPermSize=256m"

Otherwise, your java virtual machine will provide 8M of memory which might be too small for running the server.

Wednesday, July 2, 2008

Access to the Job Submission Service with PHP using NuSOAP

If you are new to the PHP and NuSOAP, my previous posting would be helpful too.
Here is the simple example for the getStatus operation,
// Pull in the NuSOAP code
require_once('../lib/nusoap.php');
$client = new soapclient("http://your.service.location:8080/axis2/services/JobSubmissionService?wsdl",true);
$TaskId = array('clusterID' => "403", 'jobID' =>"0");
$taskId = array('TaskId' => $TaskId);
$getStatus = array('taskId'=>$TaskId);
$result = $client->call('getStatus',array('parameters'=> $getStatus),
'http://jobsubmissionservice.ogce.org/xsd','',false,null,'rpc','encoded');
if ($client->fault){
echo 'Fault';
}
print_r($result);

Creating a PHP client of the Web Service(Axis2) using NuSOAP

You can download NuSOAP library from the NuSOAP project is hosted by SourceForge. It is a package of php files you can keep in your php document directory or in a separate directory. In my case, I copied /lib directory from the NuSOAP project under /usr/local/apache2/htdocs/ which is default location of documents for the apache2 server. Then, copied /sample directory of the NuSOAP project under the same directory, /usr/local/apache2/htdocs/. You can download document from the same site. It contains nusoap method summary. Now, it's time to build a WS client..

Step1. pull in the NuSOAP code into your php source code
require_once('../lib/nusoap.php');

Step2. Create now client and set the WSDL as "true".
soapclient("http://service.location.url:8080/axis2/services/YourTargetService?wsdl",true);

Step3. Send the request using call() method
mixed call (string $operation, [mixed $params = array()], [string $namespace = 'http://tempuri.org'], [string $soapAction = ''], [mixed $headers = false], [boolean $rpcParams = null], [string $style = 'rpc'], [string $use = 'encoded'])
Please note that if you don't specify the namespace, your outgoing SOAP message will include SOAP Body with namespace of 'http://tempuri.org'. This will cause fault from the server.
Here my test request,
$result = $client->call('yourMethod',array('parameters'=> $yourType),
'http://yourservice.namespace.org/xsd','',false,null,'rpc','encoded');
Also, the $params which is arguments passed to the service is the element defined in the wsdl:message . If your service has hierarchical data type structure, you need to track down your WSDL carefully.

Step4. see the result..
I just used print_r($result)...

Wednesday, April 30, 2008

Running PaCE via Teragrid Job submission service

To submit PaCE job through the Teragrid Job submission service, you need,

-- input files (fasta format file(s) and .cfs file) with valid URL(s)
-- number of clusters

Here is the sample java code to access TJSS.
====================================================================
import javax.xml.namespace.QName;

import org.apache.axis2.AxisFault;
import org.apache.axis2.addressing.EndpointReference;
import org.apache.axis2.client.Options;
import org.apache.axis2.rpc.client.RPCServiceClient;
import org.ogce.jobsubmissionservice.databean.*;

public class SubmitPaCEJob{
public static void main(String[] args) throws AxisFault {
int option = 0;
String serviceLoc = "http://localhost:8080/axis2/services/JobSubmissionService";
String serviceMethod = "submitJob";
String myproxy_username = null;
String myproxy_passwd = null;
for (int j = 0; j < argval =" args[j];" option ="1;" option =" 2;" option =" 3;"> 0){
if (option ==1)
serviceLoc = argVal;
else if (option ==2)
myproxy_username = argVal;

else if (option ==3)
myproxy_passwd = argVal;
}
}

String [] inputFileString ={
"http://validURLS:8080/tmp/PaCEexample/Brassica_rapa.mRNA.EST.fasta.PaCE",
"http://validURLS:8080/tmp/PaCEexample/Phase.cfg"
};
String rslString =
"(jobtype=mpi)"+
"(count=4)"+
"(hostCount=2)"+
"(maxWallTime=00:15)"+
"(queue=DEBUG)"+
"(arguments= Brassica_rapa.mRNA.EST.fasta.PaCE 33316 Phase.cfg)";

String [] outputFileString = {
"estClust.33316.3.PaCE",
"ContainedESTs.33316.PaCE",
"estClustSize.33316.3.PaCE",
"large_merges.33316.9.PaCE"};

try{
CondorJob cj = new CondorJob();
cj.setExecutable("/N/u/leesangm/BigRed/bin/PaCE_v9");
cj.setTransfer_input_files(inputFileString);
cj.setGrid_resource("gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler");
cj.setTransfer_output_files(outputFileString);
cj.setGlobusrsl(rslString);
cj.setMyProxyHost("myproxy.teragrid.org:7512");
cj.setMyProxyNewProxyLifetime("7200");
cj.setMyProxyCredentialName(myproxy_username);
cj.setMyProxyPassword(myproxy_passwd);
cj.setMyProxyRefreshThreshold("3600");

System.out.println(cj.toString());

RPCServiceClient serviceClient = new RPCServiceClient();

Options options = serviceClient.getOptions();

EndpointReference targetEPR = new EndpointReference(serviceLoc);

options.setTo(targetEPR);

QName query = new QName("http://jobsubmissionservice.ogce.org/xsd", serviceMethod);
Class [] returnTypes = new Class []{JobMessage[].class};
Object[] queryArgs = new Object[] {cj};
Object [] response = serviceClient.invokeBlocking(query,queryArgs,returnTypes);
JobMessage[] result = (JobMessage[])response[0];

System.out.println(result[0].toString());
}catch (Exception e){
e.printStackTrace();
}
}

private static void usage(){
System.out.println("Usage: submit_job -s \n"+
"-l \n"+
"-p \n"+
"==========================================================="+
"\n[Example]:\n"+
"submit_job "+
"-s http://localhost:8080/axis2/services/JobSubmissionService "+
"-l yourusername "+
"-p yourpassword ");
return;
}
}

Tuesday, April 29, 2008

Running PaCE on BigRed

PaCE is software to cluster large collections of Expressed Sequence Tags(EST). BigRed provides PaCE package only for the internal use. Here are examples of job submission for the condorG to the BigRed for the PaCE package. Note: I set the "OutputFolder" parameter in the Phase.cfg as ".". This will let the gram job manager put all of the output files into the globus scratch directory.
============================================================

executable = /N/u/leesangm/BigRed/bin/PaCE_v9
transfer_executable = false
should_transfer_files = true
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/EST/data/Brassica_rapa.mRNA.EST.fasta.PaCE, /home/leesangm/EST/data/Phase.cfg
universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = estClust.33316.3.PaCE
error = PaCE.err.$(Cluster)
err = PaCE.standardErr.$(Cluster)
log = PaCE.log.$(Cluster)
x509userproxy = /tmp/x509up_u500

globusrsl = (jobtype=mpi)(queue=DEBUG)(maxWallTime=00:15)\
(count = 4)\
(hostCount = 2)\
(maxWallTime=00:15)\
(arguments= 'Brassica_rapa.mRNA.EST.fasta.PaCE' '33316' 'Phase.cfg')

queue

Wednesday, April 23, 2008

Sample code: Access Job Submission Service from PHP page

If you installed WSF/PHP with your php server, you can try simple test php pages to access job submission service. Please make sure the SOAP body part should follow the style defined in the WSDL file. In my case, I could get the XML string from my SOAP monitor easily. However, you can simply download my example and use it as your template.

(1)Example of the SubmitJob operation
source code(mpi job)
source code(perl job)
Please note that you have to replace the username and password with your valid teragrid account.

(2)Example of the job management operations
source code(GetStatus)
source code(GetError)
source code(GetLog)

(3)Example of the retrieving output operation
source code(GetOutput)

Access to a Web Service from your PHP page

If you want to access Axis2 based Web Service such as OGCE local services(Job Submission service, and File Agent service) from your PHP page, there are several ways to do it. NuSOAP and Pear SOAP provide APIs into WebService. Also IMHO PHP has PHP's built-in SOAP libraries.

Since I'm a complete beginner of PHP, I'm not that familliar with the advanced design issues regarding the PHP based application. However, WSO2's WSF/PHP was a good starting point to me. Especially, it could provide us a proof of concept of our service which interacts with PHP pages through the standard WSDL interface.

You can use WSF/PHP both to build a Web Service and to build a client of a Web Service. In my case, I wanted to create a PHP client accessing a Web Service running in the Axis2 container. First, WSF/PHP provides quite simple interface to the PHP clients. For the examples, please refer to the next blog message. Basically, I needed to provide EPR of the service and XML string which are supposed to be in the SOAP body. Besides the ease-to-use feature, I was happy with that WSF/PHP allows the users to control over the SOAP message when it is needed(such as version of SOAP).

I've tried WSF/PHP with PHP5.1.1 on Linux box. Here is the step-by-step guide to the installation.

Step 1.Apache HTTP server install (if you don't have one already)
download PHP5.1.1 and install
download WSO2 WSF/PHP source from the project web site

Step 2. go to the directory of WSF/PHP source code
./configure
make
make install

Step 3. in php.ini (it will be in /usr/local/lib/php.ini if you didn't change the location)
add following lines:

extension=wsf.so
extension=xsl.so
extension_dir="/usr/local/lib/php/extensions/no-debug-non-zts-***".
include_path = "/home/username/php/wso2-wsf-php-src-1.2.1/script"

Step 4. copy sample code included in the code distribution to the Web server's document root. Test http://localhost/samples/

Tuesday, March 11, 2008

Running AMBER-pbsa on BigRed:[4] Parallel pmemd through condorG

There were few obstacles to submit amber jobs through condorG system.

(1) loadleveler specific commands
BigRed uses loadleveler as it's batch system. Compared to PBS or LSF, loadlever has some distinguished keyword, such as class instead of queue, and using machine list file. But we didn't have any problem with those during submitting job through condorG. machine list file is generated and used automatically by the job script file created by job manager.

(2) passing arguments
Amber uses arguments for its input/output/refer... files. You have to include those arguments in the globusrsl string. The condor keyword, "arguments" is related to the arguments for mpirun. However, it didn't really work for mpirun arguments too.

(3) specifying number of process and machine
loadleveler requires commands for setting the number of node and process, such as node, tasks_per_node. Also mpirun provides argument -np to specify the number of total processes being used for this job. To passing right value to loadleveler and mpirun, you have to specify those valued in the globusrsl. "count" in globusrsl string will be mapped to the value of -np in mpirun. And "hostCount" in the globusrsl string will be mapped to the value of "node" in the job script generated by job manager. I could not find how to specify the task_per_node. However, somehow based on the values of node and -np, job manager generated task_per_node.

(4) script
This is the condorG script for this job submission.
========================================================================
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/pmemd.MPI
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_com.top, /home/leesangm/bio/mm_pbsa/ZINC04273785.crd, /home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd

universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster), ZINC04273785.crd
error = amber.err.$(Cluster)
log = amber.log.$(Cluster)
x509userproxy = /tmp/x509up_u500

globusrsl = (jobtype=mpi)\
(count=16)\
(hostCount=4)\
(maxWallTime=00:15)\
(queue=DEBUG)\
(arguments= -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd )

queue
=========================================================================

Friday, March 7, 2008

Running AMBER-pbsa on SDSC machines:[1] Serial Job interactively

Good News! There were AMBER installations in three of SDSC machines: DataStar, BlueGene and Teragrid. For the serial example, I could not run on BlueGene, because there was no "sander" executable under amber9 installation. However, here are some guidelines about running amber on SDSC machines.
All of the machines keep the amber installation under the directory with same name structure.
/usr/local/apps/amber9
Therefore, just set the environment and run the same command on each of the account.

leesangm/amber> setenv AMBERHOME /usr/local/apps/amber9
leesangm/amber> set path = ( $path $AMBERHOME/exe )
leesangm/amber> $AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

Running AMBER-pbsa on NCSA machines:[1] Serial Job interactively

Running on tungsten and cobalt takes much longer time(tungsten:55 mins, cobalt:32 mins) than I expected. It takes longer time than BigRed.

1. tungsten
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/AMBER/Amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

2. cobalt
Set environment variables,
[leesangm@tund mm_pbsa]$ setenv AMBERHOME /usr/apps/chemistry/amber/amber9/amber9
[leesangm@tund mm_pbsa]$ set path = ( $path $AMBERHOME/exe )
And get the same test package of amber test and run the command,
$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

Running AMBER-pbsa on BigRed:[3] Serial Job submit through CondorG

[Step 1] First check if you have required environment setup in your .soft file. My .soft file looks like,
#
# This is the .soft file.
# It is used to customize your environment by setting up environment
# variables such as PATH and MANPATH.
# To learn what can be in this file, use 'man softenv'.
#
#
@bigred
@amber9
@teragrid-basic
@globus-4.0
@teragrid-dev
+mpich-mx-ibm-64

[Step 2] Create condor script including relevant arguments. I put all the required arguments in the "argument" command line of the script. I could get the result with using both batch system and system fork. Don't forget to transfer back the output file. My test script file is the following:
executable = /N/soft/linux-sles9-ppc64/amber9-ibm-64/exe/sander
arguments = -O -i amber_min.in -o min.out.$(Cluster) -c ZINC04273785_ini.crd
-p ZINC04273785_com.top -r ZINC04273785.crd
-ref ZINC04273785_ini.crd
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/bio/mm_pbsa/amber_min.in,
/home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd,
/home/leesangm/bio/mm_pbsa/ZINC04273785_com.top,
/home/leesangm/bio/mm_pbsa/ZINC04273785.crd,
/home/leesangm/bio/mm_pbsa/ZINC04273785_ini.crd

universe = grid
grid_resource = gt2 gatekeeper.bigred.iu.teragrid.org/jobmanager-loadleveler
transfer_output_files = min.out.$(Cluster)
error = condorG.err.$(Cluster)
log = condorG.log.$(Cluster)
x509userproxy = /tmp/x509up_u500
queue

Thursday, March 6, 2008

Running AMBER-pbsa on BigRed:[2] Serial-LoadLeveler

IU BigRed loadleveler provides several queues such as DEBUG, LONG, and NORMAL, etc. DEBUG queue(class) has limit of 4hours maximum job cpu time and 15 minutes of maximum processor cpu time. I could run my serial job with DEBUG queue and it took 14 minutes. llclass shows the complete list of queues and description.

Step 1. setup the environment in .soft file
@amber9
+mpich-mx-ibm-64

Step 2. go to the work directory

Step 3. llsubmit serial.job

Running AMBER-pbsa on BigRed:[1] Serial-Interactive

Step 1. Setup the environment in .soft file
@amber9
+mpich-mx-ibm-64

Step 2. Go to the work directory and run the following command

$AMBERHOME/exe/sander -O -i amber_min.in -o min.out -c ZINC04273785_ini.crd -p ZINC04273785_com.top -r ZINC04273785.crd -ref ZINC04273785_ini.crd

Step 3. Output file is updated every 2 minutes and it took 20 minutes for me to finish the job completely.

Monday, February 18, 2008

Draft of PolarGrid database table design

CREATE DATABASE PolarGrid;
use PolarGrid

# possible entry unit of dataset
# CREATE TABLE Expedition{
# ExpeditionID bigint,
# }
# possible entry unit of dataset
# CREATE TABLE Radar{
# RadarID bigint,
# }

#
# DataChunk
#
# DataChunk is a unit of dataset which is identified by
# (1) spatial information
# (2) temporal information
# (3) triplet of radar information (waveform, transmit antenna, receive antenna)
#
CREATE TABLE DataChunk(
DataChunkID BIGINT NOT NULL AUTO_INCREMENT,
UUID VARCHAR(255),
Desctiption VARCHAR(255),
SamplingFrequency int,
SampleAverage int,
NumberOfWaveform int,
DSPMode VARCHAR(255),
SystemDelay int,
StartPoint point,
StopPoint point,
StartUTC double,
StopUTC double,
Microformat MEDIUMBLOB,
CreationTimestamp timestamp,
RevisionTimestamp timestamp,
PGContactID bigint,
PRIMARY KEY(DataChunkID),
INDEX(StartPoint),
INDEX(StopPoint),
INDEX(StartUTC),
INDEX(StopUTC)

);

#
# FileObject:
#
# FileObject represents minimum unit of dataset. In general
# we assume that this object can be instrumental data or
# output visualization file, or revised data file.
# Please note that WaveformName,TXAntennaName, and RXAntennaName
# are from the file name. There is no validation about this name
# based on the antenna/waveform tables.
#
CREATE TABLE FileObject(
FileObjectID bigint NOT NULL AUTO_INCREMENT,
DataChunkID bigint,
UUID VARCHAR(255),
FileName VARCHAR(255),
RecordTimestamp timestamp,
RadarType VARCHAR(255),
DistributionFormat VARCHAR(255),
WaveformName VARCHAR(255),
TXAntennaName VARCHAR(255),
RXAntennaName VARCHAR(255),
OnlineResource VARCHAR(255),
CreationTimestamp timestamp,
RevisionTimestamp timestamp,
PRIMARY KEY (FileObjectID),
INDEX(DataChunkID),
INDEX(WaveformName),
INDEX(TXAntennaName),
INDEX(RXAntennaName),
INDEX(RecordTimestamp),
);

#
# Waveform
#
# This table defines waveform that transmited between antennas. Each
# radar system can have several different waveforms that it
# can transmit. And that transmitted waveform on that transmit antenna
# can be received on any combinations of antenna. Individual waveform
# describes single waveform that is used by datachunk.
#

CREATE TABLE Waveform(
WaveformID bigint NOT NULL AUTO_INCREMENT,
DataChunkID bigint,
WaveformName VARCHAR(255),
StartFrequency int,
StopFrequence int,
PulseWidth double,
ZeroPiMode int,
PRIMARY KEY (WaveformID),
INDEX(DataChunkID),
);

#
# DataAquisition
#
# This table defines how we describe the setup of antenna.
# This information is included for the waveform and data chunk
# AssociationType field specifies either this setup information is
# used for waveform or data chunk. Similarly, AssociationId field
# specified ID which is exact identity of the item.
#

CREATE TABLE DataAcquisition(
DataAcquisitionID bigint NOT NULL AUTO_INCREMENT,
NumberOfSamples int,
SampleDelay int,
BlankingTime int,
AssociationType VARCHAR(255),
AssociationID bigint,
PRIMARY KEY (DataAcquisitionID),
INDEX(AssociationType),
INDEX(AssociationID)
);

#
# Antenna
#
# This table specifies how we describe the antenna.
#

CREATE TABLE Antenna(
AntennaID bigint NOT NULL AUTO_INCREMENT,
AntennaName VARCHAR(255),
AntennaType VARCHAR(255),
Antennuation int,
AssociationType VARCHAR(255),
AssociationID bigint,
PRIMARY KEY (AntennaID),
INDEX(AssociationType),
INDEX(AssociationID)
);

#
# PGContact
#
# This table specifies contact information.
#
#

CREATE TABLE PGContact(
PGContactID bigint NOT NULL AUTO_INCREMENT,
IndividualName VARCHAR(255),
UNIXLoginName VARCHAR(255),
Email VARCHAR(255),
OrganizationName VARCHAR(255),
PositionName VARCHAR(255),
Voice VARCHAR(255),
Facsimile VARCHAR(255),
Address VARCHAR(255),
OnlineResource VARCHAR(255),
HoursOfService VARCHAR(255),
ContactInstruction VARCHAR(255),
PRIMARY KEY (PGContactID),
INDEX(UNIXLoginName),
INDEX(Email)
);

Friday, February 15, 2008

PolarGrid database table (initial draft)

This is the basic database table that will directly related to the RSS feeding microformat. Marie and I went through the workflow for generating jpg image and finally agreed on this table design. Thank you so much, Marie!! Still there are some parts which are not that clear to us. As things are clerified, I'll incorperate them to this table.

#CREATE TABLE Expedition{
# ExpeditionID bigint,
#}

CREATE TABLE DataChunk{
DataChunkID bigint NOT NULL,
UUID VARCHAR(255),
Description VARCHAR(255),
SamplingFrequency int,
SampleAverage int,
NumberOfWaveform int,
DSPMode VARCHAR(255),
StartPoint point,
StopPoint point,
StartUTC double,
StopUTC double,
PRIMARY KEY ('DataChunkID')
}

CREATE TABLE FileObject{
FileObjectID bigint NOT NULL,
DataChunkID bigint,
UUID VARCHAR(255),
FileName VARCHAR(255),
RadarType VARCHAR(255),
Timestamp timestamp,
FileType VARCHAR(255),
WaveformName VARCHAR(255),
TXAntennaName VARCHAR(255),
RXAntennaName VARCHAR255).
OnLink VARCHAR(255)
PRIMARY KEY ('FileObjectID')
}

CREATE TABLE Waveform{
WaveformID bigint NOT NULL,
DataChunkID bigint,
WaveformName VARCHAR(255),
StartFrequency int,
StopFrequence int,
PulseWidth double,
ZeroPiMode int,
PRIMARY KEY ('WaveformID')
}

CREATE TABLE DataAcquisition{
DataAcquisitionID bingint NOT NULL,
NumberOfSamples int,
SampleDelay int,
BlankingTime int,
AssociationType VARCHAR(255),
AssociationID bigint
PRIMARY KEY ('DataAcquisitionID')
}

CREATE TABLE Antenna{
AntennaID bigint NOT NULL,
AntennaName VARCHAR(255),
AntennaType VARCHAR(255),
Antennuation int,
AssociationType VARCHAR(255),
AssociationID bigint
PRIMARY KEY ('AntennaID')
}

Wednesday, February 13, 2008

[PG] 80 TB of mobile data

40 x 2TB drives... These little guys don't have any idea about being in the -40F for months.. Good luck, boxes!
For more pictures,
http://www.ussg.indiana.edu/~mrlink/gallery/PolarGrid

Wednesday, February 6, 2008

[PG]mysql GIS [1] Creating Spatial data and using functions

With MySQL version 5 or higher, you can store GIS data and issue query over it . MySQL provides data types supporting openGIS requirements. Although MySQL does not support full spatial analysis, MBR-based support is very useful, if you want simple bounding box style query. To use this, you don't need any of additional package.

I found useful manual which covers almost everything I was looking for.
http://www.browardphp.com/mysql_manual_en/manual_Spatial_extensions_in_MySQL.html

Thursday, January 17, 2008

Running parallel pw.x on the LoneStar of TACC: on site/condorG/condor-birdbath APIs

(1) submit to the LSF queue on Lonestar
bsub -I -n 4 -W 0:05 -q development -o pwscf.out ibrun /home/teragrid/tg459247/vlab/espresso/bin/pw.x < /home/teragrid/tg459247/vlab/__CC5f_7/Pwscf_Input (2) submit through condorG script file Globus RSL parameter is available at http://www.globus.org/toolkit/docs/2.4/gram/gram_rsl_parameters.html
Actual script file is following,
=============================================
executable = /home/teragrid/tg459247/vlab/bin/pw_mpi.x
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/008-O-ca--bm3.vdb,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/__cc5_7,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/Mg.vbc3
universe = grid
grid_resource = gt2 tg-login.tacc.teragrid.org/jobmanager-lsf
output = tmpfile.out.$(Cluster)
error = condorG.err.$(Cluster)
log = condorG.log.$(Cluster)
input = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/Pwscf_Input
x509userproxy = /tmp/x509up_u500
globusrsl = (environment=(PATH /usr/bin))\
(jobtype=mpi)\
(count=4)\
(queue=development)\
(maxWallTime=5)

queue

(3) submit through condor birdbath APIs
Almost the same with serial job submission except for setting up the wall clock time. When you generate globusrsl, add

(maxWallTime=yourWallMaxTime)

Friday, January 11, 2008

Job submission to TG machines

Color code
Blue: Serial pw.x is ready to run and accessible by Task Executor
Red: pw.x installation failed.
Green: Serial + MPI pw.x is ready to run and accessed from Task Executor
==================================================
machine hostname architecture job sub job manager
-----------------------------------------------------------------------------------------
BigRed login.bigred.iu.teragrid.org ppc64 GT4 loadleveler
*QueenBeelogin-qb.lsu-loni.teragrid.org GT4
NCAR tg-login.frost.ncar.teragrid.org i686 GT4
*Abe login-abe.ncsa.teragrid.org Intel64 GT4 pbs
Cobalt login-co.ncsa.teragrid.org ia64 GT4/GT2 pbs/fork
Mercury login-hg.ncsa.teragrid.org ia64 GT4/GT2 pbs/fork
Tungsten login-w.ncsa.teragrid.org ia32 GT4/GT2 LSF/fork
ORNL tg-login.ornl.teragrid.org i686 GT4/GT2 pbs/fork
*BigBen tg-login.bigben.psc.teragrid.org AMD Opteron GT4/GT2 pbs
*Rachel tg-login.rachel.psc.teragrid.org GT4/GT2 pbs
Purdue tg-login.purdue.teragrid.org GT4/GT2 pbs
*sdsc BG bglogin.sdsc.edu ppc64 GT4/GT2 no job manager??
*sdsc DS dslogin.sdsc.edu 002628DA4C00 GT4/GT2 loadleveler/fork
sdsc IBM tg-login.sdsc.teragrid.org ia64 GT4/GT2 pbs/fork
lonestar tg-login.lonestar.tacc.teragrid.org ia64 GT4/GT2 LSF/fork
maverik tg-viz-login.tacc.teragrid.org sun4u GT4/GT2 sge/fork
*ranger tg-login.ranger.tacc.teragrid.org GT4 sge/fork
IA-VIS tg-viz-login.uc.teragrid.org i686 GT4/GT2 pbs
IS-64 tg-login.uc.teragrid.org ia64 GT4/GT2 pbs/fork
=================================================

*QueenBee : could not login
*Abe doesn't support single-sign-on
*BigBen: could not login
*Abe: could not login
*Rachel: could not login
*Purdue: could not login
*sdsc BlueGene: unknown job manager?
*sdsc DataStar: unusual architecture?
*ranger: could not login

Compiling espresso in the TG machines

To run the executables on the TG machines, first you have to get ready your executables on the site.
Here is the instruction of installation serial run espresso. README.install was very useful.

* Cobalt, Mercury, and Tungsten NCSA

step 1. copy espressoXXX.tar

step 2. On the espresso directory, set the environment variable to select architecture.
setenv BIN_DIR /home/ac/quakesim/vlab/espresso/bin
setenv PSEUDO_DIR /home/ac/quakesim/vlab/espresso/pseudo
setenv TMP_DIR /home/ac/quakesim/vlab/espresso/tmp
setenv ARCH linux64
setenv PARA_PREFIX
setenv PARA_POSTFIX

note: for serial process, PARA_PREFIX MUST be left empty. For parallel process,

setenv PARA_PREFIX "mpirun -np 2"
setenv PARA_POSTFIX

step 2.5 make sure you have tmp, pseudo, bin directory under your espresso directory

step 3. ./configure

step 4. make all

* Lonestar parallel pw.x, ph.x
step 1. setenv PARA_PREFIX "mpirun"
step 2. setenv ARCH linux64
step 3. ./configure
step 4. make all

Submit job to pbs[1]: on site with command line

(0) create script file which displays hostname of the machine. Name the file as "test"
#!/bin/sh
/bin/hostname
(1) submit job test to the pbs queue.
qsub -o test.out -e test.err test
(2) check result file

*Useful guide
http://www.teragrid.org/userinfo/jobs/pbs.php

Friday, January 4, 2008

Submitting a job to LSF job queue [3]: through CondorG with birdbath APIs

To submit a job to the LSF thourgh the condor G with birdbath APIs, your ClassAdStructAttr should contains required attributes. I retrieved the list of keywords from my previous example: Submitting a job to LSF job queue [2]. After submitting condor job with example[2], run condor_q -l, than you can get a list of valid keywords. Please note that even though you use wrong keyword, condor WON'T throw any exception, and your job WON'T go through. (now I'm pulling my hair.) Therefore, make something runs correctly even though it's not with birdbath APIs, and start from the valid keywords generated from that example.

* Attributes In, Out, and Err are used for specifying Standard Input, output, and error redirections. Therefore if your executables uses standard input/output and redirects them to files, those should be specified with these attributes.

* In this case, pw.x generates multiple files besides stdout output files. Attribute
TransferOutput specifies files that should be transfered after the process is done.

* Attribute GlobusRSL is equivalant to the keyword globusrsl in the script file for the command line submission by condor_submit.

* Many many thanks to Marlon for helping me out!!

Actual ClassAdStructAttr[] is following:
------------------------------------------------------------------------------------------------------------------------------
ClassAdStructAttr[] extraAttributes =
{
new ClassAdStructAttr("GridResource", ClassAdAttrType.value3, gridResourceVal),
new ClassAdStructAttr("TransferExecutable",ClassAdAttrType.value4,"FALSE"),
new ClassAdStructAttr("Out", ClassAdAttrType.value3, tmpDir+"/"+"pwscf-"+clusterId+".out"),
new ClassAdStructAttr("UserLog",ClassAdAttrType.value3, tmpDir+"/"+"pwscf-"+clusterId+".log"),
new ClassAdStructAttr("Err",ClassAdAttrType.value3, tmpDir+"/"+"pwscf-"+clusterId+".err"),
new ClassAdStructAttr("In",ClassAdAttrType.value3, workDir+"/"+"Pwscf_Input"),
new ClassAdStructAttr("ShouldTransferFiles", ClassAdAttrType.value2,"\"YES\""),
new ClassAdStructAttr("WhenToTransferOutput", ClassAdAttrType.value2,"\"ON_EXIT\""),
new ClassAdStructAttr("StreamOut", ClassAdAttrType.value4, "TRUE"),
new ClassAdStructAttr("StreamErr",ClassAdAttrType.value4,"TRUE"),

new ClassAdStructAttr("TransferOutput",ClassAdAttrType.value2,
"\"pwscf.pot, pwscf.rho, pwscf.wfc, pwscf.md, pwscf.oldrho, pwscf.save, pwscf.update\""),

new ClassAdStructAttr("TransferOutputRemaps",ClassAdAttrType.value2,
"\"pwscf.pot="+tmpDir+"/"+"pwscf-"+clusterId+
".pot; pwscf.rho="+tmpDir+"/"+"pwscf-"+clusterId+
".rho;pwscf.wfc="+tmpDir+"/"+"pwscf-"+clusterId+
".wfc; pwscf.md="+tmpDir+"/"+"pwscf-"+clusterId+
".md; pwscf.oldrho="+tmpDir+"/"+"pwscf-"+clusterId+
".oldrho; pwscf.save="+tmpDir+"/"+"pwscf-"+clusterId+
".save; pwscf.update="+tmpDir+"/"+"pwscf-"+clusterId+".update\""),

new ClassAdStructAttr("GlobusRSL", ClassAdAttrType.value2,
"\"(queue=development)(environment=(PATH /usr/bin))(jobtype=single)(count=1)\""),

new ClassAdStructAttr("x509userproxy",ClassAdAttrType.value3,proxyLocation),

};
------------------------------------------------------------------------------------------------------------------------------

Pwscf output files?

In my local machine, pw.x generates output files besides the standard output:
pwscf.pot, pwscf.rho, pwscf.wfc
unless I reuse the tmp directory.

However, in lonestar, it genrates,
pwscf.md pwscf.oldrho pwscf.pot pwscf.rho pwscf.save pwscf.update pwscf.wfc

For sure, I transfer all of the possible files from the remote machine.

Submitting a job to LSF job queue [2]: through CondorG with condor_submit

Now, I tried to submit the same job, pw.x with the remote input files through the CondorG command line which is condor_submit. In this example, I use inputfiles stored in my local machine. Therefore, my script file should specify input files like following lines.

---------------------------------------------------------------------------------------------------------------------------------
executable = /home/teragrid/tg459282/vlab/pw.x
transfer_executable = false
should_transfer_files = yes
when_to_transfer_output = ON_EXIT
transfer_input_files = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/008-O-ca--bm3.vdb,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/__cc5_7,/home/leesangm/catalina/VLAB_Codes/__CC5f_7/Mg.vbc3

universe = grid
grid_resource = gt2 tg-login.tacc.teragrid.org/jobmanager-lsf
output = tmpfile.out.$(Cluster)
error = condorG.err.$(Cluster)
log = condorG.log.$(Cluster)
input = /home/leesangm/catalina/VLAB_Codes/__CC5f_7/Pwscf_Input

globusrsl = (queue=development)\
(environment=(PATH /usr/bin))\
(jobtype=single)\
(count=1)

queue
---------------------------------------------------------------------------------------------------------------------------------

This script file is almost the same with normal condor submit script except for the globusrsl keyword. This is a simple case for the serial job. For the parallel jobs, this should be modified.

Then submit condor job,
condor_submit script_file_name

Submitting a job to LSF job queue [1] : On the Cluster

* Usefule LSF commands :
bsub: submission jobs
bjobs: display information about the jobs
bkills: send signal to kill
For more commands,
http://its.unc.edu/dci/dci_components/lsf/lsf_commands.htm

* Useful options of bsub command
-q : name of the queue
-n: desired number of processors
-W: Walltime limit in batch jobs -W[hours]:[minutes]
-i : input file
-o : output file
-e: error file

Example lsf submit of pw.x in lonestar
bsub -q development -n 1 -W 15 -i "Pwscf_Input" -o "myout.out" ../pw.x

Thursday, January 3, 2008

Building a Client of the Task Executor

To generate client of the vlab Task Executor service, first we have to create stub code and compile them.
If the service is running on localhost, WSDL file is located at,
http://localhost:8080/task-executor/services/TaskExecutor?wsdl
With this WSDL file, we can generate java classes with WSDL2Java included in the axis package.
java org.apache.axis.wsdl.WSDL2Java http://localhost:8080/task-executor/services/TaskExecutor?wsdl
Then compile/jar the java code.
Required jar files to run WSDL2Java are following:

axis-1.4.jar
activation-1.1.jar
commons-discovery-0.2.jar
saaj.jar
jaxrpc.jar
mail-1.4.jar
wsdl4j-1.5.1.jar