Friday, November 21, 2008

Test #4(2mil): Nov.21.2008

Full sequence test for 2million sequences.

(1) Resource setup: BigRed(150), Ornl(80), Cobalt(80)
(2) Service setup: Status check interval: 60 secs
Job queue scan interval: 60 secs
Job queue size: 100
(3) Input files : 2mil.tar copied to each of cluster in the $TG_CLUSTER_SCRATCH directory.
Referred by arguments with full path to the input files
(4) Output files: staged out to swarm host

(5) Client side setup: input files are located in my desktop(same machine with swarm host).
Scan the directory and find files which contain more than 1 sequence(with grep unix command through Java Runtime). Send the request to swarm with 10 batch per rpc call.
  • Total duration of the Submission: 170364307 milliseconds(around 47.3 hours).
  • Total number of jobs submitted: 75533
  • Total number of files scanned: 536825
(6) Completed Jobs : To be added
(7) Held Jobs: To be added
(8) Open Issues:
Submission time requires to be improved.
Reason:
  • Loading 536825 objects which represent the filename takes too much of memory.[Approach]: Use filefilter and load partial list at a time
  • Using Java Runtime: Java Runtime requires extra memory to execute system fork. [Approach]: Try checking the number of sequences by means of Java FileInputStream
  • Running client and host in the same machine.[Approach]: Try the client in different machine.

Friday, November 14, 2008

Java Runtime Class with VM memory

For swarm service, I worte a client kit to crawl a directory and find cluster files which need to be assembled. To access the files and count the number of gene sequences, I used Java Runtime Process class. With the 2 million sequences, I could get around 600000 clustered files. Those clustered files were visited by the crawler program. Especially when I tried to create DOM object to interact with Web service, this crawler started to throw IO exception-memory allocation error. My Java option was -Xmn 512M -Xmx 1024M.

On linux, Runtime.exec does a fork and execute. That means that you will need double what your current java process is using in virtual memory(real + swap). Therefore, if I specify initial heap as 512M then my total heap must be bigger than 1024M.

I tried to set my java option to -Xmn 256M -Xmx 1024M. Crawler was slowed down quite a bit, but it did not throw IO exception anymore.

Friday, November 7, 2008

Test #3(2mil): Nov.07.2008

(1) Starting time: 04:00 pm

(2) Server Setup:

-BigRed : max 5

-Cobalt : max5

(3) Client Setup:

-Total job max: 20

-Input source: EST Human 2mil

* Note

New Setup:(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)(2) Decreased resource pool size(3) Decreased client jobs

Test #2(2mil) : Nov.07.2008

(1) Starting time: 02:05 pm

(2) Server Setup:
-BigRed : max 20
-Cobalt : max20

(3) Client Setup:
-Total job max: 1000
-Input source: EST Human 2mil

* Note
New Setup:
(1) Increased timeout from 20 secs to 6 mins. (axis2.xml)
(2) Decreased resource pool size
(3) Decreased client jobs

Result:
(1) Job submission successfully done. (total 1000 jobs)

References:
First condorjob clusterID: 50537

Test #1(2mil): Nov.07.2008

(1) Starting time: 12:54 pm

(2) Server Setup:
-BigRed max 400
-Cobalt max 200

(3) Client Setup:
-Max job submission: 2000000
-Input files source: EST Human 2mil

Note:
(1) Reading directory with 2mil sequence takes less than 20 secs.

(2) Client hung with http timeout
(3)Cobalt started to hold jobs with globus err code 17.