Tuesday, May 12, 2009

Adding EC2 computing nodes to your Hadoop Cluster

If you want to add EC2 nodes as your slave nodes of existing Hadoop cluster:

Step1] Create a EC2 instance which has hadoop installed.
ec2-describe-images -x all | grep hadoop

Step2] Make sure that the version of hadoop MUST be identical on all the machines in your cluster.

Step3] Generate public key on your master machine:

ssh-keygen -t rsa

Then your publickey is stored in,

.ssh/id_rsa.pub

Step4] Copy your public key (id_rsa.pub) to .ssh/authorized_keys on your EC2 instance. Now you can ssh to your instance without your keypair.

Step5] Add your new slave node to your hadoop_loc/conf/slaves
Mine looks like:

localhost
ec2-111-222-333-444.compute-1.amazonaws.com

Step6] Synchronize or copy your configufation files to the slave nodes:
-hadoop-site.xml
-slaves
-master

Step7] format your namenode (if you need)

Step8] start your cluster!!