Sangmi Lee Pallickara: December 2008

To write a simple Java application with HBase APIs, you surely need hadoop and hbase installation on your machine. For this example, I used hadoop installation with pseudo distributed setup on the localhost.
The code is mainly downloaded from the hbase site,
http://hadoop.apache.org/hbase/docs/r0.2.1/api/index.html


import java.io.IOException;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Scanner;
import org.apache.hadoop.hbase.io.BatchUpdate;
import org.apache.hadoop.hbase.io.Cell;
import org.apache.hadoop.hbase.io.RowResult;
import org.apache.hadoop.hbase.HBaseConfiguration;

public class MySimpleTest {

 public static void main(String args[]) throws IOException {
   // You need a configuration object to tell the client where to connect.
   // But don't worry, the defaults are pulled from the local config file.
   HBaseConfiguration config = new HBaseConfiguration();

   // This instantiates an HTable object that connects you to the "myTable"
   // table.
   HTable table = new HTable(config, "myTable");

   // To do any sort of update on a row, you use an instance of the BatchUpdate
   // class. A BatchUpdate takes a row and optionally a timestamp which your
   // updates will affect.
   BatchUpdate batchUpdate = new BatchUpdate("myRow");

   // The BatchUpdate#put method takes a Text that describes what cell you want
   // to put a value into, and a byte array that is the value you want to
   // store. Note that if you want to store strings, you have to getBytes()
   // from the string for HBase to understand how to store it. (The same goes
   // for primitives like ints and longs and user-defined classes - you must
   // find a way to reduce it to bytes.)
   batchUpdate.put("myColumnFamily:columnQualifier1",
     "columnQualifier1 value!".getBytes());

   // Deletes are batch operations in HBase as well.
   batchUpdate.delete("myColumnFamily:cellIWantDeleted");

   // Once you've done all the puts you want, you need to commit the results.
   // The HTable#commit method takes the BatchUpdate instance you've been
   // building and pushes the batch of changes you made into HBase.
   table.commit(batchUpdate);

   // Now, to retrieve the data we just wrote. The values that come back are
   // Cell instances. A Cell is a combination of the value as a byte array and
   // the timestamp the value was stored with. If you happen to know that the
   // value contained is a string and want an actual string, then you must
   // convert it yourself.
   Cell cell = table.get("myRow", "myColumnFamily:columnQualifier1");
   String valueStr = new String(cell.getValue());
 
   // Sometimes, you won't know the row you're looking for. In this case, you
   // use a Scanner. This will give you cursor-like interface to the contents
   // of the table.
   Scanner scanner =
     // we want to get back only "myColumnFamily:columnQualifier1" when we iterate
     table.getScanner(new String[]{"myColumnFamily:columnQualifier1"});
 
 
   // Scanners in HBase 0.2 return RowResult instances. A RowResult is like the
   // row key and the columns all wrapped up in a single interface.
   // RowResult#getRow gives you the row key. RowResult also implements
   // Map, so you can get to your column results easily.
 
   // Now, for the actual iteration. One way is to use a while loop like so:
   RowResult rowResult = scanner.next();
 
   while(rowResult != null) {
     // print out the row we found and the columns we were looking for
     System.out.println("Found row: " + new String(rowResult.getRow()) + " with value: " +
      rowResult.get("myColumnFamily:columnQualifier1".getBytes()));
   
     rowResult = scanner.next();
   }
 
   // The other approach is to use a foreach loop. Scanners are iterable!
   for (RowResult result : scanner) {
     // print out the row we found and the columns we were looking for
     System.out.println("Found row: " + new String(result.getRow()) + " with value: " +
      result.get("myColumnFamily:columnQualifier1".getBytes()));
   }
 
   // Make sure you close your scanners when you are done!
   scanner.close();
 }
}

The error java.io.IOException: Incompatible namespaceIDs in the logs of a datanode (/logs/hadoop-hadoop-datanode-.log) might be caused by bug HADOOP-1212. Here is a site which provides how to get around this bug,
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Multi-Node_Cluster)

The complete error message was,

 ... ERROR org.apache.hadoop.dfs.DataNode: java.io.IOException: Incompatible namespaceIDs in /usr/local/hadoop-datastore/hadoop-hadoop/dfs/data: namenode namespaceID = 308967713; datanode namespaceID = 113030094
      at org.apache.hadoop.dfs.DataStorage.doTransition(DataStorage.java:281)
      at org.apache.hadoop.dfs.DataStorage.recoverTransitionRead(DataStorage.java:121)
      at org.apache.hadoop.dfs.DataNode.startDataNode(DataNode.java:230)
      at org.apache.hadoop.dfs.DataNode.(DataNode.java:199)
      at org.apache.hadoop.dfs.DataNode.makeInstance(DataNode.java:1202)
      at org.apache.hadoop.dfs.DataNode.run(DataNode.java:1146)
      at org.apache.hadoop.dfs.DataNode.createDataNode(DataNode.java:1167)
      at org.apache.hadoop.dfs.DataNode.main(DataNode.java:1326)

Sangmi Lee Pallickara

Tuesday, December 9, 2008

renewing proxy

Monday, December 1, 2008

Writing simple Java application with HBase APIs[0]

Hadoop: java.io.IOException: Incompatible namespaceIDs

Related Blogs

Blog Archive

About Me