-- Yi-Kai Tsai (cuma) <yikai@xxxxxxxxxxxxx>, Asia Regional Search Engineering.
--- Begin Message ---
- From: Ryan Rawson <ryanobjc@xxxxxxxxx>
- To: "hbase-user@xxxxxxxxxxxxxxxxx" <hbase-user@xxxxxxxxxxxxxxxxx>
- Date: Mon, 12 Jan 2009 07:11:46 +0900
Hi all, New user of hbase here. I've been trolling about in IRC for a few days, and been getting great help all around so far. The topic turns to importing data into hbase - I have largeish datasets I want to evaluate hbase performance on, so I've been working at importing said data. I've managed to get some impressive performance speedups, and I chronicled them here: http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html To summarize: - Use the Native HBASE API in Java or Jython (or presumably any JVM language) - Disable table auto flush, set write buffer large (12M for me) At this point I can import a 18 GB, 440m row comma-seperated flat file in about 72 minutes using map-reduce. This is on a 3 node cluster all running hdfs,hbase,mapred with 12 map tasks (4 per). This hardware is loaner DB hardware, so once I get my real cluster I'll revise/publish new data. I look forward to meeting some of you next week at the hbase meetup at powerset! -ryan
--- End Message ---