[hadoop-taiwan] [Fwd: Performance of hbase importing]

  • From: Yi-Kai Tsai <yikai@xxxxxxxxxxxxx>
  • To: "hadoop-taiwan@xxxxxxxxxxxxx" <hadoop-taiwan@xxxxxxxxxxxxx>
  • Date: Wed, 14 Jan 2009 16:21:25 +0800


--
Yi-Kai Tsai (cuma) <yikai@xxxxxxxxxxxxx>, Asia Regional Search Engineering.

--- Begin Message ---
  • From: Ryan Rawson <ryanobjc@xxxxxxxxx>
  • To: "hbase-user@xxxxxxxxxxxxxxxxx" <hbase-user@xxxxxxxxxxxxxxxxx>
  • Date: Mon, 12 Jan 2009 07:11:46 +0900
Hi all,

New user of hbase here. I've been trolling about in IRC for a few days, and
been getting great help all around so far.

The topic turns to importing data into hbase - I have largeish datasets I
want to evaluate hbase performance on, so I've been working at importing
said data.  I've managed to get some impressive performance speedups, and I
chronicled them here:

http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html

To summarize:
- Use the Native HBASE API in Java or Jython (or presumably any JVM
language)
- Disable table auto flush, set write buffer large (12M for me)

At this point I can import a 18 GB, 440m row comma-seperated flat file in
about 72 minutes using map-reduce.  This is on a 3 node cluster all running
hdfs,hbase,mapred with 12 map tasks (4 per).  This hardware is loaner DB
hardware, so once I get my real cluster I'll revise/publish new data.

I look forward to meeting some of you next week at the hbase meetup at
powerset!

-ryan

--- End Message ---

Other related posts:

  • » [hadoop-taiwan] [Fwd: Performance of hbase importing] - Yi-Kai Tsai