BigData flume AsyncHbaseEventSerializer implemented with CSV reader and RowKey creator
Most of CSV handlers in flume thay split string with comma delimiter "," so the string for ex. a,b,"c,c2",d
wil not parse correctly.
This project uses Apache common
lib for parsing CSV and handles thouse issues.
There are two (2) Types of serialziers:
AsyncHbaseCSVEventSerializer
HbaseCSVEventSerializer
# Build
mvn clean install
# copy the jar to Lib in flume agent
cp target/
When input CSV is like:
ts,36882775649162,5610505046685714
ti,3528885721650035,4903559682490813
ti,30049400044021,6374608174602560
Using in flume configuration
agent007.sinks.sink1.type = org.apache.flume.sink.hbase.AsyncHBaseSink
agent007.sinks.sink1.channel = c1
agent007.sinks.sink1.table = transactions
agent007.sinks.sink1.columnFamily = clients
agent007.sinks.sink1.batchSize = 5000
# Add this
agent007.sinks.sink1.serializer = com.alefbt.bigdata.flume.sink.hbase.AsyncHbaseCSVEventSerializer
agent007.sinks.sink1.serializer.columns = type,id1,id2
agent007.sinks.sink1.serializer.key = type,id2
It will Run hbase put 'table', 'rowkey', 'column', 'value'
So each row in example it will run 3 put commands
the RowKey will generated from type,id2
fields
so,
# the config
agent007.sinks.sink1.serializer.columns = type,id1,id2
agent007.sinks.sink1.serializer.key = type,id2
# csv value :
ts,36882775649162,5610505046685714
# row key:
ts:5610505046685714
# The actions on HBase will be:
hbase put 'transactions', 'ts:5610505046685714', 'clients:type', 'ts'
hbase put 'transactions', 'ts:5610505046685714', 'clients:id1', '36882775649162'
hbase put 'transactions', 'ts:5610505046685714', 'clients:id2', '5610505046685714'
Me: Yehuda Korotkin yehuda@alefbt.com
- On cloudera 5.10 with AsyncHBase mode (was few issues:
org.hbase.async.RemoteException: Call queue is full on /0.0.0.0:60020, too many items queued ?
According OpenTSDB/opentsdb#783
You should increase hbase.regionserver.handler.count
I found that it solved this issue (but opend other issues)
- https://flume.apache.org/FlumeUserGuide.html#asynchbasesink
- https://commons.apache.org/proper/commons-csv/
- https://flume.apache.org/
- http://hbase.apache.org/
MIT. (see LICENSE file)