Spark环境搭建03

上面说到了Spark如何与Hadoop整合,下面就说一下Spark如何与HBase整合。

1、获取hbase的classpath

#要把netty和jetty的包去掉,否则会有jar包冲突
HBASE_PATH=`/home/hadoop/Deploy/hbase-1.1.2/bin/hbase classpath`

2、启动spark

bin/spark-shell --driver-class-path $HBASE_PATH

3、进行简单的操作

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable

val conf = HBaseConfiguration.create()
conf.set(TableInputFormat.INPUT_TABLE, "inpatient_hb")

val admin = new HBaseAdmin(conf)
admin.isTableAvailable("inpatient_hb")
res1: Boolean = true

val hBaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result])

hBaseRDD.count()
2017-01-03 20:46:29,854 INFO  [main] scheduler.DAGScheduler (Logging.scala:logInfo(58)) - Job 0 finished: count at <console>:36, took 23.170739 s
res2: Long = 115077

Leave a Reply

Your email address will not be published. Required fields are marked *

*