HBase

准备

约定

  • [M] MODIFY
  • [A] ADD
  • [D] DELETE
  • 所有操作在 hadoop 用户下

安装包

  • jdk1.8
  • hadoop-2.7.5
  • hbase-1.4.8
  • zookeeper-3.4.10
  • apache-phoenix-4.14.1-HBase-1.4-bin

机器

[HOST-43] 172.16.11.43 hostname host-172-16-11-43
[HOST-44] 172.16.11.44 hostname host-172-16-11-44

http://hbase.apache.org/book.html#configuration
ssh、dns、ntp、ulimit

edit [HOST-43] and [HOST-44] machine /etc/hosts
[A] 172.16.11.43 host-172-16-11-43
[A] 172.16.11.44 host-172-16-11-44

Limits on Number of Files and Processes (ulimit)

ulimit -n

To configure ulimit settings on Ubuntu, edit /etc/security/limits.conf,
which is a space-delimited file with four columns.
Refer to the man page for limits.conf for details
about the format of this file. In the following example,
the first line sets both soft and hard limits for
the number of open files (nofile) to 32768 for
the operating system user with the username hadoop.
The second line sets the number of processes to 32000 for the same user.

1
2
hadoop  -       nofile  32768
hadoop - nproc 32000

安装

zookeeper3.4.10

[HOST-43]

配置

client-port:2222

cp conf/zoo_sample.cfg conf/zoo.cfg

[M] conf/zoo.cfg
[M] dataDir=<path>
[A] dataLogDir=<path>

启动

bin/zkServer.sh start

start、stop、status、etc.
bin/zkServer.sh {start|start-foreground|stop|restart|status|upgrade|print-cmd}

hadoop-2.7.5 (Cluster)

Master:[HOST-43]
Slave:[HOST-44]

配置

edit [HOST-43] and [HOST-44] machine ~/.bash_profile
[A] export HADOOP_HOME=/home/hadoop/hadoop/hadoop-2.7.5
[A] export PATH=$HADOOP_HOME/bin:$PATH

source ~/.bash_profile

HDFS

edit [HOST-43] and [HOST-44] <HADOOP_HOME>/etc/hadoop/core.site.xml
[A]

1
2
3
4
5
6
7
8
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.16.11.43:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.7.5-data/tmp/</value>
</property>

edit [HOST-43] and [HOST-44] <HADOOP_HOME>/etc/hadoop/hdfs.site.xml
[A]

1
2
3
4
5
6
7
8
9
10
11
12
13

<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.7.5-data/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.7.5-data/dfs/data</value>
</property>

edit [HOST-43] <HADOOP_HOME>/etc/hadoop/slaves,like this

**[D]** #localhost
**[A]** host-172-16-11-44

/** choice **/scp -pqr hadoop-2.7.5 hadoop-2.7.5-data hadoop@172.16.11.44:~/hadoop/

启动

启动 [HOST-43] NameNode:
<HADOOP_HOME>/sbin/hadoop-daemon.sh start namenode

启动 [HOST-44] DataNode:
<HADOOP_HOME>/sbin/hadoop-daemon.sh start datanode

VIEW

http://172.16.11.43:50070

YARN

mapred-site.xml配置

1
2
3
4
<property>
<name>mapreduce.framework.name</name>//使用yran
<value>yarn</value>
</property>

yarn-site.xm配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
<property>
<name>yarn.resourcemanager.address</name>
<value>172.16.11.43:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>172.16.11.43:18030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>172.16.11.43:18088</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>172.16.11.43:18025</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>172.16.11.43:18141</value>
</property>

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

PS :slaves中既是datanode配置同时也是nodemanger的配置

启动

启动 [HOST-43] ResourceManager:
<HADOOP_HOME>/sbin/yarn-daemon.sh start resourcemanager

启动 [HOST-44] NodeManager:
<HADOOP_HOME>/sbin/yarn-daemon.sh start nodemanager

或者 [HOST-43] <HADOOP_HOME>/sbin/start-yarn.sh


验证MR

hadoop jar /home/hadoop/hadoop/hadoop-2.7.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar pi 5 5

VIEW

YARN-ResourceManager http://172.16.11.43:18088

hbase-1.4.8

master:[HOST-43]
regionserver:[HOST-44]

配置

edit [HOST-43] and [HOST-44] machine ~/.bash_profile
[A] export HBASE_HOME=/home/hadoop/hbase/hbase-1.4.8
[A] export PATH=$HBASE_HOME/bin:$PATH

source ~/.bash_profile

edit [HOST-43] and [HOST-44] <HBASE_HOME>/conf/hbase-env.sh
[M] # export JAVA_HOME=/usr/java/jdk1.6.0/ --> JAVA_HOME=<JAVA_HOME>
[M] # export HBASE_MANAGES_ZK=true --> export HBASE_MANAGES_ZK=false

edit [HOST-43] and [HOST-44] <HBASE_HOME>/conf/hbase-site.xml
[A]

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<property>
<name>hbase.rootdir</name>
<value>hdfs://172.16.11.43:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>172.16.11.43</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/home/hadoop/hbase/hbase-1.4.8-data/tmp</value>
</property>

edit [HOST-43] <HBASE_HOME>/conf/regionservers,like this
[D] #localhost
[A] host-172-16-11-44

1
host-172-16-11-44

/** choice **/scp -pqr hbase-1.4.8 hbase-1.4.8-data hadoop@172.16.11.44:~/hbase

启动

启动 [HOST-43] master:
<HBASE_HOME>/bin/hbase-daemon.sh start master

启动 [HOST-44] regionserver:
<HBASE_HOME>/bin/hbase-daemon.sh start regionserver

或者 [HOST-43] <HBASE_HOME>/sbin/start-hbase.sh

VIEW

HBase-Master http://172.16.11.44:16010
HBase-RegionSerion http://172.16.11.44:16030

phoenix4.14.1

  • expand installation tar

  • copy the phoenix server jar that is compatible with your HBase installation into the lib directory of every region server

  • restart the region servers

  • add the phoenix client jar to the classpath of your HBase client

配置

1
2
3
4
5
copy phoenix-**[verion]**-server.jar  to **[HOST-43]** and **[HOST-44]** <HBASE_HOME>/lib path.  

copy phoenix-4.14.1-HBase-1.4-server.jar <HBASE_HOME>/lib

python <PHOENIX_HOME>/bin/sqlline.py 172.16.11.43:2222:/hbase

测试

SELECT DISTINCT(TABLE_NAME) FROM “SYSTEM”.“CATALOG”;

1
create table ABC(id integer PRIMARY KEY,a varchar(500), b date, c integer);

Note: 必须有主键,因为 HBase row key.

JDBC excuteBatch

edit hbase-site.xml
http://phoenix.apache.org/tuning.html

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<property>
<name>phoenix.mutate.maxSize</name>
<value>20000000</value>
</property>
<property>
<name>phoenix.mutate.batchSize</name>
<value>20000000</value>
</property>
<property>
<name>phoenix.mutate.maxSizeBytes</name>
<value>1048576000</value>
</property>
<property>
<name>phoenix.schema.isNamespaceMappingEnabled</name>
<value>true</value>
</property>

secodery index

edit hbase-site.xml
http://phoenix.apache.org/secondary_indexing.html

1
2
3
4
<property>
<name>hbase.regionserver.wal.codec</name>
<value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property>