步骤一、配置免密登录
配置服务器间ssh免密登录https://atbulbs.github.io/2018/02/03/Hexo%E5%8D%9A%E5%AE%A2%E7%9A%84%E4%BD%BF%E7%94%A8/
前提条件
- 系统已经安装有ssh
- 操作用户最好拥有/etc/hosts文件的读写权限或者用户拥有sudo权限
- 操作用户拥有ssh-keygen命令权限
过程
- 修改hostss文件(/etc/hosts) - 1 - sudo vim /etc/hosts - 添加以下内容 - 1 
 2
 3- 192.168.126.150 hadoop1 
 192.168.126.151 hadoop2
 192.168.126.152 hadoop3
- 在每一台服务器上生成ssh登录密钥 - 1 - ssh-keygen -t rsa - 在~文件夹(当前用户个人目录)下生成.ssh隐藏文件夹以及id_rsa id_rsa.pub文件,id_rsa是私钥文件,id_rsa.pub是公钥文件 
- 分别将所有的(这里指三台服务器)id_rsa.pub内的内容拼接到一个文件中(authorized_keys),内容如下 - 1 
 2
 3- ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA9/09V+rJxmK1Qpoh3q1egXc1XLaBhO/gZCvQYzdC3CqN3Bv+IYduernvSXCLGZGweDP3Caw223/QiDDe2g2FB3+XmUSlkldpaRS5fGMIO4OAM5MjNtmkiruLuH0d8OQubxkzT8TTCfJLosfnT0jbls6cYPHmidcy1jAU82TpExCgz73zrJNLRXnS7tPPMW4IK9A0mChg11Ohn+ldZOqK7P80kR5951rHBd97fCLl8xl5UZ1Ep5XslT+Q+DLUhYPXk0NWNnDCPNsnNEAdF/jfBOOscrZkU0ahz1rYP6Zz9xDcC2kxhEDwf9aXD/wLOXJv4B/hv6/RUtrbYVrl3Fk30w== test@hadoop01 
 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4Sd0LKEhnuHhYdI/nM8diQZSKjeJ4LYIjyVqqdp1igKXFBxU2EcVADkq4S/Uupx5GBlTxmVWWREp5W3pcK0Z+I8FCpr4AAVpKqOOua5RUJg7NzmTiWf9tUnVKerwa7IYi/n0wuwBNyps123ajOtNkC4oFez3NQgXywpMX7wIQVLCldtRQCm2UHZQHMU55qHnEj3BzZqNg4vdPRygxDZubB65pQJVIBWp03LsKUahm3bL3hUL7A2mlUwCz/mXZrZUc1q/DSmGIGpxc2jc/ukSV/gG5APgyxTiEkvSIECPWhP9fJCGlTbng+NXVFtbPQ6vI0Mblb02s4G8tMWvRB9Jrw== test@hadoop02
 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA3GLFdnz5aOG3XLT+8GR/YzqJXG2K+b7M+CRiVRPcqJJvqe9UFu8/IydM/jQTIi5OhHBXqnGMcAjK76fEkMiWaPdmILbiMYr11Lx56ZhqRqsVvE8ndO2f5j45vvADZbGAJWGEVIr6tnf6QBocBg/j4h9BWryFvgfhdzM+C3CZmyRyOUJwz8NNmkXySIH/1EUHOSrwmOnO12HwLW3/nXaowR9KfyYbz2tjNJdGApeQwVQgkKeFZ8Pqq8UZBcFZeg3Zbzdwxo86y1lUxL896Wgh++jhys5eyKgruOgnSgbqy0kOu32R/uRaSt9IrnkPMPGaEi340x9+mcm9c8/PoU36Cw== test@hadoop03
- 将authorized_keys文件上传到每一台服务器下的~/.ssh/下 
- 修改.ssh文件夹以及authorized_keys文件的权限 1 
 2chmod 700 ~/.ssh 
 chmod 600 ~/.ssh/authorized_keys
验证
使用ssh命令登录其他两台服务器,成功即可,如果服务器间的用户名相同,则可以忽略用户名
步骤二、jdk安装与配置
下载安装jdk并配置环境变量
前提条件
- 已经下载jdk的安装包(这里选用tar.gz压缩包文件)
- 操作用户最好拥有/etc/profile文件的读写权限或者用户拥有sudo权限
安装配置过程
- 解压 1 
 2mkdir -p /opt/java 
 tar -zxvf jdk-8u201-linux-x64.tar.gz -C /opt/java
- 修改环境变量 - 1 - sudo vim /etc/profile - 在profile文件末尾添加 - 1 
 2
 3- export JAVA_HOME=/opt/java/jdk1.8.0_201 
 export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
 export CLASSPATH=.$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar- 使修改的profile立刻生效 - 1 - source /etc/profile 
步骤三、zookeeper安装与配置
配置zookeeper集群
前提条件
- 已经下载zookeeper的安装包
安装配置过程
- 解压安装包 1 
 2mkdir -p /opt/zookeeper 
 tar -zxvf zookeeper.tar.gz -C /opt/zookeeper
- 配置zoo.cfg
 在zookeeper/conf目录下,将zoo_sample.cfg复制为zoo.cfg,在其中配置注意,zookeeper集群中所有节点的zoo.cfg文件一致。1 
 2
 3
 4
 5
 6
 7tickTime=4000 
 initLimit=20
 dataDir=/opt/zookeeper/data
 clientPort=2181
 server.1=192.168.126.150:2888:3888
 server.2=192.168.126.151:2888:3888
 server.3=192.168.126.152:2888:3888
- 在zoo.cfg配置的dataDir下创建myid文件,其中只有一个数字,表明当前节点的server.id,与zoo.cfg中配置一致。
- 修改日志文件输出位置(红色为需要修改的地方),按天输出到指定文件夹 1 
 2mkdir -p /opt/log/zookeeper 
 vim /opt/zookeeper/bin/zkEnv.sh1 
 2
 3
 4
 5
 6
 7
 8if [ "x${ZOO_LOG_DIR}" = "x" ] 
 then
 ZOO_LOG_DIR="/opt/log/zookeeper/"
 fi
 if [ "x${ZOO_LOG4J_PROP}" = "x" ]
 then
 ZOO_LOG4J_PROP="INFO,ROLLINGFILE"
 fi1 vim /opt/zookeeper/conf/log4j.properties 1 
 2zookeeper.root.logger=INFO, ROLLINGFILE 
 log4j.appender.ROLLINGFILE=org.apache.log4j.DailyRollingFileAppender
- 启动zk服务:zkServer.sh start(启动应尽量所有节点同时启动)
 停止zk服务:zkServer.sh stop
 查看zk状态:zkServer.sh status
 重启zk服务:zkServer.sh restart
步骤四、hadoop安装与配置
- 安装和配置hadoop1节点的hadoop- 配置hadoop-env.sh
- 配置jdk安装目录
- 配置hadoop配置文件目录
- 配置环境变量 1 
 2export JAVA_HOME=/opt/java/jdk1.8.0_201 
 export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop
 
- 配置core-site.xml 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14<configuration> 
 <property>
 <name>fs.defaultFS</name>
 <value>hdfs://ns</value>
 </property>
 <property>
 <name>hadoop.tmp.dir</name>
 <value>/opt/log/hadoop</value>
 </property>
 <property>
 <name>ha.zookeeper.quorum</name>
 <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
 </property>
 </configuration>
- 配置hdfs-site.xml 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67<configuration> 
 <property>
 <name>dfs.nameservices</name>
 <value>ns</value>
 </property>
 <property>
 <name>dfs.ha.namenodes.ns</name>
 <value>nn1,nn2</value>
 </property>
 <property>
 <name>dfs.namenode.rpc-address.ns.nn1</name>
 <value>hadoop1:9000</value>
 </property>
 <property>
 <name>dfs.namenode.http-address.ns.nn1</name>
 <value>hadoop1:50070</value>
 </property>
 <property>
 <name>dfs.namenode.rpc-address.ns.nn2</name>
 <value>hadoop3:9000</value>
 </property>
 <property>
 <name>dfs.namenode.http-address.ns.nn2</name>
 <value>hadoop3:50070</value>
 </property>
 <property>
 <name>dfs.namenode.shared.edits.dir</name>
 <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/ns</value>
 </property>
 <property>
 <name>dfs.journalnode.edits.dir</name>
 <value>/opt/hadoop/journal</value>
 </property>
 <property>
 <name>dfs.ha.automatic-failover.enabled</name>
 <value>true</value>
 </property>
 <property>
 <name>dfs.client.failover.proxy.provider.ns</name>
 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>
 <property>
 <name>dfs.ha.fencing.methods</name>
 <value>sshfence</value>
 </property>
 <property>
 <name>dfs.ha.fencing.ssh.private-key-files</name>
 <value>/home/sysmon/.ssh/id_rsa</value>
 </property>
 <property>
 <name>dfs.replication</name>
 <value>3</value>
 </property>
 <property>
 <name>dfs.permissions</name>
 <value>false</value>
 </property>
 <property>
 <name>dfs.namenode.name.dir</name>
 <value>/opt/log/hadoop/namenode</value>
 </property>
 <property>
 <name>dfs.datanode.data.dir</name>
 <value>/opt/log/hadoop/datanode</value>
 </property>
 </configuration>
- 配置mapred-site.xml 1 
 2
 3
 4
 5
 6<configuration> 
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
 </configuration>
- 配置yarn-site.xml 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51<configuration> 
 <!-- Site specific YARN configuration properties -->
 <property>
 <name>yarn.resourcemanager.ha.enabled</name>
 <value>true</value>
 </property>
 <property>
 <name>yarn.resourcemanager.ha.rm-ids</name>
 <value>rm1,rm2</value>
 </property>
 <property>
 <name>yarn.resourcemanager.hostname.rm1</name>
 <value>hadoop1</value>
 </property>
 <property>
 <name>yarn.resourcemanager.hostname.rm2</name>
 <value>hadoop2</value>
 </property>
 <property>
 <name>yarn.resourcemanager.recovery.enabled</name>
 <value>true</value>
 </property>
 <property>
 <name>yarn.resourcemanager.store.class</name>
 <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
 </property>
 <property>
 <name>yarn.resourcemanager.zk-address</name>
 <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
 </property>
 <property>
 <name>yarn.resourcemanager.cluster-id</name>
 <value>yarn-ha</value>
 </property>
 <property>
 <name>yarn.resourcemanager.hostname</name>
 <value>hadoop1</value>
 </property>
 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
 <property>
 <name>yarn.nodemanager.resource.cpu-vcores</name>
 <value>4</value>
 </property>
 <property>
 <name>yarn.nodemanager.resource.memory-mb</name>
 <value>4096</value>
 </property>
 </configuration>
- 配置环境变量 1 
 2
 3
 4
 5export JAVA_HOME=/opt/java/jdk1.8.0_201 
 export ZOO_HOME=/opt/zookeeper
 export HADOOP_HOME=/opt/hadoop
 export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$ZOO_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
 export CLASSPATH=.$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
- 根据配置文件,创建相关文件夹,用来存放对应数据
 在hadoop目录下创建 journal目录
 在/opt/log/hadoop/目录下 创建namenode,datanode目录
- 通过scp命令,将hadoop安装目录远程cp到其他节点
- 启动zookeeper集群 1 zkServer.sh start 
- 格式化zookeeper
 在有namenode的节点上 执行1 hdfs zkfc -formatZK #这个指令的作用是在zookeeper集群上生成ha节点 
- 启动journalnode集群
 在三个节点上分别执行1 hadoop-daemon.sh start journalnode 
- 在hadoop1节点上格式化namenode1 hadoop namenode -format 
- 启动hadoop1节点的namenode1 hadoop-daemon.sh start namenode 
- 把hadoop3节点变为 standby namenode节点
 在hadoop3节点上执行1 
 2hdfs namenode -bootstrapStandby 
 hadoop-daemod.sh start namenode
- 在三个节点上启动datanode节点,分别执行1 hadoop-daemon.sh start datanode 
- 启动zkfc(启动FalioverControllerActive)
 在hadoop1,hadoop3节点上执行1 hadoop-daemon.sh start zkfc 
- 在hadoop1节点上启动resourcemanager1 start-yarn.sh 
- 在hadoop2节点上启动副resourcemanager1 yarn-daemon.sh start resourcemanager 
- 启动之后
 hadoop1节点的进程状态:hadoop2节点的进程状态:1 
 2
 3
 4
 5
 6
 7
 8
 915376 ResourceManager 
 20610 Jps
 20468 Kafka
 18838 QuorumPeerMain
 14999 DataNode
 14809 NameNode
 15484 NodeManager
 14590 JournalNode
 15182 DFSZKFailoverControllerhadoop3节点进程状态:1 
 2
 3
 4
 5
 623554 DataNode 
 23842 ResourceManager
 24068 NodeManager
 24965 Jps
 25690 QuorumPeerMain
 23324 JournalNode1 
 2
 3
 4
 5
 6
 7
 822340 Jps 
 16391 JournalNode
 16872 DataNode
 17081 DFSZKFailoverController
 22092 Kafka
 25932 QuorumPeerMain
 17437 NodeManager
 16719 NameNode
步骤五、kafka安装
- 解压软件 1 
 2mkdir -p /opt/kafka 
 tar -zxvf kafka_2.11-1.0.2\ .tgz -C /opt/kafka
- 修改配置文件 config/server.properties —-红色为修改内容,黑色为原先内容,蓝色为添加内容 broker.id=0 #集群内kafka的id,每一台都不能一样 host.name=192.168.126.150 log.retention.hours=168 message.max.byte=5242880 default.replication.factor=2 replica.fetch.max.bytes=5242880
- 启动(每一台集群内的服务器) - 1 
 2- cd /opt/kafka/kafka_2.11-1.0.2/bin 
 ./kafka-server-start.sh -daemon ../config/server.properties
- 测试 在生产者中输入文字,在消费者中查看,验证1 
 2
 3
 4
 5#一台服务器中创建消息队列test1,指定2个副本,1个分区 
 ./kafka-topics.sh --create --zookeeper hadoop1:2181,hadoop2:2181,hadoop3:2181 --replication-factor 2 --partitions 1 --topic test1
 ./kafka-console-consumer.sh --zookeeper mater:2181,hadoop2:2181,hadoop3:2181 --topiv test1
 #以下命令在另一台服务器中操作
 ./kafka-console-producer.sh --broker-list hadoop1:9092,hadoop2:9092,hadoop3:9092 --topic test1
步骤六、hive安装与配置
- hive不需要任何配置就可以运行,但需配置更改mysql元数据存储
- 删除hdfs中的/user/hive(之前运行过hive后,才会出现此目录) 1 hadoop fs -rmr /user/hive 
- 复制hive/conf/hive-default.xml.template为hive-site.xml,然后在配置中修改: 将mysql的链接jar包拷贝到hive到lib目录下。1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16<property> 
 <name>javax.jdo.option.ConnectionURL</name>
 <value>jdbc:mysql://192.168.126.154:3306/hive?useSSL=false</value>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionDriverName</name>
 <value>com.mysql.jdbc.Driver</value>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionUserName</name>
 <value>account</value>
 </property>
 <property>
 <name>javax.jdo.option.ConnectionPassword</name>
 <value>password</value>
 </property>
 还有,将hive-site.xml配置文件里的 value值带:的,都改成绝对路径,否则报错,hadoop不认带:的路径。(所有的 ${system:java.io.tmpdir} 改成 /tmp/)
- mysql中要手动创建hive数据库 如果出现没有权限的问题,在mysql中授权(在安装mysql的机器上执行)1 create database hive character set latin1; 1 mysql -r root -p 1 
 2
 3grant all privileges on *.* to 'root'@'%' identified by 'root' with grant option; 
 grant all privileges on *.* to 'root'@'hadoop1' identified by 'root' with grant option;
 flush privileges;
- 初始化元数据 1 schematool -initSchema -dbType mysql 
步骤七、hbase安装与配置
- 修改conf/hbase-env.sh 1 
 2export JAVA_HOME=/opt/java/jdk1.8.0_201 
 export HBASE_MANAGES_ZK=false
- 修改hbase-site.xml 1 
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18<configuration> 
 <property>
 <name>hbase.rootdir</name>
 <value>hdfs://hadoop1:9000/hbase</value>
 </property>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>hbase.cluster.distributed</name>
 <value>true</value>
 </property>
 <property>
 <name>hbase.zookeeper.quorum</name>
 <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
 </property>
 </configuration>
- 修改conf/regionservers文件,其中配置所有hbase主机,每个主机独占一行,hbase启动或关闭时会按照此配置顺序执行。
- 启动
 主节点:在一从节点启动备用hhadoop1实现高可用:1 start-hbae.sh 关闭集群:1 hbase-daemon.sh start hadoop1 stop-hbase.sh