仁智实验室

仁者见仁、智者见智

0%

一、安装Docker-Desktop

1、下载与安装

下载地址:https://hub.docker.com/editions/community/docker-ce-desktop-mac/

安装完成后,启动Docker,点击鲸鱼图标可显示docker的相关操作。

2、配置镜像加速

在Docker的Preferences中配置加速器。

阿里云加速器地址:https://754jn7no.mirror.aliyuncs.com

二、开启Docker内置Kubernetes

1、拉取kubernetes系统镜像

国内网络不能下载 Kubernetes 集群所需要的镜像。 而GitHub Actions实现 k8s.gcr.iokubernetes 依赖镜像自动同步到 Docker Hub上指定的仓库中。 通过k8s-docker-desktop-for-mac将所需镜像从 Docker Hub 的同步仓库中取回,并重新打上原始的tag.

1
2
3
git clone https://github.com/maguowei/k8s-docker-desktop-for-mac.git
cd k8s-docker-desktop-for-mac
./load_images.sh
2、开启kubernetes

在Docker的Preferences中开启enable kubernetes选项。

三、安装Kubernetes提效工具

1、命令行小工具:

下载地址:https://github.com/ahmetb/kubectx

1
2
mv kubectx_v0.9.3_darwin_x86_64/kubectx /usr/local/bin
mv kubens_v0.9.3_darwin_x86_64/kubens /usr/local/bin
2、Kubernetes集群管理工具Kuboard的安装

1)安装

1
2
kubectl apply -f https://kuboard.cn/install-script/kuboard.yaml
kubectl apply -f https://addons.kuboard.cn/metrics-server/0.3.7/metrics-server.yaml

2)查看Kuboard运行状态

1
2
3
kubectl get pods -l k8s.kuboard.cn/name=kuboard -n kube-system
# NAME READY STATUS RESTARTS AGE
# kuboard-54c9c4f6cb-6lf88 1/1 Running 0 45s

3)配置访问

traefik-kuboard.yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
kind: IngressRoute
metadata:
name: ingresskuboard
namespace: kube-system
spec:
entryPoints:
- web
routes:
- match: Host(`lka.imind.tech`)
kind: Rule
services:
- name: kuboard
port: 80

创建kuboard的traefik路由

1
kubectl apply -f traefik-kuboard.yaml

3)获取Token

1
echo $(kubectl -n kube-system get secret $(kubectl -n kube-system get secret | grep kuboard-user | awk '{print $1}') -o go-template='{{.data.token}}' | base64 -d)
  1. 添加本地hosts文件
1
echo '127.0.0.1	lka.imind.tech' > /etc/hosts

5)使用输出信息中 token字段访问Kuboard

访问kuboard

http://lka.imind.tech

本文详细介绍搭建4个节点的完全分布式Hadoop集群的方法,Linux系统版本是CentOS 7.9,Hadoop版本是3.2.1,JDK版本是1.8。

一、准备环境

1、编辑各节点配置文件,添加主节点和从节点的映射关系。
1
2
3
4
vi /etc/hosts
172.16.50.2 hadoop01
172.16.50.3 hadoop02
172.16.50.4 hadoop03
  1. 关闭防火墙(每个节点)
1
2
3
4
5
systemctl stop firewalld #关闭服务

systemctl disable firewalld #关闭开机自启动
Removed symlink /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
  1. 配置免密码登录

有关【配置免密码登录方法】,请参考这里。
4. ##### 配置Java环境(每个节点)

有关【配置java环境方法】,请参考这里。

二、搭建Hadoop完全分布式集群

在各个节点上安装与配置Hadoop的过程都基本相同,因此可以在每个节点上安装好Hadoop后,在主节点hadoop01上进行统一配置,然后通过scp 命令将修改的配置文件拷贝到各个从节点上即可。

1、下载Hadoop安装包,解压,配置Hadoop环境变量

本文下载的Hadoop版本是3.2.1,解压到指定目录(比如:/opt/modules),解压到指定目录,配置Hadoop环境变量,并使其生效。实现命令如下:

1
2
3
4
5
6
7
8
9
10
11
tar -zxvf hadoop-3.2.1.tar.gz -C /opt/modules #解压到/opt/modules目录

vi /etc/profile.d/env.sh #配置Hadoop环境变量

#Hadoop
export HADOOP_HOME=/opt/modules/hadoop-3.2.1 # 该目录为解压安装目录
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

source /etc/profile.d/env.sh
2、配置Hadoop环境脚本文件中的JAVA_HOME参数
1
2
3
4
5
6
7
8
9
10
cd /opt/modules/hadoop-3.2.1/etc/hadoop #进入Hadoop安装目录下的etc/hadoop目录

#分别在hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中添加或修改如下参数:
vi hadoop-env.sh
export JAVA_HOME=/usr/local/java/jdk1.8.0_201 # 路径为jdk安装路径
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
3、配置core-site.xml
1
vi /opt/modules/hadoop-3.2.1/etc/hadoop/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-3.2.1/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
4、配置hdfs-site.xml
1
vi /opt/modules/hadoop-3.2.1/etc/hadoop/hdfs-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
<configuration>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop01:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop02:50090</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/modules/hadoop-3.2.1/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/modules/hadoop-3.2.1/dfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///opt/modules/hadoop-3.2.1/dfs/namesecondary</value>
<description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
<description>配置为false后,可以允许不要检查权限就生成dfs上的文件,方便倒是方便了,但是你需要防止误删除.</description>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>21</value>
</property>
</configuration>
5、配置mapred-site.xml
1
vi /opt/modules/hadoop-3.2.1/etc/hadoop/mapred-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value> #设置MapReduce的运行平台为yarn
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop01:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop01:19888</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
${HADOOP_HOME}/etc/hadoop,
${HADOOP_HOME}/share/hadoop/common/*,
${HADOOP_HOME}/share/hadoop/common/lib/*,
${HADOOP_HOME}/share/hadoop/hdfs/*,
${HADOOP_HOME}/share/hadoop/hdfs/lib/*,
${HADOOP_HOME}/share/hadoop/mapreduce/*,
${HADOOP_HOME}/share/hadoop/mapreduce/lib/*,
${HADOOP_HOME}/share/hadoop/yarn/*,
${HADOOP_HOME}/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
6、配置yarn-site.xml
1
vi /opt/modules/hadoop-3.2.1/etc/hadoop/yarn-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name> #指定yarn的ResourceManager管理界面的地址,不配的话,Active Node始终为0
<value>hadoop01</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name> #reducer获取数据的方式
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop01:8088</value>
<description>配置外网只需要替换外网ip为真实ip,否则默认为 localhost:8088</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
<description>每个节点可用内存,单位MB,默认8182MB</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
<discription>忽略虚拟内存的检查,如果你是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。</discription>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://hadoop01:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
</configuration>
7、配置启动脚本,添加HDFS和Yarn权限
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 添加HDFS权限:编辑如下脚本,在第二行空白位置添加HDFS权限
vi sbin/start-dfs.sh
vi sbin/stop-dfs.sh

HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

# 添加Yarn权限:编辑如下脚本,在第二行空白位置添加Yarn权限
vi sbin/start-yarn.sh
vi sbin/stop-yarn.sh

YARN_RESOURCEMANAGER_USER=root
HDFS_DATANODE_SECURE_USER=yarn
YARN_NODEMANAGER_USER=root
8、初始化 & 启动
1
2
3
4
5
6
7
8
9
10
11
#格式化
bin/hdfs namenode -format

#启动(两种方式均可启动)
方法一:
sbin/start-all.sh

方法二:
sbin/start-dfs.sh
sbin/start-yarn.sh

9、添加从节点
1
2
3
4
5
vi /opt/modules/hadoop-3.2.1/etc/hadoop/workers

hadoop01
hadoop02
hadoop03
  1. Web端口访问
1
2
3
4
5
6
# 查看防火墙状态
firewall-cmd --statex
# 临时关闭
systemctl stop firewalld
# 禁止开机启动
systemctl disable firewalld

在浏览器输入:http://hadoop01:8088打开ResourceManager页面。

在浏览器输入:http://hadoop01:50070打开Hadoop Namenode页面。

一、Hive 安装

1
2
3
4
5
6
7
8
9
10
11
12
tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/modules/
cd /opt/modules/
mv apache-hive-3.1.2-bin hive-3.1.2

vi /etc/profile.d/env.sh
#Hive
export HIVE_HOME=/opt/modules/hive-3.1.2
export PATH=$PATH:$HIVE_HOME/bin

source /etc/profile.d/env.sh

cp mysql-connector-java-5.1.48.jar /opt/modules/hive-3.1.2/lib/

配置Metastore到Mysql

1
2
3
cd /opt/modules/hive-3.1.2/conf/
cp hive-default.xml.template hive-site.xml
vi hive-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop03:3306/metastore?createDatabaseIfNotExist=true&amp;useSSL=false&amp;useUnicode=true&amp;characterEncoding=utf8mb4</value>
<description>
JDBC connect string for a JDBC metastore.
To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hx123456</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>hive default warehouse, if nessecory, change it</description>
</property>
<property>
<name>metastore.schema.verification</name>
<value>false</value>
<description>
Enforce metastore schema version consistency.
True: Verify that version information stored in is compatible with one from Hive jars. Also disable automatic
schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
proper metastore schema migration. (Default)
False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.
</description>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.</description>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hadoop01</value>
<description>Bind host on which to run the HiveServer2 Thrift service.</description>
</property>
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>
<description>
Should metastore do authorization against database notification related APIs such as get_next_notification.
If set to true, then only the superusers in proxy settings have the permission
</description>
</property>
<property>
<name>metastore.local</name>
<value>false</value>
</property>
<property>
<name>metastore.thrift.uris</name>
<value>thrift://hadoop01:9083</value>
</property>
<property>
<name>metastore.thrift.port</name>
<value>9083</value>
</property>
<property>
<name>hive.cli.print.header</name>
<value>true</value>
<description>Whether to print the names of the columns in query output.</description>
</property>
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
<description>Whether to include the current database in the Hive prompt.</description>
</property>
</configuration>


1、下载 JDK

进入 Oracle 官方网站 下载合适的 JDK 版本,准备安装。
注意:这里需要下载 Linux 版本。这里以jdk-8u201-linux-x64.tar.gz为例,你下载的文件可能不是这个版本,这没关系,只要后缀(.tar.gz)一致即可。

2、创建目录

​ 在/usr/local目录下创建java目录

1
mkdir /usr/local/java
3、解压 JDK

把下载的文件 jdk-8u201-linux-x64.tar.gz 放在/usr/local/java/目录下。

1
tar -zxvf jdk-8u201-linux-x64.tar.gz -C /usr/local/java/
4、设置环境变量
1
vi /etc/profile.d/env.sh

添加如下内容并保存:

1
2
3
4
5
#Set Java Environment
export JAVA_HOME=/usr/local/java/jdk1.8.0_201
export JRE_HOME=/usr/local/java/jdk1.8.0_201/jre
export CLASS_PATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

注意:其中 JAVA_HOME, JRE_HOME 请根据自己的实际安装路径及 JDK 版本配置。

让修改生效:

1
source /etc/profile.d/env.sh
5、测试
1
java -version

显示 java 版本信息,则说明 JDK 安装成功:

java version “1.8.0_201”

Java(TM) SE Runtime Environment (build 1.8.0_201-b09)

Java HotSpot(TM) 64-Bit Server VM (build 25.201-b09, mixed mode)

1、安装必要的一些系统工具
1
2
yum install -y yum-utils device-mapper-persistent-data \
lvm2 bash-completion
2、添加软件源信息
1
2
3
yum-config-manager --add-repo \
http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
yum install docker -y
3、设置docker开机启动
1
2
systemctl start docker
systemctl enable docker
4、修改docker-ce配置
1
2
3
vi /etc/docker/daemon.json

{"registry-mirrors": ["https://7j4ic6a5.mirror.aliyuncs.com"]}
5、启动容器
1
docker run -d --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=hx123456 -v /opt/modules/mysql/data:/var/lib/mysql mysql/mysql-server:5.7

Hadoop集群包含1个主节点和2个从节点,需要实现各节点之间的免密码登录,下面介绍具体的实现方法。

一、Hadoop集群环境

节点名称 节点IP

hadoop01 172.16.50.2
hadoop02 172.16.50.3
hadoop03 172.16.50.4

二、免密登录原理

每台主机authorized_keys文件里面包含的主机(ssh密钥),该主机都能无密码登录,所以只要每台主机的authorized_keys文件里面都放入其他主机(需要无密码登录的主机)的ssh密钥就行了。

三、实现方法

1、配置每个节点的hosts文件
1
2
3
4
5
vim /etc/hosts
172.16.50.1 master
172.16.50.2 node01
172.16.50.3 node02
172.16.50.4 node03
2、每个节点生成ssh密钥
1
2
3
4
5
6
7
8
9
ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory '/root/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
.....................

​ 执行命令后会在~目录下生成.ssh文件夹,里面包含id_rsa和id_rsa.pub两个文件。

注:使用ssh-keygen -t rsa -P ‘’ -f ~/.ssh/id_rsa命令可避免上述交互式操作。

把本机的公钥追到其他节点root的 .ssh/authorized_keys 里
ssh-copy-id root@hadoop01,输入密码确认
ssh-copy-id root@hadoop02,输入密码确认
ssh-copy-id root@hadoop03,输入密码确认

就可以免密登录了

其他机器同样执行以上步骤