在上篇我们了解了Spark相关的基本概念,如下:
Spark 基本架构和原理
https://www.cndba.cn/dave/article/3340
本篇我们学习一下Spark集群的安装。
1 搭建Hadoop 集群环境
Spark的运行以来与HDFS和Zookeeper,关于这2个组件的安装和配置参考如下博客:
Linux 7.6 平台 Hadoop 3.1.1 集群搭建手册
https://www.cndba.cn/download/dave/6Zookeeper 集群安装配置
https://www.cndba.cn/dave/article/3295
2 安装Spark集群
2.1 下载Spark
直接从官方下载:
http://spark.apache.org/downloads.html
http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz
2.2 解压缩文件
[dave@www.cndba.cn ~]$ pwd
/home/hadoop
[dave@www.cndba.cn ~]$ ll spark-2.4.0-bin-hadoop2.7.tgz
-rw-r--r--. 1 hadoop hadoop 227893062 Mar 10 02:05 spark-2.4.0-bin-hadoop2.7.tgz
[dave@www.cndba.cn ~]$
[dave@www.cndba.cn ~]$ tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz
[dave@www.cndba.cn ~]$ mv spark-2.4.0-bin-hadoop2.7 spark
2.3 修改环境变量
在/etc/profile文件中添加如下内容:
#Spark
export SPARK_HOME=/home/hadoop/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin
2.4 修改Spark 配置文件
从模板创建文件:
[dave@www.cndba.cn conf]$ pwd
/home/hadoop/spark/conf
[dave@www.cndba.cn conf]$ ls
docker.properties.template log4j.properties.template slaves.template spark-env.sh.template
fairscheduler.xml.template metrics.properties.template spark-defaults.conf.template
[dave@www.cndba.cn conf]$ cp spark-env.sh.template spark-env.sh
[dave@www.cndba.cn conf]$
添加如下内容:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_CONF_DIR=/home/hadoop/hadoop/etc/hadoop
export SPARK_MASTER_IP=hadoopMaster
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1
变量说明
- JAVA_HOME:Java安装目录
- SCALA_HOME:Scala安装目录
- HADOOP_HOME:hadoop安装目录
- HADOOP_CONF_DIR:hadoop集群的配置文件的目录
- SPARK_MASTER_IP:spark集群的Master节点的ip地址
- SPARK_WORKER_MEMORY:每个worker节点能够最大分配给exectors的内存大小
- SPARK_WORKER_CORES:每个worker节点所占有的CPU核数目
- SPARK_WORKER_INSTANCES:每台机器上开启的worker节点的数目
2.5 修改Slaves配置文件
修改slaves配置文件,添加Worker的主机列表
[dave@www.cndba.cn conf]$ cp slaves.template slaves
[dave@www.cndba.cn conf]$ cat slaves|grep -v "#"
hadoopMaster
Slave1
Slave2
[dave@www.cndba.cn conf]$
2.6 重命名启动脚本
把SPARK_HOME/sbin下的start-all.sh和stop-all.sh这两个文件重命名,比如分别把这两个文件重命名为start-spark-all.sh和stop-spark-all.sh。
如果集群中也配置HADOOP_HOME,那么在HADOOP_HOME/sbin目录下也有start-all.sh和stop-all.sh这两个文件,当你执行这两个文件,系统不知道是操作hadoop集群还是spark集群。修改后就不会冲突。
[dave@www.cndba.cn sbin]$ pwd
/home/hadoop/spark/sbin
[dave@www.cndba.cn sbin]$ ls
slaves.sh start-all.sh start-mesos-shuffle-service.sh start-thriftserver.sh stop-mesos-dispatcher.sh stop-slaves.sh
spark-config.sh start-history-server.sh start-shuffle-service.sh stop-all.sh stop-mesos-shuffle-service.sh stop-thriftserver.sh
spark-daemon.sh start-master.sh start-slave.sh stop-history-server.sh stop-shuffle-service.sh
spark-daemons.sh start-mesos-dispatcher.sh start-slaves.sh stop-master.sh stop-slave.sh
[dave@www.cndba.cn sbin]$ mv start-all.sh start-spark-all.sh
[dave@www.cndba.cn sbin]$ mv stop-all.sh stop-spark-all.sh
[dave@www.cndba.cn sbin]$
2.7 分发到其他节点
[dave@www.cndba.cn ~]$ scp -r spark Slave1:`pwd`
[dave@www.cndba.cn ~]$ scp -r spark Slave2:`pwd`
2.8 在主节点启动Spark
[dave@www.cndba.cn ~]$ start-spark-all.sh
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoopMaster.out
Slave1: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-Slave1.out
Slave2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-Slave2.out
hadoopMaster: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoopMaster.out
[dave@www.cndba.cn ~]$
[dave@www.cndba.cn ~]$ jps
13970 QuorumPeerMain
23522 NameNode
23988 ResourceManager
30709 Worker
23757 SecondaryNameNode
30622 Master
30766 Jps
[dave@www.cndba.cn ~]$
[root@Slave1 ~]# jps
25430 Jps
13800 QuorumPeerMain
18232 DataNode
25369 Worker
19133 NodeManager
[root@Slave1 ~]#
2.9 验证
[dave@www.cndba.cn ~]$ spark-shell
2019-03-10 02:39:29 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoopMaster:4040
Spark context available as 'sc' (master = local[*], app id = local-1552156778215).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_/ // _ // _ `/ __/ '_/
/___/ .__//_,_/_/ /_//_/ version 2.4.0
/_/
Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.
scala> :help
All commands can be abbreviated, e.g., :he instead of :help.
:edit <id>|<line> edit history
:help [command] print this summary or command-specific help
:history [num] show the history (optional num is commands to show)
:h? <string> search the history
:imports [name name ...] show import history, identifying sources of names
:implicits [-v] show the implicits in scope
:javap <path|class> disassemble a file or class name
:line <id>|<line> place line(s) at the end of history
:load <path> interpret lines in a file
:paste [-raw] [path] enter paste mode or paste a file
:power enable power user mode
:quit exit the interpreter
:replay [options] reset the repl and replay all previous commands
:require <path> add a jar to the classpath
:reset [options] reset the repl to its initial state, forgetting all session entries
:save <path> save replayable session to a file
:sh <command line> run a shell command (result is implicitly => List[String])
:settings <options> update compiler options, if possible; see reset
:silent disable/enable automatic printing of results
:type [-v] <expr> display the type of an expression without evaluating it
:kind [-v] <expr> display the kind of expression's type
:warnings show the suppressed warnings from the most recent line which had any
scala>
版权声明:本文为博主原创文章,未经博主允许不得转载。
- 上一篇:Spark 基本架构和原理
- 下一篇:Pig 概念 与 架构