签到成功

知道了

CNDBA社区CNDBA社区

Spark 2.4 集群安装手册

2019-03-10 02:43 2808 0 原创 Spark
作者: dave

在上篇我们了解了Spark相关的基本概念,如下:
Spark 基本架构和原理
https://www.cndba.cn/dave/article/3340

本篇我们学习一下Spark集群的安装。http://www.cndba.cn/cndba/dave/article/3341http://www.cndba.cn/cndba/dave/article/3341

1 搭建Hadoop 集群环境

Spark的运行以来与HDFS和Zookeeper,关于这2个组件的安装和配置参考如下博客:

Linux 7.6 平台 Hadoop 3.1.1 集群搭建手册
https://www.cndba.cn/download/dave/6http://www.cndba.cn/cndba/dave/article/3341

Zookeeper 集群安装配置
https://www.cndba.cn/dave/article/3295

2 安装Spark集群

2.1 下载Spark

直接从官方下载:

http://spark.apache.org/downloads.html
http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgzhttp://www.cndba.cn/cndba/dave/article/3341

2.2 解压缩文件

[dave@www.cndba.cn ~]$ pwd
/home/hadoop
[dave@www.cndba.cn ~]$ ll spark-2.4.0-bin-hadoop2.7.tgz 
-rw-r--r--. 1 hadoop hadoop 227893062 Mar 10 02:05 spark-2.4.0-bin-hadoop2.7.tgz
[dave@www.cndba.cn ~]$
[dave@www.cndba.cn ~]$ tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz
[dave@www.cndba.cn ~]$ mv spark-2.4.0-bin-hadoop2.7 spark

2.3 修改环境变量

在/etc/profile文件中添加如下内容:

http://www.cndba.cn/cndba/dave/article/3341
http://www.cndba.cn/cndba/dave/article/3341
http://www.cndba.cn/cndba/dave/article/3341
http://www.cndba.cn/cndba/dave/article/3341

#Spark
export SPARK_HOME=/home/hadoop/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin

2.4 修改Spark 配置文件

从模板创建文件:

[dave@www.cndba.cn conf]$ pwd
/home/hadoop/spark/conf
[dave@www.cndba.cn conf]$ ls
docker.properties.template  log4j.properties.template    slaves.template               spark-env.sh.template
fairscheduler.xml.template  metrics.properties.template  spark-defaults.conf.template
[dave@www.cndba.cn conf]$ cp spark-env.sh.template spark-env.sh
[dave@www.cndba.cn conf]$

添加如下内容:

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-7.b13.el7.x86_64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_CONF_DIR=/home/hadoop/hadoop/etc/hadoop
export SPARK_MASTER_IP=hadoopMaster
export SPARK_WORKER_MEMORY=4g
export SPARK_WORKER_CORES=2
export SPARK_WORKER_INSTANCES=1

变量说明

http://www.cndba.cn/cndba/dave/article/3341

  • JAVA_HOME:Java安装目录
  • SCALA_HOME:Scala安装目录
  • HADOOP_HOME:hadoop安装目录
  • HADOOP_CONF_DIR:hadoop集群的配置文件的目录
  • SPARK_MASTER_IP:spark集群的Master节点的ip地址
  • SPARK_WORKER_MEMORY:每个worker节点能够最大分配给exectors的内存大小
  • SPARK_WORKER_CORES:每个worker节点所占有的CPU核数目
  • SPARK_WORKER_INSTANCES:每台机器上开启的worker节点的数目

2.5 修改Slaves配置文件

修改slaves配置文件,添加Worker的主机列表

[dave@www.cndba.cn conf]$ cp slaves.template slaves
[dave@www.cndba.cn conf]$ cat slaves|grep -v "#"

hadoopMaster
Slave1
Slave2
[dave@www.cndba.cn conf]$

2.6 重命名启动脚本

把SPARK_HOME/sbin下的start-all.sh和stop-all.sh这两个文件重命名,比如分别把这两个文件重命名为start-spark-all.sh和stop-spark-all.sh。
如果集群中也配置HADOOP_HOME,那么在HADOOP_HOME/sbin目录下也有start-all.sh和stop-all.sh这两个文件,当你执行这两个文件,系统不知道是操作hadoop集群还是spark集群。修改后就不会冲突。

[dave@www.cndba.cn sbin]$ pwd
/home/hadoop/spark/sbin
[dave@www.cndba.cn sbin]$ ls
slaves.sh         start-all.sh               start-mesos-shuffle-service.sh  start-thriftserver.sh   stop-mesos-dispatcher.sh       stop-slaves.sh
spark-config.sh   start-history-server.sh    start-shuffle-service.sh        stop-all.sh             stop-mesos-shuffle-service.sh  stop-thriftserver.sh
spark-daemon.sh   start-master.sh            start-slave.sh                  stop-history-server.sh  stop-shuffle-service.sh
spark-daemons.sh  start-mesos-dispatcher.sh  start-slaves.sh                 stop-master.sh          stop-slave.sh
[dave@www.cndba.cn sbin]$ mv start-all.sh start-spark-all.sh
[dave@www.cndba.cn sbin]$ mv stop-all.sh stop-spark-all.sh
[dave@www.cndba.cn sbin]$

2.7 分发到其他节点

[dave@www.cndba.cn ~]$ scp -r spark Slave1:`pwd`
[dave@www.cndba.cn ~]$ scp -r spark Slave2:`pwd`

2.8 在主节点启动Spark

[dave@www.cndba.cn ~]$ start-spark-all.sh 
starting org.apache.spark.deploy.master.Master, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-hadoopMaster.out
Slave1: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-Slave1.out
Slave2: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-Slave2.out
hadoopMaster: starting org.apache.spark.deploy.worker.Worker, logging to /home/hadoop/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-hadoopMaster.out
[dave@www.cndba.cn ~]$

[dave@www.cndba.cn ~]$ jps
13970 QuorumPeerMain
23522 NameNode
23988 ResourceManager
30709 Worker
23757 SecondaryNameNode
30622 Master
30766 Jps
[dave@www.cndba.cn ~]$

[root@Slave1 ~]# jps
25430 Jps
13800 QuorumPeerMain
18232 DataNode
25369 Worker
19133 NodeManager
[root@Slave1 ~]#

2.9 验证

http://www.cndba.cn/cndba/dave/article/3341

[dave@www.cndba.cn ~]$ spark-shell 
2019-03-10 02:39:29 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://hadoopMaster:4040
Spark context available as 'sc' (master = local[*], app id = local-1552156778215).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _/ // _ // _ `/ __/  '_/
   /___/ .__//_,_/_/ /_//_/   version 2.4.0
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

scala> :help
All commands can be abbreviated, e.g., :he instead of :help.
:edit <id>|<line>        edit history
:help [command]          print this summary or command-specific help
:history [num]           show the history (optional num is commands to show)
:h? <string>             search the history
:imports [name name ...] show import history, identifying sources of names
:implicits [-v]          show the implicits in scope
:javap <path|class>      disassemble a file or class name
:line <id>|<line>        place line(s) at the end of history
:load <path>             interpret lines in a file
:paste [-raw] [path]     enter paste mode or paste a file
:power                   enable power user mode
:quit                    exit the interpreter
:replay [options]        reset the repl and replay all previous commands
:require <path>          add a jar to the classpath
:reset [options]         reset the repl to its initial state, forgetting all session entries
:save <path>             save replayable session to a file
:sh <command line>       run a shell command (result is implicitly => List[String])
:settings <options>      update compiler options, if possible; see reset
:silent                  disable/enable automatic printing of results
:type [-v] <expr>        display the type of an expression without evaluating it
:kind [-v] <expr>        display the kind of expression's type
:warnings                show the suppressed warnings from the most recent line which had any

scala>

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
dave

dave

关注

人的一生应该是这样度过的:当他回首往事的时候,他不会因为虚度年华而悔恨,也不会因为碌碌无为而羞耻;这样,在临死的时候,他就能够说:“我的整个生命和全部精力,都已经献给世界上最壮丽的事业....."

  • 2262
    原创
  • 3
    翻译
  • 578
    转载
  • 192
    评论
  • 访问:8067306次
  • 积分:4349
  • 等级:核心会员
  • 排名:第1名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ