签到成功

知道了

CNDBA社区CNDBA社区

TiSpark 混合部署以及使用

2019-02-27 17:30 6036 0 原创 TiDB
作者: Marvinn

TiSpark 混合部署

其他部署方案参考: https://pingcap.com/docs-cn/tispark/tispark-user-guide/

因为tidb-ansible 部署安装TiDB集群 inventoy.ini并未指定spark配置,所以TiDB集群环境并未安装Spark集群,当前采用

[spark_master]

[spark_slaves]

1、已有 Spark 集群的部署方式

如果在已有 Spark 集群上运行 TiSpark,无需重启集群。可以使用 Spark 的 --jars 参数将 TiSpark 作为依赖引入:

spark-shell --jars $TISPARK_FOLDER/tispark-core-${version}-SNAPSHOT-jar-with-dependencies.jar

2、没有 Spark 集群的部署方式

如果没有使用中的 Spark 集群,推荐使用 Saprk Standalone 方式部署。这里简单介绍下 Standalone 部署方式。如果遇到问题,可以去官网寻求帮助;也欢迎在我们的 GitHub 上提 issue

2.1、下载安装包并安装

你可以在这里下载 Apache Spark https://www.apache.org/dyn/closer.lua/spark/spark-2.3.3/spark-2.3.3-bin-hadoop2.7.tgz。 注意spark版本说明

对于 Standalone 模式且无需 Hadoop 支持,则选择 Spark 2.3.x 且带有 Hadoop 依赖的 Pre-build with Apache Hadoop 2.x 任意版本。如有需要配合使用的 Hadoop 集群,则选择对应的 Hadoop 版本号。你也可以选择从源代码自行构建以配合官方 Hadoop 2.x 之前的版本。

http://www.cndba.cn/Marvinn/article/3283

tidb用户
$ mkdir -p ${deploy_dir}/spark && cd ${deploy_dir}/spark
$ export $SPARKPATH=${deploy_dir}/spark/spark-2.3.3-bin-hadoop2.7
$ tar -xvf spark-2.3.3-bin-hadoop2.7.tgz

如果你已经有了 Spark 二进制文件,并且当前 PATH 为 SPARKPATH,需将 TiSpark jar 包拷贝到 `${SPARKPATH}/jars` 目录下。
$ cd ${tidb-ansible}/resource/bin
$ cp tispark-SNAPSHOT-jar-with-dependencies.jar ${deploy_dir}/spark/spark-2.3.3-bin-hadoop2.7/jars/

3、启动 Master

在选中的 Spark Master 节点执行如下命令:

$ cd $SPARKPATH
$ ./sbin/start-master.sh

在这步完成以后,屏幕上会打印出一个 log 文件。检查 log 文件确认 Spark-Master 是否启动成功。你可以打开 http://spark-master-hostname:8080 查看集群信息(如果你没有改动 Spark-Master 默认 Port Numebr)。在启动 Spark-Slave 的时候,也可以通过这个面板来确认 Slave 是否已经加入集群。http://www.cndba.cn/Marvinn/article/3283

http://www.cndba.cn/Marvinn/article/3283
http://www.cndba.cn/Marvinn/article/3283

4、启动 Slave

类似地,可以用如下命令启动 Spark-Slave 节点:指定spark-master-hostname 主机IP

$ cd $SPARKPATH
$ ./sbin/start-slave.sh spark://spark-master-hostname:7077

命令返回以后,即可通过刚才的面板查看这个 Slave 是否已经正确地加入了 Spark 集群。在所有 Slave 节点重复刚才的命令。确认所有的 Slave 都可以正确连接 Master,这样你就拥有了一个 Standalone 模式的 Spark 集群。

5、样例查询http://www.cndba.cn/Marvinn/article/3283

5.1、部署信息

Spark 部署在 TiDB 实例部署目录下 spark 目录中

TiSpark jar 包默认部署在 Spark 部署目录 jars 文件夹下:
spark/spark-2.3.3-bin-hadoop2.7/jars/tispark-SNAPSHOT-jar-with-dependencies.jar

TiSpark sample data 及导入脚本默认部署在 TiDB-Ansible 目录下:
tidb-ansible/resources/bin/tispark-sample-data

5.2、环境准备http://www.cndba.cn/Marvinn/article/3283

在 TiDB 实例上安装 JDKhttp://www.cndba.cn/Marvinn/article/3283

Oracle JDK 官方下载页面 下载 JDK 1.8 当前最新版,本示例中下载的版本为 jdk-8u141-linux-x64.tar.gzhttps://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html

解压并根据您的 JDK 部署目录设置环境变量, 编辑 ~/.bashrc 文件,比如:http://www.cndba.cn/Marvinn/article/3283

export JAVA_HOME=/home/pingcap/jdk1.8.0_144
export PATH=$JAVA_HOME/bin:$PATH

验证 JDK 有效性:

$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

6、导入样例数据

http://www.cndba.cn/Marvinn/article/3283

假设 TiDB 集群已启动,其中一台 TiDB 实例服务 IP 为 192.168.0.2,端口为 4000,用户名为 root, 密码为空。

cd tidb-ansible/resources/bin/tispark-sample-data

修改 sample_data.sh 中 TiDB 登录信息,比如:

mysql -h 192.168.0.2 -P 4000 -u root < dss.ddl

执行脚本

./sample_data.sh

执行脚本的机器上需要安装 MySQL client,CentOS 用户可通过 yum -y install mysql来安装。

登录 TiDB 并验证数据包含 TPCH_001 库及以下表:

$ mysql -uroot -P4000 -h192.168.0.2
MySQL [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| INFORMATION_SCHEMA |
| PERFORMANCE_SCHEMA |
| TPCH_001           |
| mysql              |
| test               |
+--------------------+
5 rows in set (0.00 sec)

MySQL [(none)]> use TPCH_001
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MySQL [TPCH_001]> show tables;
+--------------------+
| Tables_in_TPCH_001 |
+--------------------+
| CUSTOMER           |
| LINEITEM           |
| NATION             |
| ORDERS             |
| PART               |
| PARTSUPP           |
| REGION             |
| SUPPLIER           |
+--------------------+
8 rows in set (0.00 sec)

7、样例使用

假设你已经按照上述步骤成功启动了 TiSpark 集群,下面简单介绍如何使用 Spark SQL 来做 OLAP 分析。这里我们用名为 tpch 数据库中的 lineitem 表作为范例。http://www.cndba.cn/Marvinn/article/3283

假设你的 PD 节点位于 172.16.30.87,端口为 2479,在$SPARKPATH/conf/spark-defaults.conf加入

$ cd  ${deploy_dir}/spark/spark-2.3.3-bin-hadoop2.7/conf
$ cp spark-defaults.conf.template spark-defaults.conf
$ vi spark-defaults.conf

spark.tispark.pd.addresses 172.16.30.87:2479,172.16.30.88:2479,172.16.30.89:2479
spark.sql.extensions org.apache.spark.sql.TiExtensions

编辑spark环境变量

$ cd  ${deploy_dir}/spark/spark-2.3.3-bin-hadoop2.7/conf
$ cp spark-env.sh.template spark-env.sh
$ mv spark-env.sh.template spark-env.sh.template.bak

$ vi spark-env.sh               加入如下几行:
export SPARK_MASTER_IP=172.16.30.86
export SPARK_LOCAL_IP=172.16.30.86
export JAVA_HOME=/usr/java/jre1.8.0_181-amd64

进入 spark 部署目录启动 spark-shell:

$ cd ${deploy_dir}/spark/spark-2.3.3-bin-hadoop2.7/
[tidb@ip-172-16-30-86 spark-2.3.3-bin-hadoop2.7]$ bin/spark-shell 
2019-02-27 17:00:56 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://172.16.30.86:4040
Spark context available as 'sc' (master = spark://172.16.30.86:7077, app id = app-20190227170101-0006).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _/ // _ // _ `/ __/  '_/
   /___/ .__//_,_/_/ /_//_/   version 2.3.3
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.

然后像使用原生 Spark 一样查询 TiDB 表:

scala> spark.sql("show databases").show
+------------+
|databaseName|
+------------+
|     default|
|      marvin|
|       dminc|
|     dmmysql|
|        test|
|    tpch_001|
|        tidb|
|     dm_meta|
|      dmtidb|
|   bikeshare|
|       mysql|
|        bike|
+------------+


scala> spark.sql("use tpch_001").show()
2019-02-27 17:04:59 WARN  ObjectStore:568 - Failed to get database tpch_001, returning NoSuchObjectException
++
||
++
++


scala> spark.sql("select count(*) from lineitem").show
2019-02-27 17:05:02 WARN  ObjectStore:568 - Failed to get database tpch_001, returning NoSuchObjectException
2019-02-27 17:05:02 WARN  ObjectStore:568 - Failed to get database tpch_001, returning NoSuchObjectException
+--------+
|count(1)|
+--------+
|   60175|
+--------+

复杂查询

scala> spark.sql("""select
     |            l_returnflag,
     |            l_linestatus,
     |            sum(l_quantity) as sum_qty,
     |            sum(l_extendedprice) as sum_base_price,
     |            sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
     |            sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
     |            avg(l_quantity) as avg_qty,
     |            avg(l_extendedprice) as avg_price,
     |            avg(l_discount) as avg_disc,
     |            count(*) as count_order
     |         from
     |            lineitem
     |         where
     |            l_shipdate <= date '1998-12-01' - interval '90' day
     |         group by
     |            l_returnflag,
     |            l_linestatus
     |         order by
     |            l_returnflag,
     |            l_linestatus""".stripMargin).show
2019-02-27 17:10:50 WARN  ObjectStore:568 - Failed to get database tpch_001, returning NoSuchObjectException
2019-02-27 17:10:50 WARN  ObjectStore:568 - Failed to get database tpch_001, returning NoSuchObjectException
+------------+------------+---------+--------------+--------------+-----------------+---------+------------+--------+-----------+
|l_returnflag|l_linestatus|  sum_qty|sum_base_price|sum_disc_price|       sum_charge|  avg_qty|   avg_price|avg_disc|count_order|
+------------+------------+---------+--------------+--------------+-----------------+---------+------------+--------+-----------+
|           A|           F|380456.00|  532348211.65|505822441.4861| 526165934.000839|25.575155|35785.709307|0.050081|      14876|
|           N|           F|  8971.00|   12384801.37| 11798257.2080|  12282485.056933|25.778736|35588.509684|0.047759|        348|
|           N|           O|742802.00| 1041502841.45|989737518.6346|1029418531.523350|25.454988|35691.129209|0.049931|      29181|
|           R|           F|381449.00|  534594445.35|507996454.4067| 528524219.358903|25.597168|35874.006533|0.049828|      14902|
+------------+------------+---------+--------------+--------------+-----------------+---------+------------+--------+-----------+

TiDB复杂查询

mysql> select
    ->            l_returnflag,
    ->            l_linestatus,
    ->            sum(l_quantity) as sum_qty,
    ->            sum(l_extendedprice) as sum_base_price,
    ->            sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    ->            sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
    ->            avg(l_quantity) as avg_qty,
    ->            avg(l_extendedprice) as avg_price,
    ->            avg(l_discount) as avg_disc,
    ->            count(*) as count_order
    ->         from
    ->            lineitem
    ->         where
    ->            l_shipdate <= date '1998-12-01' - interval '90' day
    ->         group by
    ->            l_returnflag,
    ->            l_linestatus
    ->         order by
    ->            l_returnflag,
    ->            l_linestatus
    -> ;
+--------------+--------------+-----------+----------------+----------------+-------------------+-----------+--------------+----------+-------------+
| l_returnflag | l_linestatus | sum_qty   | sum_base_price | sum_disc_price | sum_charge        | avg_qty   | avg_price    | avg_disc | count_order |
+--------------+--------------+-----------+----------------+----------------+-------------------+-----------+--------------+----------+-------------+
| A            | F            | 380456.00 |   532348211.65 | 505822441.4861 |  526165934.000839 | 25.575155 | 35785.709307 | 0.050081 |       14876 |
| N            | F            |   8971.00 |    12384801.37 |  11798257.2080 |   12282485.056933 | 25.778736 | 35588.509684 | 0.047759 |         348 |
| N            | O            | 742802.00 |  1041502841.45 | 989737518.6346 | 1029418531.523350 | 25.454988 | 35691.129209 | 0.049931 |       29181 |
| R            | F            | 381449.00 |   534594445.35 | 507996454.4067 |  528524219.358903 | 25.597168 | 35874.006533 | 0.049828 |       14902 |
+--------------+--------------+-----------+----------------+----------------+-------------------+-----------+--------------+----------+-------------+
4 rows in set (0.28 sec)

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
Marvinn

Marvinn

关注

路漫漫其修远兮、吾将上下而求索

  • 99
    原创
  • 0
    翻译
  • 2
    转载
  • 36
    评论
  • 访问:458381次
  • 积分:449
  • 等级:中级会员
  • 排名:第12名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ