签到成功

知道了

CNDBA社区CNDBA社区

Sqoop 将数据从HDFS导入到MySQL

2019-03-03 02:27 2280 0 原创 Sqoop
作者: dave

在前面的2篇博客中我们看了使用sqoop 将数据从MySQL 导入的HDFS文件,我们这里看下如何将数据从HDFS文件导入到MySQL 数据库。http://www.cndba.cn/dave/article/3307

Sqoop MySQL 导入数据到 HDFS
https://www.cndba.cn/dave/article/3305
Sqoop 导出库中所有表到 HDFS
https://www.cndba.cn/dave/article/3306http://www.cndba.cn/dave/article/3307http://www.cndba.cn/dave/article/3307

1 查看HDFS 文件的数据

为方便操作,我们这里直接使用之前测试的数据:http://www.cndba.cn/dave/article/3307http://www.cndba.cn/dave/article/3307

[dave@www.cndba.cn data]$ hdfs dfs -cat /user/hadoop/employees/part-m-00000|head
10001,1953-09-02,Georgi,Facello,M,1986-06-26
10002,1964-06-02,Bezalel,Simmel,F,1985-11-21
10003,1959-12-03,Parto,Bamford,M,1986-08-28
10004,1954-05-01,Chirstian,Koblick,M,1986-12-01
10005,1955-01-21,Kyoichi,Maliniak,M,1989-09-12
10006,1953-04-20,Anneke,Preusig,F,1989-06-02
10007,1957-05-23,Tzvetan,Zielinski,F,1989-02-10
10008,1958-02-19,Saniya,Kalloufi,M,1994-09-15
10009,1952-04-19,Sumant,Peac,F,1985-02-18
10010,1963-06-01,Duangkaew,Piveteau,F,1989-08-24
cat: Unable to write to output stream.
[dave@www.cndba.cn data]$ hdfs dfs -cat /user/hadoop/employees/part-m-00000|wc -l
300024
[dave@www.cndba.cn data]$

我们将/user/hadoop/employees/part-m-00000的数据导入的MySQL数据库中。

http://www.cndba.cn/dave/article/3307
http://www.cndba.cn/dave/article/3307

2 导入MySQL

2.1 创建表

正式操作表的创建需要根据字段类型来进行操作,我们这里直接把表结构复制一份:

CREATE TABLE `dave` (
  `emp_no` int(11) NOT NULL,
  `birth_date` date NOT NULL,
  `first_name` varchar(14) NOT NULL,
  `last_name` varchar(16) NOT NULL,
  `gender` enum('M','F') NOT NULL,
  `hire_date` date NOT NULL,
  PRIMARY KEY (`emp_no`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

2.2 导入数据

[dave@www.cndba.cn data]$  sqoop export --connect jdbc:mysql://192.168.56.2:3306/employees --username root --table dave  --export-dir /user/hadoop/employees -P
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
2019-03-03 02:23:14,354 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password: 
2019-03-03 02:23:16,203 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2019-03-03 02:23:16,222 INFO tool.CodeGenTool: Beginning code generation
Sun Mar 03 02:23:16 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2019-03-03 02:23:16,721 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dave` AS t LIMIT 1
2019-03-03 02:23:16,768 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `dave` AS t LIMIT 1
2019-03-03 02:23:16,786 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/fd09937f2198b625a8a66c3104f75e94/dave.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2019-03-03 02:23:19,013 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/fd09937f2198b625a8a66c3104f75e94/dave.jar
2019-03-03 02:23:19,034 INFO mapreduce.ExportJobBase: Beginning export of dave
2019-03-03 02:23:19,035 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2019-03-03 02:23:19,292 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2019-03-03 02:23:20,614 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
2019-03-03 02:23:20,618 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2019-03-03 02:23:20,618 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2019-03-03 02:23:20,782 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.56.100:8032
2019-03-03 02:23:21,273 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1551513927152_0010
2019-03-03 02:23:25,675 INFO input.FileInputFormat: Total input files to process : 1
2019-03-03 02:23:25,681 INFO input.FileInputFormat: Total input files to process : 1
2019-03-03 02:23:25,807 INFO mapreduce.JobSubmitter: number of splits:4
2019-03-03 02:23:25,856 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
2019-03-03 02:23:25,856 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-03-03 02:23:26,110 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1551513927152_0010
2019-03-03 02:23:26,112 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-03-03 02:23:26,451 INFO conf.Configuration: resource-types.xml not found
2019-03-03 02:23:26,451 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-03-03 02:23:26,536 INFO impl.YarnClientImpl: Submitted application application_1551513927152_0010
2019-03-03 02:23:26,585 INFO mapreduce.Job: The url to track the job: http://Master:8088/proxy/application_1551513927152_0010/
2019-03-03 02:23:26,585 INFO mapreduce.Job: Running job: job_1551513927152_0010
2019-03-03 02:23:36,783 INFO mapreduce.Job: Job job_1551513927152_0010 running in uber mode : false
2019-03-03 02:23:36,784 INFO mapreduce.Job:  map 0% reduce 0%
2019-03-03 02:24:07,055 INFO mapreduce.Job:  map 29% reduce 0%
2019-03-03 02:24:08,065 INFO mapreduce.Job:  map 37% reduce 0%
2019-03-03 02:24:13,104 INFO mapreduce.Job:  map 61% reduce 0%
2019-03-03 02:24:14,110 INFO mapreduce.Job:  map 82% reduce 0%
2019-03-03 02:24:15,152 INFO mapreduce.Job:  map 85% reduce 0%
2019-03-03 02:24:16,174 INFO mapreduce.Job:  map 93% reduce 0%
2019-03-03 02:24:17,184 INFO mapreduce.Job:  map 100% reduce 0%
2019-03-03 02:24:17,202 INFO mapreduce.Job: Job job_1551513927152_0010 completed successfully
2019-03-03 02:24:17,302 INFO mapreduce.Job: Counters: 33
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=920724
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=13838998
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=19
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
        HDFS: Number of bytes read erasure-coded=0
    Job Counters 
        Launched map tasks=4
        Data-local map tasks=4
        Total time spent by all maps in occupied slots (ms)=148315
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=148315
        Total vcore-milliseconds taken by all map tasks=148315
        Total megabyte-milliseconds taken by all map tasks=151874560
    Map-Reduce Framework
        Map input records=300024
        Map output records=300024
        Input split bytes=606
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=1115
        CPU time spent (ms)=20710
        Physical memory (bytes) snapshot=645414912
        Virtual memory (bytes) snapshot=11055927296
        Total committed heap usage (bytes)=486801408
        Peak Map Physical memory (bytes)=165249024
        Peak Map Virtual memory (bytes)=2770202624
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
2019-03-03 02:24:17,313 INFO mapreduce.ExportJobBase: Transferred 13.1979 MB in 56.6817 seconds (238.4305 KB/sec)
2019-03-03 02:24:17,318 INFO mapreduce.ExportJobBase: Exported 300024 records.
[dave@www.cndba.cn data]$

验证数据:http://www.cndba.cn/dave/article/3307http://www.cndba.cn/dave/article/3307http://www.cndba.cn/dave/article/3307

mysql> use employees
Database changed
mysql> select count(*) from dave;
+----------+
| count(*) |
+----------+
|   300024 |
+----------+
1 row in set (0.05 sec)

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
dave

dave

关注

人的一生应该是这样度过的:当他回首往事的时候,他不会因为虚度年华而悔恨,也不会因为碌碌无为而羞耻;这样,在临死的时候,他就能够说:“我的整个生命和全部精力,都已经献给世界上最壮丽的事业....."

  • 2261
    原创
  • 3
    翻译
  • 578
    转载
  • 191
    评论
  • 访问:7995676次
  • 积分:4346
  • 等级:核心会员
  • 排名:第1名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ