签到成功

知道了

CNDBA社区CNDBA社区

Sqoop 导出库中所有表到 HDFS

2019-03-03 02:09 1892 0 原创 Sqoop
作者: dave

在上一节我们看了sqoop对单表的操作,如下:

http://www.cndba.cn/dave/article/3306
http://www.cndba.cn/dave/article/3306

Sqoop MySQL 导入数据到 HDFS
https://www.cndba.cn/dave/article/3305http://www.cndba.cn/dave/article/3306

这里我们看下sqoop导出整个数据库中的表到HDFS。http://www.cndba.cn/dave/article/3306http://www.cndba.cn/dave/article/3306

1 库中表信息

mysql> use employees
Database changed
mysql> show tables
    -> ;
+----------------------+
| Tables_in_employees  |
+----------------------+
| current_dept_emp     |
| departments          |
| dept_emp             |
| dept_emp_latest_date |
| dept_manager         |
| employees            |
| salaries             |
| titles               |
+----------------------+
8 rows in set (0.00 sec)

2 导入全库表

这里有两种语法,第一种是使用之前单表操作的import,第二种是使用import-all-tables,它是强制性的,在该数据库中的每个表必须有一个主键字段。http://www.cndba.cn/dave/article/3306

如果没有主键,在导入时会报如下错误:http://www.cndba.cn/dave/article/3306

http://www.cndba.cn/dave/article/3306
http://www.cndba.cn/dave/article/3306

2019-03-03 01:52:59,211 ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table current_dept_emp. Please specify one with --split-by or perform a sequential import with '-m 1'.

不过可以通过设置-m 1来解决这个错误。http://www.cndba.cn/dave/article/3306

[dave@www.cndba.cn ~]$ sqoop import-all-tables --connect jdbc:mysql://192.168.56.2:3306/employees --username root  --m 1  -P
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
2019-03-03 01:54:19,807 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password: 
2019-03-03 01:54:22,280 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Sun Mar 03 01:54:22 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2019-03-03 01:54:22,782 INFO tool.CodeGenTool: Beginning code generation
2019-03-03 01:54:22,804 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
2019-03-03 01:54:23,293 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
2019-03-03 01:54:23,747 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/current_dept_emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2019-03-03 01:54:26,063 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/current_dept_emp.jar
2019-03-03 01:54:26,080 WARN manager.MySQLManager: It looks like you are importing from mysql.
2019-03-03 01:54:26,080 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2019-03-03 01:54:26,080 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2019-03-03 01:54:26,080 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2019-03-03 01:54:26,091 INFO mapreduce.ImportJobBase: Beginning import of current_dept_emp
…
2019-03-03 01:56:35,784 INFO mapreduce.Job: Counters: 33
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=230433
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=816
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
        HDFS: Number of bytes read erasure-coded=0
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=5127
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=5127
        Total vcore-milliseconds taken by all map tasks=5127
        Total megabyte-milliseconds taken by all map tasks=5250048
    Map-Reduce Framework
        Map input records=24
        Map output records=24
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=65
        CPU time spent (ms)=1070
        Physical memory (bytes) snapshot=144064512
        Virtual memory (bytes) snapshot=2763091968
        Total committed heap usage (bytes)=121700352
        Peak Map Physical memory (bytes)=144064512
        Peak Map Virtual memory (bytes)=2763091968
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=816
2019-03-03 01:56:35,793 INFO mapreduce.ImportJobBase: Transferred 816 bytes in 22.5143 seconds (36.2437 bytes/sec)
2019-03-03 01:56:35,807 INFO mapreduce.ImportJobBase: Retrieved 24 records.
2019-03-03 01:56:35,807 INFO tool.CodeGenTool: Beginning code generation
Sun Mar 03 01:56:35 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2019-03-03 01:56:35,832 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
2019-03-03 01:56:35,839 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/employees.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2019-03-03 01:56:36,358 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/employees.jar
2019-03-03 01:56:36,391 INFO mapreduce.ImportJobBase: Beginning import of employees
2019-03-03 01:56:36,448 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.56.100:8032
2019-03-03 01:56:36,485 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://Master:9000/user/hadoop/employees already exists

验证数据:

[dave@www.cndba.cn data]$ hdfs dfs -ls -R /user/hadoop/
drwxr-xr-x   - hadoop supergroup          0 2019-03-03 01:31 /user/hadoop/_sqoop
drwxr-xr-x   - hadoop supergroup          0 2019-03-03 01:54 /user/hadoop/current_dept_emp
-rw-r--r--   2 hadoop supergroup          0 2019-03-03 01:54 /user/hadoop/current_dept_emp/_SUCCESS
-rw-r--r--   2 hadoop supergroup   10110817 2019-03-03 01:54 /user/hadoop/current_dept_emp/part-m-00000
drwxr-xr-x   - hadoop supergroup          0 2019-03-03 01:55 /user/hadoop/departments
-rw-r--r--   2 hadoop supergroup          0 2019-03-03 01:55 /user/hadoop/departments/_SUCCESS
-rw-r--r--   2 hadoop supergroup        153 2019-03-03 01:55 /user/hadoop/departments/part-m-00000
drwxr-xr-x   - hadoop supergroup          0 2019-03-03 01:55 /user/hadoop/dept_emp
-rw-r--r--   2 hadoop supergroup          0 2019-03-03 01:55 /user/hadoop/dept_emp/_SUCCESS
-rw-r--r--   2 hadoop supergroup   11175033 2019-03-03 01:55 /user/hadoop/dept_emp/part-m-00000
drwxr-xr-x   - hadoop supergroup          0 2019-03-03 01:56 /user/hadoop/dept_emp_latest_date
-rw-r--r--   2 hadoop supergroup          0 2019-03-03 01:56 /user/hadoop/dept_emp_latest_date/_SUCCESS
-rw-r--r--   2 hadoop supergroup    8610697 2019-03-03 01:56 /user/hadoop/dept_emp_latest_date/part-m-00000
drwxr-xr-x   - hadoop supergroup          0 2019-03-03 01:56 /user/hadoop/dept_manager
-rw-r--r--   2 hadoop supergroup          0 2019-03-03 01:56 /user/hadoop/dept_manager/_SUCCESS
-rw-r--r--   2 hadoop supergroup        816 2019-03-03 01:56 /user/hadoop/dept_manager/part-m-00000
drwxr-xr-x   - hadoop supergroup          0 2019-03-03 00:43 /user/hadoop/employees
-rw-r--r--   2 hadoop supergroup          0 2019-03-03 00:43 /user/hadoop/employees/_SUCCESS
-rw-r--r--   2 hadoop supergroup   13821993 2019-03-03 00:43 /user/hadoop/employees/part-m-00000
[dave@www.cndba.cn data]$

[dave@www.cndba.cn data]$ hdfs dfs -cat /user/hadoop/dept_emp/part*|wc -l
331603
[dave@www.cndba.cn data]$

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
dave

dave

关注

人的一生应该是这样度过的:当他回首往事的时候,他不会因为虚度年华而悔恨,也不会因为碌碌无为而羞耻;这样,在临死的时候,他就能够说:“我的整个生命和全部精力,都已经献给世界上最壮丽的事业....."

  • 2261
    原创
  • 3
    翻译
  • 578
    转载
  • 191
    评论
  • 访问:7995480次
  • 积分:4346
  • 等级:核心会员
  • 排名:第1名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ