在上一节我们看了sqoop对单表的操作,如下:
Sqoop MySQL 导入数据到 HDFS
https://www.cndba.cn/dave/article/3305
这里我们看下sqoop导出整个数据库中的表到HDFS。
1 库中表信息
mysql> use employees
Database changed
mysql> show tables
-> ;
+----------------------+
| Tables_in_employees |
+----------------------+
| current_dept_emp |
| departments |
| dept_emp |
| dept_emp_latest_date |
| dept_manager |
| employees |
| salaries |
| titles |
+----------------------+
8 rows in set (0.00 sec)
2 导入全库表
这里有两种语法,第一种是使用之前单表操作的import,第二种是使用import-all-tables,它是强制性的,在该数据库中的每个表必须有一个主键字段。
如果没有主键,在导入时会报如下错误:
2019-03-03 01:52:59,211 ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table current_dept_emp. Please specify one with --split-by or perform a sequential import with '-m 1'.
不过可以通过设置-m 1来解决这个错误。
[dave@www.cndba.cn ~]$ sqoop import-all-tables --connect jdbc:mysql://192.168.56.2:3306/employees --username root --m 1 -P
Warning: /home/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
2019-03-03 01:54:19,807 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
Enter password:
2019-03-03 01:54:22,280 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Sun Mar 03 01:54:22 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2019-03-03 01:54:22,782 INFO tool.CodeGenTool: Beginning code generation
2019-03-03 01:54:22,804 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
2019-03-03 01:54:23,293 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `current_dept_emp` AS t LIMIT 1
2019-03-03 01:54:23,747 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/current_dept_emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2019-03-03 01:54:26,063 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/current_dept_emp.jar
2019-03-03 01:54:26,080 WARN manager.MySQLManager: It looks like you are importing from mysql.
2019-03-03 01:54:26,080 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2019-03-03 01:54:26,080 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2019-03-03 01:54:26,080 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2019-03-03 01:54:26,091 INFO mapreduce.ImportJobBase: Beginning import of current_dept_emp
…
2019-03-03 01:56:35,784 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=230433
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=816
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=5127
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=5127
Total vcore-milliseconds taken by all map tasks=5127
Total megabyte-milliseconds taken by all map tasks=5250048
Map-Reduce Framework
Map input records=24
Map output records=24
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=65
CPU time spent (ms)=1070
Physical memory (bytes) snapshot=144064512
Virtual memory (bytes) snapshot=2763091968
Total committed heap usage (bytes)=121700352
Peak Map Physical memory (bytes)=144064512
Peak Map Virtual memory (bytes)=2763091968
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=816
2019-03-03 01:56:35,793 INFO mapreduce.ImportJobBase: Transferred 816 bytes in 22.5143 seconds (36.2437 bytes/sec)
2019-03-03 01:56:35,807 INFO mapreduce.ImportJobBase: Retrieved 24 records.
2019-03-03 01:56:35,807 INFO tool.CodeGenTool: Beginning code generation
Sun Mar 03 01:56:35 CST 2019 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2019-03-03 01:56:35,832 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1
2019-03-03 01:56:35,839 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop
Note: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/employees.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2019-03-03 01:56:36,358 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a9b27411f6dd185d97d0bcbf21f6380d/employees.jar
2019-03-03 01:56:36,391 INFO mapreduce.ImportJobBase: Beginning import of employees
2019-03-03 01:56:36,448 INFO client.RMProxy: Connecting to ResourceManager at Master/192.168.56.100:8032
2019-03-03 01:56:36,485 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://Master:9000/user/hadoop/employees already exists
验证数据:
[dave@www.cndba.cn data]$ hdfs dfs -ls -R /user/hadoop/
drwxr-xr-x - hadoop supergroup 0 2019-03-03 01:31 /user/hadoop/_sqoop
drwxr-xr-x - hadoop supergroup 0 2019-03-03 01:54 /user/hadoop/current_dept_emp
-rw-r--r-- 2 hadoop supergroup 0 2019-03-03 01:54 /user/hadoop/current_dept_emp/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 10110817 2019-03-03 01:54 /user/hadoop/current_dept_emp/part-m-00000
drwxr-xr-x - hadoop supergroup 0 2019-03-03 01:55 /user/hadoop/departments
-rw-r--r-- 2 hadoop supergroup 0 2019-03-03 01:55 /user/hadoop/departments/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 153 2019-03-03 01:55 /user/hadoop/departments/part-m-00000
drwxr-xr-x - hadoop supergroup 0 2019-03-03 01:55 /user/hadoop/dept_emp
-rw-r--r-- 2 hadoop supergroup 0 2019-03-03 01:55 /user/hadoop/dept_emp/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 11175033 2019-03-03 01:55 /user/hadoop/dept_emp/part-m-00000
drwxr-xr-x - hadoop supergroup 0 2019-03-03 01:56 /user/hadoop/dept_emp_latest_date
-rw-r--r-- 2 hadoop supergroup 0 2019-03-03 01:56 /user/hadoop/dept_emp_latest_date/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 8610697 2019-03-03 01:56 /user/hadoop/dept_emp_latest_date/part-m-00000
drwxr-xr-x - hadoop supergroup 0 2019-03-03 01:56 /user/hadoop/dept_manager
-rw-r--r-- 2 hadoop supergroup 0 2019-03-03 01:56 /user/hadoop/dept_manager/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 816 2019-03-03 01:56 /user/hadoop/dept_manager/part-m-00000
drwxr-xr-x - hadoop supergroup 0 2019-03-03 00:43 /user/hadoop/employees
-rw-r--r-- 2 hadoop supergroup 0 2019-03-03 00:43 /user/hadoop/employees/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 13821993 2019-03-03 00:43 /user/hadoop/employees/part-m-00000
[dave@www.cndba.cn data]$
[dave@www.cndba.cn data]$ hdfs dfs -cat /user/hadoop/dept_emp/part*|wc -l
331603
[dave@www.cndba.cn data]$
版权声明:本文为博主原创文章,未经博主允许不得转载。