签到成功

知道了

CNDBA社区CNDBA社区

HiveQL的 数据定义 和 操作

2019-05-14 22:39 2113 0 原创 Hive
作者: dave

在之前的博客中,我们了解了Hive 对象的基本操作,如下:
Hive 对象 基本操作
https://www.cndba.cn/dave/article/3338

这里对部分内容进行补充。http://www.cndba.cn/dave/article/3406

1数据库

Hive 中的数据库本质上是一个目录或者命名空间,这样在用户较多的情况下可以避免表名的冲突。 如果用户在创建Hive 表时没有指定数据库,那么则使用默认的数据库default。

[dave@www.cndba.cn ~]# hive
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/jars/hive-common-1.1.0-cdh5.16.1.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show databases;
OK
default
Time taken: 2.662 seconds, Fetched: 1 row(s)
hive> create database cndba;
OK
Time taken: 0.105 seconds
hive> show databases;
OK
cndba
default
Time taken: 0.026 seconds, Fetched: 2 row(s)
hive>

这里我们创建了一个名为cndba的数据库,Hive 会为每个数据库创建一个目录。数据库中的表将以这个目录的子目录形式存储。 注意这里的default数据库是个另外,这个数据库本身没有自己的目录。
默认的配置目录是/user/hive/warehouse,之前创建了cndba的数据库,因此在该目录下生成了一个目录:/user/hive/warehouse/cndba.db。 注意这里的数据库目录的文件名都以.db结尾。

[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive
drwxrwxrwt   - hive hive          0 2028-11-11 16:15 /user/hive/warehouse
drwxrwxrwt   - hdfs hive          0 2028-11-11 00:56 /user/hive/warehouse/cndba
drwxrwxrwt   - root hive          0 2028-11-11 16:15 /user/hive/warehouse/cndba.db
drwxrwxrwt   - hdfs hive          0 2028-11-11 01:32 /user/hive/warehouse/cndba2
drwxrwxrwt   - hdfs hive          0 2028-11-11 01:38 /user/hive/warehouse/cndba3
drwxrwxrwt   - hdfs hive          0 2028-11-11 01:42 /user/hive/warehouse/cndba4
drwxrwxrwt   - root hive          0 2028-11-10 14:12 /user/hive/warehouse/dave
drwxrwxrwt   - hdfs hive          0 2028-11-10 14:13 /user/hive/warehouse/dave2
[dave@www.cndba.cn ~]#

当然在创建DB时也可以指定到其他目录:

hive> create database dave location '/dave';
OK
Time taken: 0.082 seconds
hive> show databases;
OK
cndba
dave
default
Time taken: 0.036 seconds, Fetched: 3 row(s)
hive> 
hive> desc database dave;
OK
dave        hdfs://nameservice1/dave    root    USER    
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> desc database cndba;
OK
cndba        hdfs://nameservice1/user/hive/warehouse/cndba.db    root    USER    
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive>

在创建Hive对象时指定对应的数据库即可:

hive> use cndba;
OK
Time taken: 0.043 seconds
hive> create table cndba (id int);
OK
Time taken: 0.297 seconds
hive> desc formatted cndba;
OK
# col_name                data_type               comment             

id                      int                                         

# Detailed Table Information          
Database:               cndba                    
OwnerType:              USER                     
Owner:                  root                     
CreateTime:             Sat Nov 11 16:27:54 CST 2028     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://nameservice1/user/hive/warehouse/cndba.db/cndba     
Table Type:             MANAGED_TABLE            
Table Parameters:          
    transient_lastDdlTime    1857544074          

# Storage Information          
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:          
    serialization.format    1                   
Time taken: 0.198 seconds, Fetched: 27 row(s)
hive>

2创建表

Hive 创建的表的SQL和SQL语法类似,但Hive中的存储格式有多种,具体可以参考之前的博客:

Hive 存储格式 说明
https://www.cndba.cn/dave/article/3405

Hive表支持的类型可以直接参考Hive的官方手册:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types

2.1 管理表

默认情况下,创建的Hive表都是管理表,也称为内部表。 这种表Hive会控制数据的生命周期。 Hive默认情况下会将这些表的数据存储在配置项:hive.metastore.warehouse.dir定义的目录下(/user/hive/warehouse)。当我们删除管理表时,Hive 也会删除表中的数据。http://www.cndba.cn/dave/article/3406

2.2 外部表

内部表数据由Hive自身管理,外部表数据由HDFS管理; 删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除; 创建外部表时需要执行Extenal关键字,并指定location的位置。

2.3 创建示例

hive> show databases;
OK
cndba
dave
default
Time taken: 0.021 seconds, Fetched: 3 row(s)
hive> use default;
OK
Time taken: 0.043 seconds
hive> show tables;
OK
cndba
cndba2
cndba3
cndba4
dave
dave2
Time taken: 0.056 seconds, Fetched: 6 row(s)

#直接创建Hive表,指定列:
hive> create table anqing(id int);
OK
Time taken: 0.166 seconds

#从其他已经存在的表创建表,这里仅复制表结构,不复制数据:
hive> create table huaining like default.cndba;
OK
Time taken: 0.221 seconds
hive> desc anqing;
OK
id                      int                                         
Time taken: 0.177 seconds, Fetched: 1 row(s)
hive> desc huaining;
OK
id                      int                                         
Time taken: 0.163 seconds, Fetched: 1 row(s)
hive> 

#外部表示例:
hive> create external table hefei(id int,cont string)
    > row format delimited fields terminated by ','
    > location '/dave/hefei';
OK
Time taken: 0.132 seconds
hive> load data inpath '/dave/dave.txt' overwrite into table hefei;
Loading data to table default.hefei
Table default.hefei stats: [numFiles=1, totalSize=87]
OK
Time taken: 0.861 seconds
hive> select * from hefei;
OK
1     http://www.cndba.cn
2     中国DBA社区
3     Oracle
4     Hadoop
88    安徽DBA俱乐部
Time taken: 0.359 seconds, Fetched: 5 row(s)
hive> 
[hdfs@hadoop-master3 ~]$ hdfs dfs -ls -R /dave
drwxr-xr-x   - hdfs supergroup          0 2028-11-11 18:37 /dave/hefei
-rwxr-xr-x   3 hdfs supergroup         87 2028-11-04 14:14 /dave/hefei/dave.txt
-rw-r--r--   3 hdfs supergroup   48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[hdfs@hadoop-master3 ~]$

3删除表

上节我们看了内部表和外部表的创建,我们这里看下表的删除。 根据之前的说明,删除内部表会删除数据,删除外部表,不会删除数据。测试如下:

hive> drop table anqing;
OK
Time taken: 0.265 seconds
hive> drop table hefei;
OK
Time taken: 0.217 seconds
hive> 
hive> create external table hefei(id int,cont string)
    > row format delimited fields terminated by ','
    > location '/dave/hefei';
OK
Time taken: 0.143 seconds
hive> select * from hefei;
OK
1     http://www.cndba.cn
2     中国DBA社区
3     Oracle
4     Hadoop
88    安徽DBA俱乐部
Time taken: 0.179 seconds, Fetched: 5 row(s)
hive>

注意这里,我们在删除外部表hefei之后,又重新创建了了该表,但是并没有插入数据,就可以查询的数据。 这个也验证了之前的说法。 删除外部表不会删除存储在HDFS中的数据。http://www.cndba.cn/dave/article/3406

http://www.cndba.cn/dave/article/3406
http://www.cndba.cn/dave/article/3406

4修改表

大多数的表属性都可以通过alter table 语句进行修改。 但需要注意的是,这种修改仅修改元数据,不会修改数据本身。

4.1 表重命名

hive> show tables in default;
OK
cndba
cndba2
cndba3
cndba4
dave
dave2
hefei
huaining
Time taken: 0.071 seconds, Fetched: 8 row(s)
hive> select * from hefei;
OK
1     http://www.cndba.cn
2     中国DBA社区
3     Oracle
4     Hadoop
88    安徽DBA俱乐部
Time taken: 0.184 seconds, Fetched: 5 row(s)
hive> alter table hefei rename to anqing;
OK
Time taken: 0.246 seconds
hive> show tables in default;
OK
anqing
cndba
cndba2
cndba3
cndba4
dave
dave2
huaining
Time taken: 0.04 seconds, Fetched: 8 row(s)
hive> 

[hdfs@hadoop-master3 ~]$ hdfs dfs -ls -R /dave
drwxr-xr-x   - hdfs supergroup          0 2028-11-11 18:37 /dave/hefei
-rwxr-xr-x   3 hdfs supergroup         87 2028-11-04 14:14 /dave/hefei/dave.txt
-rw-r--r--   3 hdfs supergroup   48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[hdfs@hadoop-master3 ~]$

4.2 增加、修改、删除分区

#创建分区表:
hive> create table cndba_partition (
    > id string comment 'id', 
    > name string comment '名字'
    > ) 
    > comment '测试分区' 
    > partitioned by (year int comment '年') 
    > row format delimited fields terminated by ',' ;
OK
Time taken: 2.774 seconds
hive> 
#向分区表插入数据:
hive> insert into table cndba_partition partition(year=2017) values ('001','www.cndba.cn');
hive> insert into table cndba_partition partition(year=2018) values ('002','www.ahdba.com');
hive> insert into table cndba_partition partition(year=2019) values ('003','dave');
hive> select * from cndba_partition;
OK
001    www.cndba.cn    2017
002    www.ahdba.com    2018
003    dave    2019
Time taken: 0.326 seconds, Fetched: 3 row(s)
hive> 
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt   3 hdfs hive         17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt   3 hdfs hive         18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt   3 hdfs hive          9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0
[dave@www.cndba.cn ~]# 

#添加分区:
hive> alter table cndba_partition add partition(year=2016);
OK
Time taken: 0.209 seconds
hive> 
hive> load data  inpath '/dave/hefei/dave.txt' into table cndba_partition partition(year=2016);
Loading data to table default.cndba_partition partition (year=2016)
Partition default.cndba_partition{year=2016} stats: [numFiles=1, totalSize=87]
OK
Time taken: 0.8 seconds
hive> 
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:31 /user/hive/warehouse/cndba_partition/year=2016
-rwxrwxrwt   3 hdfs hive         87 2028-11-04 14:14 /user/hive/warehouse/cndba_partition/year=2016/dave.txt
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt   3 hdfs hive         17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt   3 hdfs hive         18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt   3 hdfs hive          9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0
[dave@www.cndba.cn ~]# 

#修改分区
可以通过set location 选项来移动分区的路径,这个命令不会将数据从旧的路径移走,也不会删除旧的数据。

#注意这里的location 指向的是目录,而不是文件:
hive> alter table cndba_partition partition(year=2016) set location '/dave/hefei/dave.txt';
OK
Time taken: 0.313 seconds
hive> 
hive> insert into table cndba_partition partition(year=2016) values ('004','www.cndba.cn');
hive> 
#查看旧的目录结构,依旧存在:
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:31 /user/hive/warehouse/cndba_partition/year=2016
-rwxrwxrwt   3 hdfs hive         87 2028-11-04 14:14 /user/hive/warehouse/cndba_partition/year=2016/dave.txt
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt   3 hdfs hive         17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt   3 hdfs hive         18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt   3 hdfs hive          9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0
#查看新的目录结构:
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /dave
drwxr-xr-x   - hdfs supergroup          0 2028-11-11 23:36 /dave/hefei
drwxr-xr-x   - hdfs supergroup          0 2028-11-11 23:36 /dave/hefei/dave.txt
-rwxr-xr-x   3 hdfs supergroup         17 2028-11-11 23:36 /dave/hefei/dave.txt/000000_0
-rw-r--r--   3 hdfs supergroup   48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[dave@www.cndba.cn ~]# 

#删除分区:
hive>  alter table cndba_partition drop  partition(year=2016) ;
Dropped the partition year=2016
OK
Time taken: 0.94 seconds
hive> 
#注意在删除分区时,最新的location里的数据被删除了,但之前的数据还在:
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /dave
drwxr-xr-x   - hdfs supergroup          0 2028-11-11 23:39 /dave/hefei
-rw-r--r--   3 hdfs supergroup   48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[dave@www.cndba.cn ~]# 
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:31 /user/hive/warehouse/cndba_partition/year=2016
-rwxrwxrwt   3 hdfs hive         87 2028-11-04 14:14 /user/hive/warehouse/cndba_partition/year=2016/dave.txt
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt   3 hdfs hive         17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt   3 hdfs hive         18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt   - hdfs hive          0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt   3 hdfs hive          9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0

删除分区操作和删除表一样,对于内部表,删除分区时对应的元数据和分区内的数据会一起删除。 对于外部表,则只删除元数据。http://www.cndba.cn/dave/article/3406

4.3 修改列信息

在Hive的表中,可以对某个字段进行重命名,修改其位置、类型或者添加注释。示例如下:

hive> create table huaining(id int,name string);
OK
Time taken: 0.12 seconds
hive> desc huaining;
OK
id                      int                                         
name                    string                                      
Time taken: 0.173 seconds, Fetched: 2 row(s)
#注意这里用的是change column 修改列:
hive> alter table huaining change column name address int comment 'this is www.cndba.cn';
OK
Time taken: 0.28 seconds
hive> desc huaining;
OK
id                      int                                         
address                 int                     this is www.cndba.cn
Time taken: 0.162 seconds, Fetched: 2 row(s)
hive> 

4.4 增加列
在Hive中,可以添加新的列到分区字段之前。
hive> desc cndba_partition;
OK
id                      string                  id                  
name                    string                  ??                  
year                    int                     ?                   

# Partition Information          
# col_name                data_type               comment             

year                    int                     ?                   
Time taken: 0.234 seconds, Fetched: 8 row(s)
hive> alter table cndba_partition add columns(address string);
OK
Time taken: 0.174 seconds
#注意这里新添加的列的位置,在分区字段之前:
hive> desc cndba_partition;
OK
id                      string                  id                  
name                    string                  ??                  
address                 string                                      
year                    int                     ?                   

# Partition Information          
# col_name                data_type               comment             

year                    int                     ?                   
Time taken: 0.19 seconds, Fetched: 9 row(s)
hive>

4.5 删除/替换列

先看示例:

hive> select * from huaining;
OK
1    NULL
1    1
1    www.cndba.cn
2    dave from anqing
Time taken: 0.148 seconds, Fetched: 4 row(s)
hive> desc huaining;
OK
id                      int                                         
name                    string                                      
Time taken: 0.143 seconds, Fetched: 2 row(s)
hive> alter table huaining replace columns(name string,id int);
OK
Time taken: 0.202 seconds
hive> select * from huaining;
OK
1    NULL
1    1
1    NULL
2    NULL
Time taken: 0.144 seconds, Fetched: 4 row(s)
hive>

这里的replace 直接删除了之前的所有字段,然后指定了新的字段,这里仅仅是修改了元数据。但修改之后的第二列只能显示int类型,字符串类型的显示为NULL.

4.6 修改表属性

可以对表新增属性或者修改已经存在的属性,但无法删除属性:

hive> desc formatted huaining;
OK
# col_name                data_type               comment             

name                    string                                      
id                      int                                         

# Detailed Table Information          
Database:               default                  
OwnerType:              USER                     
Owner:                  hdfs                     
CreateTime:             Sat Nov 11 23:44:05 CST 2028     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://nameservice1/user/hive/warehouse/huaining     
Table Type:             MANAGED_TABLE            
Table Parameters:          
    COLUMN_STATS_ACCURATE    false               
    last_modified_by        hdfs                
    last_modified_time      1857571205          
    numFiles                4                   
    numRows                 -1                  
    rawDataSize             -1                  
    totalSize               43                  
    transient_lastDdlTime    1857571205          

# Storage Information          
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:          
    serialization.format    1                   
Time taken: 0.158 seconds, Fetched: 35 row(s)
hive> alter table huaining set tblproperties('notes'='www.cndba.cn');
OK
Time taken: 0.218 seconds
hive> desc formatted huaining;
OK
# col_name                data_type               comment             

name                    string                                      
id                      int                                         

# Detailed Table Information          
Database:               default                  
OwnerType:              USER                     
Owner:                  hdfs                     
CreateTime:             Sat Nov 11 23:44:05 CST 2028     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://nameservice1/user/hive/warehouse/huaining     
Table Type:             MANAGED_TABLE            
Table Parameters:          
    COLUMN_STATS_ACCURATE    false               
    last_modified_by        hdfs                
    last_modified_time      1857571405       
#注意修改之后这里多了一个属性:   
    notes                   www.cndba.cn        
    numFiles                4                   
    numRows                 -1                  
    rawDataSize             -1                  
    totalSize               43                  
    transient_lastDdlTime    1857571405          

# Storage Information          
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:          
    serialization.format    1                   
Time taken: 0.158 seconds, Fetched: 36 row(s)
hive>

4.7 修改存储属性

Hive的存储格式在之前的博客有说明:

Hive 存储格式 说明
https://www.cndba.cn/dave/article/3405

http://www.cndba.cn/dave/article/3406

我们这里直接修改:

hive> alter table huaining set fileformat sequencefile;
OK
Time taken: 0.209 seconds
hive> desc formatted huaining;
OK
# col_name                data_type               comment             

name                    string                                      
id                      int                                         

# Detailed Table Information          
Database:               default                  
OwnerType:              USER                     
Owner:                  hdfs                     
CreateTime:             Sat Nov 11 23:44:05 CST 2028     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://nameservice1/user/hive/warehouse/huaining     
Table Type:             MANAGED_TABLE            
Table Parameters:          
    COLUMN_STATS_ACCURATE    false               
    last_modified_by        hdfs                
    last_modified_time      1857571568          
    notes                   www.cndba.cn        
    numFiles                4                   
    numRows                 -1                  
    rawDataSize             -1                  
    totalSize               43                  
    transient_lastDdlTime    1857571568          

# Storage Information          
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
InputFormat:            org.apache.hadoop.mapred.SequenceFileInputFormat     
OutputFormat:           org.apache.hadoop.mapred.SequenceFileOutputFormat     
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:          
    serialization.format    1                   
Time taken: 0.175 seconds, Fetched: 36 row(s)
#修改存储格式是成果的,但之前的数据无法读取:
hive> select * from huaining;
OK
Failed with exception java.io.IOException:java.io.IOException: hdfs://nameservice1/user/hive/warehouse/huaining/000000_0 not a SequenceFile
Time taken: 0.123 seconds
hive>

所以修改存储格式需要慎重。

http://www.cndba.cn/dave/article/3406

5数据操作

5.1 直接插入数据

具体语法可以参考Hive的官方手册,这里不在详述,

https://cwiki.apache.org/confluence/display/HIVE

示例如下:

hive> create table huaining(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.117 seconds
hive> insert into huaining values(1,'www.cndba.cn');
Query ID = hdfs_20281112002222_b749c427-5e33-4196-a1af-cc304417afc0
Total jobs = 3
Launching Job 1 out of 3
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 97931312-f2f1-43aa-a1d6-e7a8dbf52548
2028-11-12 00:22:19,534    Stage-10_0: 1/1 Finished    
Status: Finished successfully in 1.02 seconds
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://nameservice1/user/hive/warehouse/huaining/.hive-staging_hive_2028-11-12_00-22-18_168_5192911748127468369-1/-ext-10000
Loading data to table default.huaining
Table default.huaining stats: [numFiles=1, numRows=1, totalSize=15, rawDataSize=14]
OK
Time taken: 1.805 seconds
hive> select * from huaining;
OK
1    www.cndba.cn
Time taken: 0.12 seconds, Fetched: 1 row(s)
hive>

5.2 向内部表中加载数据

语法如下:http://www.cndba.cn/dave/article/3406

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

这里加local 表示从本地文件系统读取,不加表示从HDFS系统中移动。

hive> load data local inpath '/tmp/dave.txt' overwrite into table huaining;
Loading data to table default.huaining
Table default.huaining stats: [numFiles=1, totalSize=21]
OK
Time taken: 0.461 seconds
hive> select * from huaining;
OK
1    anqing
2    huaining
NULL    NULL
Time taken: 0.113 seconds, Fetched: 3 row(s)
hive>

5.3 通过查询语句向表中插入数据

hive> insert overwrite table huaining select * from anqing;
hive> select * from huaining;
OK
88    www.cndba.cn
Time taken: 0.137 seconds, Fetched: 1 row(s)
注意这里加了overwrite 把原来的数据覆盖了。

5.4 建表时插入数据

hive> create table yueshan as select * from huaining;
OK
Time taken: 1.455 seconds
hive> select * from yueshan;
OK
88    www.cndba.cn
Time taken: 0.133 seconds, Fetched: 1 row(s)
hive>

5.5 导出数据

对于文本文件格式的表,可以直接使用hdfs dfs -cp source_path target_path 命令来复制到本地。 也可以使用Hive SQL来导,示例如下:

hive> insert overwrite local directory '/tmp/yueshan' select * from yueshan;
Query ID = hdfs_20281112003636_4c99e88f-401a-4521-87fd-c8e54db4ee76
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Spark Job = 3b5f84a7-ee2c-4b54-afeb-2fe6ec3c3c65
2028-11-12 00:36:42,247    Stage-15_0: 1/1 Finished    
Status: Finished successfully in 1.01 seconds
Copying data to local directory /tmp/yueshan
OK
Time taken: 1.295 seconds
hive> 
#查看导出的文件:
hive> !cat /tmp/yueshan/000000_0;
88www.cndba.cn
hive>

这里只是对基本操作做一些补充,更多的操作可以直接参考Hive的官方手册。

http://www.cndba.cn/dave/article/3406

版权声明:本文为博主原创文章,未经博主允许不得转载。

用户评论
* 以下用户言论只代表其个人观点,不代表CNDBA社区的观点或立场
dave

dave

关注

人的一生应该是这样度过的:当他回首往事的时候,他不会因为虚度年华而悔恨,也不会因为碌碌无为而羞耻;这样,在临死的时候,他就能够说:“我的整个生命和全部精力,都已经献给世界上最壮丽的事业....."

  • 2261
    原创
  • 3
    翻译
  • 578
    转载
  • 191
    评论
  • 访问:7973508次
  • 积分:4346
  • 等级:核心会员
  • 排名:第1名
精华文章
    最新问题
    查看更多+
    热门文章
      热门用户
      推荐用户
        Copyright © 2016 All Rights Reserved. Powered by CNDBA · 皖ICP备2022006297号-1·

        QQ交流群

        注册联系QQ