在之前的博客中,我们了解了Hive 对象的基本操作,如下:
Hive 对象 基本操作
https://www.cndba.cn/dave/article/3338
这里对部分内容进行补充。
1数据库
Hive 中的数据库本质上是一个目录或者命名空间,这样在用户较多的情况下可以避免表名的冲突。 如果用户在创建Hive 表时没有指定数据库,那么则使用默认的数据库default。
[dave@www.cndba.cn ~]# hive
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0
Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.16.1-1.cdh5.16.1.p0.3/jars/hive-common-1.1.0-cdh5.16.1.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive> show databases;
OK
default
Time taken: 2.662 seconds, Fetched: 1 row(s)
hive> create database cndba;
OK
Time taken: 0.105 seconds
hive> show databases;
OK
cndba
default
Time taken: 0.026 seconds, Fetched: 2 row(s)
hive>
这里我们创建了一个名为cndba的数据库,Hive 会为每个数据库创建一个目录。数据库中的表将以这个目录的子目录形式存储。 注意这里的default数据库是个另外,这个数据库本身没有自己的目录。
默认的配置目录是/user/hive/warehouse,之前创建了cndba的数据库,因此在该目录下生成了一个目录:/user/hive/warehouse/cndba.db。 注意这里的数据库目录的文件名都以.db结尾。
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive
drwxrwxrwt - hive hive 0 2028-11-11 16:15 /user/hive/warehouse
drwxrwxrwt - hdfs hive 0 2028-11-11 00:56 /user/hive/warehouse/cndba
drwxrwxrwt - root hive 0 2028-11-11 16:15 /user/hive/warehouse/cndba.db
drwxrwxrwt - hdfs hive 0 2028-11-11 01:32 /user/hive/warehouse/cndba2
drwxrwxrwt - hdfs hive 0 2028-11-11 01:38 /user/hive/warehouse/cndba3
drwxrwxrwt - hdfs hive 0 2028-11-11 01:42 /user/hive/warehouse/cndba4
drwxrwxrwt - root hive 0 2028-11-10 14:12 /user/hive/warehouse/dave
drwxrwxrwt - hdfs hive 0 2028-11-10 14:13 /user/hive/warehouse/dave2
[dave@www.cndba.cn ~]#
当然在创建DB时也可以指定到其他目录:
hive> create database dave location '/dave';
OK
Time taken: 0.082 seconds
hive> show databases;
OK
cndba
dave
default
Time taken: 0.036 seconds, Fetched: 3 row(s)
hive>
hive> desc database dave;
OK
dave hdfs://nameservice1/dave root USER
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> desc database cndba;
OK
cndba hdfs://nameservice1/user/hive/warehouse/cndba.db root USER
Time taken: 0.027 seconds, Fetched: 1 row(s)
hive>
在创建Hive对象时指定对应的数据库即可:
hive> use cndba;
OK
Time taken: 0.043 seconds
hive> create table cndba (id int);
OK
Time taken: 0.297 seconds
hive> desc formatted cndba;
OK
# col_name data_type comment
id int
# Detailed Table Information
Database: cndba
OwnerType: USER
Owner: root
CreateTime: Sat Nov 11 16:27:54 CST 2028
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://nameservice1/user/hive/warehouse/cndba.db/cndba
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1857544074
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.198 seconds, Fetched: 27 row(s)
hive>
2创建表
Hive 创建的表的SQL和SQL语法类似,但Hive中的存储格式有多种,具体可以参考之前的博客:
Hive 存储格式 说明
https://www.cndba.cn/dave/article/3405Hive表支持的类型可以直接参考Hive的官方手册:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types
2.1 管理表
默认情况下,创建的Hive表都是管理表,也称为内部表。 这种表Hive会控制数据的生命周期。 Hive默认情况下会将这些表的数据存储在配置项:hive.metastore.warehouse.dir定义的目录下(/user/hive/warehouse)。当我们删除管理表时,Hive 也会删除表中的数据。
2.2 外部表
内部表数据由Hive自身管理,外部表数据由HDFS管理; 删除内部表会直接删除元数据(metadata)及存储数据;删除外部表仅仅会删除元数据,HDFS上的文件并不会被删除; 创建外部表时需要执行Extenal关键字,并指定location的位置。
2.3 创建示例
hive> show databases;
OK
cndba
dave
default
Time taken: 0.021 seconds, Fetched: 3 row(s)
hive> use default;
OK
Time taken: 0.043 seconds
hive> show tables;
OK
cndba
cndba2
cndba3
cndba4
dave
dave2
Time taken: 0.056 seconds, Fetched: 6 row(s)
#直接创建Hive表,指定列:
hive> create table anqing(id int);
OK
Time taken: 0.166 seconds
#从其他已经存在的表创建表,这里仅复制表结构,不复制数据:
hive> create table huaining like default.cndba;
OK
Time taken: 0.221 seconds
hive> desc anqing;
OK
id int
Time taken: 0.177 seconds, Fetched: 1 row(s)
hive> desc huaining;
OK
id int
Time taken: 0.163 seconds, Fetched: 1 row(s)
hive>
#外部表示例:
hive> create external table hefei(id int,cont string)
> row format delimited fields terminated by ','
> location '/dave/hefei';
OK
Time taken: 0.132 seconds
hive> load data inpath '/dave/dave.txt' overwrite into table hefei;
Loading data to table default.hefei
Table default.hefei stats: [numFiles=1, totalSize=87]
OK
Time taken: 0.861 seconds
hive> select * from hefei;
OK
1 http://www.cndba.cn
2 中国DBA社区
3 Oracle
4 Hadoop
88 安徽DBA俱乐部
Time taken: 0.359 seconds, Fetched: 5 row(s)
hive>
[hdfs@hadoop-master3 ~]$ hdfs dfs -ls -R /dave
drwxr-xr-x - hdfs supergroup 0 2028-11-11 18:37 /dave/hefei
-rwxr-xr-x 3 hdfs supergroup 87 2028-11-04 14:14 /dave/hefei/dave.txt
-rw-r--r-- 3 hdfs supergroup 48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[hdfs@hadoop-master3 ~]$
3删除表
上节我们看了内部表和外部表的创建,我们这里看下表的删除。 根据之前的说明,删除内部表会删除数据,删除外部表,不会删除数据。测试如下:
hive> drop table anqing;
OK
Time taken: 0.265 seconds
hive> drop table hefei;
OK
Time taken: 0.217 seconds
hive>
hive> create external table hefei(id int,cont string)
> row format delimited fields terminated by ','
> location '/dave/hefei';
OK
Time taken: 0.143 seconds
hive> select * from hefei;
OK
1 http://www.cndba.cn
2 中国DBA社区
3 Oracle
4 Hadoop
88 安徽DBA俱乐部
Time taken: 0.179 seconds, Fetched: 5 row(s)
hive>
注意这里,我们在删除外部表hefei之后,又重新创建了了该表,但是并没有插入数据,就可以查询的数据。 这个也验证了之前的说法。 删除外部表不会删除存储在HDFS中的数据。
4修改表
大多数的表属性都可以通过alter table 语句进行修改。 但需要注意的是,这种修改仅修改元数据,不会修改数据本身。
4.1 表重命名
hive> show tables in default;
OK
cndba
cndba2
cndba3
cndba4
dave
dave2
hefei
huaining
Time taken: 0.071 seconds, Fetched: 8 row(s)
hive> select * from hefei;
OK
1 http://www.cndba.cn
2 中国DBA社区
3 Oracle
4 Hadoop
88 安徽DBA俱乐部
Time taken: 0.184 seconds, Fetched: 5 row(s)
hive> alter table hefei rename to anqing;
OK
Time taken: 0.246 seconds
hive> show tables in default;
OK
anqing
cndba
cndba2
cndba3
cndba4
dave
dave2
huaining
Time taken: 0.04 seconds, Fetched: 8 row(s)
hive>
[hdfs@hadoop-master3 ~]$ hdfs dfs -ls -R /dave
drwxr-xr-x - hdfs supergroup 0 2028-11-11 18:37 /dave/hefei
-rwxr-xr-x 3 hdfs supergroup 87 2028-11-04 14:14 /dave/hefei/dave.txt
-rw-r--r-- 3 hdfs supergroup 48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[hdfs@hadoop-master3 ~]$
4.2 增加、修改、删除分区
#创建分区表:
hive> create table cndba_partition (
> id string comment 'id',
> name string comment '名字'
> )
> comment '测试分区'
> partitioned by (year int comment '年')
> row format delimited fields terminated by ',' ;
OK
Time taken: 2.774 seconds
hive>
#向分区表插入数据:
hive> insert into table cndba_partition partition(year=2017) values ('001','www.cndba.cn');
hive> insert into table cndba_partition partition(year=2018) values ('002','www.ahdba.com');
hive> insert into table cndba_partition partition(year=2019) values ('003','dave');
hive> select * from cndba_partition;
OK
001 www.cndba.cn 2017
002 www.ahdba.com 2018
003 dave 2019
Time taken: 0.326 seconds, Fetched: 3 row(s)
hive>
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt 3 hdfs hive 17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt 3 hdfs hive 18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt 3 hdfs hive 9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0
[dave@www.cndba.cn ~]#
#添加分区:
hive> alter table cndba_partition add partition(year=2016);
OK
Time taken: 0.209 seconds
hive>
hive> load data inpath '/dave/hefei/dave.txt' into table cndba_partition partition(year=2016);
Loading data to table default.cndba_partition partition (year=2016)
Partition default.cndba_partition{year=2016} stats: [numFiles=1, totalSize=87]
OK
Time taken: 0.8 seconds
hive>
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt - hdfs hive 0 2028-11-11 23:31 /user/hive/warehouse/cndba_partition/year=2016
-rwxrwxrwt 3 hdfs hive 87 2028-11-04 14:14 /user/hive/warehouse/cndba_partition/year=2016/dave.txt
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt 3 hdfs hive 17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt 3 hdfs hive 18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt 3 hdfs hive 9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0
[dave@www.cndba.cn ~]#
#修改分区
可以通过set location 选项来移动分区的路径,这个命令不会将数据从旧的路径移走,也不会删除旧的数据。
#注意这里的location 指向的是目录,而不是文件:
hive> alter table cndba_partition partition(year=2016) set location '/dave/hefei/dave.txt';
OK
Time taken: 0.313 seconds
hive>
hive> insert into table cndba_partition partition(year=2016) values ('004','www.cndba.cn');
hive>
#查看旧的目录结构,依旧存在:
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt - hdfs hive 0 2028-11-11 23:31 /user/hive/warehouse/cndba_partition/year=2016
-rwxrwxrwt 3 hdfs hive 87 2028-11-04 14:14 /user/hive/warehouse/cndba_partition/year=2016/dave.txt
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt 3 hdfs hive 17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt 3 hdfs hive 18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt 3 hdfs hive 9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0
#查看新的目录结构:
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /dave
drwxr-xr-x - hdfs supergroup 0 2028-11-11 23:36 /dave/hefei
drwxr-xr-x - hdfs supergroup 0 2028-11-11 23:36 /dave/hefei/dave.txt
-rwxr-xr-x 3 hdfs supergroup 17 2028-11-11 23:36 /dave/hefei/dave.txt/000000_0
-rw-r--r-- 3 hdfs supergroup 48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[dave@www.cndba.cn ~]#
#删除分区:
hive> alter table cndba_partition drop partition(year=2016) ;
Dropped the partition year=2016
OK
Time taken: 0.94 seconds
hive>
#注意在删除分区时,最新的location里的数据被删除了,但之前的数据还在:
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /dave
drwxr-xr-x - hdfs supergroup 0 2028-11-11 23:39 /dave/hefei
-rw-r--r-- 3 hdfs supergroup 48455680 2028-11-05 23:25 /dave/isd-inventory.csv
[dave@www.cndba.cn ~]#
[dave@www.cndba.cn ~]# hdfs dfs -ls -R /user/hive/warehouse/cndba_partition
drwxrwxrwt - hdfs hive 0 2028-11-11 23:31 /user/hive/warehouse/cndba_partition/year=2016
-rwxrwxrwt 3 hdfs hive 87 2028-11-04 14:14 /user/hive/warehouse/cndba_partition/year=2016/dave.txt
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017
-rwxrwxrwt 3 hdfs hive 17 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2017/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018
-rwxrwxrwt 3 hdfs hive 18 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2018/000000_0
drwxrwxrwt - hdfs hive 0 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019
-rwxrwxrwt 3 hdfs hive 9 2028-11-11 23:21 /user/hive/warehouse/cndba_partition/year=2019/000000_0
删除分区操作和删除表一样,对于内部表,删除分区时对应的元数据和分区内的数据会一起删除。 对于外部表,则只删除元数据。
4.3 修改列信息
在Hive的表中,可以对某个字段进行重命名,修改其位置、类型或者添加注释。示例如下:
hive> create table huaining(id int,name string);
OK
Time taken: 0.12 seconds
hive> desc huaining;
OK
id int
name string
Time taken: 0.173 seconds, Fetched: 2 row(s)
#注意这里用的是change column 修改列:
hive> alter table huaining change column name address int comment 'this is www.cndba.cn';
OK
Time taken: 0.28 seconds
hive> desc huaining;
OK
id int
address int this is www.cndba.cn
Time taken: 0.162 seconds, Fetched: 2 row(s)
hive>
4.4 增加列
在Hive中,可以添加新的列到分区字段之前。
hive> desc cndba_partition;
OK
id string id
name string ??
year int ?
# Partition Information
# col_name data_type comment
year int ?
Time taken: 0.234 seconds, Fetched: 8 row(s)
hive> alter table cndba_partition add columns(address string);
OK
Time taken: 0.174 seconds
#注意这里新添加的列的位置,在分区字段之前:
hive> desc cndba_partition;
OK
id string id
name string ??
address string
year int ?
# Partition Information
# col_name data_type comment
year int ?
Time taken: 0.19 seconds, Fetched: 9 row(s)
hive>
4.5 删除/替换列
先看示例:
hive> select * from huaining;
OK
1 NULL
1 1
1 www.cndba.cn
2 dave from anqing
Time taken: 0.148 seconds, Fetched: 4 row(s)
hive> desc huaining;
OK
id int
name string
Time taken: 0.143 seconds, Fetched: 2 row(s)
hive> alter table huaining replace columns(name string,id int);
OK
Time taken: 0.202 seconds
hive> select * from huaining;
OK
1 NULL
1 1
1 NULL
2 NULL
Time taken: 0.144 seconds, Fetched: 4 row(s)
hive>
这里的replace 直接删除了之前的所有字段,然后指定了新的字段,这里仅仅是修改了元数据。但修改之后的第二列只能显示int类型,字符串类型的显示为NULL.
4.6 修改表属性
可以对表新增属性或者修改已经存在的属性,但无法删除属性:
hive> desc formatted huaining;
OK
# col_name data_type comment
name string
id int
# Detailed Table Information
Database: default
OwnerType: USER
Owner: hdfs
CreateTime: Sat Nov 11 23:44:05 CST 2028
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://nameservice1/user/hive/warehouse/huaining
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
last_modified_by hdfs
last_modified_time 1857571205
numFiles 4
numRows -1
rawDataSize -1
totalSize 43
transient_lastDdlTime 1857571205
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.158 seconds, Fetched: 35 row(s)
hive> alter table huaining set tblproperties('notes'='www.cndba.cn');
OK
Time taken: 0.218 seconds
hive> desc formatted huaining;
OK
# col_name data_type comment
name string
id int
# Detailed Table Information
Database: default
OwnerType: USER
Owner: hdfs
CreateTime: Sat Nov 11 23:44:05 CST 2028
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://nameservice1/user/hive/warehouse/huaining
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
last_modified_by hdfs
last_modified_time 1857571405
#注意修改之后这里多了一个属性:
notes www.cndba.cn
numFiles 4
numRows -1
rawDataSize -1
totalSize 43
transient_lastDdlTime 1857571405
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.158 seconds, Fetched: 36 row(s)
hive>
4.7 修改存储属性
Hive的存储格式在之前的博客有说明:
Hive 存储格式 说明
https://www.cndba.cn/dave/article/3405
我们这里直接修改:
hive> alter table huaining set fileformat sequencefile;
OK
Time taken: 0.209 seconds
hive> desc formatted huaining;
OK
# col_name data_type comment
name string
id int
# Detailed Table Information
Database: default
OwnerType: USER
Owner: hdfs
CreateTime: Sat Nov 11 23:44:05 CST 2028
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://nameservice1/user/hive/warehouse/huaining
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
last_modified_by hdfs
last_modified_time 1857571568
notes www.cndba.cn
numFiles 4
numRows -1
rawDataSize -1
totalSize 43
transient_lastDdlTime 1857571568
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.SequenceFileInputFormat
OutputFormat: org.apache.hadoop.mapred.SequenceFileOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.175 seconds, Fetched: 36 row(s)
#修改存储格式是成果的,但之前的数据无法读取:
hive> select * from huaining;
OK
Failed with exception java.io.IOException:java.io.IOException: hdfs://nameservice1/user/hive/warehouse/huaining/000000_0 not a SequenceFile
Time taken: 0.123 seconds
hive>
所以修改存储格式需要慎重。
5数据操作
5.1 直接插入数据
具体语法可以参考Hive的官方手册,这里不在详述,
示例如下:
hive> create table huaining(id int,name string) row format delimited fields terminated by ',';
OK
Time taken: 0.117 seconds
hive> insert into huaining values(1,'www.cndba.cn');
Query ID = hdfs_20281112002222_b749c427-5e33-4196-a1af-cc304417afc0
Total jobs = 3
Launching Job 1 out of 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = 97931312-f2f1-43aa-a1d6-e7a8dbf52548
2028-11-12 00:22:19,534 Stage-10_0: 1/1 Finished
Status: Finished successfully in 1.02 seconds
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://nameservice1/user/hive/warehouse/huaining/.hive-staging_hive_2028-11-12_00-22-18_168_5192911748127468369-1/-ext-10000
Loading data to table default.huaining
Table default.huaining stats: [numFiles=1, numRows=1, totalSize=15, rawDataSize=14]
OK
Time taken: 1.805 seconds
hive> select * from huaining;
OK
1 www.cndba.cn
Time taken: 0.12 seconds, Fetched: 1 row(s)
hive>
5.2 向内部表中加载数据
语法如下:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)
这里加local 表示从本地文件系统读取,不加表示从HDFS系统中移动。
hive> load data local inpath '/tmp/dave.txt' overwrite into table huaining;
Loading data to table default.huaining
Table default.huaining stats: [numFiles=1, totalSize=21]
OK
Time taken: 0.461 seconds
hive> select * from huaining;
OK
1 anqing
2 huaining
NULL NULL
Time taken: 0.113 seconds, Fetched: 3 row(s)
hive>
5.3 通过查询语句向表中插入数据
hive> insert overwrite table huaining select * from anqing;
hive> select * from huaining;
OK
88 www.cndba.cn
Time taken: 0.137 seconds, Fetched: 1 row(s)
注意这里加了overwrite 把原来的数据覆盖了。
5.4 建表时插入数据
hive> create table yueshan as select * from huaining;
OK
Time taken: 1.455 seconds
hive> select * from yueshan;
OK
88 www.cndba.cn
Time taken: 0.133 seconds, Fetched: 1 row(s)
hive>
5.5 导出数据
对于文本文件格式的表,可以直接使用hdfs dfs -cp source_path target_path 命令来复制到本地。 也可以使用Hive SQL来导,示例如下:
hive> insert overwrite local directory '/tmp/yueshan' select * from yueshan;
Query ID = hdfs_20281112003636_4c99e88f-401a-4521-87fd-c8e54db4ee76
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Spark Job = 3b5f84a7-ee2c-4b54-afeb-2fe6ec3c3c65
2028-11-12 00:36:42,247 Stage-15_0: 1/1 Finished
Status: Finished successfully in 1.01 seconds
Copying data to local directory /tmp/yueshan
OK
Time taken: 1.295 seconds
hive>
#查看导出的文件:
hive> !cat /tmp/yueshan/000000_0;
88www.cndba.cn
hive>
这里只是对基本操作做一些补充,更多的操作可以直接参考Hive的官方手册。
版权声明:本文为博主原创文章,未经博主允许不得转载。
- 上一篇:Hive 存储格式 说明
- 下一篇:CDH 查看组件的版本信息