ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

hive内部表和外部表的区别以及应用场景

2022-02-05 18:31:31  阅读:199  来源: 互联网

标签:terminated 场景 managed string 外部 hive user table


写在前面
ps:干货很多…
我们都知道,Hive基本上就是内部表和外部表两种类型,在面试的时候,常会问到这种题目:
1.hive内部表和外部表的区别
2.什么时候使用内部表,什么时候使用外部表

来自官网的定义:

Managed tables
A managed table is stored under the hive.metastore.warehouse.dir path property, by default in a folder path similar to /user/hive/warehouse/databasename.db/tablename/. The default location can be overridden by the location property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration.

Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables.

External tables
An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an MSCK REPAIR TABLE table_name statement can be used to refresh metadata information.

Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.

创建外部表的同时,语句末尾一般要自己指定 数据文件存储路径 location ‘/AUTO/PATH’
内部表不用特殊指定,默认为/user/hive/warehouse,
可配置:hive-site.xml
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
</property>

内部表数据由Hive自身管理,外部表数据由HDFS管理;
DTOP TABLE,
内部表:元数据和数据文件都会被删除掉
外部表:元数据被删除,数据文件任然保留 ,此时重建表都是可以的,还是可以直接查数据的
LOAD DATA ,
加载HDFS DATA 都是会将HDFS数据进行移动到对应的表目录,类似 mv 命令

二 、使用场景
同样来自官网的说明:

ARCHIVE/UNARCHIVE/TRUNCATE/MERGE/CONCATENATE only work for managed tables
DROP deletes data for managed tables while it only deletes metadata for external ones
ACID/Transactional only works for managed tables
Query Results Caching only works for managed tables
Only the RELY constraint is allowed on external tables
Some Materialized View features only work on managed tables

简单来说:

每天采集的ng日志和埋点日志,在存储的时候建议使用外部表,因为日志数据是采集程序实时采集进来的,一旦被误删,恢复起来非常麻烦。而且外部表方便数据的共享。

抽取过来的业务数据,其实用外部表或者内部表问题都不大,就算被误删,恢复起来也是很快的,如果需要对数据内容和元数据进行紧凑的管理, 那还是建议使用内部表

在做统计分析时候用到的中间表,结果表可以使用内部表,因为这些数据不需要共享,使用内部表更为合适。并且很多时候结果分区表我们只需要保留最近3天的数据,用外部表的时候删除分区时无法删除数据。

内部表

create  table test(
user_id string,
user_name string,
hobby array<string>,
scores map<string,int>
)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

外部表

create external table external_test(
user_id string,
user_name string,
hobby array<string>,
scores map<string,int>
)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n'
 location '/warehouse/db01.db';

分区表

create  table partition_test(
user_id string,
user_name string,
hobby array<string>,
scores map<string,int>
)
partitioned by (time string)
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

分桶表

create  table bucket_test(
user_id string,
user_name string,
hobby array<string>,
scores map<string,int>
)
clustered by (user_name) sorted by (user_name) into 3 buckets
row format delimited
fields terminated by '|'
collection items terminated by ','
map keys terminated by ':'
lines terminated by '\n';

装载数据

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

 

 

 

 

原文链接:https://blog.csdn.net/liuge36/article/details/111425996

标签:terminated,场景,managed,string,外部,hive,user,table
来源: https://www.cnblogs.com/to-here/p/15864552.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有