ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

数据清洗——地域维度

2021-10-10 15:03:34  阅读:176  来源: 互联网

标签:aa varchar String 地域 2019 维度 QB16 清洗 255


1、数据导入

要求将样表文件中的AA_GXJSQYDC2019数据导入HIVE数据仓库中。分别将地域维度表导入数据仓库中。

(1)将改名且设置字符集为UTF-8后的文件上传到本地

(2)在hive中创建表aa_2019 

create table aa_2019(

ID String,

QA04 String,

QA05 String,

QA07 String,

QA15 String,

QA19 String,

Hangye String,

QB03 String,

QB03ONE String,

QB03TWO String,

QB03_1 String,

QB06 String,

QB16 String,

QB16V String,

Gaoxin String,

QB16_1 String,

QB16_1V String,

QC02 String,

QC05_0 String,

QC24 String,

QC40 String,

QD01 String,

QD28 String,

QJ09 String,

QJ20 String,

QJ55 String,

QJ74 String,

Diyu String,

SYEAR String

)ROW format delimited fields terminated by ',' STORED AS TEXTFILE;

 

将本地文件导入hive中:

 

load data local inpath '/kkb/install/apache-hive-3.1.2-bin/testdate/aa_2019.csv' into table aa_2019;

查看数据正确性:

(3)在hive中创建表diyu

create table diyu(

dm String,

dmms String

)ROW format delimited fields terminated by ',' STORED AS TEXTFILE;

 

将本地文件导入hive中:

 

load data local inpath '/kkb/install/apache-hive-3.1.2-bin/testdate/diyu.csv' into table diyu;

 

 

 

查看数据正确性:

select * from diyu limit 10;

 

 

 


 

2、数据清洗

根据标准维度将地域维度字段清洗完成。

(1)删除表的第一行

 

alter table diyu set TBLPROPERTIES ('skip.header.line.count'='1');

 

(2)创建表aa_2019存放地域维度清洗完的数据:

 

create table aa_19(

ID String, QA04 String, QA05 String,

 QA07 String, QA15 String, QA19 String,

Hangye String, QB03 String, QB03ONE String,

QB03TWO String, QB03_1 String, QB06 String,

QB16 String, QB16V String, Gaoxin String,

QB16_1 String, QB16_1V String, QC02 String,

QC05_0 String, QC24 String, QC40 String,

 QD01 String, QD28 String, QJ09 String,

QJ20 String, QJ55 String, QJ74 String,

Diyu String, SYEAR String

)ROW format delimited fields terminated by ',' STORED AS TEXTFILE;

(3)清洗数据:

insert into table aa_19 select aa_2019.ID as ID , aa_2019.QA04 as QA04, aa_2019.QA05 as QA05, aa_2019.QA07 as QA07, aa_2019.QA15 as QA15, aa_2019.QA19 as QA19, aa_2019.Hangye as Hangye, aa_2019.QB03 as QB03,aa_2019.QB03ONE as QB03ONE, aa_2019.QB03TWO as QB03TWO, aa_2019.QB03_1 as QB03_1, aa_2019.QB06 as QB06, aa_2019.QB16 as QB16, aa_2019.QB16V as QB16V, aa_2019.Gaoxin as Gaoxin, aa_2019.QB16_1 as QB16_1, aa_2019.QB16_1V as QB16_1V, aa_2019.QC02 as QC02, aa_2019.QC05_0 as QC05_0, aa_2019.QC24 as QC24, aa_2019.QC40 as QC40, aa_2019.QD01 as QD01, aa_2019.QD28 as QD28, aa_2019.QJ09 as QJ09, aa_2019.QJ20 as QJ20, aa_2019.QJ55 as QJ55, aa_2019.QJ74 as QJ74, concat(aa_2019.QA19,diyu.dmms) as Diyu, aa_2019.SYEAR as SYEAR from aa_2019 join diyu on (aa_2019.QA19 =diyu.dm)

(4)清洗结果:

select * from table aa_19 limit 10;

 


 

3、数据

(1)在mysql中创建表:

create table aa_19(

ID varchar(255),

QA04 varchar(255),

QA05 varchar(255),

QA07 varchar(255),

QA15 varchar(255),

QA19 varchar(255),

Hangye varchar(255),

QB03 varchar(255),

QB03ONE varchar(255),

QB03TWO varchar(255),

QB03_1 varchar(255),

QB06 varchar(255),

QB16 varchar(255),

QB16V varchar(255),

Gaoxin varchar(255),

QB16_1 varchar(255),

QB16_1V varchar(255),

QC02 varchar(255),

QC05_0 varchar(255),

QC24 varchar(255),

QC40 varchar(255),

QD01 varchar(255),

QD28 varchar(255),

QJ09 varchar(255),

QJ20 varchar(255),

QJ55 varchar(255),

QJ74 varchar(255),

Diyu varchar(255),

SYEAR varchar(255)

)

 

(2)通过sqoop将表导入mysql:

 bin/sqoop export \

--connect "jdbc:mysql://node01:3306/hive2?useUnicode=true&characterEncoding=utf-8" \

--username root \

--password wyhhxx \

--table aa_19 \

--num-mappers 1 \

--export-dir /user/hive/warehouse/aa_19 \

--input-fields-terminated-by ","

 

(3)导出结果:

 

 


4、数据可视化展示

 

标签:aa,varchar,String,地域,2019,维度,QB16,清洗,255
来源: https://www.cnblogs.com/znjy/p/15389448.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有