ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

R语言实战之如何对数据进行缺失值处理

2020-11-29 13:02:19  阅读:685  来源: 互联网

标签:实战 10 FALSE 语言 08 Young leadership UK 缺失


R语言实战之如何对数据进行缺失值处理

以下是对于数据中含有部分缺失值的处理方式,代码十分详尽基础:

下面展示一些 基础代码

manager <- c(1,2,3,4,5)
date <- c("10/24/08","10/28/08","10/1/08","10/12/08","5/1/09")
country <- c("US","US","UK","UK","UK")
gender <- c("M","F","F","M","F")
age <- c(32,45,25,39,99)
q1 <- c(5,3,3,3,2)
q2 <- c(4,5,5,3,2)
q3 <- c(5,2,5,4,1)
q4 <- c(5,5,5,NA,2)
q5 <- c(5,5,2,NA,1)
leadership <- data.frame(manager, date, country, gender, age,
                         q1,q2,q3,q4,q5,stringsAsFactors = FALSE)
#stringsAsFactors = FALSE遇到字符型的数值的时候,不将其转换成变量
leadership

leadership$age[leadership$age == 99] <- NA
#将99岁的数据列为缺失值
leadership <- within(leadership,{
  agecat <- NA
  agecat[age > 75] <- "elder"
  agecat[age >= 55 & age <=75] <- "Middle Aged"
  agecat[age < 55] <- "Young"})

leadership

library("plyr")
#fix(leadership)
#打开数据编辑器进行修改,当然也可直接修改,示例如下:
leadership <- rename(leadership, c(manager = "Manager ID", date = "Testdate"))
names(leadership)[6:10] <- c("item1","item2","item3","item4","item5")
leadership
is.na(leadership)
#检测是否有缺失值

x <- c(1,2,NA,3)
y <- sum(x,na.rm = "true")
y
#na.rm移除缺失的值为真
#na.omit忽略缺失值
leadership
newdata <- na.omit(leadership)
newdata

> manager <- c(1,2,3,4,5)
> date <- c("10/24/08","10/28/08","10/1/08","10/12/08","5/1/09")
> country <- c("US","US","UK","UK","UK")
> gender <- c("M","F","F","M","F")
> age <- c(32,45,25,39,99)
> q1 <- c(5,3,3,3,2)
> q2 <- c(4,5,5,3,2)
> q3 <- c(5,2,5,4,1)
> q4 <- c(5,5,5,NA,2)
> q5 <- c(5,5,2,NA,1)
> leadership <- data.frame(manager, date, country, gender, age,
+                          q1,q2,q3,q4,q5,stringsAsFactors = FALSE)
> #stringsAsFactors = FALSE遇到字符型的数值的时候,不将其转换成变量
> leadership
  manager     date country gender age q1 q2 q3 q4 q5
1       1 10/24/08      US      M  32  5  4  5  5  5
2       2 10/28/08      US      F  45  3  5  2  5  5
3       3  10/1/08      UK      F  25  3  5  5  5  2
4       4 10/12/08      UK      M  39  3  3  4 NA NA
5       5   5/1/09      UK      F  99  2  2  1  2  1
> 
> leadership$age[leadership$age == 99] <- NA
> #将99岁的数据列为缺失值
> leadership <- within(leadership,{
+   agecat <- NA
+   agecat[age > 75] <- "elder"
+   agecat[age >= 55 & age <=75] <- "Middle Aged"
+   agecat[age < 55] <- "Young"})
> 
> leadership
  manager     date country gender age q1 q2 q3 q4 q5 agecat
1       1 10/24/08      US      M  32  5  4  5  5  5  Young
2       2 10/28/08      US      F  45  3  5  2  5  5  Young
3       3  10/1/08      UK      F  25  3  5  5  5  2  Young
4       4 10/12/08      UK      M  39  3  3  4 NA NA  Young
5       5   5/1/09      UK      F  NA  2  2  1  2  1   <NA>
> 
> library("plyr")
> #fix(leadership)
> #打开数据编辑器进行修改,当然也可直接修改,示例如下:
> leadership <- rename(leadership, c(manager = "Manager ID", date = "Testdate"))
> names(leadership)[6:10] <- c("item1","item2","item3","item4","item5")
> leadership
  Manager ID Testdate country gender age item1 item2 item3 item4 item5 agecat
1          1 10/24/08      US      M  32     5     4     5     5     5  Young
2          2 10/28/08      US      F  45     3     5     2     5     5  Young
3          3  10/1/08      UK      F  25     3     5     5     5     2  Young
4          4 10/12/08      UK      M  39     3     3     4    NA    NA  Young
5          5   5/1/09      UK      F  NA     2     2     1     2     1   <NA>
> is.na(leadership)
     Manager ID Testdate country gender   age item1 item2 item3 item4 item5 agecat
[1,]      FALSE    FALSE   FALSE  FALSE FALSE FALSE FALSE FALSE FALSE FALSE  FALSE
[2,]      FALSE    FALSE   FALSE  FALSE FALSE FALSE FALSE FALSE FALSE FALSE  FALSE
[3,]      FALSE    FALSE   FALSE  FALSE FALSE FALSE FALSE FALSE FALSE FALSE  FALSE
[4,]      FALSE    FALSE   FALSE  FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  FALSE
[5,]      FALSE    FALSE   FALSE  FALSE  TRUE FALSE FALSE FALSE FALSE FALSE   TRUE
> #检测是否有缺失值
> 
> x <- c(1,2,NA,3)
> y <- sum(x,na.rm = "true")
> y
[1] 6
> #na.rm移除缺失的值为真
> #na.omit忽略缺失值
> leadership
  Manager ID Testdate country gender age item1 item2 item3 item4 item5 agecat
1          1 10/24/08      US      M  32     5     4     5     5     5  Young
2          2 10/28/08      US      F  45     3     5     2     5     5  Young
3          3  10/1/08      UK      F  25     3     5     5     5     2  Young
4          4 10/12/08      UK      M  39     3     3     4    NA    NA  Young
5          5   5/1/09      UK      F  NA     2     2     1     2     1   <NA>
> newdata <- na.omit(leadership)
> newdata
  Manager ID Testdate country gender age item1 item2 item3 item4 item5 agecat
1          1 10/24/08      US      M  32     5     4     5     5     5  Young
2          2 10/28/08      US      F  45     3     5     2     5     5  Young
3          3  10/1/08      UK      F  25     3     5     5     5     2  Young

以上,共勉。

标签:实战,10,FALSE,语言,08,Young,leadership,UK,缺失
来源: https://blog.csdn.net/Math_is_hard/article/details/110309622

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有