ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

task6b-哦别做梦了-TP53在TCGA的肝癌的有配对样本病人的转录组数据表达量配对图

2021-10-16 00:01:19  阅读:225  来源: 互联网

标签:tcga TP53 BC DD TCGA 11A 配对


作业链接

0.作业题目

  • 从ucsc的xena浏览器里面下载感兴趣癌症,比如肝癌的表达矩阵(counts值)
  • 然后根据样本名字拿到有配对的几十个病人的癌症和正常对照数据(部分癌症数据并没有对照)
  • 接着提取感兴趣基因(比如TP53)的表达量
  • 最后套用上面的绘图代码即可!

1.数据下载


下载网址
在这里插入图片描述

然后找到LIHC
在这里插入图片描述

点击进去下载即可
在这里插入图片描述

2.数据提取以及简单统计


提取TP53的表达量数据
#TP53的ensemble id 为ENSG00000141510
zcat TCGA-LIHC.htseq_counts.tsv.gz | grep -E 'Ensembl_ID|ENSG00000141510' >TP53_tcga_expression.txt
library(dplyr)
tp53_tcga = read.table('TP53_tcga_expression.txt',header = T,check.names = F)
rownames(tp53_tcga) = 'TP53'
tp53_tcga = tp53_tcga[,-1]
统计正常样品 和 肿瘤 样品个数
> table(colnames(tp53_tcga) %>% sub('TCGA-\\w+-\\w+-','',.) )
01A 01B 02A 02B 11A 
369   2   2   1  50 
#tumor个数:369+2+2+1=374
#normal个数:50
#一共424个样本

其中01-09是tumor样本;10-29是normal样本;
这里只保留01A和11A这两种最常用的样本
01代表的是Primary Solid Tumor;11代表的是Solid Tissue Normal,具体详见https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/sample-type-codes

#只保留01和11两类样本
tp53_tcga <- colnames(tp53_tcga) %>% grepl('-[01]1A', . ,perl = T) %>% which(.) %>% tp53_tcga[,.] 
#对保留下来的样本进行统计
> table(colnames(tp53_tcga) %>% sub('TCGA-\\w+-\\w+-','',.) )
01A 11A 
369  50 
提取既有normal又有tumor的病人id

如下所示,A10Q这个相同即代表的是同一个捐献者
01A代表是取的肿瘤部位的样品,11A代表取的正常组织的样品

TCGA-BC-A10Q-01A 11.514714054138487
TCGA-BC-A10Q-11A 9.843921051289035

提取有上述配对情况的病人id,共计50个

> names(which((colnames(tp53_tcga) %>% sub('-[01]1A','',.) %>% table(.))==2))
 [1] "TCGA-BC-A10Q" "TCGA-BC-A10R" "TCGA-BC-A10T" "TCGA-BC-A10U" "TCGA-BC-A10W" "TCGA-BC-A10X" "TCGA-BC-A10Y" "TCGA-BC-A10Z" "TCGA-BC-A110"
[10] "TCGA-BC-A216" "TCGA-BD-A2L6" "TCGA-BD-A3EP" "TCGA-DD-A113" "TCGA-DD-A114" "TCGA-DD-A116" "TCGA-DD-A118" "TCGA-DD-A119" "TCGA-DD-A11A"
[19] "TCGA-DD-A11B" "TCGA-DD-A11C" "TCGA-DD-A11D" "TCGA-DD-A1EB" "TCGA-DD-A1EC" "TCGA-DD-A1EE" "TCGA-DD-A1EG" "TCGA-DD-A1EH" "TCGA-DD-A1EI"
[28] "TCGA-DD-A1EJ" "TCGA-DD-A1EL" "TCGA-DD-A39V" "TCGA-DD-A39W" "TCGA-DD-A39X" "TCGA-DD-A39Z" "TCGA-DD-A3A1" "TCGA-DD-A3A2" "TCGA-DD-A3A3"
[37] "TCGA-DD-A3A4" "TCGA-DD-A3A5" "TCGA-DD-A3A6" "TCGA-DD-A3A8" "TCGA-EP-A12J" "TCGA-EP-A26S" "TCGA-EP-A3RK" "TCGA-ES-A2HT" "TCGA-FV-A23B"
[46] "TCGA-FV-A2QR" "TCGA-FV-A3I0" "TCGA-FV-A3I1" "TCGA-FV-A3R2" "TCGA-G3-A3CH"

根据上述提取到的样本名字进一步拿到其对应的tumor和normal的表达量数据

tumor_and_normal = names(which((colnames(tp53_tcga) %>% sub('-[01]1A','',.) %>% table(.))==2))
normal = tp53_tcga[,paste0(tumor_and_normal,"-11A")]
tumor = tp53_tcga[,paste0(tumor_and_normal,"-01A")]
> normal
     TCGA-BC-A10Q-11A TCGA-BC-A10R-11A TCGA-BC-A10T-11A TCGA-BC-A10U-11A TCGA-BC-A10W-11A TCGA-BC-A10X-11A TCGA-BC-A10Y-11A
TP53         9.843921         10.06474         10.13955         9.896332          9.79279         10.36304         10.56986
     TCGA-BC-A10Z-11A TCGA-BC-A110-11A TCGA-BC-A216-11A TCGA-BD-A2L6-11A TCGA-BD-A3EP-11A TCGA-DD-A113-11A TCGA-DD-A114-11A
TP53         10.71167         9.810572         9.575539         10.80574         9.930737         9.623881         10.87498
     TCGA-DD-A116-11A TCGA-DD-A118-11A TCGA-DD-A119-11A TCGA-DD-A11A-11A TCGA-DD-A11B-11A TCGA-DD-A11C-11A TCGA-DD-A11D-11A
TP53         9.847057         8.839204         10.01262         10.59246         9.259743         8.668885         9.971544
     TCGA-DD-A1EB-11A TCGA-DD-A1EC-11A TCGA-DD-A1EE-11A TCGA-DD-A1EG-11A TCGA-DD-A1EH-11A TCGA-DD-A1EI-11A TCGA-DD-A1EJ-11A
TP53         10.42731         10.11894         9.609179         10.03067         10.28309         10.29806         10.68212
     TCGA-DD-A1EL-11A TCGA-DD-A39V-11A TCGA-DD-A39W-11A TCGA-DD-A39X-11A TCGA-DD-A39Z-11A TCGA-DD-A3A1-11A TCGA-DD-A3A2-11A
TP53         10.13699         9.544964         10.26796         10.41574         9.854868         8.897845         10.16993
     TCGA-DD-A3A3-11A TCGA-DD-A3A4-11A TCGA-DD-A3A5-11A TCGA-DD-A3A6-11A TCGA-DD-A3A8-11A TCGA-EP-A12J-11A TCGA-EP-A26S-11A
TP53         9.346514         11.37829         9.529431         9.278449         9.544964         9.764872         9.949827
     TCGA-EP-A3RK-11A TCGA-ES-A2HT-11A TCGA-FV-A23B-11A TCGA-FV-A2QR-11A TCGA-FV-A3I0-11A TCGA-FV-A3I1-11A TCGA-FV-A3R2-11A
TP53          10.0348         10.89709         10.39553         10.47675         9.642052         10.02375          10.0348
     TCGA-G3-A3CH-11A
TP53         9.368506

3.可视化


input <- data.frame(normal = as.numeric(normal), tumor = as.numeric(tumor))
library(ggpubr)
ggpaired(input, cond1 = "normal", cond2 = "tumor",
         fill = "condition", palette = "jco")

在这里插入图片描述

参考
http://rpkgs.datanovia.com/ggpubr/reference/ggpaired.html

标签:tcga,TP53,BC,DD,TCGA,11A,配对
来源: https://blog.csdn.net/coding_Joash/article/details/120792599

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有