ICode9

精准搜索请尝试: 精确搜索
首页 > 系统相关> 文章详细

linux shell脚本实现 xx.hmp.txt格式数据转换为plink格式

2022-07-29 01:01:10  阅读:152  来源: 互联网

标签:shell plink NA PC1 result home test 格式


 

001、测试数据

root@PC1:/home/test# ls
mdp_genotype_test.hmp.txt  record.sh
root@PC1:/home/test# head -n 5 mdp_genotype_test.hmp.txt | cut -f 1-13   ## 测试数据
rs      alleles chrom   pos     strand  assembly        center  protLSID        assayLSID       panel   QCcode  33-16   38-11
PZB00859.1      A/C     1       157104  +       AGPv1   Panzea  NA      NA      maize282        NA      CC      CC
PZA01271.1      C/G     1       1947984 +       AGPv1   Panzea  NA      NA      maize282        NA      CC      GG
PZA03613.2      G/T     1       2914066 +       AGPv1   Panzea  NA      NA      maize282        NA      GG      GG
PZA03613.1      A/T     1       2914171 +       AGPv1   Panzea  NA      NA      maize282        NA      TT      TT
root@PC1:/home/test# cat record.sh     ## 转换脚本
#!/bin/bash

columns=$(head -n 1 $1 | awk '{print NF}')

for i in $(seq 12 $columns); do cut -f $i $1 | paste -d "\t" -s | sed 's/\r//g; s/NN/00/g; s/\t./&\t/g;s/^\S\+\t/&&\t0\t0\t1\t-9\t/g' >> result.ped; done

sed 1d $1 | awk '{OFS = "\t"; print $3, $1, 0, $4}' > result.map

 

root@PC1:/home/test# ls
mdp_genotype_test.hmp.txt  record.sh
root@PC1:/home/test# bash record.sh mdp_genotype_test.hmp.txt    ## 格式转换
root@PC1:/home/test# ls   
mdp_genotype_test.hmp.txt  record.sh  result.map  result.ped
root@PC1:/home/test# plink --file result --recode      ## 测试转换结果
PLINK v1.90b6.24 64-bit (6 Jun 2021)           www.cog-genomics.org/plink/1.9/
(C) 2005-2021 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to plink.log.
Options in effect:
  --file result
  --recode

15969 MB RAM detected; reserving 7984 MB for main workspace.
.ped scan complete (for binary autoconversion).
Warning: Variant 424 quadallelic; setting rarest alleles missing.
Performing single-pass .bed write (3093 variants, 281 people).
--file: plink-temporary.bed + plink-temporary.bim + plink-temporary.fam
written.
3093 variants loaded from .bim file.
281 people (281 males, 0 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 281 founders and 0 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.963828.
3093 variants and 281 people pass filters and QC.
Note: No phenotypes present.
--recode ped to plink.ped + plink.map ... done.
root@PC1:/home/test# ls
mdp_genotype_test.hmp.txt  plink.log  plink.map  plink.ped  record.sh  result.map  result.ped

 

标签:shell,plink,NA,PC1,result,home,test,格式
来源: https://www.cnblogs.com/liujiaxin2018/p/16530854.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有