ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

GenomeDISCO

2020-03-19 16:00:46  阅读:311  来源: 互联网

标签:file GenomeDISCO contact samples bins metadata


 

评估从染色体捕获实验获得的一对接触图的相似性。GenomeDISCO设计用于评估从染色体构象捕获实验获得的染色质接触图的一致性和可重复性。它对测序深度,结点和边缘缺失噪声,结构域边界的变化以及距离依赖性的细微差别的识别灵敏度高,将生物重复与不同细胞类型区分开来。

 

GenomeDISCO (DIfferences between Smoothed COntact maps) is a package for comparing contact maps of 3D genome structures, obtained from experiments such as Hi-C, Capture-C, ChIA-PET, HiChip, etc. 在比较接触图之前,它使用接触图图上的随机行走进行平滑,从而产生一致性分数,可用于生物重复的质量控制。

 

比较两个接触图

1. 得到配置文件

/genomedisco/examples/configure_example.sh

 

2. 进行一致性分析

cd genomedisco
genomedisco run_all --metadata_samples examples/metadata.samples --metadata_pairs examples/metadata.pairs --bins examples/Bins.w50000.bed.gz --outdir examples/output 

 

Inputs

运行  GenomeDISCO 前,

要有以下文件

  • contact map For each of your samples, you need a file containing the counts assigned to each pair of bins in your contact map, and should have the format chr1 bin1 chr2 bin2 value. Note: GenomeDISCO assumes that this file contains the contacts for all chromosomes, and will split it into individual files for each chromosome.

  • bins This file contains the full set of genomic regions associated with your contact maps, in the format chr start end name where name is the name of the bin as used in the contact map files above. GenomeDISCO supports both fixed-size bins and variable-sized bins (e.g. obtained by partitioning the genome into restriction fragments).

 

GenomeDISCO 需要下列输入文件:

  • --metadata_samples 要比较的样本信息. Tab-delimited file, with columns "samplename", "samplefile". Note: each samplename should be unique. Each samplefile listed here should follow the format "chr1 bin1 chr2 bin2 value

  • --metadata_pairs 每一行是一对要比较的样本名。  in the format "samplename1 samplename2". Important: sample names used here need to correspond to the first column of the --metadata_samples file.

  • --bins A (gzipped) 分析中用到的所有bins的bed file。 It should have 4 columns: "chr start end name", where the name of the bin corresponds to the bins used in the contact maps.

  • --re_fragments 如果基因组中的bins不均一 就加上flag  (例如,如果它们是基于限制性内切酶的).默认情况下这些代码认为bins是均一的长度。

  • --parameters_file 生物重复和QC分析的参数 For details see "Parameters file"

  • --outdir Name of output directory. DEFAULT: replicateQC

  • --running_mode The mode in which to run the analysis. This allows you to choose whether the analysis will be run as is, or submitted as a job through sge or slurm. Available options are: "NA" (default, no jobs are submitted). Coming soon: "sge", "slurm"

  • --concise_analysis Set this flag to obtain a concise analysis, which means replicateQC is measured but plots that might be more time/memory consuming are not created. This is useful for quick testing or running large-scale analyses on hundreds of comparisons.

  • --subset_chromosomes Comma-delimited list of chromosomes for which you want to run the analysis.该参数选择部分染色体进行分析。 默认在所有染色体上进行分析。This is useful for quick testing

 

 

 

 

来源:

https://omictools.com/genomedisco-tool

 

https://github.com/kundajelab/genomedisco

标签:file,GenomeDISCO,contact,samples,bins,metadata
来源: https://www.cnblogs.com/bio-mary/p/12524878.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有