ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

实验记录 | 6/10

2021-06-10 19:01:59  阅读:198  来源: 互联网

标签:10 STAR 记录 -- RNA 实验 hg19 output


(8:54)早上的时候,来这边发现,gtf转换,但是并没有转换出来(847MB==>43.5GB)。所以,昨天的尝试是失败的。
又及:另外一个平台上尝试下载的gtf文件,也下载失败。

gencode.v19.annotation.gtf.gz 63%[++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++====> ] 22.89M 11.5KB/s in 3m 19s
2021-06-09 21:19:36 (7.90 KB/s) - Connection closed at byte 24001167. Giving up.

所以,整体上,卡在了gtf与基因组序列的不匹配上。之前,我记得我也遇到类似的问题,整体上应该解决起来并不难的。
(9:04)我现在应该怎么办呢?
我找一下,之前我的基因组的参考序列是在哪个平台上下载的:

(5月13日)
mkdir genome/hg19
cd genome/hg19
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz
tar zvfx chromFa.tar.gz

我的参考基因组序列是在UCSC平台上获取的,所以,我的gtf文档,也应该在这个平台上获取(这样才能最大可能的一一对应起来)。
(记得到时候学姐讲的时候,给学姐拍照)
这个界面的打开,是比较慢的,但是不要着急。

Feb. 2009 (GRCh37/hg19)
Genome sequence files and select annotations (2bit, GTF, GC-content, etc)
Sequence data by chromosome
Annotations
GC percent data
Protein database for hg19
SNP-masked fasta files
LiftOver files
Pairwise alignments (primates)
Pairwise alignments (other mammals)
Pairwise alignments (other vertebrates)
Multiple alignments
Patches

终于找到了UCSC上,hg19的gtf注释文档。
参考链接:https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/
目录下面一共有四个选择:

hg19.ensGene.gtf.gz 2020-01-10 09:45 26M
hg19.knownGene.gtf.gz 2020-01-10 09:45 17M
hg19.ncbiRefSeq.gtf.gz 2021-05-17 10:35 19M
hg19.refGene.gtf.gz 2020-01-10 09:45 21M
(可以看到,整体上并不是很大,哪里有那么夸张呢!)

我们选择,下载refGene这个版本。
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.refGene.gtf.gz

–2021-06-10 09:29:31-- https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.refGene.gtf.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)… 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 21846291 (21M) [application/x-gzip]
Saving to: ‘hg19.refGene.gtf.gz’
hg19.refGene.gtf.gz 38%[======> ] 8.02M 115KB/s eta 8hg19.hg19.refGene.gh

下载完成之后,我们将其解压,最后查看文件内容。

chr1 refGene transcript 11869 14362 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; gene_name “LOC102725121”;
chr1 refGene exon 11869 12227 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; exon_number “1”; exon_id “NR_148357.1”; gene_name “LOC102725121”;
chr1 refGene exon 12613 12721 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; exon_number “2”; exon_id “NR_148357.2”; gene_name “LOC102725121”;
chr1 refGene exon 13221 14362 . + . gene_id “LOC102725121”; transcript_id “NR_148357”; exon_number “3”; exon_id “NR_148357.3”; gene_name “LOC102725121”;
chr1 refGene transcript 11874 14409 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; gene_name “DDX11L1”;
chr1 refGene exon 11874 12227 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; exon_number “1”; exon_id “NR_046018.1”; gene_name “DDX11L1”;
chr1 refGene exon 12613 12721 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; exon_number “2”; exon_id “NR_046018.2”; gene_name “DDX11L1”;
chr1 refGene exon 13221 14409 . + . gene_id “DDX11L1”; transcript_id “NR_046018”; exon_number “3”; exon_id “NR_046018.3”; gene_name “DDX11L1”;
chr22 refGene transcript 24666799 24813706 . + . gene_id “SPECC1L”; transcript_id “NM_015330”; gene_name “SPECC1L”;
chr22 refGene exon 24666799 24666951 . + . gene_id “SPECC1L”; transcript_id “NM_015330”; exon_number “1”; exon_id “NM_015330.1”; gene_name “SPECC1L”;

感觉,有点奇怪,这个文件中的染色体号,好像是乱的。
有点意思,我们沿用昨天的方法,把他的前面取出来。

chr17
chr12
chr2
chr6_mcf_hap5
chr2
chr6_mcf_hap5
chr7
chr5
chr6_mcf_hap5
chrX

全部都是乱的(并不是我们理想的状态)。
我想再去看看其他的文件。
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.ensGene.gtf.gz

–2021-06-10 09:39:08-- https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/genes/hg19.ensGene.gtf.gz
Resolving hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)… 128.114.119.163
Connecting to hgdownload.soe.ucsc.edu (hgdownload.soe.ucsc.edu)|128.114.119.163|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 27763607 (26M) [application/x-gzip]
Saving to: ‘hg19.ensGene.gtf.gz’
hg19.ensGene.gtf.gz 100%[================================================================================================================>] 26.48M 101KB/s in 3m 11s
2021-06-10 09:42:20 (142 KB/s) - ‘hg19.ensGene.gtf.gz’ saved [27763607/27763607]

gunzip -d hg19.ensGene.gtf.gz
grep -o -E "^\w+([-+.]\w+)*" hg19.ensGene.gtf|uniq

chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr17_gl000204_random
chr17_gl000205_random
chr18
chr19

这个我挺满意的,只是接下来,要把这个文件按照字典排序(chr1,chr2,chr3……)。
可以使用bedtools来对序列进行排序。
参考链接:https://anaconda.org/bioconda/bedtools
conda install -c bioconda bedtools
安装成功,可以继续使用这个工具了。
后来尝试后发现,其实并不需要bedtools这个工具,直接使用sort即可(我对于一些shell指令还是不太熟悉)。
sort -k1,1V -k3,3n -k4,4n hg19.ensGene.gtf >hg19.ensGene_sorted.gtf
grep -o -E "^\w+([-+.]\w+)*" hg19.ensGene_sorted.gtf|uniq
最后的sorted结果,是我们想要的结果:

chr1
chr1_gl000191_random
chr1_gl000192_random
chr2
chr3
chr4
chr4_gl000193_random
chr4_gl000194_random
chr5
chr6
chr6_apd_hap1
chr6_cox_hap2
chr6_dbb_hap3
chr6_mann_hap4
chr6_mcf_hap5
chr6_qbl_hap6
chr6_ssto_hap7
chr7
chr7_gl000195_random
chr8
chr8_gl000196_random
chr9
chr9_gl000199_random
chr9_gl000201_random
chr10
chr11
……
chrUn_gl000213
chrUn_gl000215
chrUn_gl000216
chrUn_gl000218
chrUn_gl000219
……
chrX
chrY

我们可以使用这个gtf注释文档,来进行接下来的star比对工作。
我们把sorted后的文档,上传到服务器中。


上传完成之后,开始运行比对,产生索引。
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runThreadN 32 --runMode genomeGenerate --genomeDir /home/xxzhang/workplace/QBRC/geneome/hg19/STAR --genomeFastaFiles hg19.fa --sjdbGTFfile hg19.gtf --sjdbOverhang 149

STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 10:51:47 … started STAR run
Jun 10 10:51:47 … starting to generate Genome files
Jun 10 10:52:57 … processing annotations GTF(这一步终于过去了,不再卡在原来的地方,说明我的努力是有价值的)
Jun 10 10:53:28 … starting to sort Suffix Array. This may take a long time…
Jun 10 10:53:47 … sorting Suffix Array chunks and saving them to disk…
Jun 10 11:05:51 … loading chunks from disk, packing SA…
Jun 10 11:07:14 … finished generating suffix array
Jun 10 11:07:14 … generating Suffix Array index
Jun 10 11:10:41 … completed Suffix Array index
Jun 10 11:10:42 … inserting junctions into the genome indices
Jun 10 11:14:59 … writing Genome to disk …
Jun 10 11:15:00 … writing Suffix Array to disk …
Jun 10 11:15:10 … writing SAindex to disk
Jun 10 11:15:10 … finished successfully

所以创建成功!比我想象的要快很多!
最后在路径下生成的文件有:

chrLength.txt exonInfo.tab hg19.gtf sjdbList.fromGTF.out.tab
chrNameLength.txt geneInfo.tab Log.out sjdbList.out.tab
chrName.txt Genome SA transcriptInfo.tab
chrStart.txt genomeParameters.txt SAindex
exonGeTrInfo.tab hg19.fa sjdbInfo.txt

索引,创建完成之后,继续运行somatic.pl程序。

perl somatic.pl  RNA:./data/SRR3052083.fastq.gz  NA  RNA:./data/SRR5297065.fastq.gz NA  32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java  ./output_RNA  human 1 ./disambiguate_pipeline

在比对的过程中,出现错误。

EXITING because of INPUT ERROR: could not open genomeFastaFile: ./geneome/hg19/hg19.fa
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

我们来看一下,在STAR比对那一步,它在代码中是如何写的。

$star_index=$index;
$star_index=~s/\/[^\/]*?$/\/STAR/;
……
system_call("STAR --genomeDir ".$star_index." --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
          " ".$zcat." --outFileNamePrefix ".$type_output."/tmp/pass1 --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

system_call("cd ".$type_output.";
STAR --runMode genomeGenerate --genomeDir ".$type_output."/tmp/pass2 --genomeFastaFiles ".$index." --sjdbFileChrStartEnd ".
          $type_output."/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN ".$thread);

system_call("STAR --genomeDir ".$type_output."/tmp/pass2 --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
          " ".$zcat." --outFileNamePrefix ".$type_output."/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

通过这段代码,我们可以猜测,主要原因是权限不够。
修改hg19文件夹的权限。
chmod -R 777 geneome/hg19/
重新运行指令。

perl somatic.pl  RNA:./data/SRR3052083.fastq.gz  NA  RNA:./data/SRR5297065.fastq.gz NA  32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java  ./output_RNA  human 1 ./disambiguate_pipeline

同样的错误,并没有得到很好的解决:

EXITING because of FATAL ERROR: could not open genome file ./output_RNA/tumor/tmp/pass2//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jun 10 11:37:34 … FATAL ERROR, exiting
mv ./output_RNA/tumor/Aligned.out.sam ./output_RNA/tumor/alignment.sam
./output_RNA/tumor/SJ.out.tab doesn’t exist!
Jun 10 11:37:46 … finished mapping
Jun 10 11:37:48 … finished successfully
cd ./output_RNA/normal;/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/normal/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/normal/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN 32
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/normal/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/normal/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN 32
STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 11:37:48 … started STAR run
!!! WARNING: Could not move Log.out file from ./Log.out into ./output_RNA/normal/tmp/pass2/Log.out. Will keep ./Log.out
Jun 10 11:37:48 … starting to generate Genome files
EXITING because of INPUT ERROR: could not open genomeFastaFile: ./geneome/hg19/hg19.fa
Jun 10 11:37:48 … FATAL ERROR, exiting
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/normal/tmp/pass2 --readFilesIn ./output_RNA/normal/fastq1.fastq ./output_RNA/normal/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/normal/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/normal/tmp/pass2 --readFilesIn ./output_RNA/normal/fastq1.fastq ./output_RNA/normal/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/normal/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4
STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 11:37:48 … started STAR run
Jun 10 11:37:48 … loading genome
EXITING because of FATAL ERROR: could not open genome file ./output_RNA/normal/tmp/pass2//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jun 10 11:37:48 … FATAL ERROR, exiting
mv ./output_RNA/normal/Aligned.out.sam ./output_RNA/normal/alignment.sam
./output_RNA/normal/SJ.out.tab doesn’t exist!
Error: At least one bam file doesn’t exist!

所以,问题出在哪里呢?
我们来一行行的分析:
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./geneome/hg19/STAR --readFilesIn ./output_RNA/normal/fastq1.fastq ./output_RNA/normal/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/normal/tmp/pass1 --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

输入文件:
./output_RNA/normal/fastq1.fastq
./output_RNA/normal/fastq2.fastq
输出文件:
./output_RNA/normal/tmp/pass1

genomeDir ./GenomeDir/
string: path to the directory where genome files are stored (for --runMode alignReads) or will be generated (for --runMode generateGenome)
readFilesIn Read1 Read2
string(s): paths to files that contain input read1 (and, if needed, read2)

所以,这行代码的功能是比对吗?
不是,我觉得它没有任何意义。

进入到tumor文件夹;
cd ./output_RNA/tumor
这里怎么还创建基因组索引了呢(匪夷所思)?这是不对的。
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/tumor/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/tumor/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN 32

接下来
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/tumor/tmp/pass2 --readFilesIn ./output_RNA/tumor/fastq1.fastq ./output_RNA/tumor/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/tumor/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

所以报出了错误(哪里可能对呢?这个程序文件就是不合理的):

EXITING because of FATAL ERROR: could not open genome file ./output_RNA/tumor/tmp/pass2//genomeParameters.txt
SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions
Jun 10 11:37:34 … FATAL ERROR, exiting

……
诸如此类,在结构上挺乱的。作者为什么要做这些操作呢?为什么不直接比对?

我们需要理解,这三行STAR比对代码的意思。
所以,我觉得建库,这些操作可以都不需要,直接就是将normal/tumor序列数据比对到参考基因组上一行指令即可(这块是作者忘记注释掉了吗?)
需要查看一下网上的比对指令,我们这个文件应该怎样比对比较好?

STAR --twopassMode Basic 
--quantMode TranscriptomeSAM GeneCounts 
--runThreadN 6 
--genomeDir index_dir 
--alignIntronMin 20 
--alignIntronMax 50000 
--outSAMtype BAM SortedByCoordinate 
--sjdbOverhang 149 
--outSAMattrRGline ID:sample SM:sample PL:ILLUMINA 
--outFilterMismatchNmax 2 
--outSJfilterReads Unique 
--outSAMmultNmax 1 
--outFileNamePrefix out_prefix 
--outSAMmapqUnique 60 
--readFilesCommand gunzip -c 
--readFilesIn seq1.fq.gz seq2.fq.gz

修改代码:

system_call("STAR --genomeDir ".$star_index." --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
        " ".$zcat." --outFileNamePrefix ".$type_output." --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

#system_call("cd ".$type_output.";
#STAR --runMode genomeGenerate --genomeDir ".$type_output."/tmp/pass2 --genomeFastaFiles ".$index." --sjdbFileChrStartEnd ".
#          $type_output."/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN ".$thread);

#system_call("STAR --genomeDir ".$type_output."/tmp/pass2 --readFilesIn ".${$fastq{$type}}[0]." ".${$fastq{$type}}[1]." --runThreadN ".$thread.
 #         " ".$zcat." --outFileNamePrefix ".$type_output."/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4");

(以上思考的很乱……,划掉)


我们来,运行一下,正常的比对,看看行不行?
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./geneome/hg19/STAR --readFilesIn ./output_RNA/tumor/fastq1.fastq ./output_RNA/tumor/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/tumor/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

自动生成了两个文件:

  • SJ.out.tab
  • Aligned.out.sam

接下来,
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ./output_RNA/tumor/tmp/pass2 --genomeFastaFiles ./geneome/hg19/hg19.fa --sjdbFileChrStartEnd ./output_RNA/tumor/SJ.out.tab --sjdbOverhang 100 --runThreadN 32

这个步骤也运行成功了。

STAR version: 2.7.9a compiled: 2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source
Jun 10 13:19:50 … started STAR run
!!! WARNING: Could not move Log.out file from ./Log.out into ./output_RNA/tumor/tmp/pass2/Log.out. Will keep ./Log.out
Jun 10 13:19:50 … starting to generate Genome files
Jun 10 13:21:18 … starting to sort Suffix Array. This may take a long time…
Jun 10 13:21:37 … sorting Suffix Array chunks and saving them to disk…
Jun 10 13:33:35 … loading chunks from disk, packing SA…
Jun 10 13:34:59 … finished generating suffix array
Jun 10 13:34:59 … generating Suffix Array index
Jun 10 13:39:05 … completed Suffix Array index
Jun 10 13:39:05 … inserting junctions into the genome indices
Jun 10 13:40:43 … writing Genome to disk …
Jun 10 13:40:44 … writing Suffix Array to disk …
Jun 10 13:40:53 … writing SAindex to disk
Jun 10 13:40:54 … finished successfully

最后,运行最后一个步骤:
/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --genomeDir ./output_RNA/tumor/tmp/pass2 --readFilesIn ./output_RNA/tumor/fastq1.fastq ./output_RNA/tumor/fastq2.fastq --runThreadN 32 --outFileNamePrefix ./output_RNA/tumor/ --outFilterMatchNminOverLread 0.4 --outFilterScoreMinOverLread 0.4

同样的默认生成的文件,文件名为:

  • SJ.out.tab
  • Aligned.out.sam

所以,代码,整体上,没有什么问题,只需要把Aligned.out.sam文件的默认名称修改为alignment.sam即可。

另外,转换到计算节点的方法就是:qsub -I

我仿佛是弄明白了整体的思路,那么接下来,再看一下,哪里出现了新的问题。


切换一下,回到Linux界面下。
最终找到了问题的核心。
问题出在,原始代码中,错误的切换路径:

system_call("cd ".$type_output.";/home/xxzhang/workplace/software/STAR/bin/Linux_x86_64/STAR --runMode genomeGenerate --genomeDir ".$type_output."/tmp/pass2 --genomeFastaFiles ".$index." --sjdbFileChrStartEnd ".
         $type_output."/tmp/pass1SJ.out.tab --sjdbOverhang 100 --runThreadN ".$thread);

我们需要删去cd ".$type_output.";这行代码,因为并不需要。进入到这个路径之后,后面的相对路径就找不到了。相对应地,里面的文件也会检索不到。

修改代码后,重新回到服务器中运行(这次记得在计算节点上去运行)。

perl somatic.pl  RNA:./data/SRR3052083.fastq.gz  NA  RNA:./data/SRR5297065.fastq.gz NA  32 hg19 ./geneome/hg19/hg19.fa /home/xxzhang/workplace/software/java/jdk1.7.0_80/bin/java  ./output_RNA  human 1 ./disambiguate_pipeline>test.txt

(16:41)程序运行完成。
最后生成了两个结果文件(缺少somatic_mutations_hg19.txt):

  • germline_mutations_hg19.txt
  • coverage.txt

说明其他部分的运行都是没有问题的。
但是,在生成somatic_mutations_hg19.txt这个文档的时候,却出现了问题。这是我们接下来,要去解决的问题。

这里是一个细节的地方,我的CosmicCodingMuts.hg19.vcf 文件的原来的名字是,CosmicCodingMuts_hg19.vcf,因此,系统检测不到。所以,我需要对它的名字进行修改。

ERROR MESSAGE: Couldn’t read file /home/xxzhang/workplace/QBRC/./geneome/hg19/hg19.fa_resource/CosmicCodingMuts.hg19.vcf because file does not exist

解决方法:
mv CosmicCodingMuts_hg19.vcf CosmicCodingMuts.hg19.vcf

三个mutation calling的过程,也都没有成功。

Can’t exec “/home/xxzhang/workplace/software/manta/bin/configManta.py”: Permission denied at somatic.pl line 528.

Can’t exec “/home/xxzhang/workplace/software/Shimmer/shimmer.pl”: Permission denied at somatic.pl line 528.

解决方法:
修改文件夹的权限。

chmod -R 777 workplace/software/

/home/xxzhang/workplace/software/speedseq/bin/speedseq.config somatic -F 0.01 -q 10 -t 32 -T ./output_RNA/sptmp -o ./output_RNA/speedseq ./geneome/hg19/hg19.fa /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam /home/xxzhang/workplace/QBRC/output_RNA/tumor/tumor.bam

这个没有报错,但是,就是没有结果。不知道为啥?
还是觉得应该修改为:
/home/xxzhang/workplace/software/speedseq/bin/speedseq somatic -F 0.01 -q 10 -t 32 -T ./output_RNA/sptmp -o ./output_RNA/speedseq ./geneome/hg19/hg19.fa /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam /home/xxzhang/workplace/QBRC/output_RNA/tumor/tumor.bam

speedseq.config ===>speedseq,因为speedseq.config 不是一个具体的指令,显示不出内容的。
正确的是,speedseq

Program: speedseq
Version: 0.1.2
Author: Colby Chiang (cc2qe@virginia.edu)
usage: speedseq [options]
command: align align FASTQ files with BWA-MEM
var call SNV and indel variants with FreeBayes
somatic call somatic SNV and indel variants in a tumor/normal pair with FreeBayes
sv call SVs with LUMPY
realign realign from a coordinate sorted BAM file
options: -h show this message

/home/xxzhang/miniconda3/bin/lofreq call-parallel --pp-threads 32 -s --sig 1 --bonf 1 -C 7 -f ./geneome/hg19/hg19.fa -S ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf --call-indels -l ./geneome/hg19/hg19.fa.exon.bed -o ./output_RNA/lofreq_n.vcf /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam

Calling external LoFreq script via execvp failed: No such file or directory

这个问题,应该如何解决呢?
后来查看lofreq的路径,发现call-parallel指令,以lofreq2_call_pparallel.py作为呈现方式。
修改指令为:
/home/xxzhang/miniconda3/bin/lofreq2_call_pparallel.py --pp-threads 32 -s --sig 1 --bonf 1 -C 7 -f ./geneome/hg19/hg19.fa -S ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf --call-indels -l ./geneome/hg19/hg19.fa.exon.bed -o ./output_RNA/lofreq_n.vcf /home/xxzhang/workplace/QBRC/output_RNA/normal/normal.bam

与之对应的,就是下游找不到该文件。

ERROR MESSAGE: Could not read file /home/xxzhang/workplace/QBRC/./output_RNA/somatic_diffs.readct.vcf because file ‘./output_RNA/somatic_diffs.readct.vcf’ does not exist

ERROR MESSAGE: Could not read file /home/xxzhang/workplace/QBRC/./output_RNA/speedseq2.vcf because file ‘./output_RNA/speedseq2.vcf’ does not exist

以及,下游进行筛选的过程中,遇到问题:

报错如下:
/home/xxzhang/workplace/software/R/R-3.6.3/bin/Rscript /home/xxzhang/workplace/QBRC//somatic_script/filter_vcf2.R ./output_RNA/somatic_mutations.txt ./output_RNA/somatic_mutations.hg19_multianno.txt ./output_RNA/somatic_mutations_hg19.txt somatic

Error in read.table(args[2], stringsAsFactors = F, sep = “\t”, header = T) :
no lines available in input
Execution halted

(这里就是说,没有输入文件,归根究底还是前面的过程出了问题)

(18:20)根据上面的出错,修改somatic.pl的代码。
回到window平台中。


(18:30)开始运行程序,等待结果。
我觉得我今天有点不舒服,不舒服的原因在于,社交障碍(不愿意说话),觉得自己并不属于这里(踏踏实实的把自己的事情做好,就非常足够了),还有今天好大的雨啊。
究竟什么是做科研呢?为什么文献我一点都看不进去。
现在我的程序,正在服务器的后台运行中,我这段时间干什么呢(文献我看不进去)?
我知道自己要提升,但是眼下,让我有些乱了手脚,觉得自己心里乱糟糟的,我想回去整理一下思路。
亲爱的,你究竟想要什么?
(18:58)我害怕你陷入庸庸碌碌的麻木的忙碌之中,却忘记了自己想要什么?这是最可怕的。
我先回宿舍,把程序挂在这里运行。明天早上过来看结果。

标签:10,STAR,记录,--,RNA,实验,hg19,output
来源: https://blog.csdn.net/weixin_40640700/article/details/117766832

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有