首页 > 代码库 > vcffilter 工具bug以及解决办法
vcffilter 工具bug以及解决办法
1,使用说明:
usage: vcffilter [options] <vcf file>
options:
-f, --info-filter specifies a filter to apply to the info fields of records, removes alleles which do not pass the filter
-g, --genotype-filter specifies a filter to apply to the genotype fields of records
-s, --filter-sites filter entire records, not just alleles
-t, --tag-pass tag vcf records as positively filtered with this tag, print all records
-F, --tag-fail tag vcf records as negatively filtered with this tag, print all records
-A, --append-filter append the existing filter tag, don‘t just replace it
-a, --allele-tag apply -t on a per-allele basis. adds or sets the corresponding INFO field tag
-v, --invert inverts the filter, e.g. grep -v
-o, --or use logical OR instead of AND to combine filters
-r, --region specify a region on which to target the filtering, requires a BGZF compressed file which has been indexed with tabix. any number of regions may be specified.
Filter the specified vcf file using the set of filters. Filters are specified in the form "<ID> <operator> <value>:
-f "DP > 10" # for info fields -g "GT = 1|1" # for genotype fields -f "CpG" # for ‘flag‘ fields
Operators can be any of: =, !, <, >, |, &
Any number of filters may be specified. They are combined via logical AND unless --or is specified on the command line. Obtain logical negation through the use of parentheses, e.g. ! "( DP = 10 )"
For convenience, you can specify "QUAL" to refer to the quality of the site, even though it does not appear in the INFO fields.
2,举例:
以上是说明,我举个例子,假如你对C18.vcf文件筛选的要求是:质量要大于30,深度要大于10,类型为SNP,那么执行如下命令:
vcffilter -f "QUAL > 30 & DP > 10 & TYPE = snp" C18.vcf > C18.filtered.vcf
会得到经过筛选过的vcf文件。
3,BUG:
为了检验经过筛选的文件是否符合要求,自己写一个脚本来进行比较一下,
1 f0 = open(‘C18.vcf‘, ‘r‘) 2 filtername = i.split(‘.‘)[0]+‘.filtered.vcf‘ 3 f1 = open(filtername, ‘w‘) 4 for i in f0: 5 if i.startswith(‘#‘): 6 f1.write(i) 7 else: 8 j = i.split() 9 if (float(j[5]) > 30 and j[7].split(‘=‘)[-1] == ‘snp‘ 10 and int(j[9].split(‘:‘)[1]) > 10) : 11 f1.write(i)
运行脚本后发现,和软件结果不同,脚本产生的文件比软件产生的结果要小。
经过比较发现,软件将变异类型为 complex,snp 转化为了snp,放到了结果文件中。这种类型我们暂时不需要考虑,所以软件是存在bug的!
4,总结。
筛选vcf很简单,只需要对每行进行遍历,将符合条件的行数写入新文件即可。
自己写几行代码就可以了。
by freemao
FAFU.
free_mao@qq.com