首页 > 代码库 > pysam 模块

pysam 模块

pysam is a python module that makes it easy to read and manipulate mapped short read sequence data stored in SAM/BAM files.

 

要读bam文件, 首先实例化一个Samfile对象

import pysam

samfile = pysam.Samfile(‘ex1.bam‘, ‘rb‘)                  

#r 是reading。 w是writing。b是指bam文件, 不加的话默认是sam. 如果有b, 必须有r或者w。

#如果bai文件存在,会自动打开。没有的话 fetch() 和 pileup() 不能用。

文件打开以后, 可以用fetch对指定的region进行循环。

for alignedread in samfile.fetch(‘Chr1‘, 100, 200):         #不指定的话就是全部

    print alignedread

读完以后关闭文件

samfile.close()

 

pysam.Samfile.fetch() returns all reads overlapping a region sorted by the first aligned base in the reference sequence.

Note that it will also return reads that are only partially overlapping with the region. Thus the reads returned might span a region that is larger than the one queried.

 

统计参考序列某个区域内,call beases in each site

The pileup engine of csamtools iterates over all reads that are aligned to a region. In contrast to fetching, the pileup engine returns for each base in the reference sequence the reads that map to that particular position.

eg:

iter = samfile.pileup( ‘seq1‘, 10, 20 )
for x in iter: print str(x)
Aligned reads are returned as a pysam.PileupColumn.

eg:

import pysam

samfile = pysam.Samfile(‘ex1.bam‘, ‘rb‘)

for pileupcolum in samfile.pileup(‘Chr1‘, 100, 200):

    print ‘coverage at base %s = %s‘ % (pileupcolum.pos, pileupcolumn.n)    #pos返回的是位置,n返回有多少个碱基比对到这个位置

    for pileupread in pileupcolumn.pileups:

        print ‘\tbase in read %s = %s‘ %(pileupread.alignment.qname, pileupread.alignment.seq[pileupread.qpos])

samfile.close()

#pileupColumn.tid  指 chromesome ID as is defined in the header

  pileupColumn.pos  指 the target base coordinate(0-based)

  pileupCloumn.n   指number of reads mapping to this column

  pileupCloumn.pileups  指 list of reads aligned to this column

# pileupread.alignment.qname 返回列中read的名字,

   pileupread.alignment.seq[pileupread.qpos] 返回read在某个位点的碱基

 

NOTE:

Coordinates in pysam are always 0-based(following the python convention). SAM text fils use 1-based coordinates.

 

pysam.Samfile, pysam.Fastafile, pysam.Fastqfile各个类的详细使用说明见:http://pysam.readthedocs.org/en/latest/api.html

 

freemao

FAFU

free_mao@qq.com