首页 > 代码库 > 求fasta文件中互补序列

求fasta文件中互补序列

一个名为read_1.fa 的fasta文件,里面有若干序列,如:

>@r1
TGAATGCGAACTCCGGGACGCTCAGTAATGTGACGATAGCTGAAAACTGTACGATAAACNGTACGCTGAGGGCAGAAAAAATCGTCGGGGACATTNTAAAGGCGGCGAGCGCGGCTTTTCCG
>@r2
NTTNTGATGCGGGCTTGTGGAGTTCAGCCGATCTGACTTATGTCATTACCTATGAAATGTGAGGACGCTATGCCTGTACCAAATCCTACAATGCCGGTGAAAGGTGCCGGGATCACCCTGTGGGTTTAT
>@r3
ATCGCCCGCAGACACCTTCACGCTGGACTGTTTCGGCTTTTACAGCGTCGCTTCATAATCCTTTTTCGCCGCCGCCATCAGCGTGTTGTAATCCGCCTGCAGGATTTTCCCGTCTTTCNGTGCCTTGNT
>@r4
GGGCCAATGCGCTTACTGATGCGGAATTACGCCGTAAGGCCGCAGATGAGCTTGTCCATATGACTGCGAGAATTAACNGTGGTGAGGCGATCCCTGAACCAGTAAAACAACTTCCTGTCATGGGCGGTA
>@r5
GTCAGGAAAGTGGTAAAACTGCAACTCAATTACTGCAATGCCCTCGTAATTAAGTGAATTTACAATATCGTCCTGTTCGGAGGGAAGAACGCGGGATGTTCATTCTTCATCACTTTTAATTGATGTATA
>@r6
AGCGACATTCTTCCTCGGTACATAATCTCCTTTGGCGTTTCCCGATGNCCGTCACGCACATGGNATCCCGTGATGACCTCATTAAAAACACGCTGCAATCCCTCCTCATCTTTGCAGGCGTCCGATTTT
>@r7
CCCCGCCACCATCCCGCCGGGCNTGTCCATATCGAGCAGAATGCTGTCCACCATCGGATCGCTGGCAGCCTGTTGCAGACGGGCGATAATGCCGTTGTAACCGGTCATCCCCGAGTACGGCTGCAGCGC
>@r8
NTGAACAGTAAACGTCTGTTGAGCACATCCTTTAATAAGCAGGGCCAGCGCAGTATCNAGTAGCATATTTTTCATGGTGTTATTCCCGATGCTTTTTG
>@r9
CCCGATGCTTTTTGAAGTTCGCAGAATCGTATGTGTAGANAATTAAACAAANCCT
..........等等

complement_seq.py代码如下:

#encoding = utf-8

"""
简介:求fasta文件中每个序列的互补序列
作者:刘自军
date:2017年5月18:54
"""
import sys
from collections import OrderedDict

args = sys.argv

seq = OrderedDict()
tmp_dit = {A:T,G:C,C:G,T:A,N:N}

with open(args[1]) as f:

    for line in f:
        
        line = line.strip(\n)
        if line.startswith(>):
            seq_id = line
            seq[seq_id] = ‘‘
        else:
            for i in line:
                seq[seq_id] += tmp_dit[i]

for id,com_seq in seq.items():
    print (%s\n%s %(id,com_seq))

python complement_seq.py read_1.fa

或者python complement_seq.py read_1.fa > com_read.fa

 

求fasta文件中互补序列