首页 > 代码库 > PSL format
PSL format
PSL lines represent alignments, and are typically taken from files generated by BLAT or psLayout. See the BLAT documentation for more details. All of the following fields are required on each data line within a PSL file:
- matches - Number of bases that match that aren‘t repeats
- misMatches - Number of bases that don‘t match
- repMatches - Number of bases that match but are part of repeats
- nCount - Number of ‘N‘ bases
- qNumInsert - Number of inserts in query
- qBaseInsert - Number of bases inserted in query
- tNumInsert - Number of inserts in target
- tBaseInsert - Number of bases inserted in target
- strand - ‘+‘ or ‘-‘ for query strand. For translated alignments, second ‘+‘or ‘-‘ is for genomic strand
- qName - Query sequence name
- qSize - Query sequence size
- qStart - Alignment start position in query
- qEnd - Alignment end position in query
- tName - Target sequence name
- tSize - Target sequence size
- tStart - Alignment start position in target
- tEnd - Alignment end position in target
- blockCount - Number of blocks in the alignment (a block contains no gaps)
- blockSizes - Comma-separated list of sizes of each block
- qStarts - Comma-separated list of starting positions of each block in query
- tStarts - Comma-separated list of starting positions of each block in target
Example:
Here is an example of an annotation track in PSL format. Note that line breaks have been inserted into the PSL lines in this example for documentation display purposes. This example can be pasted into the browser without editing.
browser position chr22:13073000-13074000browser hide alltrack name=fishBlats description="Fish BLAT" visibility=2useScore=159 9 0 0 1 823 1 96 +- FS_CONTIG_48080_1 1955 171 1062 chr22 47748585 13073589 13073753 2 48,20, 171,1042, 34674832,34674976,59 7 0 0 1 55 1 55 +- FS_CONTIG_26780_1 2825 2456 2577 chr22 47748585 13073626 13073747 2 21,45, 2456,2532, 34674838,34674914,59 7 0 0 1 55 1 55 -+ FS_CONTIG_26780_1 2825 2455 2676 chr22 47748585 13073727 13073848 2 45,21, 249,349, 13073727,13073827,
Click here to display this track in the Genome Browser.
Be aware that the coordinates for a negative strand in a PSL line are handled in a special way. In the qStart and qEnd fields, the coordinates indicate the position where the query matches from the point of view of the forward strand, even when the match is on the reverse strand. However, in the qStarts list, the coordinates are reversed.
Example:
Here is a 61-mer containing 2 blocks that align on the minus strand and 2 blocks that align on the plus strand (this sometimes happens due to assembly errors):
0 1 2 3 4 5 6 tens position in query 0123456789012345678901234567890123456789012345678901234567890 ones position in query ++++++++++++++ +++++ plus strand alignment on query ------------------ -------------------- minus strand alignment on query 0987654321098765432109876543210987654321098765432109876543210 ones position in query negative strand coordinates6 5 4 3 2 1 0 tens position in query negative strand coordinatesPlus strand: qStart=22 qEnd=61 blockSizes=14,5 qStarts=22,56 Minus strand: qStart=4 qEnd=56 blockSizes=20,18 qStarts=5,39
Essentially, the minus strand blockSizes and qStarts are what you would get if you reverse-complemented the query. However, the qStart and qEnd are not reversed. Use the following formulas to convert one to the other:
Negative-strand-coordinate-qStart = qSize - qEnd = 61 - 56 = 5 Negative-strand-coordinate-qEnd = qSize - qStart = 61 - 4 = 57
BLAT this actual sequence against hg19 for a real-world example:
CCCC
GGGTAAAATGAGTTTTTT
GGTCCAATCTTTTA
ATCCACTCCCTACCCTCCTA
GCAAG
Look for the alignment on the negative strand (-) of chr21, which conveniently aligns to the window chr21:10,000,001-10,000,061.
Browser window coordinates are 1-based [start,end] while psl coordinates are 0-based [start,end), so a start of 10,000,001 in the browser corresponds to a start of 10,000,000 in the psl. Subtracting 10,000,000 from the target (chromosome) position in psl gives the query negative strand coordinate above.
The 4, 14, and 5 bases at beginning, middle, and end were chosen to not match with the genome at the corresponding position.
PSL format