首页 > 代码库 > LeetCode-Repeated DNA Sequence

LeetCode-Repeated DNA Sequence

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", Return: ["AAAAACCCCC", "CCCCCAAAAA"].

Analysis:

We want to code a 10-letter-long substring into a integer, to perform hashset add and check for duplication.

Since each letter only has 4 cases: A,C,G,T, we can use 2-bit to represent it. Therefore, we can use a 20-bits integer to represent the substring.

Solution:

public class Solution {    // Use mask to only maintain the last 20 bits.    int mask = (1 << 20) - 1;    public List<String> findRepeatedDnaSequences(String s) {        List<String> resList = new ArrayList<String>();        if (s.length() < 10)            return resList;        HashSet<Integer> codeSet = new HashSet<Integer>();        HashSet<Integer> resSet = new HashSet<Integer>();        char[] charArray = s.toCharArray();        // Get code of the first 9 letters.        int code = 0;        for (int i = 0; i < 9; i++) {            code = moveCode(code, charArray[i]);        }        for (int i = 9; i < s.length(); i++) {            // Get code.            code = moveCode(code, charArray[i]);            // if current code has existed and have not appeared twice (i.e.,            // not added to resList), then add it into resList.            if (!codeSet.add(code) && resSet.add(code)) {                resList.add(s.substring(i - 9, i + 1));            }        }        return resList;    }    public int moveCode(int value, char c) {        value <<= 2;        // if (c==‘A‘) value += 0;        if (c == ‘C‘)  value += 1;        if (c == ‘G‘)  value += 2;        if (c == ‘T‘)  value += 3;        value &= mask;        return value;    }}

 

LeetCode-Repeated DNA Sequence