187. Repeated DNA Sequences

首页 > 代码库 > 187. Repeated DNA Sequences

2024-10-11 22:11:02 216人阅读

题目：

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].

链接： http://leetcode.com/problems/repeated-dna-sequences/　　

6/25/2017

好久没刷题了，这道题也是参考别人的答案。

48ms, 80%时间复杂度O(N*N*k)，k=10, 第一个N来自遍历数组，第二个N来自substring

注意第8行，结束的位置是i <= s.length() - 10，要包含最后一位。

 1 public class Solution {
 2     public List<String> findRepeatedDnaSequences(String s) {
 3         List<String> res = new ArrayList<String>();
 4         if (s == null || s.length() < 10) {
 5             return res;
 6         }
 7         Map<String, Integer> substringCount = new HashMap<String, Integer>();
 8         for (int i = 0; i <= s.length() - 10; i++) {
 9             String substring = s.substring(i, i + 10);
10             if (substringCount.containsKey(substring)) {
11                 int count = substringCount.get(substring);
12                 if (count == 1) {
13                     res.add(substring);
14                 }
15                 substringCount.put(substring, count + 1);
16             } else {
17                 substringCount.put(substring, 1);
18             }
19         }
20         return res;
21     }
22 }

别人的答案：

类似rabin-karp，因为只有4个字符，所以每个字符用2位来表示（4^10 < 2^32），map里只需要比较数组而不是string，map的效率更高。链接里有解释

https://discuss.leetcode.com/topic/8894/clean-java-solution-hashmap-bits-manipulation

类似的，只不过用了8进制，链接里有解释，但是我稍微写详细一些。

t存的是所有10个字符的int hash值，这个值是通过这个算法里来计算的。注意有个ox3FFFFFFF，想明白了这个是只保留最后30位，为什么因为字符通过&7之后每个只保留3位2进制数，如果是10个字符的话正好是30位，可以消去10个字符之前的影响。

https://discuss.leetcode.com/topic/8487/i-did-it-in-10-lines-of-c

1 vector<string> findRepeatedDnaSequences(string s) {
2     unordered_map<int, int> m;
3     vector<string> r;
4     for (int t = 0, i = 0; i < s.size(); i++)
5         if (m[t = t << 3 & 0x3FFFFFFF | s[i] & 7]++ == 1)
6             r.push_back(s.substr(i - 9, 10));
7     return r;
8 }

更多讨论

https://discuss.leetcode.com/category/195/repeated-dna-sequences

187. Repeated DNA Sequences

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > 187. Repeated DNA Sequences

187. Repeated DNA Sequences

看完仍有疑问？有类似问题直接问程序猿