[LeetCode] Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",



Return:

["AAAAACCCCC", "CCCCCAAAAA"].

用Map的话超内存了,改用bitsmap,因为只有4个字母,所以只要用两位就可以做为一个字母的编码,10个字母就是20位,所以创建一个2^20大小的数组就可以解决问题了。

 1 class Solution {

 2 public:

 3     int getVal(char ch) {

 4         if (ch == 'A') return 0;

 5         if (ch == 'C') return 1;

 6         if (ch == 'G') return 2;

 7         if (ch == 'T') return 3;

 8     }

 9     

10     vector<string> findRepeatedDnaSequences(string s) {

11         set<string> st;

12         vector<string> res;

13         string str;

14         if (s.length() < 10 || s == "") return res;

15         int mp[1024*1024] = {0};

16         unsigned int val = 0;

17         for (int i = 0; i < 9; ++i) {

18             val <<= 2;

19             val |= getVal(s[i]);

20         }

21         for (int i = 9; i < s.length(); ++i) {

22             val <<= 14;

23             val >>= 12;

24             val |= getVal(s[i]);

25             ++mp[val];

26             if (mp[val] > 1) {

27                 str = s.substr(i-9, 10);

28                 st.insert(str);

29             }

30         }

31         for (set<string>::iterator i = st.begin(); i != st.end(); ++i) {

32             res.push_back(*i);

33         }

34         return res;

35     }

36 };

 

你可能感兴趣的:(LeetCode)