[LeetCode187]Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

    Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

    Return:
    ["AAAAACCCCC", "CCCCCAAAAA"].
Hide Company Tags LinkedIn
Hide Tags Hash Table Bit Manipulation

如果不考虑用bit manipulation的方法, 这道题就一行code:

class Solution {
public:
    vector<string> findRepeatedDnaSequences(string s) {
        unordered_map<string,int> mp;
        int n = s.size();
        vector<string> res;
        for(int i = 0; i<n-9; ++i){
            if(mp[s.substr(i,10)]++ == 1) res.push_back(s.substr(i,10));
        }
        return res;
    }
};

但是很慢。。看到discuss里的8ms code, bit manipulation 这种题总是terrify me! 讨厌! 好好理解一下:

vector<string> findRepeatedDnaSequences(string s) {
    char  hashMap[1048576] = {0};
    vector<string> ans;
    int len = s.size(),hashNum = 0;
    if (len < 11) return ans;
    for (int i = 0;i < 9;++i)
        hashNum = hashNum << 2 | (s[i] - 'A' + 1) % 5;
    for (int i = 9;i < len;++i)
        if (hashMap[hashNum = (hashNum << 2 | (s[i] - 'A' + 1) % 5) & 0xfffff]++ == 1)
            ans.push_back(s.substr(i-9,10));
    return ans;
}

你可能感兴趣的:(LeetCode)