187. Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].

Solution1:Hashset 查重

思路:

187. Repeated DNA Sequences_第1张图片
屏幕快照 2017-09-14 上午11.54.20.png

Time Complexity: O(N) Space Complexity: O(10N)

Solution2:先encode 再Hashset 查重

Time Complexity: O(N) Space Complexity: O(N)

Solution1 Code:

public List findRepeatedDnaSequences(String s) {
    Set seen = new HashSet(), repeated = new HashSet();
    for (int i = 0; i + 9 < s.length(); i++) {
        String ten = s.substring(i, i + 10);
        if (!seen.add(ten))
            repeated.add(ten);
    }
    return new ArrayList(repeated);
}

Solution2 Code:

class Solution {
    private char[] encode_map = new char[26];
    
    public List findRepeatedDnaSequences(String s) {
        // init
        Set seen = new HashSet();
        Set repeated = new HashSet();
        //encode_map['A' - 'A'] = 0;
        encode_map['C' - 'A'] = 1;
        encode_map['G' - 'A'] = 2;
        encode_map['T' - 'A'] = 3;
        
        // sliding window
        for (int i = 0; i + 9 < s.length(); i++) {
            String ten = s.substring(i, i + 10);
            int code = encode(ten);
            if (!seen.add(code))
                repeated.add(ten);
        }
        
        // result
        return new ArrayList(repeated);
    }
    
    private int encode(String s) {
        int code = 0;
        for(int j = 0; j < 10; j++) {
            code <<= 2;
            code |= encode_map[s.charAt(j) - 'A'];
        }
        return code;
    }
}

你可能感兴趣的:(187. Repeated DNA Sequences)