Repeated DNA Sequences

Repeated DNA Sequences

问题:

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

思路:

  位操作得到hash

我的代码:

public class Solution {

    public List<String> findRepeatedDnaSequences(String s) {

        List<String> rst = new ArrayList<String>();

        if(s == null || s.length() < 10) return rst;

        int len = s.length();

        Set<Integer> set = new HashSet<Integer>();

        for(int i = 0; i <= len-10; i++)

        {

            String substr = s.substring(i,i+10);

            Integer key = getHash(substr);

            if(set.contains(key))

            {

                if(!rst.contains(substr))

                    rst.add(substr);

            }

            else

            {

                set.add(key);

            }

        }

        return rst;

    }

    public int getCode(char c)

    {

        switch(c)

        {

            case 'A':   return 0;

            case 'C':   return 1;

            case 'G':   return 2;

            default:    return 3;

        }

    }

    public Integer getHash(String s)

    {

        int hash = 0;

        for(int i = 0; i < s.length(); i++)

        {

            hash = hash << 2 | getCode(s.charAt(i));

        }

        return hash;

    }

}
View Code

学习之处:

  • 平常写代码的时候很少用位操作,所以每次看到位操作的问题,都会很生疏,需要加强这方面的内容
  • 常用的位操作
    • 用与操作可以得到一个数中的每一位,n&(0x00000001<<i)
    • 用或操作去构建一个新的数出来,逆向构建:rst |= (0x80000000 >>> i) 正向构建:hash = hash << 2 | getCode(s.charAt(i));

你可能感兴趣的:(sequence)