题目描述:
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
我自己写的代码如下,写的时候感觉就要出事,但是竟然ac了。可见leetcode并没有对内存多大限制。
public List<String> findRepeatedDnaSequences(String s) { List<String> result=new Stack<String>(); int n=s.length(); Map<String,Integer> map=new HashMap<String, Integer>(); for(int i=0;i<=n-10;i++){ String substr=s.substring(i,i+10); if(map.containsKey(substr)&&map.get(substr)==1){ result.add(substr); map.put(substr, map.get(substr)+1); } else if(!map.containsKey(substr)) map.put(substr, 1); } return result; }
这里的字母只有4种,A,G,C,T.为了节约内存,我们可以将他们编号成00,01,10,11.那么10个字符串需要20bits,一个int就可以搞定。
代码如下:
public List<String> findRepeatedDnaSequences(String s) { List<String> result=new ArrayList<String>(); Map<Character,Integer> map=new HashMap<Character, Integer>(); if(s==null || s.length() < 11) return result; map.put('A', 0); map.put('G', 1); map.put('C', 2); map.put('T', 3); int hash=0; Set<Integer> set = new HashSet<Integer>(); Set<Integer> unique = new HashSet<Integer>(); for(int i=0;i<s.length();i++){ char ch=s.charAt(i); hash=(hash<<2)+map.get(ch); if(i<9){ continue; }else{ hash&=(1<<20)-1; if(set.contains(hash)&&!unique.contains(hash)){ result.add(s.substring(i-9,i+1)); unique.add(hash); }else{ set.add(hash); } } } return result; }同样是位操作,还可以用ASCLL表的后三位来区分他们,这样连map都不需要了,更节省空间。
public class Solution { public List<String> findRepeatedDnaSequences(String s) { List<String> ans = new ArrayList<String>(); HashMap<Integer, Integer> map = new HashMap<Integer, Integer>(); int key = 0; for (int i = 0; i < s.length(); i++) { key = ((key << 3) | (s.charAt(i) & 0x7)) & 0x3fffffff; if (i < 9) continue; if (map.get(key) == null) { map.put(key, 1); } else if (map.get(key) == 1) { ans.add(s.substring(i - 9, i + 1)); map.put(key, 2); } } return ans; } }