All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T,
for example: “ACGAATTCCG”. When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = “AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT”,
Return:
[“AAAAACCCCC”, “CCCCCAAAAA”].
从第一个字符开始遍历,每次截取10个字符,然后检查截取的字符串是否在剩下的字符子串中出现,如果出现,则为一个符合条件的子串。
实现代码如下:
public List<String> findRepeatedDnaSequences(String s) {
Set<String> set=new HashSet<String>();
int len=s.length();
int sLen=10;
for(int i=0;i<len-sLen;i++){
String str=s.substring(i, i+sLen);
boolean isExist=isExist(str,s.substring(i+1));
if(isExist){
set.add(str);
}
}
List<String> list=new ArrayList<String>();
for(String str:set){
list.add(str);
}
return list;
}
public boolean isExist(String s,String str){
return str.contains(s);
}
由于上面对每次子串都在剩余的字符串中进行了寻找匹配,因此比较耗时,这里借助map,将子串和其出现的次数作为(key,count)进行存储,最后提取出count>1的key就是最后寻找的结果。
public List<String> findRepeatedDnaSequences(String s) {
Map<String,Integer> map=new HashMap<String,Integer>();
int len=s.length();
int sLen=10;
for(int i=0;i<=len-sLen;i++){//注意边界
String str=s.substring(i, i+sLen);//最后的索引位置是取不到的
Integer count=map.getOrDefault(str, 0);
count++;
map.put(str, count);
}
List<String> list=new ArrayList<String>();
Set<String> set=map.keySet();
for(String str:set){
int count=map.get(str);
if(count>1){
list.add(str);
}
}
return list;
}