Programmers often have a preference among program constructs. For example, some may prefer "if(0==a)", while others may prefer "if(!a)". Analyzing such patterns can help to narrow down a programmer's identity, which is useful for detecting plagiarism.
Now given some text sampled from someone's program, can you find the person's most commonly used pattern of a specific length?
Input Specification:
Each input file contains one test case. For each case, there is one line consisting of the pattern length N (1<=N<=1048576), followed by one line no less than N and no more than 1048576 characters in length, terminated by a carriage return '\n'. The entire input is case sensitive.
Output Specification:
For each test case, print in one line the length-N substring that occurs most frequently in the input, followed by a space and the number of times it has occurred in the input. If there are multiple such substrings, print the lexicographically smallest one.
Whitespace characters in the input should be printed as they are. Also note that there may be multiple occurrences of the same substring overlapping each other.
Sample Input 1:4 //A can can can a can.Sample Output 1:
can 4Sample Input 2:
3 int a=~~~~~~~~~~~~~~~~~~~~~0;Sample Output 2:
~~~ 19
此题求的是重复出现次数最多的长度为n的字串,暴力肯定是炸啊,这题要使用后缀数组,然后求出height数组,最长一段的连续大于n的就是答案
然而此题也有一组比较变态的数据,用倍增的算法求后缀数组要超时,必须要用on的,也就是用dc3求后缀数组,然后效率就是on的了,由于过的那个代码太丑了,
暂时就不放出了,等我修改之后在放出。
先放个hash的吧,应用了BKDR字符串hash,因为这个hash貌似基本是不会出现冲突的,成功AC了。
但是按道理来说,找字典序最小这种东西这样暴力跑的话是可能被卡住的,然而数据应该是水了可能。
#include<cstdio> #include<cstring> #include<algorithm> using namespace std; typedef long long LL; const int maxn = 1e6 + 5e5; const int mod = 1e9 + 7; char s[maxn]; int n, Hash[maxn], x, y, tot, ans, m; int cnt[maxn], t[maxn], a[maxn]; int Inv(int x) { if (x == 1) return 1; return (LL)Inv(mod % x)*(mod - mod / x) % mod; } int main() { scanf("%d", &n); getchar(); gets(s + 1); s[0] = 0; x = 1; y = Inv(131); for (int i = 1; i <= n; i++) { Hash[n] = ((LL)s[i] * x + Hash[n]) % mod;//BKDR 字符串hash if (i < n) x = (LL)x * 131 % mod; } a[tot++] = Hash[n]; for (int i = n + 1; s[i]; i++) { Hash[i] = ((LL)(Hash[i - 1] - s[i - n] + mod) * y + (LL)s[i] * x) % mod; a[tot++] = Hash[i]; } sort(a, a + tot); m = unique(a, a + tot) - a; ans = 0; for (int i = n; s[i]; i++) { x = lower_bound(a, a + m, Hash[i]) - a; if (!t[x]) t[x] = i - n + 1; ans = max(ans, ++cnt[x]); } x = 0; for (int i = 0; i < m; i++) { if (cnt[i] == ans) { if (!x) x = t[i]; else { for (int j = 0; j < n; j++) { if (s[x + j] > s[t[i] + j]) { x = t[i]; break; } if (s[x + j] < s[t[i] + j]) break; } } } } for (int i = 0; i < n; i++) printf("%c", s[x + i]); printf(" %d\n", ans); return 0; }