hash

   做了一点简单的hash,刚开始用set,总结一下。

  hash表通过把关键码值映射到表中一个位置来访问记录,以加快查找的速度。

  比如这就是一种方法

Let consist of positiveintegers .The problem is to find the smallest positiveintegerC such that

for all .

C must be a multiple of at least one element of W.

If some

for all ,

then the next largest C that could resolve the conflict is at least

Since all such conflicts must be resolved, it is advantageous to choose the largest candidate from among the conflicts as the nextC to test.


  当然还有很多方法,比如可以用一种进制来把一种情况转化成一个数,这个也很常见,这种hash值都是一一对应的,不会有重复,那有重复怎么办呢?这就要用链表,把hash值相同的用链表连起来,然后在链表中找有没有重复的。

  先说一下set,用STL的set(或map)可以省事了,但是很慢。头文件包括#include<set>,可以定义set<string> hash,hash.count()就是判断string已经有的个数,所以假设判断a出现过没有可以用if(!hash.count(a)),或者hash.find(a)!=hash.end()。插入a是hash.insert(a)。如果想再hash中查找,要定义一个游标(就和指针类似),set<string>::iterator it,查找是for(it=hash.begin();it!=hash.end();it++) 设一个string类的a,a=*it  ,然后就可以操作了。

  字符串合成一个或者拆开可以用sprintf和sscanf。

C语言里有strtok用来分割字符串。

例子:

char sentence[]="This is a sentence with 7 tokens";

char *tokenPtr=strtok(sentence," ");
while(tokenPtr!=NULL)
{
cout<<tokenPtr<<'\n';
tokenPtr=strtok(NULL," ");
}
也就是第一次strtok(要分割的串,分割的标志),后面的就是(NULL,分割的标志)。

C++string里面有一个substr,a.substr(0,i)就是从a[0]开始截取i个字符,另外string类是可以直接写等号和加号的,两个字符串连接直接c=a+b就行了。


  set虽然方便,但是有些题会超时,还是要自己写hash,字符串hash在网上看了很多种,简单的方法是每次把当前的和乘以一个数再加上当前字符的值,乘的那个数有好几种,有一个是131的,至于为什么。。数学证明。。应该是这样分散的比较均匀。。我现在也不会,先用着吧。。因为sum会越界,0x7fffffff第一位是0,后面都是1,sum&0x7fffffff就可以保证不越界了。

int hash(char *str){
int sum=0;
while(*str)
    sum=sum*131+*str++;
    return (sum&0x7fffffff)%MAXSTATE;
}


Problem A
Concatenation of Languages
Input File:
Standard Input

Output: Standard Output

 

A language is a set of strings.And the concatenation of two languages is the set of all strings that areformed by concatenating the strings of the second language at the end of thestrings of the first language.

 

For example, if we have twolanguage A and B such that:

A = {cat,dog, mouse}

B = {rat,bat}

The concatenation of A andB would be:

C ={catrat, catbat, dograt, dogbat, mouserat, mousebat}

 

Given two languages your task isonly to count the number of strings in the concatenation of the two languages.

 

Input

There can be multiple test cases. The first lineof the input file contains the number of test cases,T (1≤T≤25).ThenT test cases follow. The firstline of each test case contains two integers,M andN (M,N<1500), the number of strings ineach of the languages. Then the nextMlines contain the strings of the first language. TheN following lines give you the strings of the second language. Youcan assume that the strings are formed by lower case letters (‘a’ to‘z’) only, that they are less than10 characters long and that each string is presented in one linewithout any leading or trailing spaces. The strings in the input languages maynot be sorted and there will be no duplicate string.

 

Output

For each ofthe test cases you need to print one line of output. The output for each testcase starts with the serial number of the test case, followed by the number ofstrings in the concatenation of the second language after the first language.

 

SampleInput                              Output for Sample Input

2

3 2

cat

dog

mouse

rat

bat

1 1

abc

cab

Case 1: 6

Case 2: 1


  这个就是问把后面的接到前面有多少种,用set超时了,只能自己写了。。第一次正儿八经写了个。。

#include<iostream>
#include<cstdio>
#include<cstring>
#include<cstdlib>
#include<cctype>
#include<algorithm>
#include<cmath>
#include<map>
#include<set>
#define INF 0x3f3f3f3f
#define MAXSTATE 10000007
using namespace std;
char a[2000][20],b[2000][20];
int num,head[MAXSTATE],next[MAXSTATE];
char c[MAXSTATE][55];
int hash(char *str)
{
    int sum=0;
    while(*str)
        sum=sum*131+*str++;
    return (sum&0x7fffffff)%MAXSTATE;
}
int try_to_insert(int s)
{
    int h=hash(c[s]);
    int u=head[h];
    while(u!=-1)
    {
        if(strcmp(c[u],c[s])==0) return 0;
        u=next[u];
    }
    next[s]=head[h];
    head[h]=s;
    num++;
    return 1;
}
int main()
{
    freopen("in.txt","r",stdin);
    int T,test=0,i,j;
    scanf("%d",&T);
    while(T--)
    {
        memset(head,-1,sizeof(head));
        num=0;
        int M,N,ans=0;
        scanf("%d%d",&M,&N);
        getchar();
        for(int i=0; i<M; i++)
            gets(a[i]);
        for(int i=0; i<N; i++)
            gets(b[i]);

        for (int i = 0; i < M ; i ++)
            for (int j = 0; j < N; j ++)
            {
                sprintf(c[num],"%s%s",a[i],b[j]);
                try_to_insert(num);
            }

        printf("Case %d: %d\n",++test,num);
    }
    return 0;
}

  一般的思路都是有个try_to_insert函数,调用hash函数求出hash值,链表接的是hash值相同的元素在数组中的下标,如果没重复,就把这个下标加到这个hash值链表的头。sprintf(c[num],"%s%s",a[i],b[j]);就可以方便的把a,b赋给c。

  关于hash还有好多好多东西。。现在还有很多都不会,以后会了再来说。。

你可能感兴趣的:(hash)