一个简单的模式字符串查找(支持通配符‘*’)

数据结构课的一些作业还是有些难度的,对于部分有价值或下了苦工的问题还是传上来好叻,回头写注释,纪念菜鸡生涯

【问题描述】
 
在当前目录下的文件string.in中查找给定的字符串,并将查找到的字符串和行号输出到当前目录下的文件string.out中。要求:
 
1)从键盘输入给定的字符串,该字符串中只包含大小写字母、数字字符、中括号字符‘[’和‘]’、‘*’,以及字符‘^’。字符串的长度不超过20。
 
2)字符‘^’只能出现在中括号内,且只能作为中括号内的第一个字符出现。除了字符‘^’,中括号中至少包含一个以上的字母或数字。
 
3)字符*不会出现在中括号内
 
4)在给定字符串中,中括号最多出现一次。若中括号内未出现字符‘^’,表示该位置上的字符只要与中括号内的任一字符相同,则匹配成功;若中括号内出现字符‘^’,表示该位置上的字符与中括号内的所有字符都不相同时,匹配成功。
 
5)字符*可以同零个字符或者多个任意字符相匹配
 
6)在给定的字符串中,*号最多仅出现一次
 
7)*号的作用范围局限于一行,不会跨越行进行匹配
 
8) 有多个字符串和*号匹配时,仅仅输出一个,并且输出这些串中长度最短的那个
 
9)查找字符串时大小写无关。
 
10)先输出查到的行号(行号从1开始),行号后跟冒号‘:’,然后是查找到的字符串,多个字符串之间用逗号‘,’隔开。各行之间用一个回车换行符隔开。
 
【输入形式】
 
首先从标准输入(键盘)读入待查找的字符串。待查找的文件string.in位于当前目录下。
 
【输出形式】
 
将查找到的结果输出到当前目录下的string.out中。
 
【样例输入1】
 
zh[ao]ng
假如string.in文件内容为:
Zhang ying ju zhu zai ZhongGuo. 
Ta zheng zai du gao zhong.
Bie ren dou jia ta xiao zhang.
 
【样例输出1】
 
string.out文件内容为:
1:Zhang,Zhong
2:zhong
3:zhang
 
【样例1说明】
 
给定字符串中有中括号,表示第三个字符可以是a也可以是o,且大小写无关,因此文章中第一行的Zhang和Zhong与给定字符串匹配,故输出1:Zhang,Zhong。其它类推。
 
【样例输入2】
 
a[^ab]a
string.in文件内容为:
Do you like banana?
ABA is the abbreviation of American Bankers Association.
 
【样例输出2】
 
string.out文件内容为:
1:ana,ana
 
【样例2说明】
 
给定字符串中括号内有字符‘^’,表示第一个和第三个字符都为a,第二个字符不能为a或b,因此文章中第一行的banana内有两个字符串ana与给定字符串匹配,故输出1:ana,ana。第二行中ABA的第二个字符为B,由于大小写无关,与给定字符串中括号内的b相同,故不能匹配。
 
【样例输入3】
 
w*d
string.in文件内容为:
wwwdd
world is a nice word
 
【样例输出3】
 
string.out文件内容为:
1:wwwd,wwd,wd
2:world,word
 
【样例3说明】
 

给定的字符串中有‘*’,表示在一行内,可以和以'w'开头,以'd'结尾的任意字符串相匹配。在一行中,对于第一个字符'w',同时有字符串"wwwd"以及"wwwdd"与之相匹配,根据上述第8条规则,应该匹配"wwwd"。一次类推得到'wwd'和'wd'。同样的规则用于第二行,得到"world"和"word"


#include 
#include 
#include 

char tolower(char s)
{
	if (s >= 'A'&&s <= 'Z')
		s += 'a' - 'A';
	return s;
}

// This function judges whether from a given position(pos_scans), in the string(scans[]),
// the following letters can match the pattern given in the regular expression(regex[]).
// If so, the string matching the pattern is to be stored in the string(prints[]), and return 1
int regex_match(char scans[], int pos_scans, char regex[], char prints[]) 
{
	int iter_regex = 0; // iter_regex records the position of scanner in regex[]
	int iter_scans = 0; // iter_scans records the position of scanner in scans[]
	int len_regex = strlen(regex); 
	char dic[81];       // dic[] stores the pattern in a wildcard box "[]"
	int i, j;

	while (iter_regex < len_regex)
	{ 
		if (regex[iter_regex] != '[' && regex[iter_regex] != '*') 
		{// the scanner in regex[] gets a letter (']' is not included. this is guaranteed in '[' case)
			if (tolower(regex[iter_regex]) == tolower(scans[pos_scans + iter_scans])) 
			{// simply check whether the same letter appears in scans[]
				iter_regex++;
				iter_scans++;
			}
			else break;
		}
		else if (regex[iter_regex] == '[')
		{// the scanner starts a wildcard box "[]"
			i = 0;
			iter_regex++;
			while (regex[iter_regex] != ']')
			{// store the pattern in this box into dic[]	
				dic[i++] = regex[iter_regex];	
				iter_regex++;
			}
			dic[i] = '\0';
			if (dic[0] == '^')
			{// if '^' is there in the box, the criteria is opposite
				for (j = 1; j < i; j++)
				{// the letter scanned in scans[] cannot appear in the box
					if (tolower(scans[pos_scans + iter_scans]) == tolower(dic[j]))
						break;
				}
				if (j == i)
				{// "j" reaches "i", meaning a success
					iter_scans++;
					iter_regex++;
				}
				else break; 
			}
			else
			{// no '^' is there in the box
				int flag = 0;
				for (j = 0; j < i; j++)
				{
					if (tolower(scans[pos_scans + iter_scans]) == tolower(dic[j]))
					{// it is a match only if the letter scanned in scans[] appears in the box
						flag = 1;
						break;
					}
				}
				if (flag)
				{
					iter_regex++;
					iter_scans++;
				}
				else break;
			}
		}
		else if (regex[iter_regex] == '*')
		{// '*' means any letter (or letters) can match
			if (iter_regex == len_regex - 1)
			{// if the scanner has already reached the end of regex[]
				iter_regex++;
				while (scans[pos_scans + iter_scans] != '\0') iter_scans++; // all the remaining letters in scans[] can match 
				break;
			}
			else if (regex[iter_regex + 1] != '[')
			{// if the scanner gets a letter following '*'
				while (tolower(scans[pos_scans + iter_scans]) != tolower(regex[iter_regex + 1]))
				{// scanner in scans[] can go forward until it gets the same letter as scanned in regex[]
					iter_scans++;
					if (scans[pos_scans + iter_scans] == '\0') break;
				}
				if (tolower(scans[pos_scans + iter_scans]) == tolower(regex[iter_regex + 1]))
				{// if the scanner in scans[] meets the same letter as scanned in regex[], the match is a success
					iter_scans++;
					iter_regex+=2;
				}
				else break;// otherwise the scanner goes to the end of scans[], meaning the match is a failure
			}
			else if (regex[iter_regex + 1] == '[')
			{// it the scanner finds a '[' following '*'
				i = 0;
				iter_regex++;
				while (regex[iter_regex] != ']')
				{// store the pattern into dic[]
					dic[i++] = regex[iter_regex];
					iter_regex++;
				}
				dic[i] = '\0';
				while (scans[pos_scans + iter_scans] != '\0')
				{// check the scanner has not reached the end of scans[]
					if (dic[0] == '^')
					{// if '^' starts this "[]" box
						for (j = 1; j < i; j++)
						{// if the letter scanned in scans[] does not appear in the box
						 // it means a success of matching "*[]"
							if (tolower(scans[pos_scans + iter_scans]) == tolower(dic[j]))
							{// if the letter appears, we should scan the next letter in scans[]
								iter_scans++;
								break;
							}
						}
						if (j == i)
						{// the letter scanned in scans[] does not appear in the box
							iter_scans++;
							iter_regex++;
							break;
						}
					}
					else
					{
						int flag = 0;
						for (j = 0; j < i; j++)
						{
							if (tolower(scans[pos_scans + iter_scans]) == tolower(dic[j]))
							{// if the letter appears in the box, meaning the match is a success
								flag = 1;
								break;
							}
						}
						if (flag)
						{
							iter_regex++;
							iter_scans++;
							break;
						}
						else iter_scans++;// if not, we scan the next letter in scans[]
					}
				}
			}
		}
		if (scans[pos_scans + iter_scans] == '\0') break; // the scanning of scans[] ends
	}

	if (iter_regex == len_regex)
	{// if the scanning of regex is finished, it means the match of regex[] is a success
		for (j = 0; j < iter_scans; j++)
			prints[j] = scans[pos_scans + j];
		prints[j] = '\0';
		return 1;
	}
	else return 0;
}

int main()
{
	FILE *fin, *fout;
	char regex[21];
	char scans[81];
	char prints[161];
	int line = 0;
	int i;

	if ((fin = fopen("string.in", "r")) == NULL)
		exit(1);
	if ((fout = fopen("string.out", "w")) == NULL)
		exit(1);
	scanf("%s",regex);
	while (fgets(scans, 81, fin) != NULL)
	{
		line++;
		int flag = 1;
		for (i = 0; scans[i] != '\0'; i++)
		{
			if (regex_match(scans, i, regex, prints))
			{
				if(flag) fprintf(fout, "%d:", line);
				else fprintf(fout, ",");
				fprintf(fout, "%s", prints);
				flag = 0;
			}
		}
		if (!flag) fprintf(fout,"\n");
	}

	fclose(fin);
	fclose(fout);
	return 0;
}

你可能感兴趣的:(C/C++,课程学习报告)