The solution to search related problems on Geonetwork: operators, quotes, phrase, chinese.

  did you encounter the problems search related on web in Advanced Search?
I did.
1. Problem.
These are the search related problems I found:
1) operators: the operators( and, or, not ) can not take any effect.
2) quotes:     also can not take any effect.
3) the phrase query:  must use quotes, but quotes,.....
4) the character query in Asian Language like chinese: 
can not get the exact result, GN found the metadata which includes  each character in query, not the query phrase.
the effect is like: "any more", and Geonetwork found "any" and "more".
2. WHY?
ok, why? what are the reasons? I 
The analyzer is the main reason for the problems.
In the java class file of services.main.Search, 
I saw that the query sentence will be send to MainUtil.splitWord function to split the word, like below.
if (any != null)
any.setText(MainUtil.splitWord(any.getText()));
Take a look at the splitWord function, it used StandardAnalyzer.
public static String splitWord(String requestStr)
{
Analyzer a = new StandardAnalyzer();
.....
}
We know, the StandardAnalyzer will filter some strings like "and", "or", "not", "as"..., 
and it also filter the quotes ("), so the return of this function will ignore the operator and quotes.
As default operator "and", the GN will use "and" to query in Lucene.
So,  the problems become.
3.Solution.
How to resolve that?
Just do not use the StandardAnalyzer? No, we need it to analyze the query sentence, for example, 
the phrase in the quotes. So we must find the quotes before analyze, and send the phrase between
quotes to analyzer. My solution can let the quotes, operators, phrase take effect, 
it can resolve the problem, implement the search function and Chinese involved. Below is my solution,
if (any != null)
{
any.setText( splitWord(any.getText()) );
}
Use the splitWord to replace the MainUtil.splitWord, and MainUtil.splitWord will be used in splitWord.
Below is the splitWord function in Search.java
//code from here, these code will be in .service.main.Search.java file
//author: [email protected]
private static final String OPER_AND = " and ";
private static final String OPER_OR = " or ";
private static final String OPER_NOT = " not ";
private String splitWord( String strValue )
{
//basic process string: trim, multi whitespace changed to one.
String  strQuoteSg = "/'";
String strQuoteDb = "/"";
//single quote to double quote mark
strValue = strValue.replaceAll( strQuoteSg, strQuoteDb);
//trim
strValue = strValue.trim();
//union the continued whitespace to one single
strValue = strValue.replaceAll("//s//s+", " ");
//toLowerCase, the search is not case sensitive
strValue = strValue.toLowerCase();
if( strValue.length()>0 )
{
int nFirstIndex = strValue.indexOf(strQuoteDb);
if( nFirstIndex<0>
{
//no quotes, must use the operator and, or, not to supple the quotes
strValue = replaceComponent( strValue );
}
return splitString( strValue );
}
else
return strValue;
}

// " " --> " and "
private String replaceComponent( String strValue )
{
String strQuoteDb = "/"";
String strWhitespace = " ";
//add quotes to head and tail
strValue = strQuoteDb +strValue+ strQuoteDb;
//find the whitespace index
int nIndex = strValue.indexOf( strWhitespace );
if( nIndex<0>
return strValue;
else
{
//and ,or ,not
strValue = checkKeyword( strValue );
//if not inclucde, just use add as default.
if( strValue.contains( OPER_AND ) || strValue.contains( OPER_OR )
|| strValue.contains( OPER_NOT ))
{
return strValue;
}
else
{
return strValue.replace( strWhitespace, 
strQuoteDb+ strWhitespace+"and"+strWhitespace+strQuoteDb);
}
}
}
private String checkKeyword(String strValue)
{
strValue = checkKeywordComponent( strValue, OPER_AND );
strValue = checkKeywordComponent( strValue, OPER_OR );
strValue = checkKeywordComponent( strValue, OPER_NOT );
return strValue;
}
//add quotes to the head and tail of the string
//the strValue and keyword must be lowercase 
private String checkKeywordComponent(String strValue, String keyword)
{
StringBuffer sb = new StringBuffer();
sb.append( strValue );
int nIndex = sb.indexOf( keyword );
int offset = keyword.length();
String strQuoteDb = "/"";
while( nIndex >=0 )
{
//check the quote
if( !sb.substring( nIndex-1, nIndex).equals( strQuoteDb ))
{
sb.insert( nIndex, strQuoteDb );
offset++;
}
if( !sb.substring( nIndex+offset, nIndex+offset+1).equals( strQuoteDb ))
{
sb.insert( nIndex+offset, strQuoteDb );
}
nIndex = sb.indexOf(keyword, nIndex+2 );
offset = keyword.length();
}
return sb.toString();
}
private String splitString(String strValue)
{
//clear the whitespace of head and tail
strValue = strValue.trim();
//continued whitespace to one  
strValue = strValue.replaceAll("//s//s+", " ");
//add quotes for operator: and ,or ,not
strValue = checkKeyword( strValue );
String strQuoteDb = "/"";
StringBuffer sb = new StringBuffer();
int nStartIndex = 0;
int nFirstIndex = strValue.indexOf( strQuoteDb );
while( nFirstIndex>=0 )
{
sb.append( strValue.substring( nStartIndex, nFirstIndex+1 ) );
int nSecondQuote = strValue.indexOf(strQuoteDb, nFirstIndex+1 );
nStartIndex = (nFirstIndex
if( nSecondQuote<0 class="Apple-tab-span" style="white-space:pre">//the last quote  not exist
{
String strLast = strValue.substring(nStartIndex, strValue.length() );
strLast = MainUtil.splitWord( strLast );
sb.append( strLast );
sb.append( strQuoteDb );
nStartIndex = strValue.length()-1;
break;
}
else
{
String strLast = strValue.substring( nStartIndex, nSecondQuote );
strLast = MainUtil.splitWord( strLast );
sb.append( strLast );
sb.append( strQuoteDb );
nStartIndex = nSecondQuote+1;
}
//find the third "
nFirstIndex = strValue.indexOf( strQuoteDb, nStartIndex );
}
if( nStartIndex+1 <>
{
sb.append( strValue.substring(nStartIndex+1));
}
return sb.toString();
}
You can have a test.
4. COMMIT?
who can commit this to GN source?
Or how can i commit this ?
()-1)?>

你可能感兴趣的:(The solution to search related problems on Geonetwork: operators, quotes, phrase, chinese.)