java 统计一个(英文)文本中频率最高的10个单词

要求: 统计一个(英文)文本中频率最高的10个单词

解决思路:1.通过java I/O操作读取文本文件

                  2.用BufferedReader每次读取文本的一行(即为一个String)

                  3.将取得的String用“ ”(空格)分隔得到String[]

                  4.若单词中包含标点符号如“.” 或者","将其剔除掉。

                  5.将单词(“this”,"is","a","to" 等除外)以及出现的次数以Map<key,value>的形式保存,即Map<String,Integer>

                  6.排序输出前10个单词.

实现类代码:

 

package com.homework;
import java.io.BufferedReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class CountWordsReader {
	private BufferedReader bufferReader;
	private String excpetString;
	private Map<String,Integer> allWordsMap;
	public CountWordsReader(BufferedReader bufferReader,String[] exceptWords) {
		super();
		this.bufferReader = bufferReader;
		this.allWordsMap=new HashMap<String,Integer>();
		this.setExcpetString(exceptWords);
	}
	
    public String getExcpetString() {
		return excpetString;
	}

	public void setExcpetString(String[] exceptWords) {
		for(String aWord:exceptWords){
			this.excpetString+=aWord+" ";
		}
	}

	public void countWords(){
    	String line;                 //文本中的每一行 当作一个字符串:
    	try {
			while((line=this.bufferReader.readLine())!=null){
				String[] words=line.split(" ");
				for(String aWord:words){
				//替换掉单词中的"." ","
				String afterHandle=aWord.replace(".","").replace(",","");
					if(!this.excpetString.contains(afterHandle)){
						if(allWordsMap.containsKey(afterHandle)){
							int value=allWordsMap.get(afterHandle);
							value++;
							allWordsMap.put(afterHandle, value);       //如果原来中有这个单词 将value++
						}else{
							allWordsMap.put(afterHandle, 1);           //如果原来Map中没有这个单词 加进去。
						}
					}
					
				}
			}
		} catch (IOException e) {
			e.printStackTrace();
		}
    }
	
	/**
	 * 输出value值从高到低前10个单词
	 */
	public void printTopWords(){
		final HashMap<String,Integer> map=(HashMap<String, Integer>) this.allWordsMap;
		List<String> keys=new ArrayList<String>(this.allWordsMap.keySet());
		
		//匿名内部类实现排序。
		Collections.sort(keys, new Comparator<Object>(){
			@Override
			public int compare(Object o1, Object o2) {
				 //按照value的值降序排列,若要升序,则这里小于号换成大于号
                if(map.get(o1)<map.get(o2)){
                	return 1;
                }
                else if(map.get(o1)==map.get(o2)){
                	return 0;
                }else
                    return -1;
			}
		});
		//输出前10个单词
		int count=0;
		for(String aKey:keys){
			System.out.println(aKey+" "+this.allWordsMap.get(aKey));
			count++;
			if(count==9){
				break;
			}
		}
	}
}

 

客户端代码:

package com.homework;

import java.io.BufferedReader;

public class Test {
	public static void main(String[] args) {
		String[] expectWords={"this","is","a","the","to","of","in","be","has"};
		try {
			BufferedReader buffer=new BufferedReader(new FileReader("test.txt"));  //注意该文件放在项目的根路径下。
			CountWordsReader reader=new CountWordsReader(buffer,excpetWords);
			
			reader.countWords();
			reader.printTopWords();
			
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		}
		
	}
}

 结果:

something 3
life 2
May 2
do 2
mind 2
feel 2
oneself 2
my 2
different 1

你可能感兴趣的:(作业)