最近在写毕设的时候遇到的一个,很常见的问题
就是对单词统计个数
stackoverflow上的解答
关于如何高效的实现一般有下面几种方法:
[1]使用hashmap
但是注意不要使用containsKey(X) 来判断是否已经事先存在某个word 这会导致每次都遍历整个map
可以使用get(X)==null 来判断是否存在了该单词 这样更快
Integer count = map.get(word); if(count == null){ count = 0; } map.put(word, count + 1);
[2]使用AtomicLong
final ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong>(); ... map.putIfAbsent(word, new AtomicLong(0)); map.get(word).incrementAndGet();
[3]使用trove【high perfomance for java collections】
TObjectIntHashMap<String> freq = new TObjectIntHashMap<String>(); ... freq.adjustOrPutValue(word, 1, 1);
至于这个为什么会比HashMap<String,Integer> 快的原因 还要慢慢寻找?
class MutableInt { int value = 1; // note that we start at 1 since we're counting public void increment () { ++value; } public int get () { return value; } } ... Map<String, MutableInt> freq = new HashMap<String, MutableInt>(); ... MutableInt count = freq.get(word); if (count == null) { freq.put(word, new MutableInt()); } else { count.increment(); }
Memory rotation may be an issue here, since every boxing of an int larger than or equal to 128 causes an object allocation (see Integer.valueOf(int)). Although the garbage collector very efficiently deals with short-lived objects, performance will suffer to some degree.
If you know that the number of increments made will largely outnumber the number of keys (=words in this case), consider using an int holder instead. Phax already presented code for this. Here it is again, with two changes (holder class made static and initial value set to 1):
static class MutableInt { int value = 1; void inc() { ++value; } int get() { return value; } } ... Map<String,MutableInt> map = new HashMap<String,MutableInt>(); MutableInt value = map.get(key); if (value == null) { value = new MutableInt(); map.put(key, value); } else { value.inc(); }
If you need extreme performance, look for a Map implementation which is directly tailored towards primitive value types. jrudolph mentioned GNU Trove.
By the way, a good search term for this subject is "histogram".
======