2011年7月15日23:34:18
效率PK —— 统计字符串中字符出现次数
原文见:javascript 统计哪个字符出现的次数最多–修正版
var str = "The officials say tougher legislation is needed because some \ telecommunications companies in recent years have begun new services and made \ system upgrades that create technical obstacles to surveillance. They want to \ increase legal incentives and penalties aimed at pushing carriers like Verizon, \ AT&T, and Comcast to ensure that any network changes will not disrupt their \ ability to conduct wiretaps." + "An Obama administration task force that includes officials from the \ Justice and Commerce Departments, the F.B.I. and other agencies recently began \ working on draft legislation to strengthen and expand a 1994 law requiring \ carriers to make sure their systems can be wiretapped. There is not yet \ agreement over the details, according to officials familiar with the \ deliberations, but they said the administration intends to submit a package to \ Congress next year." + "To bolster their case, security agencies are citing two previously \ undisclosed episodes in which major carriers were stymied for weeks or even \ months when they tried to comply with court-approved wiretap orders in criminal \ or terrorism investigations, the officials said.", count = 0, index = 0, arrStr = [], oLetter = {}; str = str.replace(/\s/g,''); // 之前的Method_3和normal不对,原来是漏了这里 for (var i = 0; i < 5000; i++) { //create a long text arrStr.push(str); } str = arrStr.join(""); // 原来的代码这里为什么要用","?我发现他的代码也会统计",",所以把","删掉了。 if(! ('console' in this || 'console' in window) ){ // 专给无console的解析器 console = { stacks : [], log : function(str){ stacks.push(str); }, show : function(){ alert(console.stacks.join('\n')); console.stacks = []; } } }
我的方法,使用str.replace(RegExp,Function) 进行遍历
关于str.replace(RegExp, function)的用法,请参考我的上一篇随笔《JavaScript replace(RegExp, Function)详解》
function method_replace_RegExp_function(){ function counter(match) { // 用于统计的函数 if(visited[match]){ visited[match]++; } else { visited[match] = 1; } } var count = 0, index = 0, arrStr=[], visited = {}; var begin = (new Date()).getTime(); str.replace(/\S/g, counter); for (var i in visited) { if (visited[i] > count) { count = visited[i]; index = i; } } var end = + new Date(); console.log("Method_replace_RegExp_Function:\n出现次数最多的是" + index + ",一共出现" + count + "次", "耗时:" + (end - begin) + "毫秒"); }
// 又想到的Normal方法 function method_normal(){ var count = 0, index = 0, arrStr = [], visited = {}, tmp = ''; var begin = (new Date()).getTime(); for(var i = 0; i < str.length; i++){ tmp = str.charAt(i); if(visited[tmp]){ visited[tmp]++; } else { visited[tmp] = 1; } } for (var i in visited) { if (visited[i] > count) { count = visited[i]; index = i; } } var end = + new Date(); console.log("Method_normal:\n出现次数最多的是" + index + ",一共出现" + count + "次", "耗时:" + (end - begin) + "毫秒"); } method_2(); method_3(); method_replace_RegExp_function(); method_normal(); (!!console.show)?console.show():void 0; //给不支持console的浏览器使用的
几个环境下的输出结果:
傲游 3.1.3.600 Method_2: 出现次数最多的是e,一共出现610000次 耗时:7128毫秒 Method_3: 出现次数最多的是e,一共出现610000次 耗时:6757毫秒 Method_replace_RegExp_Function: 出现次数最多的是e,一共出现610000次 耗时:4399毫秒 Method_normal: 出现次数最多的是e,一共出现610000次 耗时:5925毫秒
Node.exe 2011.07.14 v0.5.1 http://nodejs.org > method_2(); Method_2: 出现次数最多的是e,一共出现610000次 耗时:3141毫秒 > method_3(); Method_3: 出现次数最多的是e,一共出现610000次 耗时:1560毫秒 > //method_replace_RegExp_function(); //这个会直接死掉…… > method_normal(); Method_normal: 出现次数最多的是e,一共出现610000次 耗时:1045毫秒
FireFox 3.6.3 FireBug 1.7.3 Method_2: 出现次数最多的是e,一共出现610000次 耗时:12046毫秒 Method_3: 出现次数最多的是e,一共出现610000次 耗时:10488毫秒 Method_replace_RegExp_Function: 出现次数最多的是e,一共出现610000次 耗时:6836毫秒 Method_normal: 出现次数最多的是e,一共出现610000次 耗时:5351毫秒
IE9: 日志: Method_2: 出现次数最多的是e,一共出现610000次耗时:18411毫秒 日志: Method_3: 出现次数最多的是e,一共出现610000次耗时:10968毫秒 日志: Method_replace_RegExp_Function: 出现次数最多的是e,一共出现610000次耗时:1651毫秒 日志: Method_normal: 出现次数最多的是e,一共出现610000次耗时:12339毫秒
总结:不能迷信正则表达式的强大搜索功能,正则的每一次匹配过程就是一次循环。
所以正则的匹配不能用太多,善用String.replace(RegExp, Function)才是高效的选择。