先推荐两篇文章:
http://www.wagerlabs.com/blog/2008/02/parsing-text-an.html
http://ppolv.wordpress.com/2008/02/25/parsing-csv-in-erlang/
(都需要爬墙访问,该死的‘功夫网’)
Erlang中解析文本协议,使用Binary无疑是高效的选择,但是我发现,文章中,对Binary中各个字节组合为字符串,都是使用list的:
NewList = lists:reverse([Char|OldList])
而不是
NewList = binary_to_list(<<OldBin/binary,$Char>>)
稍后我做了个测试,证明了对于大量短字符串的构成,比如将 <<"GET /index.html HTTP/1.1">> 解析为 ["GET","/index.html","HTTP/1.1"],使用list会更好一些。
简单写了个循环的测试代码:
test_append() ->
test_char_append(100),
test_char_append(1000),
test_char_append(10000),
test_char_append(100000),
test_char_append(1000000),
test_char_append(10000000),
test_field_append(10000),
test_field_append(100000),
test_field_append(200000),
test_field_append(300000).
test_char_append(Loop) ->
erlang:statistics(wall_clock),
test_char_append_by_list(Loop, []),
{_,T1} = erlang:statistics(wall_clock),
test_char_append_by_binary(Loop, <<>>),
{_,T2} = erlang:statistics(wall_clock),
io:format("~p loops, test_char_append_by_list using time: ~pms~n", [Loop,T1]),
io:format("~p loops, test_char_append_by_binary using time: ~pms~n~n", [Loop,T2]),
ok.
test_field_append(Loop) ->
erlang:statistics(wall_clock),
test_field_append_by_list(Loop, []),
{_,T1} = erlang:statistics(wall_clock),
test_field_append_by_binary(Loop, []),
{_,T2} = erlang:statistics(wall_clock),
io:format("~p loops, test_field_append_by_list using time: ~pms~n", [Loop,T1]),
io:format("~p loops, test_field_append_by_binary using time: ~pms~n~n", [Loop,T2]),
ok.
test_char_append_by_list(0, List) -> lists:reverse(List);
test_char_append_by_list(N, List) -> test_char_append_by_list(N-1, [$!|List]).
test_char_append_by_binary(0, Bin) -> binary_to_list(Bin);
test_char_append_by_binary(N, Bin) -> test_char_append_by_binary(N-1, <<Bin/binary, $!>>).
test_field_append_by_list(0, List) -> lists:reverse(List);
test_field_append_by_list(N, List) ->
Field = test_char_append_by_list(100, []),
test_field_append_by_list(N-1, [Field|List]).
test_field_append_by_binary(0, List) -> lists:reverse(List);
test_field_append_by_binary(N, List) ->
Field = test_char_append_by_binary(100, <<>>),
test_field_append_by_binary(N-1, [Field|List]).
输出大致如下:
引用
100 loops, test_char_append_by_list using time: 0ms
100 loops, test_char_append_by_binary using time: 0ms
1000 loops, test_char_append_by_list using time: 0ms
1000 loops, test_char_append_by_binary using time: 0ms
10000 loops, test_char_append_by_list using time: 0ms
10000 loops, test_char_append_by_binary using time: 0ms
100000 loops, test_char_append_by_list using time: 16ms
100000 loops, test_char_append_by_binary using time: 16ms
1000000 loops, test_char_append_by_list using time: 203ms
1000000 loops, test_char_append_by_binary using time: 156ms
10000000 loops, test_char_append_by_list using time: 2922ms
10000000 loops, test_char_append_by_binary using time: 1594ms
10000 loops, test_field_append_by_list using time: 62ms
10000 loops, test_field_append_by_binary using time: 172ms
100000 loops, test_field_append_by_list using time: 1109ms
100000 loops, test_field_append_by_binary using time: 1860ms
200000 loops, test_field_append_by_list using time: 2672ms
200000 loops, test_field_append_by_binary using time: 4937ms
300000 loops, test_field_append_by_list using time: 3438ms
300000 loops, test_field_append_by_binary using time: 7062ms
可见当字符串较短时,使用list比binary速度更佳,当字符串达到10w以上(谁没事搞那么长的list?),binary才有一点点的优势。在大量构造短字符串时,还是乖乖用list组合并反转吧