Tcl 脚本读取复杂CSV文件

用 tcl/tk 写了个测试工具,需要用tcl 脚本读取csv 文件。但复杂的csv 文件中,每个单元格可能包含逗号,双引号,换行符,双引号中又有换行符等等情况,导致读取困难。网上找到的一些例子,大多是逐个读取单个字符,用了一段时间,感觉效率差了点。研究了一下,自己写了 tcl 读csv 文件的代码,如下:

proc readCSV { channel { header 1 } { symbol , }} {
	set quote 0	
	set data [ split [ read $channel nonewline ] "\n" ]
	foreach line $data {
		set quote [ expr { $quote + [ regexp -all \" $line ]}]
		if { [ expr { $quote % 2 }] == "0" } {
			set quote 0
			append row_temp $line
			set row_temp [ split $row_temp , ]	
			foreach section $row_temp {
				set quote [ expr { $quote + [ regexp -all \" $section ]}]
				if { [ expr { $quote % 2 }] == "0" } {
					append cell_temp $section
					set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
					lappend cell $cell_temp
					unset cell_temp
					set quote 0
				} else {
					append cell_temp $section$symbol
				}
			}
			lappend final [ regsub -all {""} $cell \" ]
			unset cell
			unset row_temp
		} else {
			append row_temp $line\n
		}
	}
	# generate array if needed, or return $final here
	set row [ llength $final ]
	set column [ llength [ lindex $final 0 ]]
	if { $header == 1 } {
		for { set i 0 } { $i < $row } { incr i } {		
			for { set j 0 } { $j < $column } { incr j } {
				set csvData([ lindex [ lindex $final 0 ] $j ],$i) [ lindex [ lindex $final $i ] $j ]
			}
		}
	} else {
		for { set i 0 } { $i < $row } { incr i } {		
			for { set j 0 } { $j < $column } { incr j } {
				set csvData($i,$j) [ lindex [ lindex $final $i ] $j ]
			}
		}
	}
	return [ array get csvData ]
}

函数返回一个数组,默认指定csv文件中第一行作为Header,分隔符为",",可变更。

能够处理csv文件中包含的 ",", "'", "\n" 字符。

 

Example:

下面是以Header & line number的方式输出某单元格数据:

set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv ]
puts $csvData(Name,1)    ;# assume there is a cell containing "Name" at first row.

下面是以row number & line number方式输出某单元格数据:

set csv [ open c:/testcase.csv {RDWR} ]
array set csvData [ readCSV $csv 0 ]
puts $csvData(3,1)   

 

Efficency:
经测试,处理 2000 x 4 容量的测试用例文件,用时100ms左右。

-----------------------------------

CPU: Dual-Core 3.20GHz

Memory: 2G

System Type: 32bit

-----------------------------------

tcl 里有个专门处理csv文件的包,叫csv,对比了一下效率。如果同样返回处理后的数据列表,这个函数处理速度会快一点。

csv package的使用方法:

package require csv
package require struct::queue

set csv [ open c:/testcase.csv {RDWR} ]

::struct::queue q
::csv::read2queue $csv q
set final [ q peek [ q size ]]

Cappacity readCSV csv package file size
2000*4 103ms 170ms 768KB
2000*8 200ms 335ms 1534KB
2000*16 382ms 770ms 3065KB
2000*32 760ms 2088ms 6127KB
2000*64 1501ms 6411ms 12252KB
2000*128 2995ms 21841ms 24501KB

Output:

所输出的数据,与在Excel 中看到的csv 文件内容相同。


类的形式:

package require Itcl

itcl::class readCSV {
	common final
	common anchor 1
	constructor { path } {
		set quote 0
		set channel [ open $path {RDWR} ]
		set data [ split [ read $channel nonewline ] "\n" ]
		close $channel
			foreach line $data {
				set quote [ expr { $quote + [ regexp -all \" $line ]}]
				if { [ expr { $quote % 2 }] == "0" } {
					set quote 0
					append row_temp $line
					set row_temp [ split $row_temp , ]	
					foreach section $row_temp {
						set quote [ expr { $quote + [ regexp -all \" $section ]}]
						if { [ expr { $quote % 2 }] == "0" } {
							append cell_temp $section
							set cell_temp [ regsub {"(.*)"} $cell_temp {\1} ]
							lappend cell $cell_temp
							unset cell_temp
							set quote 0
						} else {
							append cell_temp $section,
						}
					}
					lappend final [ regsub -all {""} $cell \" ]
					unset cell
					unset row_temp
				} else {
					append row_temp $line\n
				}
			}
	}
	
	method getCell { row col } {
		return [ lindex [ lindex $final $row ] $col ]
	}
	
	method getValue { header } {
		set col [ lsearch [ lindex $final 0 ] $header ]
		return [ getCell $anchor $col ]
	}
	
	method next { } {
		if { [ done ] == 0 } {
			incr anchor
		}
	}
	
	method pre { } {
		if { $anchor > 1 } {
			incr anchor -1
		}
	}
	
	method end { } {
		set anchor [ expr {[ llength $final ]-1}]
	}
	
	method done { } {
		if { $anchor == [ expr {[ llength $final ]-1} ]} {
			return 1
		} else {
			return 0
		}
	}
	
	method reset { } {
		set anchor 1
	}
	
}	

Name Age Address
Zhang_san 13 Address1:
1. aaaaa
2. aaad "bbbb",
3. bacad,
adfa"aaa".
Li_si 14 Address2, xxxx
aaaa"
bbbbb".,
Wang_wu 15 Address3
Example:

readCSV f c:/csvfile.csv
f getValue Name
output:

Zhang_san

f next
f getValue Name
output:

Li_si

f pre
f getValue Name
f end
f getValue Name
f getCell 1 0
output:

Zhang_san

Wang_wu

Zhang_san

你可能感兴趣的:(Tcl 脚本读取复杂CSV文件)