>> help textscan
TEXTSCAN Read formatted data from text file or string.
C = TEXTSCAN(FID,'FORMAT') reads data from an open text file identified
by FID into cell array C. Use FOPEN to open the file and obtain FID.
The FORMAT is a string of conversion specifiers enclosed in single
quotation marks. The number of specifiers determines the number of
cells in the cell array C. For more information, see "Format Options."
C = TEXTSCAN(FID,'FORMAT',N) reads data from the file, using the FORMAT
N times, where N is a positive integer. To read additional data from
the file after N cycles, call TEXTSCAN again using the original FID.
C = TEXTSCAN(FID,'FORMAT','PARAM',VALUE) accepts one or more
comma-separated parameter name/value pairs. For a list of parameters
and values, see "Parameter Options."
C = TEXTSCAN(FID,'FORMAT',N,'PARAM',VALUE) reads data from the
file, using the FORMAT N times, and using settings specified by pairs
of PARAM/VALUE arguments.
C = TEXTSCAN(STR,...) reads data from string STR. You can use the
FORMAT, N, and PARAM/VALUE arguments described above with this syntax.
However, for strings, repeated calls to TEXTSCAN restart the scan from
the beginning each time. (To restart a scan from the last position,
request a POSITION output. See also Example 3.)
[C, POSITION] = TEXTSCAN(...) returns the file or string position at
the end of the scan as the second output argument. For a file, this is
the value that FTELL(FID) would return after calling TEXTSCAN. For a
string, POSITION indicates how many characters TEXTSCAN read.
Notes:
When TEXTSCAN reads a specified file or string, it attempts to match
the data to the format string. If TEXTSCAN fails to convert a data
field, it stops reading and returns all fields read before the failure.
Format Options:
The FORMAT string is of the form: %<WIDTH>.<PREC><SPECIFIER>
<SPECIFIER> is required; <WIDTH> and <PREC> are optional.
<WIDTH> is the number of characters or digits to read.
<PREC> applies only to the family of %f specifiers, and specifies
the number of digits to read to the right of the decimal point.
Supported values for SPECIFIER:
Numeric Input Type Specifier Output Class
------------------ --------- ------------
Integer, signed %d int32
%d8 int8
%d16 int16
%d32 int32
%d64 int64
Integer, unsigned %u uint32
%u8 uint8
%u16 uint16
%u32 uint32
%u64 uint64
Floating-point number %f double
%f32 single
%f64 double
%n double
TEXTSCAN converts numeric fields to the specified output type
according to MATLAB rules regarding overflow, truncation, and the
use of NaN, Inf, and -Inf. For example, MATLAB represents an
integer NaN as zero.
TEXTSCAN imports any complex number as a whole into a complex
numeric field, converting the real and imaginary parts to the
specified type (such as %d or %f). Do not include embedded white
space in a complex number.
Character Strings Specifier Details
----------------- --------- -------------------------
Strings %s String
%q String, possibly double-quoted
%c Single character, including delimiter
Pattern-matching %[...] Read only characters in the brackets,
until the first nonmatching
character. To include ] in the set,
specify it first: %[]...].
%[^...] Read only characters not in the
brackets, until the first matching
character. To exclude ], specify it
first: %[^]...].
For each character (%c) specifier, TEXTSCAN returns a char array.
Other string specifiers return a cell array of strings.
Skipping fields or parts of fields:
Specifier Action Taken
--------- ------------
%*... Skip the field. TEXTSCAN does not create an output cell.
%*N... Ignore N characters of the field, where N is an integer
less than or equal to the number of characters in the
field.
Alternatively, include literal text to ignore in the specifier.
For example, 'Level%u8' reads 'Level1' as 1.
TEXTSCAN does not include leading white-space characters in the
processing of any data fields. When processing numeric data,
TEXTSCAN also ignores trailing white space.
If you use the default (white space) field delimiter, TEXTSCAN
interprets repeated white-space characters as a single delimiter.
If you specify a nondefault delimiter, TEXTSCAN interprets repeated
delimiter characters as separate delimiters, and returns an empty
value to the output cell.
Parameter Options:
Parameter Value Default
--------- ----- -------
BufSize Maximum string length in bytes 4095
CollectOutput If true, TEXTSCAN concatenates 0 (false)
consecutive output cells with the
same data type into a single array.
CommentStyle Symbol(s) designating text to None
ignore. Specify a single string
(such as '%') to ignore characters
following the string on the same
line. Specify a cell array of two
strings (such as {'/*', '*/'}) to
ignore characters between the
strings. TEXTSCAN checks for
comments only at the start of each
field, not within a field.
Delimiter Field delimiter character(s) White space
EmptyValue Value to return for empty numeric NaN
fields in delimited files
EndOfLine End-of-line character Determined
from file:
\n, \r, or \r\n
ExpChars Exponent characters 'eEdD'
Headerlines Number of lines to skip. Includes 0
the remainder of the current line.
MultipleDelimsAsOne 0 (false)
If true, TEXTSCAN treats
consecutive delimiters as a single
delimiter. Only valid if you
specify the 'Delimiter' option.
ReturnOnError Determines behavior when TEXTSCAN 1 (true)
fails to read or convert. If true,
TEXTSCAN terminates without error
and returns all fields read. If
false, TEXTSCAN terminates with an
error and does not return an output
cell array.
TreatAsEmpty String(s) in the data file to None
treat as an empty value. Can be a
single string or cell array of
strings. Only applies to numeric
fields.
Whitespace White-space characters ' \b\t'
Examples:
Example 1: Read each column of a text file.
Suppose the text file 'mydata.dat' contains the following:
Sally Level1 12.34 45 1.23e10 inf Nan Yes 5.1+3i
Joe Level2 23.54 60 9e19 -inf 0.001 No 2.2-.5i
Bill Level3 34.90 12 2e5 10 100 No 3.1+.1i
Read the file:
fid = fopen('mydata.dat');
C = textscan(fid, '%s%s%f32%d8%u%f%f%s%f');
fclose(fid);
TEXTSCAN returns a 1-by-9 cell array C with the following cells:
C{1} = {'Sally','Joe','Bill'} %class cell
C{2} = {'Level1'; 'Level2'; 'Level3'} %class cell
C{3} = [12.34;23.54;34.9] %class single
C{4} = [45;60;12] %class int8
C{5} = [4294967295; 4294967295; 200000] %class uint32
C{6} = [Inf;-Inf;10] %class double
C{7} = [NaN;0.001;100] %class double
C{8} = {'Yes','No','No'} %class cell
C{9} = [5.1+3.0i; 2.2-0.5i; 3.1+0.1i] %class double
The first two elements of C{5} are the maximum values for a 32-bit
unsigned integer, or intmax('uint32').
Example 2: Read a string, truncating each value to one decimal digit.
str = '0.41 8.24 3.57 6.24 9.27';
C = textscan(str, '%3.1f %*1d');
TEXTSCAN returns a 1-by-1 cell array C:
C{1} = [0.4; 8.2; 3.5; 6.2; 9.2]
Example 3: Resume a text scan of a string.
lyric = 'Blackbird singing in the dead of night';
[firstword, pos] = textscan(lyric,'%9c', 1); %first word
lastpart = textscan(lyric(pos+1:end), '%s'); %remaining text
For additional examples, type "doc textscan" at the command prompt.
=========
实验
>> fid=fopen('e:\readme.txt')
fid =
4
>> c{:}
ans =
'Apache'
>> c=textscan(fid,'%10s',2)
c =
{2x1 cell}
>> c{:}
ans =
'Lucene'
'README'
>> fclose(fid)
ans =
0
>>
>> c=textscan(fid,'%7.2f32')
c =
[3x1 single]
>> c{:}
ans =
1.0e+004 *
0.1321
3.2423
0.1000
>>
跳过某个字段,使用百分号后跟*
>> fid=fopen('e:\readme.txt')
fid =
7
>> c=textscan(fid,'%7.2f32 %*n')
c =
[2x1 single]
>> fclose(fid)
ans =
0
>> c{:}
ans =
1.0e+003 *
1.3210
0.9999
readme.txt内容是:
1321 32423 999.89
test1.txt内容是:
资产负债表
2004年12月31日
资 产 期末数 负债及所有者权益 期末数
流动资产: 流动负债:
货币资金 6865234.00 短期借款 120000.00
应收票据 72120.00 应付票据 85500.00
应收账款 38050.00 应付账款 80200.00
减:坏账准备 预收账款
应收账款净额 38050.00 应付工资
其他应收款 应付福利费 8430.00
预付账款 26600.00 应交税金 24420.00
存 货 281950.00 应付股利 34000.00
待摊费用 100.00 其他应付款
流动资产合计 7284054.00 预提费用 1600.00
长期投资: 流动负债合计 354150.00
长期股权投资 长期负债:
固定资产: 长期借款 804800.00
固定资产原价 2370000.00 应付债券
减:累计折旧 41200.00 长期负债合计 804800.00
固定资产净值 2328800.00
所有者权益:
无形资产及其他资产: 实收资本 8900000.00
无形资产 500000.00 资本公积
长期待摊费用 盈余公积 28290.40
其他长期资产 未分配利润 25613.60
无形资产及其他资产合计 500000.00 所有者权益合计 8953904.00
资产总计 10112854.00 负债及所有者权益总计 10112854.00
>> fclose(fid)
ans =
0
>> fid=fopen('e:\test1.txt')
fid =
7
>> c1=textscan(fid,'%s',5)
c1 =
{5x1 cell}
>> c1{:}
ans =
'资产负债表'
'2004年12月31日'
'资'
'产'
'期末数'
>> c1=textscan(fid,'%s',5)
c1 =
{5x1 cell}
>> c1{:}
ans =
'负债及所有者权益'
'期末数'
'流动资产:'
'流动负债:'
' 货币资金'
>> c1=textscan(fid,'%s',5)
c1 =
{5x1 cell}
>> c1{:}
ans =
'6865234.00'
' 短期借款'
'120000.00'
' 应收票据'
'72120.00'
>>
>> c1=textscan(fid,'%s %d64 %s %d64 %s %d64',1)
将test1.txt的内容改为:
货币资金 6865234.00 短期借款 120000.00
应收票据 72120.00 应付票据 85500.00
应收账款 38050.00 应付账款 80200.00
减:坏账准备 预收账款
应收账款净额 38050.00 应付工资
其他应收款 应付福利费 8430.00
预付账款 26600.00 应交税金 24420.00
存 货 281950.00 应付股利 34000.00
待摊费用 100.00 其他应付款
流动资产合计 7284054.00 预提费用 1600.00
读与括号中的字符不匹配的字符,直到遇上第一个匹配符为止
%[^...]
>> c1=textscan(fid,'%s %[^\n]')
c1 =
{10x1 cell} {10x1 cell}
>> c1{:}
ans =
' 货币资金'
' 应收票据'
' 应收账款'
'减:坏账准备'
'应收账款净额'
' 其他应收款'
' 预付账款'
' 存'
' 待摊费用'
' 流动资产合计'
ans =
'6865234.00 短期借款 120000.00 '
'72120.00 应付票据 85500.00 '
'38050.00 应付账款 80200.00 '
' 预收账款 '
'38050.00 应付工资 '
' 应付福利费 8430.00 '
'26600.00 应交税金 24420.00 '
'货 281950.00 应付股利 34000.00 '
'100.00 其他应付款 '
'7284054.00 预提费用 1600.00 '
>> c1{1}
ans =
' 货币资金'
' 应收票据'
' 应收账款'
'减:坏账准备'
'应收账款净额'
' 其他应收款'
' 预付账款'
' 存'
' 待摊费用'
' 流动资产合计'
>> c1{2}
ans =
'6865234.00 短期借款 120000.00 '
'72120.00 应付票据 85500.00 '
'38050.00 应付账款 80200.00 '
' 预收账款 '
'38050.00 应付工资 '
' 应付福利费 8430.00 '
'26600.00 应交税金 24420.00 '
'货 281950.00 应付股利 34000.00 '
'100.00 其他应付款 '
'7284054.00 预提费用 1600.00 '
将text1.txt改为:
货币资金,6865234.00 , 短期借款,120000.00
应收票据,72120.00 , 应付票据,85500.00
应收账款,38050.00 , 应付账款,80200.00
减:坏账准备,, 预收账款,
应收账款净额,38050.00 , 应付工资,
其他应收款,, 应付福利费,8430.00
预付账款,26600.00 , 应交税金,24420.00
存 货,281950.00 , 应付股利,34000.00
待摊费用,100.00 , 其他应付款,
流动资产合计,7284054.00 , 预提费用,1600.00
体会一下空字段和分隔符
>> fclose(fid)
ans =
0
>> fid=fopen('e:\test1.txt')
fid =
7
>> c1=textscan(fid,'%s %d64 %s %d64','delimiter',',','emptyValue',-Inf)
c1 =
{10x1 cell} [10x1 int64] {10x1 cell} [10x1 int64]
>> c1{:}
ans =
' 货币资金'
' 应收票据'
' 应收账款'
'减:坏账准备'
'应收账款净额'
' 其他应收款'
' 预付账款'
' 存 货'
' 待摊费用'
' 流动资产合计'
ans =
6865234
72120
38050
-9223372036854775808
38050
-9223372036854775808
26600
281950
100
7284054
ans =
' 短期借款'
'应付票据'
' 应付账款'
' 预收账款'
' 应付工资'
' 应付福利费'
' 应交税金'
' 应付股利'
' 其他应付款'
' 预提费用'
ans =
120000
85500
80200
-9223372036854775808
-9223372036854775808
8430
24420
34000
-9223372036854775808
1600
>>
再试一次
> fid=fopen('e:\test1.txt')
fid =
7
>> c1=textscan(fid,'%s %d64 %s %d64','delimiter',',','emptyValue',NaN)
c1 =
{10x1 cell} [10x1 int64] {10x1 cell} [10x1 int64]
>> c1{:}
ans =
' 货币资金'
' 应收票据'
' 应收账款'
'减:坏账准备'
'应收账款净额'
' 其他应收款'
' 预付账款'
' 存 货'
' 待摊费用'
' 流动资产合计'
ans =
6865234
72120
38050
0
38050
0
26600
281950
100
7284054
ans =
' 短期借款'
'应付票据'
' 应付账款'
' 预收账款'
' 应付工资'
' 应付福利费'
' 应交税金'
' 应付股利'
' 其他应付款'
' 预提费用'
ans =
120000
85500
80200
0
0
8430
24420
34000
0
1600
>>
>> fid=fopen('e:\test1.txt')
fid =
7
>> c1=textscan(fid,'%s %d64 %s %d64','delimiter',',')
c1 =
{10x1 cell} [10x1 int64] {10x1 cell} [10x1 int64]
>> c1{:}
ans =
' 货币资金'
' 应收票据'
' 应收账款'
'减:坏账准备'
'应收账款净额'
' 其他应收款'
' 预付账款'
' 存 货'
' 待摊费用'
' 流动资产合计'
ans =
6865234
72120
38050
0
38050
0
26600
281950
100
7284054
ans =
' 短期借款'
'应付票据'
' 应付账款'
' 预收账款'
' 应付工资'
' 应付福利费'
' 应交税金'
' 应付股利'
' 其他应付款'
' 预提费用'
ans =
120000
85500
80200
0
0
8430
24420
34000
0
1600
>>