12.1 Data Files
Table constructors provide an interesting alternative for file formats. With a
little extra work when writing data, reading becomes trivial. The technique is to
write our data file as Lua code that, when run, builds the data into the program. 当运行的时候,build data 到program?
With table constructors, these chunks can look remarkably like a plain data file.
If our data file is in a predefined format, such as CSV (Comma-Separated Values) or XML, we have
little choice. (如果是CSV, or XML 那我们的选择就少了) However, if we are going to create the file for our own use, we can use Lua constructors as our format.(如果是我们自己有权选择,那就可以使用Lua constructor)
In this format, we represent each data record as a Lua constructor.
Instead of writing in our data file something like
Donald E. Knuth,Literate Programming,CSLI,1992
Jon Bentley,More Programming Pearls,Addison-Wesley,1990
we write
Entry{"Donald E. Knuth",
"Literate Programming",
"CSLI",
1992}
Entry{"Jon Bentley",
"More Programming Pearls",
"Addison-Wesley",
1990}
Remember that Entry{code} is the same as Entry({code}), that is, a call to
some function Entry with a table as its single argument. 单参数的方法调用,()可以省略,
So, that previous piece of data is a Lua program. (数据变成了prgrame,那么只要一运行这prgram,data就可以拿到了)To read that file, we only need to run it, with a sensible
definition for Entry. For instance, the following program counts the number of
entries in a data file:
local count=0;
function Entry() --here Entry can't be define as local ,orelse will hit error.can find Entry when excute f();
count=count+1;
end
local f=loadfile "data";
f=assert(f);
if f then
f();
end
print (count);
----------------------------
The next program collects in a set the names of all authors found in the file, and
then prints them (not necessarily in the same order as in the file):
local authors = {} -- a set to collect authors
function Entry (b)
authors[b[1]] = true
end
dofile("data")
for name in pairs(authors) do
print(name)
end
-------------------
When file size is not a big concern, we can use name-value pairs for our
representation:1
Entry{
author = "Donald E. Knuth",
title = "Literate Programming",
publisher = "CSLI",
year = 1992
}
Entry{
author = "Jon Bentley",
title = "More Programming Pearls",
year = 1990,
publisher = "Addison-Wesley",
}
This format is what we call a self-describing data format,
1If this format reminds you of BibTeX, it is not a coincidence. BibTeX was one of the inspirations
for the constructor syntax in Lua.
Lua not only runs fast, but it also compiles fast.For instance, the above
program for listing authors processes 1 MB of data in a tenth of a second.2 This
is not by chance. Data description has been one of the main applications of
Lua since its creation and we took great care to make its compiler fast for large
programs.
12.2 Serialization
We can represent serialized data as Lua code in such a
way that, when we run the code, it reconstructs the saved values into the reading
program.
function serialize (o)
if type(o) == "number" then
io.write(o)
else <other cases>
end
end
By writing a number in decimal format, however, you risk losing some precision.
In Lua 5.2, you can use a hexadecimal format to avoid this problem:
1.-- serialize number.
if type(o) == "number" then
io.write(string.format("%a", o))
2.--serialize string
a.
if type(o) == "string" then
io.write("'", o, "'") -- ie:b='o', 也就是用单引号括住表示这是个string,但有风险,如果o 里面有' or "" ,那will hit error.
b.
if type(o) == "string" then
io.write("[[", o, "]]") -- [[ o ]] -- [[ ]] is long string 的表示方法,而不管里面有没有' or "", but if malicious user 恶意用户)
provide o as--> ]]..os.execute('rm *')..[[
b=[[ ]]..os.execute('rm *')..[[ ]] -- Ops,,,
c.终极解决方法
A simple way to quote a string in a secure way is with the option“%q” from
the string.format function. It surrounds the string with double quotes and
properly escapes double quotes, newlines, and some other characters inside the
string:
a = 'a "problematic" \\string'
print(string.format("%q", a)) --> "a \"problematic\" \\string"
Since version 5.1 Lua offers another option to quote arbitrary strings in a
secure way, with the notation [=[...]=] for long strings.(这么说这也算另外一个终极办法) However, this new
notation is mainly intended for hand-written code, where we do not want to
change a literal string in any way. In automatically generated code, it is easier
to escape problematic characters, as the option “%q” from string.format does.
If you nevertheless want to use the long-string notation for automatically
generated code, you must take care of some details.(如果你仍然想使用[==[ ]==] 这种类型,需要注意一些细节) The first one is that you must choose a proper number of equal signs. (首先就是上面的=的数量拉)
A good proper number is one more than the maximum that appears in the original string. 也就是比原字符中出现=个数的最大值多一个。
The second detail is that Lua always ignores a newline at the beginning of a long string; a simple way to avoid this problem is to add a newline to be ignored.
print("equal size...");
--quote 方法找到最大=个数,并返回比最大=多一个的 [=(max+1)[string]=(max+1)]
local function quote(orgstring)
local equalSignCount=-1;
for s in string.gmatch(orgstring,"]=*]") do
print (s);
equalSignCount=math.max(equalSignCount,#s-2);
end
equalSign=string.rep("=",equalSignCount+1);
return string.format("[%s[ \n %s ]%s]",equalSign,orgstring,equalSign);
end
abc=[======[
adfasdf
asdfasf
asdfsadf
asdf==
asdf]==]
[====[
asdfsafd]====]
]======];
print(quote(abc));
string.gmatch creates an iterator to traverse all occurrences of the pattern
‘]=*]’ * mean = 个数>= 0,
string.rep to replicate an equal sign n+1 times, which is
one more than the maximum occurring in the string. Note, is replicate, not replace,是复制不是替代。
--------
Saving tables without cycles
local function serializeWithoutcyle(s)
if type (s) =='number' then
io.write(string.format("%a",s)..'\n');
elseif type(s)=='string' then
io.write(string.format("%q",s)..'\n');
elseif type(s)=='table' then
io.write("{ \n");
for k,v in pairs(s) do
io.write("[");
serializeWithoutcyle(k);
io.write("]=");
io.write(serializeWithoutcyle(v));
io.write("\n");
end
io.write("}\n");
end
end
print(serializeWithoutcyle({a=12, b='Lua', key='another "one"'}))
Saving tables with cycles
a = {x=1, y=2; {3,4,5}}
a[2] = a -- cycle
a.z = a[1] -- shared subtable
象这样的table,有cycle 自应用,循环引用, 有shared subtable a[1], can use the functionserializeWithoutcyle.
save 成下面这样就可以了a={}, 当a[2]=a ,时候,a变量就可以应用到了。 不可能用当个table表示了.
a = {}
a[1] = {}
a[1][1] = 3
a[1][2] = 4
a[1][3] = 5
a[2] = a
a["y"] = 2
a["x"] = 1
a["z"] = a[1]
关键点是, we need a variable name 去命名这个table. (we use the feature :table 可以用table as key. ) 这样当一个table item 是一个table的时候。可以用这个variable 去assign.
see below implementation:
a = {x=1, y=2, {3,4,5}}
a[2] = a -- cycle
a.z = a[1] -- shared subtable
local serializedWithCyle;
local basicSerizled;
basicSerizled=function(s)
if type(s) =='number' then
return tostring(s);
elseif type(s)=='string' then
return string.format("%q",s);
else
error("->can support serialized type="..type(s));
end
end
serializedWithCyle=function(varName,value,saveVarTable)
saveVarTable=saveVarTable or {};
io.write(varName.."=");
if type(value)=='number' or type(value)=='string' then
io.write(basicSerizled(value).."\n");
elseif type(value)=='table' then
if saveVarTable[value] then
io.write(saveVarTable[value].."\n");
else --not save
saveVarTable[value]=varName;
io.write("{};\n"); --create a new table
for k,v in pairs(value) do
-- local kname=varName.."["..basicSerizled(k).."]";
local kname=string.format("%s[%s]",varName,basicSerizled(k));
serializedWithCyle(kname,v,saveVarTable);
end
end
else
error("can support serialized type="..type(s));
end
end
serializedWithCyle("a",a);
--看上面那个迭代函数,如果晕,其实可以看下面的结果先,如果遇到table,我们把table=varname 存到预定的table
//同时我们要创建一个new table, varname={},这样如果下面的item 会refer 到这个table,我们就可以用这个varname去赋值.
a={};
a[1]={};
a[1][1]=3
a[1][2]=4
a[1][3]=5
a[2]=a --a 是一个table,我们前面已经定义了这个变量。
a["z"]=a[1] --a[1] 是一个table,我们前面已经定义了这个变量。
a["y"]=2
a["x"]=1
得益于 table 里可以用table as key, 我们可以存这个table 所对应的变量名称。 以后的cycle 如果用到这table我们就可以用这个变量赋值。 所以逻辑其实还是很清晰的,关键是我们要明白思路。