chapter 12 Data Files and Persistence

12.1 Data Files

   Table constructors provide an interesting alternative for file formats. With a
little extra work when writing data, reading becomes trivial. The technique is to
write our data file as Lua code that, when run, builds the data into the program. 当运行的时候,build data 到program?
With table constructors, these chunks can look remarkably like a plain data file.


If our data file is in  a predefined format, such as CSV (Comma-Separated Values) or XML, we have
little choice. (如果是CSV, or XML 那我们的选择就少了) However, if we are going to create the file for our own use, we  can use Lua constructors as our format.(如果是我们自己有权选择,那就可以使用Lua constructor)

In this format, we represent each data  record as a Lua constructor.


Instead of writing in our data file something like 

Donald E. Knuth,Literate Programming,CSLI,1992
Jon Bentley,More Programming Pearls,Addison-Wesley,1990
we write

Entry{"Donald E. Knuth",
"Literate Programming",
"CSLI",
1992}
Entry{"Jon Bentley",
"More Programming Pearls",
"Addison-Wesley",
1990}



Remember that Entry{code} is the same as Entry({code}), that is, a call to
some function Entry with a table as its single argument. 单参数的方法调用,()可以省略

So, that previous piece  of data is a Lua program. (数据变成了prgrame,那么只要一运行这prgram,data就可以拿到了)To read that file, we only need to run it, with a sensible
definition for Entry. For instance, the following program counts the number of
entries in a data file:


local count=0;
function Entry() --here Entry can't be define as local ,orelse will hit error.can find Entry when excute f();
 count=count+1;
end

local f=loadfile "data";
f=assert(f);
if f then
 f();
end
print (count);


----------------------------


The next program collects in a set the names of all authors found in the file, and
then prints them (not necessarily in the same order as in the file):

local authors = {} -- a set to collect authors
 function Entry (b)

  authors[b[1]] = true

 end
dofile("data")
for name in pairs(authors) do

print(name)

end

-------------------

When file size is not a big concern, we can use name-value pairs for our
representation:1
Entry{
author = "Donald E. Knuth",
title = "Literate Programming",
publisher = "CSLI",
year = 1992
}
Entry{
author = "Jon Bentley",
title = "More Programming Pearls",
year = 1990,
publisher = "Addison-Wesley",
}

This format is what we call a self-describing data format,

1If this format reminds you of BibTeX, it is not a coincidence. BibTeX was one of the inspirations
for the constructor syntax in Lua.

Lua not only runs fast, but it also compiles fast.For instance, the above
program for listing authors processes 1 MB of data in a tenth of a second.2 This
is not by chance. Data description has been one of the main applications of
Lua since its creation and we took great care to make its compiler fast for large
programs
.


12.2 Serialization

We can represent serialized data as Lua code in such a
way that, when we run the code, it reconstructs the saved values into the reading
program.


function serialize (o)
if type(o) == "number" then
io.write(o)
else <other cases>
end
end

By writing a number in decimal format, however, you risk losing some precision.
In Lua 5.2, you can use a hexadecimal format to avoid this problem:

1.-- serialize number.

if type(o) == "number" then
io.write(string.format("%a", o)) 

2.--serialize string

a.

if type(o) == "string" then
io.write("'", o, "'")  -- ie:b='o', 也就是用单引号括住表示这是个string,但有风险,如果o 里面有' or "" ,那will hit error.

b.

if type(o) == "string" then
io.write("[[", o, "]]") -- [[ o ]] -- [[ ]] is long string 的表示方法,而不管里面有没有' or "", but if malicious user 恶意用户)

provide o as-->  ]]..os.execute('rm *')..[[

b=[[ ]]..os.execute('rm *')..[[ ]]   -- Ops,,,


c.终极解决方法


A simple way to quote a string in a secure way is with the option“%q” from
the string.format function. It surrounds the string with double quotes and
properly escapes double quotes, newlines, and some other characters inside the
string:
a = 'a "problematic" \\string'
print(string.format("%q", a)) --> "a \"problematic\" \\string"


Since version 5.1 Lua offers another option to quote arbitrary strings in a
secure way, with the notation [=[...]=] for long strings.(这么说这也算另外一个终极办法) However, this new
notation is mainly intended for hand-written code, where we do not want to
change a literal string in any way
. In automatically generated code, it is easier
to escape problematic characters, as the option “%q” from string.format does.


If you nevertheless want to use the long-string notation for automatically
generated code, you must take care of some details.(如果你仍然想使用[==[ ]==] 这种类型,需要注意一些细节) The first one is that you  must choose a proper number of equal signs.         (首先就是上面的=的数量拉)

A good proper number is one more than the maximum that appears in the original string. 也就是比原字符中出现=个数的最大值多一个。

The second detail is that Lua always ignores  a newline at the beginning of a long string; a simple way to avoid this problem  is to add a newline to be ignored.


 print("equal size...");

--quote 方法找到最大=个数,并返回比最大=多一个的 [=(max+1)[string]=(max+1)] 

local function quote(orgstring)
    local equalSignCount=-1;
    for s in string.gmatch(orgstring,"]=*]") do
        print (s);
        equalSignCount=math.max(equalSignCount,#s-2);
    end
     equalSign=string.rep("=",equalSignCount+1);
      return string.format("[%s[ \n %s ]%s]",equalSign,orgstring,equalSign);

end


abc=[======[
adfasdf
asdfasf
asdfsadf
asdf==
asdf]==]
[====[
asdfsafd]====]
]======];

print(quote(abc));


string.gmatch creates an iterator to traverse all occurrences of the pattern
‘]=*]’ * mean  = 个数>= 0,

string.rep to replicate an equal sign n+1 times, which is
one more than the maximum occurring in the string.  Note, is replicate, not replace,是复制不是替代。



--------

Saving tables without cycles


local function serializeWithoutcyle(s)
  if type (s) =='number' then
     io.write(string.format("%a",s)..'\n');
  elseif type(s)=='string' then
      io.write(string.format("%q",s)..'\n');
  elseif type(s)=='table' then
      io.write("{ \n");
           for   k,v in pairs(s) do
              io.write("[");
              serializeWithoutcyle(k);
              io.write("]=");
              io.write(serializeWithoutcyle(v));
              io.write("\n");
           end
      io.write("}\n");
  end

end

print(serializeWithoutcyle({a=12, b='Lua', key='another "one"'}))



Saving tables with cycles

a = {x=1, y=2; {3,4,5}}
a[2] = a -- cycle
a.z = a[1] -- shared subtable 

象这样的table,有cycle 自应用,循环引用, 有shared subtable a[1], can use the functionserializeWithoutcyle.

save 成下面这样就可以了a={}, 当a[2]=a ,时候,a变量就可以应用到了。 不可能用当个table表示了.

a = {}
a[1] = {}
a[1][1] = 3
a[1][2] = 4
a[1][3] = 5
a[2] = a  
a["y"] = 2
a["x"] = 1
a["z"] = a[1]

关键点是,  we need a variable name 去命名这个table. (we use the feature :table 可以用table as key. ) 这样当一个table item 是一个table的时候。可以用这个variable 去assign.

see below implementation:



a = {x=1, y=2, {3,4,5}}
a[2] = a -- cycle
a.z = a[1] -- shared subtable  

local serializedWithCyle;
local basicSerizled;

basicSerizled=function(s)
   if type(s) =='number' then
      return tostring(s);
     elseif type(s)=='string' then
       return string.format("%q",s);
      else
      error("->can support serialized type="..type(s));
     end
end
serializedWithCyle=function(varName,value,saveVarTable)
    saveVarTable=saveVarTable or {};
    io.write(varName.."=");
   if type(value)=='number' or  type(value)=='string' then
      io.write(basicSerizled(value).."\n");
    elseif type(value)=='table' then
          if saveVarTable[value] then
               io.write(saveVarTable[value].."\n");
           else --not save  
              saveVarTable[value]=varName;
              io.write("{};\n"); --create a new table
              for k,v in pairs(value) do
                  --  local kname=varName.."["..basicSerizled(k).."]";
                      local kname=string.format("%s[%s]",varName,basicSerizled(k));
                      serializedWithCyle(kname,v,saveVarTable);
              end
              
           end
    
     else
      error("can support serialized type="..type(s));
    end      

end
serializedWithCyle("a",a);


--看上面那个迭代函数,如果晕,其实可以看下面的结果先,如果遇到table,我们把table=varname 存到预定的table

//同时我们要创建一个new table, varname={},这样如果下面的item 会refer 到这个table,我们就可以用这个varname去赋值.


a={};

a[1]={};
a[1][1]=3
a[1][2]=4
a[1][3]=5
a[2]=a  --a 是一个table,我们前面已经定义了这个变量。
a["z"]=a[1] --a[1] 是一个table,我们前面已经定义了这个变量。
a["y"]=2
a["x"]=1

得益于 table 里可以用table as key, 我们可以存这个table 所对应的变量名称。 以后的cycle 如果用到这table我们就可以用这个变量赋值。 所以逻辑其实还是很清晰的,关键是我们要明白思路。



















 


















你可能感兴趣的:(chapter 12 Data Files and Persistence)