Python学习心路历程-day5

学习内容:

1.模块介绍

2.time &datetime模块

3.random模块

4.os模块

5.sys模块

6.shutil模块

7.json & picle模块

8.shelve模块

9.xml处理

10.yaml处理

11.configparser模块

12.hashlib模块

13.subprocess模块

14.logging模块

15.re正则表达式

1.模块介绍                                                                                       

定义:

  用来从逻辑上组织Python代码(变量,函数,类,逻辑:实现一个功能),本质就是.py结尾的Python文件(例如test.py 对应的模块名就是test)
包的定义:用来从逻辑上组织模块的,本质就是一个目录(必须带有一个__init__.py文件)。

导入方法:

import module_name
import module1_name,import modulel2_name
from module_name import *
from module_name import m1,m2.m3

import本质(路径搜索和搜索路径)  

导入模块的本质就是把python文件解释一遍,如:

import module_name --->module_name.py----->module_name.py的路径---->sys.path

导入包的本质就是执行该包下的__init__.py文件。

导入优化

  from module_test import test

模块分类

  A.标准库

  B.开源模块

  C.自定义模块

2.time 与 datetime模块                                                                   

  Python中,通常有这几种方式来表示时间:
    1)时间戳
    2)格式化的时间字符串
    3)元组(struct_time)共九个元素。
  由于Python的time模块实现主要调用C库,所以各个平台可能有所不同。
UTC(Coordinated Universal Time,世界协调时)亦即格林威治天文时间,世界标准时间。
  在中国为UTC+8
DST(Daylight Saving Time)即夏令时
  时间戳(timestamp)的方式:
  通常来说,时间戳表示的是从1970年1月1日00:00:00开始按秒计算的偏移量。
  我们运行“type(time.time())”,返回的是float类型。返回时间戳方式的函数主要有time(),clock()等。
  元组(struct_time)方式:struct_time元组共有9个元素,返回struct_time的函数主要有gmtime(),localtime(),strptime()。
  下面列出这种方式元组中的几个元素:

time模块:

>>> time.time()  #返回当前时间戳
1522035652.215034
>>> time.localtime() #返回本地时间 的struct time对象格式
time.struct_time(tm_year=2018, tm_mon=3, tm_mday=26, tm_hour=11, tm_min=45, tm_sec=8, tm_wday=0, tm_yday=85, tm_isdst=0)
>>>
>>> time.gmtime()   #当前时间戳转化为UTC
time.struct_time(tm_year=2018, tm_mon=3, tm_mday=26, tm_hour=3, tm_min=45, tm_sec=54, tm_wday=0, tm_yday=85, tm_isdst=0)
>>> time.localtime()#当前时间utc+8
time.struct_time(tm_year=2018, tm_mon=3, tm_mday=26, tm_hour=11, tm_min=46, tm_sec=2, tm_wday=0, tm_yday=85, tm_isdst=0)
>>> x = time.localtime()
>>> time.mktime(x)  #元组转时间戳
1522036050.0
>>>
>>> time.strftime('%Y-%m-%d %H:%M:%S',time.localtime())  #元组转化格式化字符串
'2018-03-26 11:48:14'
>>>
>>> time.strptime('2016-08-23 16:06:54','%Y-%m-%d %H:%M:%S') #格式化字符串转化原组
time.struct_time(tm_year=2016, tm_mon=8, tm_mday=23, tm_hour=16, tm_min=6, tm_sec=54, tm_wday=1, tm_yday=236, tm_isdst=-1)
>>>

strftime("格式",struct_time)--->"格式化字符串"
strptime("格式化字符串",“格式”)--->struct_time

>>> time.asctime()
'Mon Mar 26 11:50:02 2018'
>>> time.ctime()
'Mon Mar 26 11:50:10 2018'
>>> time.ctime(1522035652.215034) #时间戳转特殊格式
'Mon Mar 26 11:40:52 2018'
>>>

datetime模块:

>>> import datetime
>>> datetime.datetime.now()
datetime.datetime(2018, 3, 26, 12, 22, 37, 766518)
>>> print(datetime.datetime.now())
2018-03-26 12:22:45.683983
>>> print(datetime.datetime.now()+datetime.timedelta(+3))      #三天后的时间
2018-03-29 12:23:08.399578
>>> print(datetime.datetime.now()+datetime.timedelta(-3))      #三天前的时间
2018-03-23 12:23:11.624363
>>> print(datetime.datetime.now()+datetime.timedelta(hours=3)) #三小时后
2018-03-26 15:23:20.175275
>>> print(datetime.datetime.now()+datetime.timedelta(hours=-3))#三小时前
2018-03-26 09:23:27.564384

格式参照

%a    本地(locale)简化星期名称
%A    本地完整星期名称
%b    本地简化月份名称
%B    本地完整月份名称
%c    本地相应的日期和时间表示
%d    一个月中的第几天(01 - 31)
%H    一天中的第几个小时(24小时制,00 - 23)
%I    第几个小时(12小时制,01 - 12)
%j    一年中的第几天(001 - 366)
%m    月份(01 - 12)
%M    分钟数(00 - 59)
%p    本地am或者pm的相应符    一
%S    秒(01 - 61)    二
%U    一年中的星期数。(00 - 53星期天是一个星期的开始。)第一个星期天之前的所有天数都放在第0周。    三
%w    一个星期中的第几天(0 - 6,0是星期天)    三
%W    和%U基本相同,不同的是%W以星期一为一个星期的开始。
%x    本地相应日期
%X    本地相应时间
%y    去掉世纪的年份(00 - 99)
%Y    完整的年份
%Z    时区的名字(如果不存在为空字符)
%%    ‘%’字符

时间转换关系

Python学习心路历程-day5_第1张图片

Python学习心路历程-day5_第2张图片

3.random模块                                                                                

随机浮点数

>>> import random
>>> random.random()
0.38741916300777435
>>> random.random()
0.2726009482506605
>>> random.random()
0.8928518510787847
>>> random.random()
0.12703455294635024
>>> random.random()
0.054001403811667514

整数随机数

>>> random.randint(1,9) #随机1-9不包括9
3
>>> random.uniform(1,9) #指定区间
6.532363738442411
>>>random.randrange(0,5)#指基数递增集合中取随机数

随机获取元素

>>> random.choice('csqzyy')
'y'

从序列中随机取指定长度的片

>>> random.sample('csqzyy', 5)
['c', 'z', 'y', 's', 'y']

洗牌

  1 items = [1,2,3,4,5,6,7]
  2 print(items) #[1, 2, 3, 4, 5, 6, 7]
  3 random.shuffle(items)
  4 print(items) #[1, 4, 7, 2, 5, 3, 6]

练习:生成随机数

  1 #!/user/bin/env python
  2 # -*- coding: UTF-8 -*-
  3 # Author: cs
  4 # 用于生成4位随机验证码
  5 import random
  6 checkcode = ""
  7 for i in range(4):
  8     current = random.randrange(0, 4)  #生成随机数与循环次数对比
  9     current1 = random.randrange(0, 4)
 10     if current == i:
 11         tmp = chr(random.randint(65, 90))  #65-90为ASCII码表A-Z
 12     elif current1 == i:
 13         tmp = chr(random.randint(97, 122))   #97-122为ASCII码a-z
 14     else:
 15         tmp = random.randint(0, 9)
 16     checkcode += str(tmp)
 17 print(checkcode)

注意:该python文件名为“random”,运行时可能会出现 AttributeError: module 'random' has no attribute 'randrange'错误提示,后来文件名改为“random1”就可以了

4.OS模块                                                                                       

提供对操作系统进行调用的接口

  1 os.getcwd() 获取当前工作目录,即当前python脚本工作的目录路径
  2 os.chdir("dirname")  改变当前脚本工作目录;相当于shell下cd
  3 os.curdir  返回当前目录: ('.')
  4 os.pardir  获取当前目录的父目录字符串名:('..')
  5 os.makedirs('dirname1/dirname2')    可生成多层递归目录
  6 os.removedirs('dirname1')    若目录为空,则删除,并递归到上一级目录,如若也为空,则删除,依此类推
  7 os.mkdir('dirname')    生成单级目录;相当于shell中mkdir dirname
  8 os.rmdir('dirname')    删除单级空目录,若目录不为空则无法删除,报错;相当于shell中rmdir dirname
  9 os.listdir('dirname')    列出指定目录下的所有文件和子目录,包括隐藏文件,并以列表方式打印
 10 os.remove()  删除一个文件
 11 os.rename("oldname","newname")  重命名文件/目录
 12 os.stat('path/filename')  获取文件/目录信息
 13 os.sep    输出操作系统特定的路径分隔符,win下为"\\",Linux下为"/"
 14 os.linesep    输出当前平台使用的行终止符,win下为"\t\n",Linux下为"\n"
 15 os.pathsep    输出用于分割文件路径的字符串
 16 os.name    输出字符串指示当前使用平台。win->'nt'; Linux->'posix'
 17 os.system("bash command")  运行shell命令,直接显示
 18 os.environ  获取系统环境变量
 19 os.path.abspath(path)  返回path规范化的绝对路径
 20 os.path.split(path)  将path分割成目录和文件名二元组返回
 21 os.path.dirname(path)  返回path的目录。其实就是os.path.split(path)的第一个元素
 22 os.path.basename(path)  返回path最后的文件名。如何path以/或\结尾,那么就会返回空值。即os.path.split(path)的第二个元素
 23 os.path.exists(path)  如果path存在,返回True;如果path不存在,返回False
 24 os.path.isabs(path)  如果path是绝对路径,返回True
 25 os.path.isfile(path)  如果path是一个存在的文件,返回True。否则返回False
 26 os.path.isdir(path)  如果path是一个存在的目录,则返回True。否则返回False
 27 os.path.join(path1[, path2[, ...]])  将多个路径组合后返回,第一个绝对路径之前的参数将被忽略
 28 os.path.getatime(path)  返回path所指向的文件或者目录的最后存取时间
 29 os.path.getmtime(path)  返回path所指向的文件或者目录的最后修改时间

5.sys模块                                                                                        

  1 sys.argv           命令行参数List,第一个元素是程序本身路径
  2 sys.exit(n)        退出程序,正常退出时exit(0)
  3 sys.version        获取Python解释程序的版本信息
  4 sys.maxint         最大的Int值
  5 sys.path           返回模块的搜索路径,初始化时使用PYTHONPATH环境变量的值
  6 sys.platform       返回操作系统平台名称
  7 sys.stdout.write('please:')
  8 val = sys.stdin.readline()[:-1]

6.shutil模块                                                                                     

高级的 文件、文件夹、压缩包 处理模块

shutil.copyfileobj(fsrc, fdst[, length])
将文件内容拷贝到另一个文件中,可以部分内容

  1 def copyfileobj(fsrc, fdst, length=16*1024):
  2     """copy data from file-like object fsrc to file-like object fdst"""
  3     while 1:
  4         buf = fsrc.read(length)
  5         if not buf:
  6             break
  7         fdst.write(buf)

shutil.copyfile(src, dst)
拷贝文件

  1 def copyfile(src, dst):
  2     """Copy data from src to dst"""
  3     if _samefile(src, dst):
  4         raise Error("`%s` and `%s` are the same file" % (src, dst))
  5 
  6     for fn in [src, dst]:
  7         try:
  8             st = os.stat(fn)
  9         except OSError:
 10             # File most likely does not exist
 11             pass
 12         else:
 13             # XXX What about other special files? (sockets, devices...)
 14             if stat.S_ISFIFO(st.st_mode):
 15                 raise SpecialFileError("`%s` is a named pipe" % fn)
 16 
 17     with open(src, 'rb') as fsrc:
 18         with open(dst, 'wb') as fdst:
 19             copyfileobj(fsrc, fdst)

shutil.copymode(src, dst)
仅拷贝权限。内容、组、用户均不变

  1 def copystat(src, dst):
  2     """Copy all stat info (mode bits, atime, mtime, flags) from src to dst"""
  3     st = os.stat(src)
  4     mode = stat.S_IMODE(st.st_mode)
  5     if hasattr(os, 'utime'):
  6         os.utime(dst, (st.st_atime, st.st_mtime))
  7     if hasattr(os, 'chmod'):
  8         os.chmod(dst, mode)
  9     if hasattr(os, 'chflags') and hasattr(st, 'st_flags'):
 10         try:
 11             os.chflags(dst, st.st_flags)
 12         except OSError, why:
 13             for err in 'EOPNOTSUPP', 'ENOTSUP':
 14                 if hasattr(errno, err) and why.errno == getattr(errno, err):
 15                     break
 16             else:
 17                 raise

shutil.copy(src, dst)
拷贝文件和权限

  1 def copy(src, dst):
  2     """Copy data and mode bits ("cp src dst").
  3 
  4     The destination may be a directory.
  5 
  6     """
  7     if os.path.isdir(dst):
  8         dst = os.path.join(dst, os.path.basename(src))
  9     copyfile(src, dst)
 10     copymode(src, dst)

shutil.copy2(src, dst)
拷贝文件和状态信息

  1 def copy2(src, dst):
  2     """Copy data and all stat info ("cp -p src dst").
  3 
  4     The destination may be a directory.
  5 
  6     """
  7     if os.path.isdir(dst):
  8         dst = os.path.join(dst, os.path.basename(src))
  9     copyfile(src, dst)
 10     copystat(src, dst)

shutil.ignore_patterns(*patterns)
shutil.copytree(src, dst, symlinks=False, ignore=None)
递归的去拷贝文件

例如:copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*'))

  1 def ignore_patterns(*patterns):
  2     """Function that can be used as copytree() ignore parameter.
  3 
  4     Patterns is a sequence of glob-style patterns
  5     that are used to exclude files"""
  6     def _ignore_patterns(path, names):
  7         ignored_names = []
  8         for pattern in patterns:
  9             ignored_names.extend(fnmatch.filter(names, pattern))
 10         return set(ignored_names)
 11     return _ignore_patterns
 12 
 13 def copytree(src, dst, symlinks=False, ignore=None):
 14     """Recursively copy a directory tree using copy2().
 15 
 16     The destination directory must not already exist.
 17     If exception(s) occur, an Error is raised with a list of reasons.
 18 
 19     If the optional symlinks flag is true, symbolic links in the
 20     source tree result in symbolic links in the destination tree; if
 21     it is false, the contents of the files pointed to by symbolic
 22     links are copied.
 23 
 24     The optional ignore argument is a callable. If given, it
 25     is called with the `src` parameter, which is the directory
 26     being visited by copytree(), and `names` which is the list of
 27     `src` contents, as returned by os.listdir():
 28 
 29         callable(src, names) -> ignored_names
 30 
 31     Since copytree() is called recursively, the callable will be
 32     called once for each directory that is copied. It returns a
 33     list of names relative to the `src` directory that should
 34     not be copied.
 35 
 36     XXX Consider this example code rather than the ultimate tool.
 37 
 38     """
 39     names = os.listdir(src)
 40     if ignore is not None:
 41         ignored_names = ignore(src, names)
 42     else:
 43         ignored_names = set()
 44 
 45     os.makedirs(dst)
 46     errors = []
 47     for name in names:
 48         if name in ignored_names:
 49             continue
 50         srcname = os.path.join(src, name)
 51         dstname = os.path.join(dst, name)
 52         try:
 53             if symlinks and os.path.islink(srcname):
 54                 linkto = os.readlink(srcname)
 55                 os.symlink(linkto, dstname)
 56             elif os.path.isdir(srcname):
 57                 copytree(srcname, dstname, symlinks, ignore)
 58             else:
 59                 # Will raise a SpecialFileError for unsupported file types
 60                 copy2(srcname, dstname)
 61         # catch the Error from the recursive copytree so that we can
 62         # continue with other files
 63         except Error, err:
 64             errors.extend(err.args[0])
 65         except EnvironmentError, why:
 66             errors.append((srcname, dstname, str(why)))
 67     try:
 68         copystat(src, dst)
 69     except OSError, why:
 70         if WindowsError is not None and isinstance(why, WindowsError):
 71             # Copying file access times may fail on Windows
 72             pass
 73         else:
 74             errors.append((src, dst, str(why)))
 75     if errors:
 76         raise Error, errors
View Code

shutil.rmtree(path[, ignore_errors[, onerror]])
递归的去删除文件

  1 def rmtree(path, ignore_errors=False, οnerrοr=None):
  2     """Recursively delete a directory tree.
  3 
  4     If ignore_errors is set, errors are ignored; otherwise, if onerror
  5     is set, it is called to handle the error with arguments (func,
  6     path, exc_info) where func is os.listdir, os.remove, or os.rmdir;
  7     path is the argument to that function that caused it to fail; and
  8     exc_info is a tuple returned by sys.exc_info().  If ignore_errors
  9     is false and onerror is None, an exception is raised.
 10 
 11     """
 12     if ignore_errors:
 13         def onerror(*args):
 14             pass
 15     elif onerror is None:
 16         def onerror(*args):
 17             raise
 18     try:
 19         if os.path.islink(path):
 20             # symlinks to directories are forbidden, see bug #1669
 21             raise OSError("Cannot call rmtree on a symbolic link")
 22     except OSError:
 23         onerror(os.path.islink, path, sys.exc_info())
 24         # can't continue even if onerror hook returns
 25         return
 26     names = []
 27     try:
 28         names = os.listdir(path)
 29     except os.error, err:
 30         onerror(os.listdir, path, sys.exc_info())
 31     for name in names:
 32         fullname = os.path.join(path, name)
 33         try:
 34             mode = os.lstat(fullname).st_mode
 35         except os.error:
 36             mode = 0
 37         if stat.S_ISDIR(mode):
 38             rmtree(fullname, ignore_errors, onerror)
 39         else:
 40             try:
 41                 os.remove(fullname)
 42             except os.error, err:
 43                 onerror(os.remove, fullname, sys.exc_info())
 44     try:
 45         os.rmdir(path)
 46     except os.error:
 47         onerror(os.rmdir, path, sys.exc_info())
View Code

shutil.move(src, dst)
递归的去移动文件

  1 def move(src, dst):
  2     """Recursively move a file or directory to another location. This is
  3     similar to the Unix "mv" command.
  4 
  5     If the destination is a directory or a symlink to a directory, the source
  6     is moved inside the directory. The destination path must not already
  7     exist.
  8 
  9     If the destination already exists but is not a directory, it may be
 10     overwritten depending on os.rename() semantics.
 11 
 12     If the destination is on our current filesystem, then rename() is used.
 13     Otherwise, src is copied to the destination and then removed.
 14     A lot more could be done here...  A look at a mv.c shows a lot of
 15     the issues this implementation glosses over.
 16 
 17     """
 18     real_dst = dst
 19     if os.path.isdir(dst):
 20         if _samefile(src, dst):
 21             # We might be on a case insensitive filesystem,
 22             # perform the rename anyway.
 23             os.rename(src, dst)
 24             return
 25 
 26         real_dst = os.path.join(dst, _basename(src))
 27         if os.path.exists(real_dst):
 28             raise Error, "Destination path '%s' already exists" % real_dst
 29     try:
 30         os.rename(src, real_dst)
 31     except OSError:
 32         if os.path.isdir(src):
 33             if _destinsrc(src, dst):
 34                 raise Error, "Cannot move a directory '%s' into itself '%s'." % (src, dst)
 35             copytree(src, real_dst, symlinks=True)
 36             rmtree(src)
 37         else:
 38             copy2(src, real_dst)
 39             os.unlink(src)
View Code

shutil.make_archive(base_name, format,...)

创建压缩包并返回文件路径,例如:zip、tar

    • base_name: 压缩包的文件名,也可以是压缩包的路径。只是文件名时,则保存至当前目录,否则保存至指定路径,
      如:www                        =>保存至当前路径
      如:/Users/wupeiqi/www =>保存至/Users/wupeiqi/
    • format: 压缩包种类,“zip”, “tar”, “bztar”,“gztar”
    • root_dir: 要压缩的文件夹路径(默认当前目录)
    • owner: 用户,默认当前用户
    • group: 组,默认当前组
    • logger: 用于记录日志,通常是logging.Logger对象
  1 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置当前程序目录
  2 
  3 import shutil
  4 ret = shutil.make_archive("wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
  5 
  6 
  7 #将 /Users/wupeiqi/Downloads/test 下的文件打包放置 /Users/wupeiqi/目录
  8 import shutil
  9 ret = shutil.make_archive("/Users/wupeiqi/wwwwwwwwww", 'gztar', root_dir='/Users/wupeiqi/Downloads/test')
  1 def make_archive(base_name, format, root_dir=None, base_dir=None, verbose=0,
  2                  dry_run=0, owner=None, group=None, logger=None):
  3     """Create an archive file (eg. zip or tar).
  4 
  5     'base_name' is the name of the file to create, minus any format-specific
  6     extension; 'format' is the archive format: one of "zip", "tar", "bztar"
  7     or "gztar".
  8 
  9     'root_dir' is a directory that will be the root directory of the
 10     archive; ie. we typically chdir into 'root_dir' before creating the
 11     archive.  'base_dir' is the directory where we start archiving from;
 12     ie. 'base_dir' will be the common prefix of all files and
 13     directories in the archive.  'root_dir' and 'base_dir' both default
 14     to the current directory.  Returns the name of the archive file.
 15 
 16     'owner' and 'group' are used when creating a tar archive. By default,
 17     uses the current owner and group.
 18     """
 19     save_cwd = os.getcwd()
 20     if root_dir is not None:
 21         if logger is not None:
 22             logger.debug("changing into '%s'", root_dir)
 23         base_name = os.path.abspath(base_name)
 24         if not dry_run:
 25             os.chdir(root_dir)
 26 
 27     if base_dir is None:
 28         base_dir = os.curdir
 29 
 30     kwargs = {'dry_run': dry_run, 'logger': logger}
 31 
 32     try:
 33         format_info = _ARCHIVE_FORMATS[format]
 34     except KeyError:
 35         raise ValueError, "unknown archive format '%s'" % format
 36 
 37     func = format_info[0]
 38     for arg, val in format_info[1]:
 39         kwargs[arg] = val
 40 
 41     if format != 'zip':
 42         kwargs['owner'] = owner
 43         kwargs['group'] = group
 44 
 45     try:
 46         filename = func(base_name, base_dir, **kwargs)
 47     finally:
 48         if root_dir is not None:
 49             if logger is not None:
 50                 logger.debug("changing back to '%s'", save_cwd)
 51             os.chdir(save_cwd)
 52 
 53     return filename
View Code

shutil 对压缩包的处理是调用 ZipFile 和 TarFile 两个模块来进行的,详细:

  1 import zipfile
  2 
  3 # 压缩
  4 z = zipfile.ZipFile('laxi.zip', 'w')
  5 z.write('a.log')
  6 z.write('data.data')
  7 z.close()
  8 
  9 # 解压
 10 z = zipfile.ZipFile('laxi.zip', 'r')
 11 z.extractall()
 12 z.close()
zipfile压缩解压
  1 import tarfile
  2 
  3 # 压缩
  4 tar = tarfile.open('your.tar','w')
  5 tar.add('/Users/wupeiqi/PycharmProjects/bbs2.zip', arcname='bbs2.zip')
  6 tar.add('/Users/wupeiqi/PycharmProjects/cmdb.zip', arcname='cmdb.zip')
  7 tar.close()
  8 
  9 # 解压
 10 tar = tarfile.open('your.tar','r')
 11 tar.extractall()  # 可设置解压地址
 12 tar.close()
tar 压缩解压
  1 class ZipFile(object):
  2     """ Class with methods to open, read, write, close, list zip files.
  3 
  4     z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
  5 
  6     file: Either the path to the file, or a file-like object.
  7           If it is a path, the file will be opened and closed by ZipFile.
  8     mode: The mode can be either read "r", write "w" or append "a".
  9     compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
 10     allowZip64: if True ZipFile will create files with ZIP64 extensions when
 11                 needed, otherwise it will raise an exception when this would
 12                 be necessary.
 13 
 14     """
 15 
 16     fp = None                   # Set here since __del__ checks it
 17 
 18     def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
 19         """Open the ZIP file with mode read "r", write "w" or append "a"."""
 20         if mode not in ("r", "w", "a"):
 21             raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
 22 
 23         if compression == ZIP_STORED:
 24             pass
 25         elif compression == ZIP_DEFLATED:
 26             if not zlib:
 27                 raise RuntimeError,\
 28                       "Compression requires the (missing) zlib module"
 29         else:
 30             raise RuntimeError, "That compression method is not supported"
 31 
 32         self._allowZip64 = allowZip64
 33         self._didModify = False
 34         self.debug = 0  # Level of printing: 0 through 3
 35         self.NameToInfo = {}    # Find file info given name
 36         self.filelist = []      # List of ZipInfo instances for archive
 37         self.compression = compression  # Method of compression
 38         self.mode = key = mode.replace('b', '')[0]
 39         self.pwd = None
 40         self._comment = ''
 41 
 42         # Check if we were passed a file-like object
 43         if isinstance(file, basestring):
 44             self._filePassed = 0
 45             self.filename = file
 46             modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
 47             try:
 48                 self.fp = open(file, modeDict[mode])
 49             except IOError:
 50                 if mode == 'a':
 51                     mode = key = 'w'
 52                     self.fp = open(file, modeDict[mode])
 53                 else:
 54                     raise
 55         else:
 56             self._filePassed = 1
 57             self.fp = file
 58             self.filename = getattr(file, 'name', None)
 59 
 60         try:
 61             if key == 'r':
 62                 self._RealGetContents()
 63             elif key == 'w':
 64                 # set the modified flag so central directory gets written
 65                 # even if no files are added to the archive
 66                 self._didModify = True
 67             elif key == 'a':
 68                 try:
 69                     # See if file is a zip file
 70                     self._RealGetContents()
 71                     # seek to start of directory and overwrite
 72                     self.fp.seek(self.start_dir, 0)
 73                 except BadZipfile:
 74                     # file is not a zip file, just append
 75                     self.fp.seek(0, 2)
 76 
 77                     # set the modified flag so central directory gets written
 78                     # even if no files are added to the archive
 79                     self._didModify = True
 80             else:
 81                 raise RuntimeError('Mode must be "r", "w" or "a"')
 82         except:
 83             fp = self.fp
 84             self.fp = None
 85             if not self._filePassed:
 86                 fp.close()
 87             raise
 88 
 89     def __enter__(self):
 90         return self
 91 
 92     def __exit__(self, type, value, traceback):
 93         self.close()
 94 
 95     def _RealGetContents(self):
 96         """Read in the table of contents for the ZIP file."""
 97         fp = self.fp
 98         try:
 99             endrec = _EndRecData(fp)
100         except IOError:
101             raise BadZipfile("File is not a zip file")
102         if not endrec:
103             raise BadZipfile, "File is not a zip file"
104         if self.debug > 1:
105             print endrec
106         size_cd = endrec[_ECD_SIZE]             # bytes in central directory
107         offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
108         self._comment = endrec[_ECD_COMMENT]    # archive comment
109 
110         # "concat" is zero, unless zip was concatenated to another file
111         concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
112         if endrec[_ECD_SIGNATURE] == stringEndArchive64:
113             # If Zip64 extension structures are present, account for them
114             concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
115 
116         if self.debug > 2:
117             inferred = concat + offset_cd
118             print "given, inferred, offset", offset_cd, inferred, concat
119         # self.start_dir:  Position of start of central directory
120         self.start_dir = offset_cd + concat
121         fp.seek(self.start_dir, 0)
122         data = fp.read(size_cd)
123         fp = cStringIO.StringIO(data)
124         total = 0
125         while total < size_cd:
126             centdir = fp.read(sizeCentralDir)
127             if len(centdir) != sizeCentralDir:
128                 raise BadZipfile("Truncated central directory")
129             centdir = struct.unpack(structCentralDir, centdir)
130             if centdir[_CD_SIGNATURE] != stringCentralDir:
131                 raise BadZipfile("Bad magic number for central directory")
132             if self.debug > 2:
133                 print centdir
134             filename = fp.read(centdir[_CD_FILENAME_LENGTH])
135             # Create ZipInfo instance to store file information
136             x = ZipInfo(filename)
137             x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
138             x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
139             x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
140             (x.create_version, x.create_system, x.extract_version, x.reserved,
141                 x.flag_bits, x.compress_type, t, d,
142                 x.CRC, x.compress_size, x.file_size) = centdir[1:12]
143             x.volume, x.internal_attr, x.external_attr = centdir[15:18]
144             # Convert date/time code to (year, month, day, hour, min, sec)
145             x._raw_time = t
146             x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
147                                      t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
148 
149             x._decodeExtra()
150             x.header_offset = x.header_offset + concat
151             x.filename = x._decodeFilename()
152             self.filelist.append(x)
153             self.NameToInfo[x.filename] = x
154 
155             # update total bytes read from central directory
156             total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
157                      + centdir[_CD_EXTRA_FIELD_LENGTH]
158                      + centdir[_CD_COMMENT_LENGTH])
159 
160             if self.debug > 2:
161                 print "total", total
162 
163 
164     def namelist(self):
165         """Return a list of file names in the archive."""
166         l = []
167         for data in self.filelist:
168             l.append(data.filename)
169         return l
170 
171     def infolist(self):
172         """Return a list of class ZipInfo instances for files in the
173         archive."""
174         return self.filelist
175 
176     def printdir(self):
177         """Print a table of contents for the zip file."""
178         print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")
179         for zinfo in self.filelist:
180             date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
181             print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)
182 
183     def testzip(self):
184         """Read all the files and check the CRC."""
185         chunk_size = 2 ** 20
186         for zinfo in self.filelist:
187             try:
188                 # Read by chunks, to avoid an OverflowError or a
189                 # MemoryError with very large embedded files.
190                 with self.open(zinfo.filename, "r") as f:
191                     while f.read(chunk_size):     # Check CRC-32
192                         pass
193             except BadZipfile:
194                 return zinfo.filename
195 
196     def getinfo(self, name):
197         """Return the instance of ZipInfo given 'name'."""
198         info = self.NameToInfo.get(name)
199         if info is None:
200             raise KeyError(
201                 'There is no item named %r in the archive' % name)
202 
203         return info
204 
205     def setpassword(self, pwd):
206         """Set default password for encrypted files."""
207         self.pwd = pwd
208 
209     @property
210     def comment(self):
211         """The comment text associated with the ZIP file."""
212         return self._comment
213 
214     @comment.setter
215     def comment(self, comment):
216         # check for valid comment length
217         if len(comment) > ZIP_MAX_COMMENT:
218             import warnings
219             warnings.warn('Archive comment is too long; truncating to %d bytes'
220                           % ZIP_MAX_COMMENT, stacklevel=2)
221             comment = comment[:ZIP_MAX_COMMENT]
222         self._comment = comment
223         self._didModify = True
224 
225     def read(self, name, pwd=None):
226         """Return file bytes (as a string) for name."""
227         return self.open(name, "r", pwd).read()
228 
229     def open(self, name, mode="r", pwd=None):
230         """Return file-like object for 'name'."""
231         if mode not in ("r", "U", "rU"):
232             raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
233         if not self.fp:
234             raise RuntimeError, \
235                   "Attempt to read ZIP archive that was already closed"
236 
237         # Only open a new file for instances where we were not
238         # given a file object in the constructor
239         if self._filePassed:
240             zef_file = self.fp
241             should_close = False
242         else:
243             zef_file = open(self.filename, 'rb')
244             should_close = True
245 
246         try:
247             # Make sure we have an info object
248             if isinstance(name, ZipInfo):
249                 # 'name' is already an info object
250                 zinfo = name
251             else:
252                 # Get info object for name
253                 zinfo = self.getinfo(name)
254 
255             zef_file.seek(zinfo.header_offset, 0)
256 
257             # Skip the file header:
258             fheader = zef_file.read(sizeFileHeader)
259             if len(fheader) != sizeFileHeader:
260                 raise BadZipfile("Truncated file header")
261             fheader = struct.unpack(structFileHeader, fheader)
262             if fheader[_FH_SIGNATURE] != stringFileHeader:
263                 raise BadZipfile("Bad magic number for file header")
264 
265             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
266             if fheader[_FH_EXTRA_FIELD_LENGTH]:
267                 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
268 
269             if fname != zinfo.orig_filename:
270                 raise BadZipfile, \
271                         'File name in directory "%s" and header "%s" differ.' % (
272                             zinfo.orig_filename, fname)
273 
274             # check for encrypted flag & handle password
275             is_encrypted = zinfo.flag_bits & 0x1
276             zd = None
277             if is_encrypted:
278                 if not pwd:
279                     pwd = self.pwd
280                 if not pwd:
281                     raise RuntimeError, "File %s is encrypted, " \
282                         "password required for extraction" % name
283 
284                 zd = _ZipDecrypter(pwd)
285                 # The first 12 bytes in the cypher stream is an encryption header
286                 #  used to strengthen the algorithm. The first 11 bytes are
287                 #  completely random, while the 12th contains the MSB of the CRC,
288                 #  or the MSB of the file time depending on the header type
289                 #  and is used to check the correctness of the password.
290                 bytes = zef_file.read(12)
291                 h = map(zd, bytes[0:12])
292                 if zinfo.flag_bits & 0x8:
293                     # compare against the file type from extended local headers
294                     check_byte = (zinfo._raw_time >> 8) & 0xff
295                 else:
296                     # compare against the CRC otherwise
297                     check_byte = (zinfo.CRC >> 24) & 0xff
298                 if ord(h[11]) != check_byte:
299                     raise RuntimeError("Bad password for file", name)
300 
301             return ZipExtFile(zef_file, mode, zinfo, zd,
302                     close_fileobj=should_close)
303         except:
304             if should_close:
305                 zef_file.close()
306             raise
307 
308     def extract(self, member, path=None, pwd=None):
309         """Extract a member from the archive to the current working directory,
310            using its full name. Its file information is extracted as accurately
311            as possible. `member' may be a filename or a ZipInfo object. You can
312            specify a different directory using `path'.
313         """
314         if not isinstance(member, ZipInfo):
315             member = self.getinfo(member)
316 
317         if path is None:
318             path = os.getcwd()
319 
320         return self._extract_member(member, path, pwd)
321 
322     def extractall(self, path=None, members=None, pwd=None):
323         """Extract all members from the archive to the current working
324            directory. `path' specifies a different directory to extract to.
325            `members' is optional and must be a subset of the list returned
326            by namelist().
327         """
328         if members is None:
329             members = self.namelist()
330 
331         for zipinfo in members:
332             self.extract(zipinfo, path, pwd)
333 
334     def _extract_member(self, member, targetpath, pwd):
335         """Extract the ZipInfo object 'member' to a physical
336            file on the path targetpath.
337         """
338         # build the destination pathname, replacing
339         # forward slashes to platform specific separators.
340         arcname = member.filename.replace('/', os.path.sep)
341 
342         if os.path.altsep:
343             arcname = arcname.replace(os.path.altsep, os.path.sep)
344         # interpret absolute pathname as relative, remove drive letter or
345         # UNC path, redundant separators, "." and ".." components.
346         arcname = os.path.splitdrive(arcname)[1]
347         arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
348                     if x not in ('', os.path.curdir, os.path.pardir))
349         if os.path.sep == '\\':
350             # filter illegal characters on Windows
351             illegal = ':<>|"?*'
352             if isinstance(arcname, unicode):
353                 table = {ord(c): ord('_') for c in illegal}
354             else:
355                 table = string.maketrans(illegal, '_' * len(illegal))
356             arcname = arcname.translate(table)
357             # remove trailing dots
358             arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
359             arcname = os.path.sep.join(x for x in arcname if x)
360 
361         targetpath = os.path.join(targetpath, arcname)
362         targetpath = os.path.normpath(targetpath)
363 
364         # Create all upper directories if necessary.
365         upperdirs = os.path.dirname(targetpath)
366         if upperdirs and not os.path.exists(upperdirs):
367             os.makedirs(upperdirs)
368 
369         if member.filename[-1] == '/':
370             if not os.path.isdir(targetpath):
371                 os.mkdir(targetpath)
372             return targetpath
373 
374         with self.open(member, pwd=pwd) as source, \
375              file(targetpath, "wb") as target:
376             shutil.copyfileobj(source, target)
377 
378         return targetpath
379 
380     def _writecheck(self, zinfo):
381         """Check for errors before writing a file to the archive."""
382         if zinfo.filename in self.NameToInfo:
383             import warnings
384             warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
385         if self.mode not in ("w", "a"):
386             raise RuntimeError, 'write() requires mode "w" or "a"'
387         if not self.fp:
388             raise RuntimeError, \
389                   "Attempt to write ZIP archive that was already closed"
390         if zinfo.compress_type == ZIP_DEFLATED and not zlib:
391             raise RuntimeError, \
392                   "Compression requires the (missing) zlib module"
393         if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
394             raise RuntimeError, \
395                   "That compression method is not supported"
396         if not self._allowZip64:
397             requires_zip64 = None
398             if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
399                 requires_zip64 = "Files count"
400             elif zinfo.file_size > ZIP64_LIMIT:
401                 requires_zip64 = "Filesize"
402             elif zinfo.header_offset > ZIP64_LIMIT:
403                 requires_zip64 = "Zipfile size"
404             if requires_zip64:
405                 raise LargeZipFile(requires_zip64 +
406                                    " would require ZIP64 extensions")
407 
408     def write(self, filename, arcname=None, compress_type=None):
409         """Put the bytes from filename into the archive under the name
410         arcname."""
411         if not self.fp:
412             raise RuntimeError(
413                   "Attempt to write to ZIP archive that was already closed")
414 
415         st = os.stat(filename)
416         isdir = stat.S_ISDIR(st.st_mode)
417         mtime = time.localtime(st.st_mtime)
418         date_time = mtime[0:6]
419         # Create ZipInfo instance to store file information
420         if arcname is None:
421             arcname = filename
422         arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
423         while arcname[0] in (os.sep, os.altsep):
424             arcname = arcname[1:]
425         if isdir:
426             arcname += '/'
427         zinfo = ZipInfo(arcname, date_time)
428         zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes
429         if compress_type is None:
430             zinfo.compress_type = self.compression
431         else:
432             zinfo.compress_type = compress_type
433 
434         zinfo.file_size = st.st_size
435         zinfo.flag_bits = 0x00
436         zinfo.header_offset = self.fp.tell()    # Start of header bytes
437 
438         self._writecheck(zinfo)
439         self._didModify = True
440 
441         if isdir:
442             zinfo.file_size = 0
443             zinfo.compress_size = 0
444             zinfo.CRC = 0
445             zinfo.external_attr |= 0x10  # MS-DOS directory flag
446             self.filelist.append(zinfo)
447             self.NameToInfo[zinfo.filename] = zinfo
448             self.fp.write(zinfo.FileHeader(False))
449             return
450 
451         with open(filename, "rb") as fp:
452             # Must overwrite CRC and sizes with correct data later
453             zinfo.CRC = CRC = 0
454             zinfo.compress_size = compress_size = 0
455             # Compressed size can be larger than uncompressed size
456             zip64 = self._allowZip64 and \
457                     zinfo.file_size * 1.05 > ZIP64_LIMIT
458             self.fp.write(zinfo.FileHeader(zip64))
459             if zinfo.compress_type == ZIP_DEFLATED:
460                 cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
461                      zlib.DEFLATED, -15)
462             else:
463                 cmpr = None
464             file_size = 0
465             while 1:
466                 buf = fp.read(1024 * 8)
467                 if not buf:
468                     break
469                 file_size = file_size + len(buf)
470                 CRC = crc32(buf, CRC) & 0xffffffff
471                 if cmpr:
472                     buf = cmpr.compress(buf)
473                     compress_size = compress_size + len(buf)
474                 self.fp.write(buf)
475         if cmpr:
476             buf = cmpr.flush()
477             compress_size = compress_size + len(buf)
478             self.fp.write(buf)
479             zinfo.compress_size = compress_size
480         else:
481             zinfo.compress_size = file_size
482         zinfo.CRC = CRC
483         zinfo.file_size = file_size
484         if not zip64 and self._allowZip64:
485             if file_size > ZIP64_LIMIT:
486                 raise RuntimeError('File size has increased during compressing')
487             if compress_size > ZIP64_LIMIT:
488                 raise RuntimeError('Compressed size larger than uncompressed size')
489         # Seek backwards and write file header (which will now include
490         # correct CRC and file sizes)
491         position = self.fp.tell()       # Preserve current position in file
492         self.fp.seek(zinfo.header_offset, 0)
493         self.fp.write(zinfo.FileHeader(zip64))
494         self.fp.seek(position, 0)
495         self.filelist.append(zinfo)
496         self.NameToInfo[zinfo.filename] = zinfo
497 
498     def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
499         """Write a file into the archive.  The contents is the string
500         'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance or
501         the name of the file in the archive."""
502         if not isinstance(zinfo_or_arcname, ZipInfo):
503             zinfo = ZipInfo(filename=zinfo_or_arcname,
504                             date_time=time.localtime(time.time())[:6])
505 
506             zinfo.compress_type = self.compression
507             if zinfo.filename[-1] == '/':
508                 zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
509                 zinfo.external_attr |= 0x10           # MS-DOS directory flag
510             else:
511                 zinfo.external_attr = 0o600 << 16     # ?rw-------
512         else:
513             zinfo = zinfo_or_arcname
514 
515         if not self.fp:
516             raise RuntimeError(
517                   "Attempt to write to ZIP archive that was already closed")
518 
519         if compress_type is not None:
520             zinfo.compress_type = compress_type
521 
522         zinfo.file_size = len(bytes)            # Uncompressed size
523         zinfo.header_offset = self.fp.tell()    # Start of header bytes
524         self._writecheck(zinfo)
525         self._didModify = True
526         zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
527         if zinfo.compress_type == ZIP_DEFLATED:
528             co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
529                  zlib.DEFLATED, -15)
530             bytes = co.compress(bytes) + co.flush()
531             zinfo.compress_size = len(bytes)    # Compressed size
532         else:
533             zinfo.compress_size = zinfo.file_size
534         zip64 = zinfo.file_size > ZIP64_LIMIT or \
535                 zinfo.compress_size > ZIP64_LIMIT
536         if zip64 and not self._allowZip64:
537             raise LargeZipFile("Filesize would require ZIP64 extensions")
538         self.fp.write(zinfo.FileHeader(zip64))
539         self.fp.write(bytes)
540         if zinfo.flag_bits & 0x08:
541             # Write CRC and file sizes after the file data
542             fmt = 'if zip64 else '543             self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
544                   zinfo.file_size))
545         self.fp.flush()
546         self.filelist.append(zinfo)
547         self.NameToInfo[zinfo.filename] = zinfo
548 
549     def __del__(self):
550         """Call the "close()" method in case the user forgot."""
551         self.close()
552 
553     def close(self):
554         """Close the file, and for mode "w" and "a" write the ending
555         records."""
556         if self.fp is None:
557             return
558 
559         try:
560             if self.mode in ("w", "a") and self._didModify: # write ending records
561                 pos1 = self.fp.tell()
562                 for zinfo in self.filelist:         # write central directory
563                     dt = zinfo.date_time
564                     dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
565                     dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
566                     extra = []
567                     if zinfo.file_size > ZIP64_LIMIT \
568                             or zinfo.compress_size > ZIP64_LIMIT:
569                         extra.append(zinfo.file_size)
570                         extra.append(zinfo.compress_size)
571                         file_size = 0xffffffff
572                         compress_size = 0xffffffff
573                     else:
574                         file_size = zinfo.file_size
575                         compress_size = zinfo.compress_size
576 
577                     if zinfo.header_offset > ZIP64_LIMIT:
578                         extra.append(zinfo.header_offset)
579                         header_offset = 0xffffffffL
580                     else:
581                         header_offset = zinfo.header_offset
582 
583                     extra_data = zinfo.extra
584                     if extra:
585                         # Append a ZIP64 field to the extra's
586                         extra_data = struct.pack(
587                                 '588                                 1, 8*len(extra), *extra) + extra_data
589 
590                         extract_version = max(45, zinfo.extract_version)
591                         create_version = max(45, zinfo.create_version)
592                     else:
593                         extract_version = zinfo.extract_version
594                         create_version = zinfo.create_version
595 
596                     try:
597                         filename, flag_bits = zinfo._encodeFilenameFlags()
598                         centdir = struct.pack(structCentralDir,
599                         stringCentralDir, create_version,
600                         zinfo.create_system, extract_version, zinfo.reserved,
601                         flag_bits, zinfo.compress_type, dostime, dosdate,
602                         zinfo.CRC, compress_size, file_size,
603                         len(filename), len(extra_data), len(zinfo.comment),
604                         0, zinfo.internal_attr, zinfo.external_attr,
605                         header_offset)
606                     except DeprecationWarning:
607                         print >>sys.stderr, (structCentralDir,
608                         stringCentralDir, create_version,
609                         zinfo.create_system, extract_version, zinfo.reserved,
610                         zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
611                         zinfo.CRC, compress_size, file_size,
612                         len(zinfo.filename), len(extra_data), len(zinfo.comment),
613                         0, zinfo.internal_attr, zinfo.external_attr,
614                         header_offset)
615                         raise
616                     self.fp.write(centdir)
617                     self.fp.write(filename)
618                     self.fp.write(extra_data)
619                     self.fp.write(zinfo.comment)
620 
621                 pos2 = self.fp.tell()
622                 # Write end-of-zip-archive record
623                 centDirCount = len(self.filelist)
624                 centDirSize = pos2 - pos1
625                 centDirOffset = pos1
626                 requires_zip64 = None
627                 if centDirCount > ZIP_FILECOUNT_LIMIT:
628                     requires_zip64 = "Files count"
629                 elif centDirOffset > ZIP64_LIMIT:
630                     requires_zip64 = "Central directory offset"
631                 elif centDirSize > ZIP64_LIMIT:
632                     requires_zip64 = "Central directory size"
633                 if requires_zip64:
634                     # Need to write the ZIP64 end-of-archive records
635                     if not self._allowZip64:
636                         raise LargeZipFile(requires_zip64 +
637                                            " would require ZIP64 extensions")
638                     zip64endrec = struct.pack(
639                             structEndArchive64, stringEndArchive64,
640                             44, 45, 45, 0, 0, centDirCount, centDirCount,
641                             centDirSize, centDirOffset)
642                     self.fp.write(zip64endrec)
643 
644                     zip64locrec = struct.pack(
645                             structEndArchive64Locator,
646                             stringEndArchive64Locator, 0, pos2, 1)
647                     self.fp.write(zip64locrec)
648                     centDirCount = min(centDirCount, 0xFFFF)
649                     centDirSize = min(centDirSize, 0xFFFFFFFF)
650                     centDirOffset = min(centDirOffset, 0xFFFFFFFF)
651 
652                 endrec = struct.pack(structEndArchive, stringEndArchive,
653                                     0, 0, centDirCount, centDirCount,
654                                     centDirSize, centDirOffset, len(self._comment))
655                 self.fp.write(endrec)
656                 self.fp.write(self._comment)
657                 self.fp.flush()
658         finally:
659             fp = self.fp
660             self.fp = None
661             if not self._filePassed:
662                 fp.close()
ZipFile
  1 class ZipFile(object):
  2     """ Class with methods to open, read, write, close, list zip files.
  3 
  4     z = ZipFile(file, mode="r", compression=ZIP_STORED, allowZip64=False)
  5 
  6     file: Either the path to the file, or a file-like object.
  7           If it is a path, the file will be opened and closed by ZipFile.
  8     mode: The mode can be either read "r", write "w" or append "a".
  9     compression: ZIP_STORED (no compression) or ZIP_DEFLATED (requires zlib).
 10     allowZip64: if True ZipFile will create files with ZIP64 extensions when
 11                 needed, otherwise it will raise an exception when this would
 12                 be necessary.
 13 
 14     """
 15 
 16     fp = None                   # Set here since __del__ checks it
 17 
 18     def __init__(self, file, mode="r", compression=ZIP_STORED, allowZip64=False):
 19         """Open the ZIP file with mode read "r", write "w" or append "a"."""
 20         if mode not in ("r", "w", "a"):
 21             raise RuntimeError('ZipFile() requires mode "r", "w", or "a"')
 22 
 23         if compression == ZIP_STORED:
 24             pass
 25         elif compression == ZIP_DEFLATED:
 26             if not zlib:
 27                 raise RuntimeError,\
 28                       "Compression requires the (missing) zlib module"
 29         else:
 30             raise RuntimeError, "That compression method is not supported"
 31 
 32         self._allowZip64 = allowZip64
 33         self._didModify = False
 34         self.debug = 0  # Level of printing: 0 through 3
 35         self.NameToInfo = {}    # Find file info given name
 36         self.filelist = []      # List of ZipInfo instances for archive
 37         self.compression = compression  # Method of compression
 38         self.mode = key = mode.replace('b', '')[0]
 39         self.pwd = None
 40         self._comment = ''
 41 
 42         # Check if we were passed a file-like object
 43         if isinstance(file, basestring):
 44             self._filePassed = 0
 45             self.filename = file
 46             modeDict = {'r' : 'rb', 'w': 'wb', 'a' : 'r+b'}
 47             try:
 48                 self.fp = open(file, modeDict[mode])
 49             except IOError:
 50                 if mode == 'a':
 51                     mode = key = 'w'
 52                     self.fp = open(file, modeDict[mode])
 53                 else:
 54                     raise
 55         else:
 56             self._filePassed = 1
 57             self.fp = file
 58             self.filename = getattr(file, 'name', None)
 59 
 60         try:
 61             if key == 'r':
 62                 self._RealGetContents()
 63             elif key == 'w':
 64                 # set the modified flag so central directory gets written
 65                 # even if no files are added to the archive
 66                 self._didModify = True
 67             elif key == 'a':
 68                 try:
 69                     # See if file is a zip file
 70                     self._RealGetContents()
 71                     # seek to start of directory and overwrite
 72                     self.fp.seek(self.start_dir, 0)
 73                 except BadZipfile:
 74                     # file is not a zip file, just append
 75                     self.fp.seek(0, 2)
 76 
 77                     # set the modified flag so central directory gets written
 78                     # even if no files are added to the archive
 79                     self._didModify = True
 80             else:
 81                 raise RuntimeError('Mode must be "r", "w" or "a"')
 82         except:
 83             fp = self.fp
 84             self.fp = None
 85             if not self._filePassed:
 86                 fp.close()
 87             raise
 88 
 89     def __enter__(self):
 90         return self
 91 
 92     def __exit__(self, type, value, traceback):
 93         self.close()
 94 
 95     def _RealGetContents(self):
 96         """Read in the table of contents for the ZIP file."""
 97         fp = self.fp
 98         try:
 99             endrec = _EndRecData(fp)
100         except IOError:
101             raise BadZipfile("File is not a zip file")
102         if not endrec:
103             raise BadZipfile, "File is not a zip file"
104         if self.debug > 1:
105             print endrec
106         size_cd = endrec[_ECD_SIZE]             # bytes in central directory
107         offset_cd = endrec[_ECD_OFFSET]         # offset of central directory
108         self._comment = endrec[_ECD_COMMENT]    # archive comment
109 
110         # "concat" is zero, unless zip was concatenated to another file
111         concat = endrec[_ECD_LOCATION] - size_cd - offset_cd
112         if endrec[_ECD_SIGNATURE] == stringEndArchive64:
113             # If Zip64 extension structures are present, account for them
114             concat -= (sizeEndCentDir64 + sizeEndCentDir64Locator)
115 
116         if self.debug > 2:
117             inferred = concat + offset_cd
118             print "given, inferred, offset", offset_cd, inferred, concat
119         # self.start_dir:  Position of start of central directory
120         self.start_dir = offset_cd + concat
121         fp.seek(self.start_dir, 0)
122         data = fp.read(size_cd)
123         fp = cStringIO.StringIO(data)
124         total = 0
125         while total < size_cd:
126             centdir = fp.read(sizeCentralDir)
127             if len(centdir) != sizeCentralDir:
128                 raise BadZipfile("Truncated central directory")
129             centdir = struct.unpack(structCentralDir, centdir)
130             if centdir[_CD_SIGNATURE] != stringCentralDir:
131                 raise BadZipfile("Bad magic number for central directory")
132             if self.debug > 2:
133                 print centdir
134             filename = fp.read(centdir[_CD_FILENAME_LENGTH])
135             # Create ZipInfo instance to store file information
136             x = ZipInfo(filename)
137             x.extra = fp.read(centdir[_CD_EXTRA_FIELD_LENGTH])
138             x.comment = fp.read(centdir[_CD_COMMENT_LENGTH])
139             x.header_offset = centdir[_CD_LOCAL_HEADER_OFFSET]
140             (x.create_version, x.create_system, x.extract_version, x.reserved,
141                 x.flag_bits, x.compress_type, t, d,
142                 x.CRC, x.compress_size, x.file_size) = centdir[1:12]
143             x.volume, x.internal_attr, x.external_attr = centdir[15:18]
144             # Convert date/time code to (year, month, day, hour, min, sec)
145             x._raw_time = t
146             x.date_time = ( (d>>9)+1980, (d>>5)&0xF, d&0x1F,
147                                      t>>11, (t>>5)&0x3F, (t&0x1F) * 2 )
148 
149             x._decodeExtra()
150             x.header_offset = x.header_offset + concat
151             x.filename = x._decodeFilename()
152             self.filelist.append(x)
153             self.NameToInfo[x.filename] = x
154 
155             # update total bytes read from central directory
156             total = (total + sizeCentralDir + centdir[_CD_FILENAME_LENGTH]
157                      + centdir[_CD_EXTRA_FIELD_LENGTH]
158                      + centdir[_CD_COMMENT_LENGTH])
159 
160             if self.debug > 2:
161                 print "total", total
162 
163 
164     def namelist(self):
165         """Return a list of file names in the archive."""
166         l = []
167         for data in self.filelist:
168             l.append(data.filename)
169         return l
170 
171     def infolist(self):
172         """Return a list of class ZipInfo instances for files in the
173         archive."""
174         return self.filelist
175 
176     def printdir(self):
177         """Print a table of contents for the zip file."""
178         print "%-46s %19s %12s" % ("File Name", "Modified    ", "Size")
179         for zinfo in self.filelist:
180             date = "%d-%02d-%02d %02d:%02d:%02d" % zinfo.date_time[:6]
181             print "%-46s %s %12d" % (zinfo.filename, date, zinfo.file_size)
182 
183     def testzip(self):
184         """Read all the files and check the CRC."""
185         chunk_size = 2 ** 20
186         for zinfo in self.filelist:
187             try:
188                 # Read by chunks, to avoid an OverflowError or a
189                 # MemoryError with very large embedded files.
190                 with self.open(zinfo.filename, "r") as f:
191                     while f.read(chunk_size):     # Check CRC-32
192                         pass
193             except BadZipfile:
194                 return zinfo.filename
195 
196     def getinfo(self, name):
197         """Return the instance of ZipInfo given 'name'."""
198         info = self.NameToInfo.get(name)
199         if info is None:
200             raise KeyError(
201                 'There is no item named %r in the archive' % name)
202 
203         return info
204 
205     def setpassword(self, pwd):
206         """Set default password for encrypted files."""
207         self.pwd = pwd
208 
209     @property
210     def comment(self):
211         """The comment text associated with the ZIP file."""
212         return self._comment
213 
214     @comment.setter
215     def comment(self, comment):
216         # check for valid comment length
217         if len(comment) > ZIP_MAX_COMMENT:
218             import warnings
219             warnings.warn('Archive comment is too long; truncating to %d bytes'
220                           % ZIP_MAX_COMMENT, stacklevel=2)
221             comment = comment[:ZIP_MAX_COMMENT]
222         self._comment = comment
223         self._didModify = True
224 
225     def read(self, name, pwd=None):
226         """Return file bytes (as a string) for name."""
227         return self.open(name, "r", pwd).read()
228 
229     def open(self, name, mode="r", pwd=None):
230         """Return file-like object for 'name'."""
231         if mode not in ("r", "U", "rU"):
232             raise RuntimeError, 'open() requires mode "r", "U", or "rU"'
233         if not self.fp:
234             raise RuntimeError, \
235                   "Attempt to read ZIP archive that was already closed"
236 
237         # Only open a new file for instances where we were not
238         # given a file object in the constructor
239         if self._filePassed:
240             zef_file = self.fp
241             should_close = False
242         else:
243             zef_file = open(self.filename, 'rb')
244             should_close = True
245 
246         try:
247             # Make sure we have an info object
248             if isinstance(name, ZipInfo):
249                 # 'name' is already an info object
250                 zinfo = name
251             else:
252                 # Get info object for name
253                 zinfo = self.getinfo(name)
254 
255             zef_file.seek(zinfo.header_offset, 0)
256 
257             # Skip the file header:
258             fheader = zef_file.read(sizeFileHeader)
259             if len(fheader) != sizeFileHeader:
260                 raise BadZipfile("Truncated file header")
261             fheader = struct.unpack(structFileHeader, fheader)
262             if fheader[_FH_SIGNATURE] != stringFileHeader:
263                 raise BadZipfile("Bad magic number for file header")
264 
265             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])
266             if fheader[_FH_EXTRA_FIELD_LENGTH]:
267                 zef_file.read(fheader[_FH_EXTRA_FIELD_LENGTH])
268 
269             if fname != zinfo.orig_filename:
270                 raise BadZipfile, \
271                         'File name in directory "%s" and header "%s" differ.' % (
272                             zinfo.orig_filename, fname)
273 
274             # check for encrypted flag & handle password
275             is_encrypted = zinfo.flag_bits & 0x1
276             zd = None
277             if is_encrypted:
278                 if not pwd:
279                     pwd = self.pwd
280                 if not pwd:
281                     raise RuntimeError, "File %s is encrypted, " \
282                         "password required for extraction" % name
283 
284                 zd = _ZipDecrypter(pwd)
285                 # The first 12 bytes in the cypher stream is an encryption header
286                 #  used to strengthen the algorithm. The first 11 bytes are
287                 #  completely random, while the 12th contains the MSB of the CRC,
288                 #  or the MSB of the file time depending on the header type
289                 #  and is used to check the correctness of the password.
290                 bytes = zef_file.read(12)
291                 h = map(zd, bytes[0:12])
292                 if zinfo.flag_bits & 0x8:
293                     # compare against the file type from extended local headers
294                     check_byte = (zinfo._raw_time >> 8) & 0xff
295                 else:
296                     # compare against the CRC otherwise
297                     check_byte = (zinfo.CRC >> 24) & 0xff
298                 if ord(h[11]) != check_byte:
299                     raise RuntimeError("Bad password for file", name)
300 
301             return ZipExtFile(zef_file, mode, zinfo, zd,
302                     close_fileobj=should_close)
303         except:
304             if should_close:
305                 zef_file.close()
306             raise
307 
308     def extract(self, member, path=None, pwd=None):
309         """Extract a member from the archive to the current working directory,
310            using its full name. Its file information is extracted as accurately
311            as possible. `member' may be a filename or a ZipInfo object. You can
312            specify a different directory using `path'.
313         """
314         if not isinstance(member, ZipInfo):
315             member = self.getinfo(member)
316 
317         if path is None:
318             path = os.getcwd()
319 
320         return self._extract_member(member, path, pwd)
321 
322     def extractall(self, path=None, members=None, pwd=None):
323         """Extract all members from the archive to the current working
324            directory. `path' specifies a different directory to extract to.
325            `members' is optional and must be a subset of the list returned
326            by namelist().
327         """
328         if members is None:
329             members = self.namelist()
330 
331         for zipinfo in members:
332             self.extract(zipinfo, path, pwd)
333 
334     def _extract_member(self, member, targetpath, pwd):
335         """Extract the ZipInfo object 'member' to a physical
336            file on the path targetpath.
337         """
338         # build the destination pathname, replacing
339         # forward slashes to platform specific separators.
340         arcname = member.filename.replace('/', os.path.sep)
341 
342         if os.path.altsep:
343             arcname = arcname.replace(os.path.altsep, os.path.sep)
344         # interpret absolute pathname as relative, remove drive letter or
345         # UNC path, redundant separators, "." and ".." components.
346         arcname = os.path.splitdrive(arcname)[1]
347         arcname = os.path.sep.join(x for x in arcname.split(os.path.sep)
348                     if x not in ('', os.path.curdir, os.path.pardir))
349         if os.path.sep == '\\':
350             # filter illegal characters on Windows
351             illegal = ':<>|"?*'
352             if isinstance(arcname, unicode):
353                 table = {ord(c): ord('_') for c in illegal}
354             else:
355                 table = string.maketrans(illegal, '_' * len(illegal))
356             arcname = arcname.translate(table)
357             # remove trailing dots
358             arcname = (x.rstrip('.') for x in arcname.split(os.path.sep))
359             arcname = os.path.sep.join(x for x in arcname if x)
360 
361         targetpath = os.path.join(targetpath, arcname)
362         targetpath = os.path.normpath(targetpath)
363 
364         # Create all upper directories if necessary.
365         upperdirs = os.path.dirname(targetpath)
366         if upperdirs and not os.path.exists(upperdirs):
367             os.makedirs(upperdirs)
368 
369         if member.filename[-1] == '/':
370             if not os.path.isdir(targetpath):
371                 os.mkdir(targetpath)
372             return targetpath
373 
374         with self.open(member, pwd=pwd) as source, \
375              file(targetpath, "wb") as target:
376             shutil.copyfileobj(source, target)
377 
378         return targetpath
379 
380     def _writecheck(self, zinfo):
381         """Check for errors before writing a file to the archive."""
382         if zinfo.filename in self.NameToInfo:
383             import warnings
384             warnings.warn('Duplicate name: %r' % zinfo.filename, stacklevel=3)
385         if self.mode not in ("w", "a"):
386             raise RuntimeError, 'write() requires mode "w" or "a"'
387         if not self.fp:
388             raise RuntimeError, \
389                   "Attempt to write ZIP archive that was already closed"
390         if zinfo.compress_type == ZIP_DEFLATED and not zlib:
391             raise RuntimeError, \
392                   "Compression requires the (missing) zlib module"
393         if zinfo.compress_type not in (ZIP_STORED, ZIP_DEFLATED):
394             raise RuntimeError, \
395                   "That compression method is not supported"
396         if not self._allowZip64:
397             requires_zip64 = None
398             if len(self.filelist) >= ZIP_FILECOUNT_LIMIT:
399                 requires_zip64 = "Files count"
400             elif zinfo.file_size > ZIP64_LIMIT:
401                 requires_zip64 = "Filesize"
402             elif zinfo.header_offset > ZIP64_LIMIT:
403                 requires_zip64 = "Zipfile size"
404             if requires_zip64:
405                 raise LargeZipFile(requires_zip64 +
406                                    " would require ZIP64 extensions")
407 
408     def write(self, filename, arcname=None, compress_type=None):
409         """Put the bytes from filename into the archive under the name
410         arcname."""
411         if not self.fp:
412             raise RuntimeError(
413                   "Attempt to write to ZIP archive that was already closed")
414 
415         st = os.stat(filename)
416         isdir = stat.S_ISDIR(st.st_mode)
417         mtime = time.localtime(st.st_mtime)
418         date_time = mtime[0:6]
419         # Create ZipInfo instance to store file information
420         if arcname is None:
421             arcname = filename
422         arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
423         while arcname[0] in (os.sep, os.altsep):
424             arcname = arcname[1:]
425         if isdir:
426             arcname += '/'
427         zinfo = ZipInfo(arcname, date_time)
428         zinfo.external_attr = (st[0] & 0xFFFF) << 16L      # Unix attributes
429         if compress_type is None:
430             zinfo.compress_type = self.compression
431         else:
432             zinfo.compress_type = compress_type
433 
434         zinfo.file_size = st.st_size
435         zinfo.flag_bits = 0x00
436         zinfo.header_offset = self.fp.tell()    # Start of header bytes
437 
438         self._writecheck(zinfo)
439         self._didModify = True
440 
441         if isdir:
442             zinfo.file_size = 0
443             zinfo.compress_size = 0
444             zinfo.CRC = 0
445             zinfo.external_attr |= 0x10  # MS-DOS directory flag
446             self.filelist.append(zinfo)
447             self.NameToInfo[zinfo.filename] = zinfo
448             self.fp.write(zinfo.FileHeader(False))
449             return
450 
451         with open(filename, "rb") as fp:
452             # Must overwrite CRC and sizes with correct data later
453             zinfo.CRC = CRC = 0
454             zinfo.compress_size = compress_size = 0
455             # Compressed size can be larger than uncompressed size
456             zip64 = self._allowZip64 and \
457                     zinfo.file_size * 1.05 > ZIP64_LIMIT
458             self.fp.write(zinfo.FileHeader(zip64))
459             if zinfo.compress_type == ZIP_DEFLATED:
460                 cmpr = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
461                      zlib.DEFLATED, -15)
462             else:
463                 cmpr = None
464             file_size = 0
465             while 1:
466                 buf = fp.read(1024 * 8)
467                 if not buf:
468                     break
469                 file_size = file_size + len(buf)
470                 CRC = crc32(buf, CRC) & 0xffffffff
471                 if cmpr:
472                     buf = cmpr.compress(buf)
473                     compress_size = compress_size + len(buf)
474                 self.fp.write(buf)
475         if cmpr:
476             buf = cmpr.flush()
477             compress_size = compress_size + len(buf)
478             self.fp.write(buf)
479             zinfo.compress_size = compress_size
480         else:
481             zinfo.compress_size = file_size
482         zinfo.CRC = CRC
483         zinfo.file_size = file_size
484         if not zip64 and self._allowZip64:
485             if file_size > ZIP64_LIMIT:
486                 raise RuntimeError('File size has increased during compressing')
487             if compress_size > ZIP64_LIMIT:
488                 raise RuntimeError('Compressed size larger than uncompressed size')
489         # Seek backwards and write file header (which will now include
490         # correct CRC and file sizes)
491         position = self.fp.tell()       # Preserve current position in file
492         self.fp.seek(zinfo.header_offset, 0)
493         self.fp.write(zinfo.FileHeader(zip64))
494         self.fp.seek(position, 0)
495         self.filelist.append(zinfo)
496         self.NameToInfo[zinfo.filename] = zinfo
497 
498     def writestr(self, zinfo_or_arcname, bytes, compress_type=None):
499         """Write a file into the archive.  The contents is the string
500         'bytes'.  'zinfo_or_arcname' is either a ZipInfo instance or
501         the name of the file in the archive."""
502         if not isinstance(zinfo_or_arcname, ZipInfo):
503             zinfo = ZipInfo(filename=zinfo_or_arcname,
504                             date_time=time.localtime(time.time())[:6])
505 
506             zinfo.compress_type = self.compression
507             if zinfo.filename[-1] == '/':
508                 zinfo.external_attr = 0o40775 << 16   # drwxrwxr-x
509                 zinfo.external_attr |= 0x10           # MS-DOS directory flag
510             else:
511                 zinfo.external_attr = 0o600 << 16     # ?rw-------
512         else:
513             zinfo = zinfo_or_arcname
514 
515         if not self.fp:
516             raise RuntimeError(
517                   "Attempt to write to ZIP archive that was already closed")
518 
519         if compress_type is not None:
520             zinfo.compress_type = compress_type
521 
522         zinfo.file_size = len(bytes)            # Uncompressed size
523         zinfo.header_offset = self.fp.tell()    # Start of header bytes
524         self._writecheck(zinfo)
525         self._didModify = True
526         zinfo.CRC = crc32(bytes) & 0xffffffff       # CRC-32 checksum
527         if zinfo.compress_type == ZIP_DEFLATED:
528             co = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION,
529                  zlib.DEFLATED, -15)
530             bytes = co.compress(bytes) + co.flush()
531             zinfo.compress_size = len(bytes)    # Compressed size
532         else:
533             zinfo.compress_size = zinfo.file_size
534         zip64 = zinfo.file_size > ZIP64_LIMIT or \
535                 zinfo.compress_size > ZIP64_LIMIT
536         if zip64 and not self._allowZip64:
537             raise LargeZipFile("Filesize would require ZIP64 extensions")
538         self.fp.write(zinfo.FileHeader(zip64))
539         self.fp.write(bytes)
540         if zinfo.flag_bits & 0x08:
541             # Write CRC and file sizes after the file data
542             fmt = 'if zip64 else '543             self.fp.write(struct.pack(fmt, zinfo.CRC, zinfo.compress_size,
544                   zinfo.file_size))
545         self.fp.flush()
546         self.filelist.append(zinfo)
547         self.NameToInfo[zinfo.filename] = zinfo
548 
549     def __del__(self):
550         """Call the "close()" method in case the user forgot."""
551         self.close()
552 
553     def close(self):
554         """Close the file, and for mode "w" and "a" write the ending
555         records."""
556         if self.fp is None:
557             return
558 
559         try:
560             if self.mode in ("w", "a") and self._didModify: # write ending records
561                 pos1 = self.fp.tell()
562                 for zinfo in self.filelist:         # write central directory
563                     dt = zinfo.date_time
564                     dosdate = (dt[0] - 1980) << 9 | dt[1] << 5 | dt[2]
565                     dostime = dt[3] << 11 | dt[4] << 5 | (dt[5] // 2)
566                     extra = []
567                     if zinfo.file_size > ZIP64_LIMIT \
568                             or zinfo.compress_size > ZIP64_LIMIT:
569                         extra.append(zinfo.file_size)
570                         extra.append(zinfo.compress_size)
571                         file_size = 0xffffffff
572                         compress_size = 0xffffffff
573                     else:
574                         file_size = zinfo.file_size
575                         compress_size = zinfo.compress_size
576 
577                     if zinfo.header_offset > ZIP64_LIMIT:
578                         extra.append(zinfo.header_offset)
579                         header_offset = 0xffffffffL
580                     else:
581                         header_offset = zinfo.header_offset
582 
583                     extra_data = zinfo.extra
584                     if extra:
585                         # Append a ZIP64 field to the extra's
586                         extra_data = struct.pack(
587                                 '588                                 1, 8*len(extra), *extra) + extra_data
589 
590                         extract_version = max(45, zinfo.extract_version)
591                         create_version = max(45, zinfo.create_version)
592                     else:
593                         extract_version = zinfo.extract_version
594                         create_version = zinfo.create_version
595 
596                     try:
597                         filename, flag_bits = zinfo._encodeFilenameFlags()
598                         centdir = struct.pack(structCentralDir,
599                         stringCentralDir, create_version,
600                         zinfo.create_system, extract_version, zinfo.reserved,
601                         flag_bits, zinfo.compress_type, dostime, dosdate,
602                         zinfo.CRC, compress_size, file_size,
603                         len(filename), len(extra_data), len(zinfo.comment),
604                         0, zinfo.internal_attr, zinfo.external_attr,
605                         header_offset)
606                     except DeprecationWarning:
607                         print >>sys.stderr, (structCentralDir,
608                         stringCentralDir, create_version,
609                         zinfo.create_system, extract_version, zinfo.reserved,
610                         zinfo.flag_bits, zinfo.compress_type, dostime, dosdate,
611                         zinfo.CRC, compress_size, file_size,
612                         len(zinfo.filename), len(extra_data), len(zinfo.comment),
613                         0, zinfo.internal_attr, zinfo.external_attr,
614                         header_offset)
615                         raise
616                     self.fp.write(centdir)
617                     self.fp.write(filename)
618                     self.fp.write(extra_data)
619                     self.fp.write(zinfo.comment)
620 
621                 pos2 = self.fp.tell()
622                 # Write end-of-zip-archive record
623                 centDirCount = len(self.filelist)
624                 centDirSize = pos2 - pos1
625                 centDirOffset = pos1
626                 requires_zip64 = None
627                 if centDirCount > ZIP_FILECOUNT_LIMIT:
628                     requires_zip64 = "Files count"
629                 elif centDirOffset > ZIP64_LIMIT:
630                     requires_zip64 = "Central directory offset"
631                 elif centDirSize > ZIP64_LIMIT:
632                     requires_zip64 = "Central directory size"
633                 if requires_zip64:
634                     # Need to write the ZIP64 end-of-archive records
635                     if not self._allowZip64:
636                         raise LargeZipFile(requires_zip64 +
637                                            " would require ZIP64 extensions")
638                     zip64endrec = struct.pack(
639                             structEndArchive64, stringEndArchive64,
640                             44, 45, 45, 0, 0, centDirCount, centDirCount,
641                             centDirSize, centDirOffset)
642                     self.fp.write(zip64endrec)
643 
644                     zip64locrec = struct.pack(
645                             structEndArchive64Locator,
646                             stringEndArchive64Locator, 0, pos2, 1)
647                     self.fp.write(zip64locrec)
648                     centDirCount = min(centDirCount, 0xFFFF)
649                     centDirSize = min(centDirSize, 0xFFFFFFFF)
650                     centDirOffset = min(centDirOffset, 0xFFFFFFFF)
651 
652                 endrec = struct.pack(structEndArchive, stringEndArchive,
653                                     0, 0, centDirCount, centDirCount,
654                                     centDirSize, centDirOffset, len(self._comment))
655                 self.fp.write(endrec)
656                 self.fp.write(self._comment)
657                 self.fp.flush()
658         finally:
659             fp = self.fp
660             self.fp = None
661             if not self._filePassed:
662                 fp.close()
663 
664 ZipFile
665 
666 tarfile
TarFile

7.json & pickle 模块                                                                       

用于序列化的两个模块

  • json,用于字符串 和 python数据类型间进行转换
  • pickle,用于python特有的类型 和 python的数据类型间进行转换

Json模块提供了四个功能:dumps、dump、loads、load

pickle模块提供了四个功能:dumps、dump、loads、load

8.shelve模块                                                                                    shelve模块是一个简单的k,v将内存数据通过文件持久化的模块,可以持久化任何pickle可支持的python数据格式

  1 import shelve
  2 
  3 d = shelve.open('shelve_test') #打开一个文件
  4 
  5 class Test(object):
  6     def __init__(self,n):
  7         self.n = n
  8 
  9 t = Test(123)
 10 t2 = Test(123334)
 11 
 12 name = ["alex","rain","test"]
 13 d["test"] = name #持久化列表
 14 d["t1"] = t      #持久化类
 15 d["t2"] = t2
 16 
 17 d.close()

9.xml处理                                                                                        

xml是实现不同语言或程序之间进行数据交换的协议,跟json差不多,但json使用起来更简单,不过,古时候,在json还没诞生的黑暗年代,大家只能选择用xml呀,至今很多传统公司如金融行业的很多系统的接口还主要是xml。

xml的格式如下,就是通过<>节点来区别数据结构的:

  1 version="1.0"?>
  2 <data>
  3     name="Liechtenstein">
  4         yes">2
  5         2008
  6         141100
  7         name="Austria" direction="E"/>
  8         name="Switzerland" direction="W"/>
  9     
 10     name="Singapore">
 11         yes">5
 12         2011
 13         59900
 14         name="Malaysia" direction="N"/>
 15     
 16     name="Panama">
 17         yes">69
 18         2011
 19         13600
 20         name="Costa Rica" direction="W"/>
 21         name="Colombia" direction="E"/>
 22     
 23 data>
View Code

xml协议在各个语言里的都 是支持的,在python中可以用以下模块操作xml

  1 import xml.etree.ElementTree as ET
  2 
  3 tree = ET.parse("xmltest.xml")
  4 root = tree.getroot()
  5 print(root.tag)
  6 
  7 #遍历xml文档
  8 for child in root:
  9     print(child.tag, child.attrib)
 10     for i in child:
 11         print(i.tag,i.text)
 12 
 13 #只遍历year 节点
 14 for node in root.iter('year'):
 15   print(node.tag,node.text)
View Code

修改和删除xml文档内容

  1 import xml.etree.ElementTree as ET
  2 
  3 tree = ET.parse("xmltest.xml")
  4 root = tree.getroot()
  5 
  6 #修改
  7 for node in root.iter('year'):
  8     new_year = int(node.text) + 1
  9     node.text = str(new_year)
 10     node.set("updated","yes")
 11 
 12 tree.write("xmltest.xml")
 13 
 14 
 15 #删除node
 16 for country in root.findall('country'):
 17    rank = int(country.find('rank').text)
 18    if rank > 50:
 19      root.remove(country)
 20 
 21 tree.write('output.xml')
View Code

自己创建xml文档

  1 import xml.etree.ElementTree as ET
  2 
  3 
  4 new_xml = ET.Element("namelist")
  5 name = ET.SubElement(new_xml,"name",attrib={"enrolled":"yes"})
  6 age = ET.SubElement(name,"age",attrib={"checked":"no"})
  7 sex = ET.SubElement(name,"sex")
  8 sex.text = '33'
  9 name2 = ET.SubElement(new_xml,"name",attrib={"enrolled":"no"})
 10 age = ET.SubElement(name2,"age")
 11 age.text = '19'
 12 
 13 et = ET.ElementTree(new_xml) #生成文档对象
 14 et.write("test.xml", encoding="utf-8",xml_declaration=True)
 15 
 16 ET.dump(new_xml) #打印生成的格式
View Code

10.PyYAML模块                                                                              

Python也可以很容易的处理ymal文档格式,只不过需要安装一个模块,参考文档:http://pyyaml.org/wiki/PyYAMLDocumentation

11.ConfigParser模块                                                                      

用于生成和修改常见配置文档,当前模块的名称在 python 3.x 版本中变更为 configparser。

来看一个好多软件的常见文档格式如下

  1 
  2 [DEFAULT]
  3 
  4 ServerAliveInterval = 45
  5 
  6 Compression = yes
  7 
  8 CompressionLevel = 9
  9 
 10 ForwardX11 = yes
 11 
 12 
 13 
 14 [bitbucket.org]
 15 
 16 User = hg
 17 
 18 
 19 
 20 [topsecret.server.com]
 21 
 22 Port = 50022
 23 
 24 ForwardX11 = no
View Code

如果想用python生成一个这样的文档怎么做呢?

  1 
  2 import configparser
  3 
  4 
  5 
  6 config = configparser.ConfigParser()
  7 
  8 config["DEFAULT"] = {'ServerAliveInterval': '45',
  9 
 10                       'Compression': 'yes',
 11 
 12                      'CompressionLevel': '9'}
 13 
 14 
 15 
 16 config['bitbucket.org'] = {}
 17 
 18 config['bitbucket.org']['User'] = 'hg'
 19 
 20 config['topsecret.server.com'] = {}
 21 
 22 topsecret = config['topsecret.server.com']
 23 
 24 topsecret['Host Port'] = '50022'     # mutates the parser
 25 
 26 topsecret['ForwardX11'] = 'no'  # same here
 27 
 28 config['DEFAULT']['ForwardX11'] = 'yes'
 29 
 30 with open('example.ini', 'w') as configfile:
 31 
 32    config.write(configfile)
View Code

写完了还可以再读出来哈。

  1 
  2 >>> import configparser
  3 
  4 >>> config = configparser.ConfigParser()
  5 
  6 >>> config.sections()
  7 
  8 []
  9 
 10 >>> config.read('example.ini')
 11 
 12 ['example.ini']
 13 
 14 >>> config.sections()
 15 
 16 ['bitbucket.org', 'topsecret.server.com']
 17 
 18 >>> 'bitbucket.org' in config
 19 
 20 True
 21 
 22 >>> 'bytebong.com' in config
 23 
 24 False
 25 
 26 >>> config['bitbucket.org']['User']
 27 
 28 'hg'
 29 
 30 >>> config['DEFAULT']['Compression']
 31 
 32 'yes'
 33 
 34 >>> topsecret = config['topsecret.server.com']
 35 
 36 >>> topsecret['ForwardX11']
 37 
 38 'no'
 39 
 40 >>> topsecret['Port']
 41 
 42 '50022'
 43 
 44 >>> for key in config['bitbucket.org']: print(key)
 45 
 46 ...
 47 
 48 user
 49 
 50 compressionlevel
 51 
 52 serveraliveinterval
 53 
 54 compression
 55 
 56 forwardx11
 57 
 58 >>> config['bitbucket.org']['ForwardX11']
 59 
 60 'yes'
View Code

configparser增删改查语法

  1 [section1]
  2 k1 = v1
  3 k2:v2
  4 [section2]
  5 k1 = v1
  6 import ConfigParser
  7 config = ConfigParser.ConfigParser()
  8 config.read('i.cfg')
  9 # ########## 读 ##########
 10 #secs = config.sections()
 11 #print secs
 12 #options = config.options('group2')
 13 #print options
 14 #item_list = config.items('group2')
 15 #print item_list
 16 #val = config.get('group1','key')
 17 #val = config.getint('group1','key')
 18 # ########## 改写 ##########
 19 #sec = config.remove_section('group1')
 20 #config.write(open('i.cfg', "w"))
 21 #sec = config.has_section('wupeiqi')
 22 #sec = config.add_section('wupeiqi')
 23 #config.write(open('i.cfg', "w"))
 24 #config.set('group2','k1',11111)
 25 #config.write(open('i.cfg', "w"))
 26 #config.remove_option('group2','age')
 27 #config.write(open('i.cfg', "w"))
View Code

12.hashlib模块                                                                                   

用于加密相关的操作,3.x里代替了md5模块和sha模块,主要提供 SHA1, SHA224, SHA256, SHA384, SHA512 ,MD5 算法

  1 import hashlib
  2 
  3 m = hashlib.md5()
  4 m.update(b"Hello")
  5 m.update(b"It's me")
  6 print(m.hexdigest())
  7 m.update(b"It's been a long time since last time we ...")
  8 
  9 print(m.hexdigest()) #2进制格式hash
 10 print(len(m.hexdigest())) #16进制格式hash
 11 '''
 12 def digest(self, *args, **kwargs): # real signature unknown
 13     """ Return the digest value as a string of binary data. """
 14     pass
 15 
 16 def hexdigest(self, *args, **kwargs): # real signature unknown
 17     """ Return the digest value as a string of hexadecimal digits. """
 18     pass
 19 
 20 '''
 21 import hashlib
 22 
 23 # ######## md5 ########
 24 
 25 hash = hashlib.md5()
 26 hash.update('admin')
 27 print(hash.hexdigest())
 28 
 29 # ######## sha1 ########
 30 
 31 hash = hashlib.sha1()
 32 hash.update('admin')
 33 print(hash.hexdigest())
 34 
 35 # ######## sha256 ########
 36 
 37 hash = hashlib.sha256()
 38 hash.update('admin')
 39 print(hash.hexdigest())
 40 
 41 
 42 # ######## sha384 ########
 43 
 44 hash = hashlib.sha384()
 45 hash.update('admin')
 46 print(hash.hexdigest())
 47 
 48 # ######## sha512 ########
 49 
 50 hash = hashlib.sha512()
 51 hash.update('admin')
 52 print(hash.hexdigest())
View Code

还不够吊?python 还有一个 hmac 模块,它内部对我们创建 key 和 内容 再进行处理然后再加密

散列消息鉴别码,简称HMAC,是一种基于消息鉴别码MAC(Message Authentication Code)的鉴别机制。使用HMAC时,消息通讯的双方,通过验证消息中加入的鉴别密钥K来鉴别消息的真伪;

一般用于网络通信中消息加密,前提是双方先要约定好key,就像接头暗号一样,然后消息发送把用key把消息加密,接收方用key + 消息明文再加密,拿加密后的值 跟 发送者的相对比是否相等,这样就能验证消息的真实性,及发送者的合法性了.

  1 import hmac
  2 h = hmac.new('wueiqi')
  3 h.update('hellowo')
  4 print h.hexdigest()
View Code

13.Subprocess模块                                                                          

常用subprocess方法示例

#执行命令,返回命令执行状态 , 0 or 非0
>>> retcode = subprocess.call(["ls", "-l"])

#执行命令,如果命令结果为0,就正常返回,否则抛异常
>>> subprocess.check_call(["ls", "-l"])
0

#接收字符串格式命令,返回元组形式,第1个元素是执行状态,第2个是命令结果 
>>> subprocess.getstatusoutput('ls /bin/ls')
(0, '/bin/ls')

#接收字符串格式命令,并返回结果
>>> subprocess.getoutput('ls /bin/ls')
'/bin/ls'

#执行命令,并返回结果,注意是返回结果,不是打印,下例结果返回给res
>>> res=subprocess.check_output(['ls','-l'])
>>> res
b'total 0\ndrwxr-xr-x 12 alex staff 408 Nov 2 11:05 OldBoyCRM\n'

#上面那些方法,底层都是封装的subprocess.Popen
poll()
 Check if child process has terminated. Returns returncode

wait()
 Wait for child process to terminate. Returns returncode attribute.


terminate() 杀掉所启动进程
communicate() 等待任务结束

stdin 标准输入

stdout 标准输出

stderr 标准错误

pid
 The process ID of the child process.

#例子
>>> p = subprocess.Popen("df -h|grep disk",stdin=subprocess.PIPE,stdout=subprocess.PIPE,shell=True)
>>> p.stdout.read()
b'/dev/disk1 465Gi 64Gi 400Gi 14% 16901472 104938142 14% /\n'
View Code
>>> subprocess.run(["ls", "-l"])  # doesn't capture output
CompletedProcess(args=['ls', '-l'], returncode=0)
>>> subprocess.run("exit 1", shell=True, check=True)
Traceback (most recent call last):
  ...
subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1
>>> subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0,
stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n')
View Code

调用subprocess.run(...)是推荐的常用方法,在大多数情况下能满足需求,但如果你可能需要进行一些复杂的与系统的交互的话,你还可以用subprocess.Popen(),语法如下:

p = subprocess.Popen("find / -size +1000000 -exec ls -shl {} \;",shell=True,stdout=subprocess.PIPE)
print(p.stdout.read())
View Code

可用参数:

    • args:shell命令,可以是字符串或者序列类型(如:list,元组)
    • bufsize:指定缓冲。0 无缓冲,1 行缓冲,其他 缓冲区大小,负值 系统缓冲
    • stdin, stdout, stderr:分别表示程序的标准输入、输出、错误句柄
    • preexec_fn:只在Unix平台下有效,用于指定一个可执行对象(callable object),它将在子进程运行之前被调用
    • close_sfs:在windows平台下,如果close_fds被设置为True,则新创建的子进程将不会继承父进程的输入、输出、错误管道。
      所以不能将close_fds设置为True同时重定向子进程的标准输入、输出与错误(stdin, stdout, stderr)。
    • shell:同上
    • cwd:用于设置子进程的当前目录
    • env:用于指定子进程的环境变量。如果env = None,子进程的环境变量将从父进程中继承。
    • universal_newlines:不同系统的换行符不同,True -> 同意使用 \n
    • startupinfo与createionflags只在windows下有效
      将被传递给底层的CreateProcess()函数,用于设置子进程的一些属性,如:主窗口的外观,进程的优先级等等

终端输入的命令分为两种:

  • 输入即可得到输出,如:ifconfig
  • 输入进行某环境,依赖再输入,如:python

需要交互的命令示例

import subprocess
obj = subprocess.Popen(["python"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
obj.stdin.write('print 1 \n ')
obj.stdin.write('print 2 \n ')
obj.stdin.write('print 3 \n ')
obj.stdin.write('print 4 \n ')
out_error_list = obj.communicate(timeout=10)
print out_error_list
View Code

subprocess实现sudo 自动输入密码

import subprocess
def mypass():
    mypass = '123' #or get the password from anywhere
    return mypass
echo = subprocess.Popen(['echo',mypass()],
                        stdout=subprocess.PIPE,
                        )
sudo = subprocess.Popen(['sudo','-S','iptables','-L'],
                        stdin=echo.stdout,
                        stdout=subprocess.PIPE,
                        )
end_of_pipe = sudo.stdout
print "Password ok \n Iptables Chains %s" % end_of_pipe.read()
View Code

14.logging模块                                                                               

很多程序都有记录日志的需求,并且日志中包含的信息即有正常的程序访问日志,还可能有错误、警告等信息输出,python的logging模块提供了标准的日志接口,你可以通过它存储各种格式的日志,logging的日志可以分为 debug(), info(), warning(), error() and critical() 5个级别,下面我们看一下怎么用。

最简单用法

import logging
logging.warning("user [alex] attempted wrong password more than 3 times")
logging.critical("server is down")
#输出
WARNING:root:user [alex] attempted wrong password more than 3 times
CRITICAL:root:server is down
View Code

如果想把日志写到文件里,也很简单

import logging
logging.basicConfig(filename='example.log',level=logging.INFO)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
View Code

其中下面这句中的level=loggin.INFO意思是,把日志纪录级别设置为INFO,也就是说,只有比日志是INFO或比INFO级别更高的日志才会被纪录到文件里,在这个例子, 第一条日志是不会被纪录的,如果希望纪录debug的日志,那把日志级别改成DEBUG就行了。

logging.basicConfig(filename='example.log',level=logging.INFO)
View Code

感觉上面的日志格式忘记加上时间啦,日志不知道时间怎么行呢,下面就来加上!

import logging
logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p')
logging.warning('is when this event was logged.')
View Code

15.re正则表达式                                                                                

基本正则表达式元字符和语法

Python学习心路历程-day5_第3张图片

常用正则表达式符号

'.'     默认匹配除\n之外的任意一个字符,若指定flag DOTALL,则匹配任意字符,包括换行
'^'     匹配字符开头,若指定flags MULTILINE,这种也可以匹配上(r"^a","\nabc\neee",flags=re.MULTILINE)
'$'     匹配字符结尾,或e.search("foo$","bfoo\nsdfsf",flags=re.MULTILINE).group()也可以
'*'     匹配*号前的字符0次或多次,re.findall("ab*","cabb3abcbbac")  结果为['abb', 'ab', 'a']
'+'     匹配前一个字符1次或多次,re.findall("ab+","ab+cd+abb+bba") 结果['ab', 'abb']
'?'     匹配前一个字符1次或0次
'{m}'   匹配前一个字符m次
'{n,m}' 匹配前一个字符n到m次,re.findall("ab{1,3}","abb abc abbcbbb") 结果'abb', 'ab', 'abb']
'|'     匹配|左或|右的字符,re.search("abc|ABC","ABCBabcCD").group() 结果'ABC'
'(...)' 分组匹配,re.search("(abc){2}a(123|456)c", "abcabca456c").group() 结果 abcabca456c
'\A'    只从字符开头匹配,re.search("\Aabc","alexabc") 是匹配不到的
'\Z'    匹配字符结尾,同$
'\d'    匹配数字0-9
'\D'    匹配非数字
'\w'    匹配[A-Za-z0-9]
'\W'    匹配非[A-Za-z0-9]
's'     匹配空白字符、\t、\n、\r , re.search("\s+","ab\tc1\n3").group() 结果 '\t'
'(?P<name>...)' 分组匹配 re.search("(?P[0-9]{4})(?P[0-9]{2})(?P[0-9]{4})","371481199306143242").groupdict("city") 结果{'province': '3714', 'city': '81', 'birthday': '1993'}

最常用的匹配语法

re.match 从头开始匹配
re.search 匹配包含
re.findall 把所有匹配到的字符放到以列表中的元素返回
re.splitall 以匹配到的字符当做列表分隔符
re.sub      匹配字符并替换
View Code

1)match(pattern, string, flags=0)

从起始位置开始根据模型去字符串中匹配指定内容,匹配单个

  • 正则表达式
  • 要匹配的字符串
  • 标志位,用于控制正则表达式的匹配方式
import re

obj = re.match('\d+', '123uuasf')
if obj:
    print obj.group()
import re

obj = re.findall('\d+', 'fa123uu888asf')
print obj
# flags
I = IGNORECASE = sre_compile.SRE_FLAG_IGNORECASE # ignore case
L = LOCALE = sre_compile.SRE_FLAG_LOCALE # assume current 8-bit locale
U = UNICODE = sre_compile.SRE_FLAG_UNICODE # assume unicode locale
M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline
X = VERBOSE = sre_compile.SRE_FLAG_VERBOSE # ignore whitespace and comments

flags

2)search(pattern, string, flags=0)

根据模型去字符串中匹配指定内容,匹配单个

import re

obj = re.search('\d+', 'u123uu888asf')
if obj:
    print obj.group()

3)group和groups

a = "123abc456"
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group()

print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(0)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(1)
print re.search("([0-9]*)([a-z]*)([0-9]*)", a).group(2)

print re.search("([0-9]*)([a-z]*)([0-9]*)", a).groups()

4)findall(pattern, string, flags=0)

上述两中方式均用于匹配单值,即:只能匹配字符串中的一个,如果想要匹配到字符串中所有符合条件的元素,则需要使用 findall。

import re

obj = re.findall('\d+', 'fa123uu888asf')
print obj

5)sub(pattern, repl, string, count=0, flags=0)

用于替换匹配的字符串

content = "123abc456"
new_content = re.sub('\d+', 'sb', content)
# new_content = re.sub('\d+', 'sb', content, 1)
print new_content

相比于str.replace功能更加强大

6)split(pattern, string, maxsplit=0, flags=0)

根据指定匹配进行分组

content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('\*', content)
# new_content = re.split('\*', content, 1)
print new_content
content = "'1 - 2 * ((60-30+1*(9-2*5/3+7/3*99/4*2998+10*568/14))-(-4*3)/(16-3*2) )'"
new_content = re.split('[\+\-\*\/]+', content)
# new_content = re.split('\*', content, 1)
print new_content
inpp = '1-2*((60-30 +(-40-5)*(9-2*5/3 + 7 /3*99/4*2998 +10 * 568/14 )) - (-4*3)/ (16-3*2))'
inpp = re.sub('\s*','',inpp)
new_content = re.split('\(([\+\-\*\/]?\d+[\+\-\*\/]?\d+){1}\)', inpp, 1)
print new_content

相比于str.split更加强大

计算器源码

转载于:https://www.cnblogs.com/cs-qzyy/p/8652672.html

你可能感兴趣的:(Python学习心路历程-day5)