python笔记（

常用

提高python编码效率

赋值

列表推导式

列表解析表达式,类似[i*2 for i in list]

三元运算

1 if 5>3 else 0 输出1

占位符

print "打印数字参数num %d" %(num)
整数：%d，浮点数：%f，字符串：%s，万能格式：%r

yield

yield 是一个类似 return 的关键字，只是这个函数返回的是个生成器。
生成器是可以迭代的，但是你只可以读取它一次，因为它并不把所有的值放在内存中，它是实时地生成数据:

#[]生成list，()生成生成器
mygenerator = (x*x for x in range(3))
for i in mygenerator:
    print i

数据结构

tuple元组

Python的元组与列表类似，不同之处在于元组的元素不能修改。
tup1 = ('physics', 'chemistry', 1997, 2000)

字典

遍历

dict={"name":"python","english":33,"math":35}

print "##for in "
for i in dict:
        print "dict[%s]=" % i,dict[i]

print "##items"
for (k,v) in  dict.items():
        print "dict[%s]=" % k,v

print "##iteritems"
for k,v in dict.iteritems():
        print "dict[%s]=" % k,v

dict1.update(dict2)将dict2的键值对更新到dict1

判断key是否存在

d = dict{'name':'bob','value':'5'}
#1
print d.has_key('name') #True
#2
print 'name' in d.keys() #True

合并

x = {'apple':1,'banana':2}
y = {'banana':10,'pear':11}
for k, v in y.items():
    if k in x.keys():
        x[k] += v
    else:
        x[k] = v
#>>>x  {'pear':11,'apple':1,'banana':12}

list

list合并

list1+list2
list1.extend(list2)
去重
list(set(list1))

list相减

先转化为set相减，再list化
newlist = list(set(list1) - set(list2))

list元素重复次数

my_dict = {i:MyList.count(i) for i in MyList}

list批量更新

list.extend(addlist)

遍历两个list

把两个list合并之后再遍历

list1 = ['a','b','c']
list2 = [1,2,3]
list = zip(list1,list2)
#[('a',1),('b',2'),('c',3)]
for i,j in list:

获取一个随机子list

rand_items = [items[random.randrange(len(items))] for item in range(4)]
# 或者             
rand_items = random.sample(items, n)

混淆排序用random.shuffle

set 集合批量更新和更新

a = list[]
b = set(a)
b.update(list)
b.add('x')

list.pop()

删除最后一个元素并返回该元素的值

常用函数

数学运算

取整

import math
#三个取整返回结果都是浮点数
math.ceil(2.3)   #向上取整  3.0
math.floor(2.3)  #向下取整  2.0
math.round(2.3)  #四舍五入  2.0

多进程多线程

多进程

感觉是用于重复执行同一个任务
标准库：Multiprocessing
python进程池：multiprocessing.pool
廖雪峰：python进程
Python多进程模块Multiprocessing介绍
相关阅读GIL

据说用Gevent替代更好4

多线程

感觉是用于同事执行一个任务中不同的子任务
张亚楠:线程池threadpool
threadpool github主页,比pip新

threading

setDaemon(True)表示这个进程不重要，主线程不许要等待setDaemon(True)的子线程，可直接退出，如果要等待子进程完成后再退出，那么就不要设置或者显式设置setDaemon(False)（使用进程便利文本的时候，设置为True可能导致最后结果偏少，因为主要进程过早结束了）
join()会等到线程结束，或者给了timeout参数的时候，等到超时。如果主线程除了等线程结束之外还有其他事要做，就不需要调用join()，只有在你要等待线程结束的时候才调用join()(http://honglei.blog.51cto.com/3143746/933118)（PS:但不调用会死锁？？？）

queue

http://python.usyiyi.cn/python_278/library/queue.html
queue.join()等到queue为空，再执行别的操作

线程退出

1.class的构造器中添加：
self.setDaemon(True)
2.run()中添加queue.task_done() 告知程序这个queue操作完成（例如循环中queue取数据一次）
3.主线程中添加queue.join() 等待queue为空，就执行后续操作

进阶协程 gevent 待补充

文件和路径

重命名：os.rename(old, new)
删除：os.remove(file)
列出目录下的文件：os.listdir(path)
获取当前工作目录：os.getcwd()
改变工作目录：os.chdir(newdir)
创建多级目录：os.makedirs(r"c:\python \test")
创建单个目录：os.mkdir("test")
删除多个目录：os.removedirs(r"c:\python") #删除所给路径最后一个目录下所有空目录。
删除单个目录：os.rmdir("test")
获取文件属性：os.stat(file)
修改文件权限与时间戳：os.chmod(file)
执行操作系统命令：os.system("dir")
启动新进程：os.exec(), os.execvp()
在后台执行程序：osspawnv()
终止当前进程：os.exit(), os._exit()
分离文件名：os.path.split(r"c:\python\hello.py") --> ("c:\python", "hello.py")
分离扩展名：os.path.splitext(r"c:\python\hello.py") --> ("c:\python\hello", ".py")
获取路径名：os.path.dirname(r"c:\python\hello.py") --> "c:\python"
获取文件名：os.path.basename(r"r:\python\hello.py") --> "hello.py"
判断文件或目录是否存在：os.path.exists(r"c:\python\hello.py") --> True
判断是否是绝对路径：os.path.isabs(r".\python") --> False
判断是否是目录：os.path.isdir(r"c:\python") --> True
判断是否是文件：os.path.isfile(r"c:\python\hello.py") --> True
判断是否是链接文件：os.path.islink(r"c:\python\hello.py") --> False
获取文件大小：os.path.getsize(filename)
搜索目录下的所有文件：os.path.walk()

csv操作

Python CSV文件爱你处理/读写

错误

ValueError: too many values to unpack split

如果行格式异常，i,j,k=line.split(',')就会报错，用异常处理跳过就好

for line in open('csvname.csv'){
    i,j,k = line.split(',')
}

序列化和反序列化

json

#对象转json
import json
d = dict(name='Bob', age=20, score=88)
json.dumps(d)
#反序列化
json_str = '{"age":20, "score":88, "name": "Bob"}'
json.loads(json_str)

读取scrapy的jsonlines文件

import json
lines = []
with open('lines.json','r') as f:
    for line in f:
        lines.append(json.loads(line))

写入包含中文的json到文件

python的json.dumps方法默认会输出成这种格式"\u535a\u5ba2\u56ed",
要输出中文需要指定ensure_ascii参数为False，去掉indent参数就是单行,如下代码片段：
json.dumps({'text':"中文"},ensure_ascii=False,indent=2)

pickle

#cPickle速度比pickle快
try:
    import cPickle as pickle
except ImportError:
    import pickle

#序列化
import pickle
table = {
    'a':[1,2,3],
    'b':['span','eggs'],
    'c':{'name':'bob'}
}
mydb = open('dbase','w')
pickle.dump(table,mydb)
mydb.close()
#反序列化
mydb = open('dbase','r')
table = pickle.load(mydb)

shelve

shelve的键必须是字符串，而值可以是python中任意的数值类型,以键值对的形式把对象序列化到文件

#存数据
import shelve
dbase = shelve.open('mydbase')
object1 = ['The','bright',('side','of'),['life']]
object2 = {'name':'Brian','age':33,'motto':object1}
dbase['brian'] = object2
dbase['knight'] = {'name':'Knight','motto':'Ni!'}
dbase close()
#
#取数据
dbase = shelve.open("mydbase")
len(dbase) #2#
dbase.keys()  #['knight','brian']#
dbase['knight']  #{'motto':'Ni!','name':'Knight'}#

sql

sql插入时间

import time

a = time.strptime('my date', "%b %d %Y %H:%M")

cursor.execute('INSERT INTO myTable (Date) VALUES(%s)', (time.strftime('%Y-%m-%d %H:%M:%S', a),))

import datetime

a = datetime.datetime.strptime('my date', "%b %d %Y %H:%M")

cursor.execute('INSERT INTO myTable (Date) VALUES(%s)', (a.strftime('%Y-%m-%d %H:%M:%S'),))

sql select list, where ... in 列表

http://blog.rizauddin.com/2008/11/python-list-in-sql-query-as-parameter_7496.html

id = [10, 20, 4, 6, 9]
sql = 'select * from studens where id in %s' % str(tuple(id))

id = [10, 20, 4, 6, 9]
xx = ', '.join(id)
sql = 'select * from students where id in (%s)' % xx

postgresql

http://www.yiibai.com/html/postgresql/2013/080998.html

import psycopg2
conn = psycopg2.connect("database="testdb",user="postgres",password="password",host="127.0.0.1",port="5432")
cur = conn.cursor()
cur.execute('''sql''')
conn.commit()
conn.close()

connection url

postgresql://[user[:password]@][netloc][:port][/dbname][?param1=value1&...]

postgresql://
postgresql://localhost
postgresql://localhost:5433
postgresql://localhost/mydb
postgresql://user@localhost
postgresql://user:secret@localhost
postgresql://other@localhost/otherdb?connect_timeout=10&application_name=myapp

postgresql插入中文list

#单个list
dbtag = [u'\u52a8\u753b',u'\u52a8\u753b',u'\u52a8\u753b']
cur.execute("""insert into doubansimple(dbtag) values(%s) """ , [dbtag])

# cur.mogrify可以测试语句,并翻译成最终提交的sql语句
#多参数,包含list,json
cur.mogrify('''insert into douban(title, htitle, dbid, dbtag, dbrating, mainpic, desc_info, info_pic, staff_info, staff, related_info, related_pic) VALUES( %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)''', (title, htitle, dbid, dbtag, dbrating, mainpic, desc_info, info_pic, staff_info, staff, related_info, related_pic))

postgres 注意用户对表空间的权限

正则

Python正则表达式指南

r = re.compile(r'regex', re.MULTILINE)
r.sub("",str)
#等价于
re.sub(r'regex',"",str, flag=re.MULTILINE)

re.sub引用

p = re.compile(r'(\w+) (\w+)')
s = 'i say, hello world!'
print p.subn(r'\2 \1', s)
# ('say i, world hello!', 2)

一次完成多个替换

http://book.51cto.com/art/201005/198272.htm

import re  
def multiple_replace(text, adict):  
     rx = re.compile('|'.join(map(re.escape, adict)))  
     def one_xlat(match):  
           return adict[match.group(0)]  
     return rx.sub(one_xlat, text)

import re  
def make_xlat(*args, **kwds):  
     adict = dict(*args, **kwds)  
     rx = re.compile('|'.join(map(re.escape, adict)))  
     def one_xlat(match):  
           return adict[match.group(0)]  
     def xlat(text):  
           return rx.sub(one_xlat, text)  
     return xlat

使用详细区别看上面链接（待看）

if _ _name_ _ == "_ _main_ _":  
       text = "Larry Wall is the creator of Perl" 
       adict = {  
          "Larry Wall" : "Guido van Rossum",  
          "creator" : "Benevolent Dictator for Life",  
          "Perl" : "Python",  
       }  
       print multiple_replace(text, adict)  
       translate = make_xlat(adict)  
       print translate(text)

搜索

re.search(r'regex',str)
# 等价于
r = re.compile(r'regex')
r.search("",str)

search匹配所有字符串，最多匹配一次，返回一个match对象，用match.group()获取值
match从字符串的开始匹配一次，返回match
，开始不同，匹配就失败，返货None
findall返回所有匹配的字符串列表

字符和编码

http://www.jianshu.com/p/53bb448fe85b

codecs指定编码打开文件和创建文件

codecs.open('filename.txt','ab+','utf-8')

编码转换

编码查询

isinstance(s, str)
isinstance(s, unicode)
返回True or False

字符串字符串处理

str.strip(),用于去除字符串头尾的指定字符，默认去除空格
str.splie('/')[1]，分割字符串为list
str.replace("aabb","")替换aabb为空字符串（删除）

urlencode和urldecode

将链接中已编码的字符串显示成正常文字，中文unquote成gbk。
str = urllib.unquote(str).encode('utf8')
urlencode编码unquote之后输出有问题

urllib.unquote(split_link[2].decode('utf-8').encode('utf-8')).decode("utf-8")

将字符串编码
str = urllib.quote(str)
将键值对编码用urlencode,解码还是用unquote

from urllib import urlencode, urldecode
data = {
    'a':'text',
    'name':'魔兽'
}
encodedata = urlencode(data)

unicode uft-8

unicode可以编码成utf-8
ustr =  u' \u65e5\u672c'
ustr.encode('utf8')

从unicode或utf编码转换成汉字

写入文件的时候用的到
string_escape 和 unicode_escape可以转换utf8和unicode

f = '\u53eb\u6211'
print f
print (f.decode('unicode_escape'))
#结果为:
# \u53eb\u6211
# 叫我

json.dumps()之后转utf8

json.dumps(dict)之后,编码可能会变为unicode字符串
json.dumps(dict).decode('unicode_escape').encode('utf-8')

数据库和终端编码查询

json写入数据库

写入之前先反序列化 json.dumps(dict)

python终端编码

print sys.stdout.encoding

postgresql终端编码

查看服务端编码
show server_encoding
查看终端当前的编码
show client_encoding
设置编码
\encoding CLIENT-ENCODING

windows

cmd默认编码是gbk

sqlite当前编码格式

pragma encoding

多线程 queue

queue去重

继承queue,重写方法，达到去重效果
http://stackoverflow.com/questions/16506429/check-if-element-is-already-in-a-queue/16506527#16506527

class SetQueue(Queue.Queue):
    def _init(self, maxsize):
        self.queue = set()
    def _put(self, item):
        self.queue.add(item)
    def _get(self):
        return self.queue.pop()

异常处理

可以用条件判断加上raise主动跳转到except:
raise 如果跟上提示或者具体的异常，则会直接报错，不会跳转到except
try... except... else

常用库

requests库

快速上手
高级用法
requests下载图片
request和urllib2下载图片
忽略ssl证书验证,在get方法添加verify=False
连接数过多 Failed to establish a new connection headers={'Connection':'close'})

传递参数字典 get时可以用params=，post时用data=，其他还有json=等

bs4

Beautiful Soup 4.2.0文档

Python中过滤HTML标签的函数

https://gist.github.com/dndn/859717

#用正则简单过滤html的<>标签
import re
str = "srcdhello

"
str = re.sub(r']*>','',str)
#改进版 去除script
str = re.sub(r'()|(]+>)|(&\w{4};)','',html,flags=re.DOTALL)
print str

#用了HTMLParser，有更简单的方式吗？正则？
def strip_tags(html):
    """
    Python中过滤HTML标签的函数
    >>> str_text=strip_tags("hello")
    >>> print str_text
    hello
    """
    from HTMLParser import HTMLParser
    html = html.strip()
    html = html.strip("\n")
    result = []
    parser = HTMLParser()
    parser.handle_data = result.append
    parser.feed(html)
    parser.close()
    return ''.join(result)

pip批量安装依赖

pip install -r requirements.txt

生成随机值

随机字母列表

http://stackoverflow.com/questions/306400/how-do-i-randomly-select-an-item-from-a-list-using-python

import random
foo = ['a', 'b', 'c', 'd', 'e']
print(random.choice(foo))

随机数字列表

http://stackoverflow.com/questions/16655089/python-random-numbers-into-a-list

#1-100范围
import random
my_randoms = random.sample(xrange(1, 101), 10)

获取文件列表按创建时间排列

ScreenClip.png

性能优化

测量小代码执行时间 timeit

http://python.usyiyi.cn/python_278/library/timeit.html

变量占用空间大小

http://outofmemory.cn/code-snippet/1776/Python-how-look-variable-involve-space-size