a = complex(1,0.4)Unicode()
while: if: if xxx: ... elif yyy: ... elif xxx: ... else: ... for range() break continue 循环中的else pass
5)匿名函数 lambda arg1,arg2...:<expression>
10)判断变量类型的两种方法:isinstance(var,int) type(var).__name__=="int"
11)在循环中删除list元素时尤其要注意出问题,for i in listA:... listA.remove(i)是会有问题的,删除一个元素之后后面的元素就前移了;for i in len(listA):...del listA[i]也会有问题,删除元素后长度变化,循环会越界
filter(lambda x:x !=4,listA)这种方式比较优雅
listA = [ i for i in listA if i !=4] 也不错,或者直接创建一个新的列表算球
2)"for k in my_dict" 优于 "for k in my_dict.keys()",也优于"for k in [....]"
12)set是dict的一种实现 https://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset
>>> s1 = set([1,2,3,4,5]) >>> s2 = set([3,4,5,6,7,8]) >>> s1|s2 set([1, 2, 3, 4, 5, 6, 7, 8]) >>> s1-s2 set([1, 2]) >>> s2-s1 set([8, 6, 7])
1)str() unicode() repr() repr() print rjust() ljust() center() zfill() xxx%v xxx%(v1,v2) 打印复杂对象时可用pprint模块(调试时很有用)
7.class2)调用父类的方法:1>ParentClass.FuncName(self,args) 2>super(ChildName,self).FuncName(args) 第二种方法的使用必须保证类是从object继承下来的,否则super会报错
#!/bin/python #encoding=utf8 class A(object): def __init__(self, a, b): self.a = a self.b = b def show(self): print "A::show() a=%s b=%s" % (self.a,self.b) class B(A): def __init__(self, a, b, c): #A.__init__(self,a,b) super(B,self).__init__(a,b) #super这种用法要求父类必须是从object继承的 self.c = c if __name__ == "__main__": b = B(1,2,3) print b.a,b.b,b.c b.show() #输出 xudongsong@sysdev:~$ python class_test.py 1 2 3 A::show() a=1 b=2
>>> str2 = u"sfdasfafasf" >>> type(str2) <type 'unicode'> >>> isinstance(str2,str) False >>> isinstance(str2,unicode) True >>> type(str2) <type 'unicode'> >>> str3 = "safafasdf" >>> type(str3) <type 'str'> >>> isinstance(str3,unicode) False >>> isinstance(str3,str) True >>> str4 = r'asdfafadf' >>> isinstance(str4,str) True >>> isinstance(str4,unicode) False >>> type(str4) <type 'str'>
Some care must be taken if both signals and threads are used in the same program. The fundamental thing to remember in using signals and threads simultaneously is: always perform signal() operations in the main thread of execution. Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication. Use locks instead.
Threads interact strangely with interrupts: the KeyboardInterrupt exception will be received by an arbitrary thread. (When the signal module is available, interrupts always go to the main thread.)
当导入signal模块时, KeyboardInterrupt异常总是由主线程收到,否则KeyboardInterrupt异常会被任意一个线程接到。
直接按Ctrl+C会导致Python接收到SIGINT信号,转成KeyboardInterrupt异常在某个线程抛出,如果还有线程没有被 setDaemon,则这些线程照运行不误。如果用kill送出非SIGINT信号,且该信号没设置处理函数,则整个进程挂掉,不管有多少个线程 还没完成。
>>> import signal >>> def f(): ... signal.signal(signal.SIGINT, sighandler) ... signal.signal(signal.SIGTERM, sighandler) ... while True: ... time.sleep(1) ... >>> def sighandler(signum,frame): ... print signum,frame ... >>> f() ^C2 <frame object at 0x15b2a40> ^C2 <frame object at 0x15b2a40> ^C2 <frame object at 0x15b2a40> ^C2 <frame object at 0x15b2a40>
import signal, time term = False def sighandler(signum, frame): print "terminate signal received..." global term term = True def set_signal(): signal.signal(signal.SIGTERM, sighandler) signal.signal(signal.SIGINT, sighandler) def clear_signal(): signal.signal(signal.SIGTERM, 0) signal.signal(signal.SIGINT, 0) set_signal() while not term: print "hello" time.sleep(1) print "jumped out of while loop" clear_signal() term = False for i in range(5): if term: break else: print "hello, again" time.sleep(1)
[dongsong@bogon python_study]$ python signal_test.py hello hello hello ^Cterminate signal received... jumped out of while loop hello, again hello, again ^C [dongsong@bogon python_study]$多进程程序使用信号时,要想让父进程捕获信号并对子进程做一些操作,应该在子进程启动完成以后再注册信号处理函数,否则子进程继承父进程的地址空间,也会有该信号处理函数,程序会混乱不堪
from multiprocessing import Process, Pipe import logging, time, signal g_logLevel = logging.DEBUG g_logFormat = "%(asctime)s %(levelname)s [%(filename)s:%(lineno)d]%(message)s" def f(conn): conn.send([42, None, 'hello']) #conn.close() logging.basicConfig(level=g_logLevel,format=g_logFormat,stream=None) logging.debug("hello,world") def f2(): while True: print "hello,world" time.sleep(1) termFlag = False def sighandler(signum, frame): print "terminate signal received..." global termFlag termFlag = True if __name__ == '__main__': # parent_conn, child_conn = Pipe() # p = Process(target=f, args=(child_conn,)) # p.start() # print parent_conn.recv() # prints "[42, None, 'hello']" # print parent_conn.recv() # p.join() p = Process(target=f2) p.start() signal.signal(signal.SIGTERM, sighandler) signal.signal(signal.SIGINT, sighandler) while not termFlag: time.sleep(0.5) print "jump out of the main loop" p.terminate() p.join()
10.Python 的内建函数locals() 。它返回的字典对所有局部变量的名称与值进行映射
def func(*args): ...
def accept(**kwargs): ...
>>> def func(arg1, arg2 = "hello", *arg3, **arg4): ... print arg1 ... print arg2 ... print arg3 ... print arg4 ... >>> func("xds","t1",t2="t2",t3="t3") xds t1 () {'t2': 't2', 't3': 't3'}
13.装饰器 在函数前加上@another_method,用于对已有函数做包装、前提检查=工作,这篇文章写得很透彻 http://daqinbuyi.iteye.com/blog/1161274
import sys try: f = open('myfile.txt') s = f.readline() i = int(s.strip()) except IOError, (errno, strerror): print "I/O error(%s): %s" % (errno, strerror) except ValueError: print "Could not convert data to an integer." except: print "Unexpected error:", sys.exc_info()[0] raise
>>> try: ... raise Exception('spam', 'eggs') ... except Exception, inst: ... print "error %s" % str(e) ... print type(inst) # the exception instance ... print inst.args # arguments stored in .args ... print inst # __str__ allows args to printed directly ... x, y = inst # __getitem__ allows args to be unpacked directly ... print 'x =', x ... print 'y =', y ... <type 'instance'> ('spam', 'eggs') ('spam', 'eggs') x = spam y = eggs
15.命令行参数的处理,用python的optparse库处理,具体用法见这篇文章 http://blog.chinaunix.net/space.php?uid=16981447&do=blog&id=2840082
from optparse import OptionParser [...] def main(): usage = "usage: %prog [options] arg" parser = OptionParser(usage) parser.add_option("-f", "--file", dest="filename", help="read data from FILENAME") parser.add_option("-v", "--verbose", action="store_true", dest="verbose") parser.add_option("-q", "--quiet", action="store_false", dest="verbose") [...] (options, args) = parser.parse_args() if len(args) != 1: parser.error("incorrect number of arguments") if options.verbose: print "reading %s..." % options.filename [...] if __name__ == "__main__": main()通俗的讲,make_option()和add_option()用于创建对python脚本的某个命令项的解析方式,用parse_args()解析后单个参数存入args元组,键值对参数存入options;dest指定键值对的key,不写则用命令的长名称作为key;help用于对脚本调用--help/-h时候解释对应命令;action描述参数解析方式,默认store表示命令出现则用dest+后跟的value存入options,store_true表示命令出现则以dest+True存入options,store_false表示命令出现则以dest+False存入options
16.最近用了BeautifulSoup v4,出现如下错误(之前用的是低版本的BeautifulSoup,没遇到这个错误)
HTMLParser.HTMLParseError: malformed start tag解决办法:用easy_install html5lib,安装html5lib,替代HTMLParser
[dongsong@localhost boosenspider]$ vpython Python 2.6.6 (r266:84292, Dec 7 2011, 20:48:22) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> >>> >>> from bs4 import BeautifulSoup as soup >>> s = soup('<li class="dk_dk" id="dkdk"><a href="javascript:;" onclick="MOP.DZH.clickDaka();" class="btn_dk">打卡</a></li>') >>> s <html><head></head><body><li class="dk_dk" id="dkdk"><a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">打å�¡</a></li></body></html> >>> type(s) <class 'bs4.BeautifulSoup'> >>> >>> >>> t = s.body.contents[0] >>> t <li class="dk_dk" id="dkdk"><a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">打å�¡</a></li> >>> import re >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dks")}) [] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk")}) [<a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">打å�¡</a>] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk"),'href':None}) [] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk"),'href':re.compile('')}) [<a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">打å�¡</a>] >>> t.contents[0] <a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">打å�¡</a> >>> t.contents[0].string = "hello" >>> t <li class="dk_dk" id="dkdk"><a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a></li> >>> t.contents[0].text u'hello' >>> t.contents[0].string u'hello' >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk"),'text':re.compile('')}) [] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk"),'text':re.compile('h')}) [] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk"),'text':re.compile('^h')}) [] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk")}) [<a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a>] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk")},text=re.compile(r'')) [<a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a>] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk")},text=re.compile(r'a')) [] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk")},text=re.compile(r'^hell')) [<a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a>] >>> t.findAll(name='a',attrs={'class':re.compile(r"btn_dk")},text=re.compile(r'^hello$')) [<a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a>] >>> >>> t.findAll(name='a',attrs={},text=re.compile(r'^hello$')) [<a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a>] >>> >>> t <li class="dk_dk" id="dkdk"><a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a></li> >>> t1 = soup('<li class="dk_dk" id="dkdk"><a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a></li>').body.contents[0] >>> >>> t1 <li class="dk_dk" id="dkdk"><a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a></li> >>> t == t1 True >>> re.search(r'(^hello)|(^bbb)','hello') <_sre.SRE_Match object at 0x25ef718> >>> re.search(r'(^hello)|(^bbb)','hellosdfsd') <_sre.SRE_Match object at 0x25ef7a0> >>> re.search(r'(^hello)|(^bbb)','bbbsdfsdf') <_sre.SRE_Match object at 0x25ef718> >>> t2 = t1.contents[0] >>> t2 <a class="btn_dk" href="javascript:;" onclick="MOP.DZH.clickDaka();">hello</a> >>> t2.findAll(name='a') [] [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> >>> >>> from bs4 import BeautifulSoup as soup >>> s = soup('<li><a href="http://www.tianya.cn/techforum/articleslist/0/24.shtml" id="item天涯婚礼堂">天涯婚礼堂</a></li>') >>> s.findAll(name='a',attrs={'href':None}) [] >>> s.findAll(name='a',attrs={'href':True}) [<a href="http://www.tianya.cn/techforum/articleslist/0/24.shtml" id="item?¤???ˉ????¤????">?¤???ˉ????¤????</a>] >>> import re >>> s.findAll(name='a',attrs={'href':re.compile(r'')}) [<a href="http://www.tianya.cn/techforum/articleslist/0/24.shtml" id="item?¤???ˉ????¤????">?¤???ˉ????¤????</a>] >>> s1 =s >>> s1 <html><head></head><body><li><a href="http://www.tianya.cn/techforum/articleslist/0/24.shtml" id="item?¤???ˉ????¤????">?¤???ˉ????¤????</a></li></body></html> >>> id(s1) 140598579280080 >>> id(s) 140598579280080 >>> s1.body.contents[0].contents[0]['href']=None >>> s1 <html><head></head><body><li><a href id="item?¤???ˉ????¤????">?¤???ˉ????¤????</a></li></body></html> >>> s <html><head></head><body><li><a href id="item?¤???ˉ????¤????">?¤???ˉ????¤????</a></li></body></html> >>> id(s) 140598579280080 >>> id(s1) 140598579280080 >>> s.findAll(name='a',attrs={'href':re.compile(r'')}) [] >>> s.findAll(name='a',attrs={'href':True}) [] >>> s.findAll(name='a',attrs={'href':None}) [<a href id="item?¤???ˉ????¤????">?¤???ˉ????¤????</a>] >>> s.findAll(name='a') [<a href id="item?¤???ˉ????¤????">?¤???ˉ????¤????</a>] <code><a target=_blank target="_blank" name="arg-text"></a>#text是一个用于搜索<code>NavigableString</code>对象的参数。它的值可以是字符串,一个正则表达式,一个list或dictionary,<code>True</code>或<code>None</code>,一个以<code>NavigableString</code>为参数的可调用对象</code> #None,False,''表示不做要求;re.compile(''),True表示必须有NavigableString存在 (跟attrs不同,attrs字典中指定为False的属性表示不能存在) #注意findAll函数text参数的使用,如下: >>> rts = s2.findAll(name=u'ul',attrs={u'id': u'contentbar', u'st_type': 'nav'}, text=re.compile(r'')) >>> len(rts) 0 >>> rts = s2.findAll(name=u'ul',attrs={u'id': u'contentbar', u'st_type': 'nav'}, text='') >>> len(rts) 1 >>> rts = s2.findAll(name=u'ul',attrs={u'id': u'contentbar', u'st_type': 'nav'}, text=True) >>> len(rts) 0 >>> rts = s2.findAll(name=u'ul',attrs={u'id': u'contentbar', u'st_type': 'nav'}, text=False) >>> len(rts) 1 >>> rts = s2.findAll(name=u'ul',attrs={u'id': u'contentbar', u'st_type': 'nav'}, text=None) >>> len(rts) 1 #关于string属性的用法,以及其在什么类型元素上出现的问题 >>> from bs4 import BeautifulSoup as soup >>> soup1 = soup('<b>hello,<img href="sfdsf">aaaa</img></b>').body.contents[0] >>> soup1 <b>hello,<img href="sfdsf"/>aaaa</b> >>> soup1.string >>> soup1.name u'b' >>> soup1.text u'hello,aaaa' >>> type(soup1) <class 'bs4.element.Tag'> >>> soup1.contents[0] u'hello,' >>> type(soup1.contents[0]) <class 'bs4.element.NavigableString'> >>> soup1.contents[0].string u'hello,' >>> soup2 = soup('<b>hello</b>').body.contents[0] >>> type(soup2) <class 'bs4.element.Tag'> >>> soup2.string u'hello' #limit的用法,为零表示不限制 >>> soup2.findAll(name='a',text=False,limit=0) [<a href="http://book.douban.com/subject/4172417/"><img class="m_sub_img" src="http://img1.douban.com/spic/s4424194.jpg"/></a>, <a href="http://book.douban.com/subject/4172417/">匆匆那年</a>] >>> soup2.findAll(name='a',text=False,limit=1) [<a href="http://book.douban.com/subject/4172417/"><img class="m_sub_img" src="http://img1.douban.com/spic/s4424194.jpg"/></a>]
[dongsong@bogon boosenspider]$ cat bs_constrator.py #encoding=utf-8 from bs4 import BeautifulSoup as soup from bs4 import Tag if __name__ == '__main__': sou = soup('<div></div>') tag1 = Tag(sou, name='div') tag1['id'] = 'gentie1' tag1.string = 'hello,tag1' sou.div.insert(0,tag1) tag2 = Tag(sou, name='div') tag2['id'] = 'gentie2' tag2.string = 'hello,tag2' sou.div.insert(1,tag2) print sou
[dongsong@bogon boosenspider]$ vpython bs_constrator.py <html><head></head><body><div><div id="gentie1">hello,tag1</div><div id="gentie2">hello,tag2</div></div></body></html>
>>> t = Tag(name='t') >>> t.string="<img src='www.baidu.com'/>" >>> t <t><img src='www.baidu.com'/></t> >>> str(t) "<t><img src='www.baidu.com'/></t>" >>> t.string u"<img src='www.baidu.com'/>" >>> HTMLParser.HTMLParser().unescape(str(t)) u"<t><img src='www.baidu.com'/></t>" >>> s1 u"<t><img src='www.baidu.com'/></t>" >>> >>> s2 = cgi.escape(s1) >>> s2 u"<t><img src='www.baidu.com'/></t>" >>> HTMLParser.HTMLParser().unescape(s2) u"<t><img src='www.baidu.com'/></t>"
>>> md5.md5("asdfadf").hexdigest() 'aee0014b14124efe03c361e1eed93589' >>> import hashlib >>> hashlib.md5("asdfadf").hexdigest() 'aee0014b14124efe03c361e1eed93589'
urlFile = urllib2.urlopen(url, timeout=g_url_timeout) urlData = urlFile.read()
19.正则匹配 re模块
>>> ss = ''' ... hell0,a ... shhh ... liumingdong ... xudongsong ... hello ... ''' >>> ss '\nhell0,a\nshhh\nliumingdong\nxudongsong\nhello\n' SyntaxError: EOL while scanning string literal >>> sss = 'aaaa\ ... bbbb\ ... cccccc' >>> sss 'aaaabbbbcccccc' >>> s3 = r'(^hello)|\ ... (abc$)' >>> >>> re.search(s3,'hello,world') <_sre.SRE_Match object at 0x7f95233047a0> #第一行的正则串匹配成功 >>> re.search(s3,'aaa,hello,worldabc') #第二行的匹配失败 >>> s4 = r'(^hello)|(abc$)' #s4没有用单引号加\做跨行,则两个正则串都匹配上了 >>> re.search(s4,"hello,world") <_sre.SRE_Match object at 0x182e690> >>> re.search(s4,"aaa,hello,worldabc") <_sre.SRE_Match object at 0x7f95233047a0> >>> #注意如何取匹配到的子串(把要抽取的子串对应的正则用圆括号括起来,group从1开始就是圆括号对应的子串) >>> re.search(r'^(\d+)abc(\d+)$','232abc1').group(0,1,2) ('232abc1', '232', '1') #下面是一个re和lambda混合使用的一个例子 #encoding=utf-8 import re f = lambda arg: re.search(u'^(\d+)\w+',arg).group(1) print f(u'1111条评论') try: f(u'aaaa') except AttributeError,e: print str(e) :!python re_lambda.py 111 'NoneType' object has no attribute 'group'
>>> re.findall(r'\\@[A-Za-z0-9]+', s) ['\\@userA', '\\@userB'] >>> s 'hello,world,\\@userA\\@userB' >>> re.findall(r'\\@([A-Za-z0-9]+)', s) ['userA', 'userB']
20.写了个爬虫,之前在做一些url的连接时总是自己来根据各种情况来处理,比如./xxx #xxxx /xxx神马的都要考虑,太烦了,后来发现有现成的东西可以用
>>>from urlparse import urljoin >>>import urllib >>>url = urljoin(r"http://book.douban.com/tag/?view=type",u"./网络小说") >>> url u'http://book.douban.com/tag/\u7f51\u7edc\u5c0f\u8bf4' >>> conn2 = urllib.urlopen(url) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/urllib.py", line 86, in urlopen return opener.open(url) File "/usr/lib64/python2.6/urllib.py", line 179, in open fullurl = unwrap(toBytes(fullurl)) File "/usr/lib64/python2.6/urllib.py", line 1041, in toBytes " contains non-ASCII characters") UnicodeError: URL u'http://book.douban.com/tag/\u7f51\u7edc\u5c0f\u8bf4' contains non-ASCII characters >>> conn2 = urllib.urlopen(url.encode('utf-8'))
>>> request = urllib2.Request("http://img1.gtimg.com/finance/pics/hv1/46/178/1031/67086211.jpg",headers={'If-Modified-Since':'Wed, 02 May 2012 18:32:20 GMT'}) #等同于request.add_header('If-Modified-Since','Wed, 02 May 2012 18:32:20 GMT') >>> urllib2.urlopen(request) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib64/python2.6/urllib2.py", line 397, in open response = meth(req, response) File "/usr/lib64/python2.6/urllib2.py", line 510, in http_response 'http', request, response, code, msg, hdrs) File "/usr/lib64/python2.6/urllib2.py", line 435, in error return self._call_chain(*args) File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain result = func(*args) File "/usr/lib64/python2.6/urllib2.py", line 518, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 304: Not Modified >>> urllib.urlencode({"aaa":"bbb"}) 'aaa=bbb' >>> urllib.urlencode([("aaa","bbb")]) 'aaa=bbb' #urlencode的使用,在提交post表单时需要把参数k-v用urlencode处理后放入头部 #urllib2.urlopen(url,data=urllib.urlencode(...))
[dongsong@localhost python_study]$ cat cookie.py from urllib2 import Request, build_opener, HTTPCookieProcessor, HTTPHandler import httplib, urllib, cookielib, Cookie, os conn = httplib.HTTPConnection('webapp.pucrs.br') #COOKIE FINDER cj = cookielib.CookieJar() opener = build_opener(HTTPCookieProcessor(cj),HTTPHandler()) req = Request('http://webapp.pucrs.br/consulta/principal.jsp') f = opener.open(req) html = f.read() import pdb pdb.set_trace() for cookie in cj: c = cookie #FIM COOKIE FINDER params = urllib.urlencode ({'pr1':111049631, 'pr2':'sssssss'}) headers = {"Content-type":"text/html", "Set-Cookie" : "JSESSIONID=70E78D6970373C07A81302C7CF800349"} # I couldn't set the value automaticaly here, the cookie object can't be converted to string, so I change this value on every session to the new cookie's value. Any solutions? conn.request ("POST", "/consulta/servlet/consulta.aluno.ValidaAluno",params, headers) # Validation page resp = conn.getresponse() temp = conn.request("GET","/consulta/servlet/consulta.aluno.Publicacoes") # desired content page resp = conn.getresponse() print resp.read()
def change_log_file(fileName): h = logging.FileHandler(fileName) h.setLevel(g_logLevel) h.setFormatter(logging.Formatter(g_logFormat)) logger = logging.getLogger() #print logger.handlers for handler in logger.handlers: handler.close() while len(logger.handlers) > 0: logger.removeHandler(logger.handlers[0]) logger.addHandler(h)
[dongsong@localhost python_study]$ cat logging_test.py #encoding=utf-8 import logging, sys if __name__ == '__main__': logger = logging.getLogger('test') logger.setLevel(logging.DEBUG) print 'log handlers: %s' % str(logger.manager.loggerDict) logger.error('here') logger.warning('here') logger.info('here') logger.debug('here') #handler = logging.FileHandler('test.log') handler = logging.StreamHandler(sys.stdout) handler.setLevel(logging.DEBUG) formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) logger.addHandler(handler) #logging.getLogger('test').addHandler(logging.NullHandler()) # python 2.7+ logger.error('here') logger.warning('here') logger.info('here') logger.debug('here') [dongsong@localhost python_study]$ vpython logging_test.py log handlers: {'test': <logging.Logger instance at 0x7f1dde0c2758>} No handlers could be found for logger "test" 2012-12-26 11:30:48,725 - test - ERROR - here 2012-12-26 11:30:48,725 - test - WARNING - here 2012-12-26 11:30:48,725 - test - INFO - here 2012-12-26 11:30:48,725 - test - DEBUG - here
import multiprocessing from multiprocessing import Process import time def func(): for i in range(3): print "hello" time.sleep(1) proc = Process(target = func) proc.start() while True: childList = multiprocessing.active_children() print childList if len(childList) == 0: break time.sleep(1)
[dongsong@bogon python_study]$ python multiprocessing_children.py [<Process(Process-1, started)>] hello [<Process(Process-1, started)>] hello [<Process(Process-1, started)>] hello [<Process(Process-1, started)>] [] [dongsong@bogon python_study]$ fg
[dongsong@bogon python_study]$ vpython Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from multiprocessing import Pool >>> import time >>> poolObj = Pool(processes = 10) >>> procObj = poolObj.apply_async(time.sleep, (20,)) >>> procObj.get(timeout = 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/multiprocessing/pool.py", line 418, in get raise TimeoutError multiprocessing.TimeoutError >>> print procObj.get(timeout = 21) None >>> poolObj.__dict__['_pool'] [<Process(PoolWorker-1, started daemon)>, <Process(PoolWorker-2, started daemon)>, <Process(PoolWorker-3, started daemon)>, <Process(PoolWorker-4, started daemon)>, <Process(PoolWorker-5, started daemon)>, <Process(PoolWorker-6, started daemon)>, <Process(PoolWorker-7, started daemon)>, <Process(PoolWorker-8, started daemon)>, <Process(PoolWorker-9, started daemon)>, <Process(PoolWorker-10, started daemon)>] >>> poolObj.close() >>> poolObj.join()
#encoding=utf-8 from bs4 import BeautifulSoup as soup tag = soup((u"<p>白痴代码</p>"),from_encoding='unicode').body.contents[0] newStr = str(tag) #tag内部的__str__()返回utf-8编码的字符串(tag不实现__str__()的话就会按照本文第38条表现了) print type(newStr),isinstance(newStr,unicode),newStr try: print u"[unicode]hello," + newStr #自动把newStr按照unicode解释,报错 except Exception,e: print str(e) print "[utf-8]hello," + newStr print u"[unicode]hello," + newStr.decode('utf-8')
[dongsong@bogon python_study]$ vpython tag_str_test.py <type 'str'> False <p>白痴代码</p> 'ascii' codec can't decode byte 0xe7 in position 3: ordinal not in range(128) [utf-8]hello,<p>白痴代码</p> [unicode]hello,<p>白痴代码</p>
3>查询语句需要注意的问题见下述测试代码;推荐的cursor.execute()用法是cursor.execute(sql, args),因为底层会自动做字符串逃逸
If you're not familiar with the Python DB-API, notethat the SQL statement incursor.execute() uses placeholders,"%s",rather than adding parameters directly within the SQL. If you use thistechnique, the underlying database library will automatically add quotes andescaping to your parameter(s) as necessary. (Also note that Django expects the"%s" placeholder,not the "?" placeholder, which is used by the SQLitePython bindings. This is for the sake of consistency and sanity.)
#encoding=utf-8 import MySQLdb conn = MySQLdb.connect(host = "", port = 3306, user = "xds", passwd = "xds", db = "xds_db", charset = 'utf8') cursor = conn.cursor() print cursor siteName = u"百度贴吧" bbsNames = [u"明星", u"影视"] siteName = siteName.encode('utf-8') for index in range(len(bbsNames)): bbsNames[index] = bbsNames[index].encode('utf-8') #正确的用法 #args = tuple([siteName] + bbsNames) #sql = "select bbs from t_site_bbs where site = %s and bbs in (%s,%s)" #rts = cursor.execute(sql,args) #print rts #正确的用法 args = tuple([siteName] + bbsNames) sql = "select bbs from t_site_bbs where site = '%s' and bbs in ('%s','%s')" % args print sql rts = cursor.execute(sql) print rts #错误的用法,报错 #args = tuple([siteName] + bbsNames) #sql = "select bbs from t_site_bbs where site = %s and bbs in (%s,%s)" % args #rts = cursor.execute(sql) print rts #错误的用法,不报错,但是查不到数据(bbsName的成员是数字串或者英文字符串时正确) #sql = "select bbs from t_site_bbs where site = '%s' and bbs in %s" % (siteName, str(tuple(bbsNames))) #print sql #rts = cursor.execute(sql) #print rts rts = cursor.fetchall() for rt in rts: print rt[0]
[dongsong@bogon boosencms]$ vpython Python 2.6.6 (r266:84292, Dec 7 2011, 20:48:22) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import time >>> time.gmtime() time.struct_time(tm_year=2012, tm_mon=5, tm_mday=18, tm_hour=4, tm_min=14, tm_sec=55, tm_wday=4, tm_yday=139, tm_isdst=0) >>> time.localtime() time.struct_time(tm_year=2012, tm_mon=5, tm_mday=18, tm_hour=12, tm_min=15, tm_sec=2, tm_wday=4, tm_yday=139, tm_isdst=0) >>> time.time() 1337314595.7790151 >>> time.timezone -28800 >>> time.gmtime(time.time()) time.struct_time(tm_year=2012, tm_mon=5, tm_mday=18, tm_hour=4, tm_min=19, tm_sec=45, tm_wday=4, tm_yday=139, tm_isdst=0) >>> time.localtime(time.time()) time.struct_time(tm_year=2012, tm_mon=5, tm_mday=18, tm_hour=12, tm_min=19, tm_sec=54, tm_wday=4, tm_yday=139, tm_isdst=0) >>> time.strftime("%a, %d %b %Y %H:%M:%S +0800", time.localtime(time.time())) 'Fri, 18 May 2012 12:21:20 +0800' >>> time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime(time.time())) 'Fri, 18 May 2012 04:21:36 +0000' #%Z这玩意到底怎么用的,下面也没搞明白 >>> time.strftime("%a, %d %b %Y %H:%M:%S %Z", time.gmtime(time.time())) 'Fri, 18 May 2012 04:23:09 CST' >>> time.strftime("%a, %d %b %Y %H:%M:%S %Z", time.localtime(time.time())) 'Fri, 18 May 2012 12:23:31 CST' >>> timeStr = time.strftime("%a, %d %b %Y %H:%M:%S +0000", time.gmtime(time.time())) >>> timeStr 'Fri, 18 May 2012 04:24:29 +0000' >>> t = time.strptime(timeStr, "%a, %d %b %Y %H:%M:%S %Z") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/_strptime.py", line 454, in _strptime_time return _strptime(data_string, format)[0] File "/usr/lib64/python2.6/_strptime.py", line 325, in _strptime (data_string, format)) ValueError: time data 'Fri, 18 May 2012 04:24:29 +0000' does not match format '%a, %d %b %Y %H:%M:%S %Z' >>> t = time.strptime(timeStr, "%a, %d %b %Y %H:%M:%S +0000") >>> t time.struct_time(tm_year=2012, tm_mon=5, tm_mday=18, tm_hour=4, tm_min=24, tm_sec=29, tm_wday=4, tm_yday=139, tm_isdst=-1) #下面是datetime的用法 >>> import datetime >>> datetime.datetime.today() datetime.datetime(2012, 5, 18, 12, 28, 25, 892141) >>> datetime.datetime(2012,12,12,23,54) datetime.datetime(2012, 12, 12, 23, 54) >>> datetime.datetime(2012,12,12,23,54,32) datetime.datetime(2012, 12, 12, 23, 54, 32) >>> datetime.datetime.fromtimestamp(time.time()) datetime.datetime(2012, 5, 18, 12, 29, 15, 130257) >>> datetime.datetime.utcfromtimestamp(time.time()) datetime.datetime(2012, 5, 18, 4, 29, 34, 897017) >>> datetime.datetime.now() datetime.datetime(2012, 5, 18, 12, 29, 52, 558249) >>> datetime.datetime.utcnow() datetime.datetime(2012, 5, 18, 4, 30, 6, 164009) >>> datetime.datetime.fromtimestamp(time.time()).strftime("%a, %d %b %Y %H:%M:%S") 'Fri, 18 May 2012 17:05:30' >>> datetime.datetime.today().strftime("%a, %d %b %Y %H:%M:%S") 'Fri, 18 May 2012 17:05:44' >>> datetime.datetime.strptime('Fri, 18 May 2012 04:24:29', "%a, %d %b %Y %H:%M:%S") datetime.datetime(2012, 5, 18, 4, 24, 29)
>>> datetime.datetime.fromtimestamp(time.time()).strftime('%X') '17:07:14' >>> datetime.datetime.fromtimestamp(time.time()).strftime('%x') '02/28/15' >>> datetime.datetime.fromtimestamp(time.time()).strftime('%c') 'Sat Feb 28 17:07:24 2015'
%M 分钟,0-59
%S 秒,0-61(官网这样写的)
>>> l = ['a','b','c','d'] >>> '&'.join(l) 'a&b&c&d'
b line_number 加断点,还可以指定文件和函数加断点
b 180, childWeiboRt.retweetedId == 3508203280986906 条件断点
b 显示所有断点
cl breakpoint_number 清除某个断点
cl 清除所有断点
c 继续
n 下一步
s 跟进函数内部
bt 调用栈
whatis obj 查看某变量类型(跟python的内置函数type()等效)
up 移到调用栈的上一层(frame),可以看该调用点的代码和变量(当然,程序实际进行到哪里了是不可改变的)
down 移到调用栈的下一层(frame),可以看该调用点的代码和变量(当然,程序实际进行到哪里了是不可改变的)
for it in [(attr,getattr(instanceObj,attr)) for attr in dir(instanceObj)]: print it[0],'-->',it[1]
>>> import sys >>> def f2(): ... print sys._getframe().f_code.co_name ... >>> f2() f2
+ URL中+号表示空格 %2B
空格 URL中的空格可以用+号或者编码 %20
/ 分隔目录和子目录 %2F
? 分隔实际的URL和参数 %3F
% 指定特殊字符 %25
# 表示书签 %23
& URL中指定的参数间的分隔符 %26
= URL中指定参数的值 %3D
>>> import urllib >>> import urlparse >>> urlparse.urljoin('http://s.weibo.com/weibo/',urllib.quote('python c++')) 'http://s.weibo.com/weibo/python%20c%2B%2B'
[dongsong@bogon python_study]$ cat url.py #encoding=utf-8 import urllib, urlparse if __name__ == '__main__': baseUrl = 'http://s.weibo.com/weibo/' url = urlparse.urljoin(baseUrl, urllib.quote(urllib.quote('python c++'))) print url conn = urllib.urlopen(url) data = conn.read() f = file('/tmp/d.html', 'w') f.write(data) f.close()
[dongsong@bogon python_study]$ vpython url.py http://s.weibo.com/weibo/python%2520c%252B%252B
_mysql_exceptions.Warning: Incorrect string value: '\xF0\x9F\x91\x91\xE7\xAC...' for column 'detail' at row 1
而用第一种方式存库就没有问题,初步认定是json.dumps(ensure_ascii = False)对繁体字的处理有编码问题
myString = jsonStr.decode('utf-8', 'ignore') #转成unicode,并忽略错误
jsonObj = json.loads(myString)
#encoding=utf-8 import json from pprint import pprint def show_rt(rt): pprint(rt) print rt print "type(rt) is %s" % type(rt) if __name__ == '__main__': unDic = { u'中国':u'北京', u'日本':u'东京', u'法国':u'巴黎' } utf8Dic = { r'中国':r'北京', r'日本':r'东京', r'法国':r'巴黎' } pprint(unDic) pprint(utf8Dic) print "\nunicode instance dumps to string:" rt = json.dumps(unDic) show_rt(rt) print "utf-8 instance dumps to string:" rt = json.dumps(utf8Dic) show_rt(rt) #encoding is the character encoding for str instances, default is UTF-8 #If ensure_ascii is False, then the return value will be a unicode instance, default is True print "\nunicode instance dumps(ensure_ascii=False) to string:" rt = json.dumps(unDic,ensure_ascii=False) show_rt(rt) print "utf-8 instance dumps(ensure_ascii=False) to string:" rt = json.dumps(utf8Dic,ensure_ascii=False) show_rt(rt) print "\n-----------------数据结构混杂编码-----------------" unDic[u'日本'] = r'东京' utf8Dic[r'日本'] = u'东京' pprint(unDic) pprint(utf8Dic) print "\nunicode instance dumps to string:" try: rt = json.dumps(unDic) except Exception,e: print "%s:%s" % (type(e),str(e)) else: show_rt(rt) print "utf-8 instance dumps to string:" try: rt = json.dumps(utf8Dic) except Exception,e: print "%s:%s" % (type(e),str(e)) else: show_rt(rt) print "\nunicode instance dumps(ensure_ascii=False) to string:" try: rt = json.dumps(unDic, ensure_ascii=False) except Exception,e: print "%s:%s" % (type(e),str(e)) else: show_rt(rt) print "utf-8 instance dumps to string:" try: rt = json.dumps(utf8Dic, ensure_ascii=False) except Exception,e: print "%s:%s" % (type(e),str(e)) else: show_rt(rt)
[dongsong@bogon python_study]$ vpython json_test.py {u'\u4e2d\u56fd': u'\u5317\u4eac', u'\u65e5\u672c': u'\u4e1c\u4eac', u'\u6cd5\u56fd': u'\u5df4\u9ece'} {'\xe4\xb8\xad\xe5\x9b\xbd': '\xe5\x8c\x97\xe4\xba\xac', '\xe6\x97\xa5\xe6\x9c\xac': '\xe4\xb8\x9c\xe4\xba\xac', '\xe6\xb3\x95\xe5\x9b\xbd': '\xe5\xb7\xb4\xe9\xbb\x8e'} unicode instance dumps to string: '{"\\u4e2d\\u56fd": "\\u5317\\u4eac", "\\u65e5\\u672c": "\\u4e1c\\u4eac", "\\u6cd5\\u56fd": "\\u5df4\\u9ece"}' {"\u4e2d\u56fd": "\u5317\u4eac", "\u65e5\u672c": "\u4e1c\u4eac", "\u6cd5\u56fd": "\u5df4\u9ece"} type(rt) is <type 'str'> utf-8 instance dumps to string: '{"\\u4e2d\\u56fd": "\\u5317\\u4eac", "\\u6cd5\\u56fd": "\\u5df4\\u9ece", "\\u65e5\\u672c": "\\u4e1c\\u4eac"}' {"\u4e2d\u56fd": "\u5317\u4eac", "\u6cd5\u56fd": "\u5df4\u9ece", "\u65e5\u672c": "\u4e1c\u4eac"} type(rt) is <type 'str'> unicode instance dumps(ensure_ascii=False) to string: u'{"\u4e2d\u56fd": "\u5317\u4eac", "\u65e5\u672c": "\u4e1c\u4eac", "\u6cd5\u56fd": "\u5df4\u9ece"}' {"中国": "北京", "日本": "东京", "法国": "巴黎"} type(rt) is <type 'unicode'> utf-8 instance dumps(ensure_ascii=False) to string: '{"\xe4\xb8\xad\xe5\x9b\xbd": "\xe5\x8c\x97\xe4\xba\xac", "\xe6\xb3\x95\xe5\x9b\xbd": "\xe5\xb7\xb4\xe9\xbb\x8e", "\xe6\x97\xa5\xe6\x9c\xac": "\xe4\xb8\x9c\xe4\xba\xac"}' {"中国": "北京", "法国": "巴黎", "日本": "东京"} type(rt) is <type 'str'> -----------------数据结构混杂编码----------------- {u'\u4e2d\u56fd': u'\u5317\u4eac', u'\u65e5\u672c': '\xe4\xb8\x9c\xe4\xba\xac', u'\u6cd5\u56fd': u'\u5df4\u9ece'} {'\xe4\xb8\xad\xe5\x9b\xbd': '\xe5\x8c\x97\xe4\xba\xac', '\xe6\x97\xa5\xe6\x9c\xac': u'\u4e1c\u4eac', '\xe6\xb3\x95\xe5\x9b\xbd': '\xe5\xb7\xb4\xe9\xbb\x8e'} unicode instance dumps to string: '{"\\u4e2d\\u56fd": "\\u5317\\u4eac", "\\u65e5\\u672c": "\\u4e1c\\u4eac", "\\u6cd5\\u56fd": "\\u5df4\\u9ece"}' {"\u4e2d\u56fd": "\u5317\u4eac", "\u65e5\u672c": "\u4e1c\u4eac", "\u6cd5\u56fd": "\u5df4\u9ece"} type(rt) is <type 'str'> utf-8 instance dumps to string: '{"\\u4e2d\\u56fd": "\\u5317\\u4eac", "\\u6cd5\\u56fd": "\\u5df4\\u9ece", "\\u65e5\\u672c": "\\u4e1c\\u4eac"}' {"\u4e2d\u56fd": "\u5317\u4eac", "\u6cd5\u56fd": "\u5df4\u9ece", "\u65e5\u672c": "\u4e1c\u4eac"} type(rt) is <type 'str'> unicode instance dumps(ensure_ascii=False) to string: <type 'exceptions.UnicodeDecodeError'>:'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128) utf-8 instance dumps to string: <type 'exceptions.UnicodeDecodeError'>:'ascii' codec can't decode byte 0xe4 in position 1: ordinal not in range(128)
>>> import json >>> d = {1:[1,2,3,4],0:()} >>> d {0: (), 1: [1, 2, 3, 4]} >>> s = json.dumps(d) >>> s '{"0": [], "1": [1, 2, 3, 4]}' >>> json.loads(s) {u'1': [1, 2, 3, 4], u'0': []}
Keys in key/value pairs of JSON are always of the type str. Whena dictionary is converted into JSON, all the keys of the dictionary arecoerced to strings. As a result of this, if a dictionary is converedinto JSON and then back into a dictionary, the dictionary may not equalthe original one. That is, loads(dumps(x)) != x if x has non-stringkeys.
>>> obj2 = subprocess.Popen('python /home/dongsong/python_study/child2.py', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) >>> dir(obj2) ['__class__', '__del__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_check_timeout', '_child_created', '_close_fds', '_communicate', '_communicate_with_poll', '_communicate_with_select', '_communication_started', '_execute_child', '_get_handles', '_handle_exitstatus', '_input', '_internal_poll', '_remaining_time', '_set_cloexec_flag', '_translate_newlines', 'communicate', 'kill', 'pid', 'poll', 'returncode', 'send_signal', 'stderr', 'stdin', 'stdout', 'terminate', 'universal_newlines', 'wait'] >>> dir(obj2.stdout) ['__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'closed', 'encoding', 'errors', 'fileno', 'flush', 'isatty', 'mode', 'name', 'newlines', 'next', 'read', 'readinto', 'readline', 'readlines', 'seek', 'softspace', 'tell', 'truncate', 'write', 'writelines', 'xreadlines'] >>> obj2.stdout.read() '[<logging.StreamHandler instance at 0x7fdb0ad63248>]\naaaaa\naaaaa\naaaaa\naaaaa\naaaaa\naaaaa\naaaaa\naaaaa\naaaaa\naaaaa\n' >>> obj2.stdout.read() '' >>> obj2.communicate()[0] '' >>> obj2.communicate()[1] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/subprocess.py", line 729, in communicate stdout, stderr = self._communicate(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1310, in _communicate stdout, stderr = self._communicate_with_poll(input, endtime) File "/usr/lib64/python2.6/subprocess.py", line 1364, in _communicate_with_poll register_and_append(self.stdout, select_POLLIN_POLLPRI) File "/usr/lib64/python2.6/subprocess.py", line 1343, in register_and_append poller.register(file_obj.fileno(), eventmask) ValueError: I/O operation on closed file >>> obj2.stderr.read() Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: I/O operation on closed file >>> args = shlex.split('python /home/dongsong/python_study/child2.py') >>> obj = subprocess.Popen(args)
flags = fcntl.fcntl(procObj.stdout.fileno(), fcntl.F_GETFL) fcntl.fcntl(procObj.stdout.fileno(), fcntl.F_SETFL, flags|os.O_NONBLOCK)
原理在僵尸的百科里有提到:fork两次,父进程fork一个子进程,然后继续工作,子进程fork一 个孙进程后退出,那么孙进程被init接管,孙进程结束后,init会回收。不过子进程的回收 还要自己做。
建议不要修改系统默认编码,会影响一些库的使用;一定要改可用这些方法。其中sys.setdefaultencoding()方法不是任何场景都有效(Thesetdefaultencoding is used in python-installed-dir/site-packages/pyanaconda/sitecustomize.py)
[dongsong@bogon python_study]$ vpython Python 2.6.6 (r266:84292, Dec 7 2011, 20:48:22) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.getdefaultencoding() 'ascii' >>> s = u'中国' >>> str(s) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) >>> s.encode('utf-8') '\xe4\xb8\xad\xe5\x9b\xbd' >>> sys.setdefaultencoding('utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'module' object has no attribute 'setdefaultencoding' >>> d = {u'中国':u'北京'} >>> d {u'\u4e2d\u56fd': u'\u5317\u4eac'} >>> str(d) "{u'\\u4e2d\\u56fd': u'\\u5317\\u4eac'}" #修改默认编码 [dongsong@bogon python_study]$ cat ~/venv/lib/python2.6/site-packages/sitecustomize.py import sys sys.setdefaultencoding('utf-8') [dongsong@bogon python_study]$ vpython -c 'import sys; print sys.getdefaultencoding();' utf-8 [dongsong@bogon python_study]$ vpython Python 2.6.6 (r266:84292, Dec 7 2011, 20:48:22) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = u'中国' >>> str(s) '\xe4\xb8\xad\xe5\x9b\xbd' >>> import sys >>> print sys.getdefaultencoding() utf-8 >>> d = {u'中国':u'北京'} >>> d {u'\u4e2d\u56fd': u'\u5317\u4eac'} >>> str(d) "{u'\\u4e2d\\u56fd': u'\\u5317\\u4eac'}"
可以用python -S 跳过site.py(site.py这个东东可以看看python源码里面的内容),然后sys模块就直接支持setdefaultencoding()方法了。
... except Exception,e: if not isinstance(e, APIError): traceback.print_exc(file=sys.stderr)
import sys tp,val,td = sys.exc_info()
traceback ---- 包含调用栈信息的对象。
40.用python做GUI开发的一些选择 GUI Programming in Python( http://wiki.python.org/moin/GuiProgramming)
cocos2d :Cocos2D家族的前世今生
1>@classmethod修饰的类的方法是类方法,第一个参数cls是接收类变量。有子类继承时,调用该类方法时,传入的类变量cls是子类,而非父类。不同于C++中类的静态方法。调用方法:ClassA.func() or ClassA().func()(后者调用时函数忽略类的实例)classmethod() is useful for creating alternateclass constructors.
>>> class A: ... @classmethod ... def func(cls): ... import pdb ... pdb.set_trace() ... pass ... >>> A.func() > <stdin>(6)func() (Pdb) cls <class __main__.A at 0x7fc8b056ea70> (Pdb) type(cls) <type 'classobj'> (Pdb) >>> type(A()) <type 'instance'>2>@staticmethod修饰的类的方法是静态方法,静态方法不接收隐式的第一个参数。基本上跟一个全局函数相同,跟C++中类的静态方法很类似。调用方法:ClassA.func() or ClassA().func() (后者调用时函数忽略类的实例)
>>> d1 {1: 6, 11: 12, 12: 13, 13: 14} >>> d2 {1: 2, 2: 3, 3: 4} >>> dict(d2, **d1) {1: 6, 2: 3, 3: 4, 11: 12, 12: 13, 13: 14} >>> dict(d1,**d2) {1: 2, 2: 3, 3: 4, 11: 12, 12: 13, 13: 14} >>> d = dict(d1) >>> d {1: 6, 11: 12, 12: 13, 13: 14} >>> d2 {1: 2, 2: 3, 3: 4} >>> d.update(d2) >>> d {1: 2, 2: 3, 3: 4, 11: 12, 12: 13, 13: 14} >>> d = dict(d2) >>> d {1: 2, 2: 3, 3: 4} >>> d1 {1: 6, 11: 12, 12: 13, 13: 14} >>> d.update(d1) >>> d {1: 6, 2: 3, 3: 4, 11: 12, 12: 13, 13: 14}
2>>socket.setdefaulttimeout(xx) #(全局socket超时设置)
from urllib2 import urlopen from threading import Timer url = "http://www.python.org" def handler(fh): fh.close() fh = urlopen(url) t = Timer(20.0, handler,[fh]) t.start() data = fh.read() t.cancel()
鸟人用的版本:(xlwt-0.7.4 xlrd-0.8.0 xlutils-1.5.2)
设置行的高度可以用sheetObj.row(index).set_style(easyxf('font:height 720;')) 设置列的宽度可以用sheetObj.col(index).width = 1000 其他那些方法差不多都有bug 设置不上http://reliablybroken.com/b/2011/10/widths-heights-with-xlwt-python/
#encoding=utf-8 from xlwt import Workbook, easyxf book = Workbook(encoding='utf-8') sheet1 = book.add_sheet('Sheet 1') sheet1.col_width(20000) book.add_sheet('Sheet 2') sheet1.write(0,0,'起点') sheet1.write(0,1,'B1') row1 = sheet1.row(1) row1.write(0,'Ai2') row1.write(1,'B2') sheet1.col(0).width = 10000 sheet1.col(1).width = 20000 #sheet1.default_col_width = 20000 #bug invalid #sheet1.col_width(30000) #bug invalid #sheet1.default_row_height = 5000 #bug invalid #sheet1.row(0).height = 5000 #bug invalid sheet1.row(0).set_style(easyxf('font:height 400;')) style = easyxf('pattern: pattern solid, fore_colour red;' 'align: vertical center, horizontal center;' 'font: bold true;') sheet1.write_merge(2,5,2,5,'Merged',style) sheet2 = book.get_sheet(1) sheet2.row(0).write(0,'Sheet 2 A1') sheet2.row(0).write(1,'Sheet 2 B1') sheet2.flush_row_data() sheet2.write(1,0,'Sheet 2 A3') sheet2.col(0).width = 5000 sheet2.col(0).hidden = True book.save('simple.xls')
45.在本机有多个ip地址的情况下,urllib2发起http请求时如何指定使用哪个IP地址?两种方式,方便且稍带取巧性质的是篡改socket模块的socket方法(下面的代码是这种),另一种是:A better way is to extendconnect()
method in subclass ofHTTPConnection
and redefinehttp_open()
method in subclass ofHTTPHandler
def bind_alt_socket(alt_ip): true_socket = socket.socket def bound_socket(*a, **k): sock = true_socket(*a, **k) sock.bind((alt_ip, 0)) return sock socket.socket = bound_socket参考: http://www.rossbates.com/2009/10/urllib2-with-multiple-network-interfaces/
1.sip安装 wget http://sourceforge.net/projects/pyqt/files/sip/sip-4.14.1/sip-4.14.1.tar.gz vpython configure.py make sudo make install 2.sudo yum install qt qt-devel -y sudo yum install qtwebkit qtwebkit-devel -y //没有这一个操作的话,下面configure操作就会不生成QtWebKit的Makefile 3.pyqt安装 wget http://sourceforge.net/projects/pyqt/files/PyQt4/PyQt-4.9.5/PyQt-x11-gpl-4.9.5.tar.gz vpython configure.py -q/usr/bin/qmake-qt4 -g make make installdir(PyQt4)看不到的模块不表示不存在啊亲!so动态库可以用from PyQt4 import QtGui或者import PyQt4.QtGui来引入的啊亲!尼玛,我一直以为安装失败了,各种尝试各种找原因啊,崩溃中...
下述示例是在默认python环境中使用virtualenv python中安装的callme模块:
[dongsong@localhost ~]$ python Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import callme Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named callme >>> activate_this = '/home/dongsong/venv/bin/activate_this.py' >>> execfile(activate_this, dict(__file__=activate_this)) >>> import callme >>>至于如何使得mod_python使用virtualenv python环境,可参考前述连接:
#myvirtualdjango.py activate_this = '/home/django/progopedia.ru/ve/bin/activate_this.py' execfile(activate_this, dict(__file__=activate_this)) from django.core.handlers.modpython import handler
<VirtualHost> ServerName progopedia.ru ServerAdmin [email protected] <Location "/"> SetHandler python-program PythonPath "['/home/django/progopedia.ru/ve/bin', '/home/django/progopedia.ru/src/progopedia_ru_project/'] + sys.path" PythonHandler myvirtualdjango SetEnv DJANGO_SETTINGS_MODULE settings SetEnv PYTHON_EGG_CACHE /var/tmp/egg PythonInterpreter polyprog_ru </Location> </VirtualHost>
print 会自动在行末加上回车,如果不需回车,只需在print语句的结尾添加一个逗号”,“,就可以改变它的行为
49.finally 很容易搞错哦!
[dongsong@localhost python_study]$ cat finally_test.py #encoding=utf-8 def func(): a = 1 try: return a except Exception,e: print '%r' % e else: print 'no exception' finally: print 'finally' a += 1 a = func() print 'func returned %s' % a [dongsong@localhost python_study]$ vpython finally_test.py finally func returned 1
1>当调用 stackless.schedule() 的时候,当前活动微进程将暂停执行,并将自身重新插入到调度器队列的末尾,好让下一个微进程被执行。
一旦在它前面的所有其他微进程都运行过了,它将从上次 停止的地方继续开始运行。这个过程会持续,直到所有的活动微进程都完成了运行过程。这就是使用stackless达到合作式多任务的方式。
2>接收的微进程调用 channel.receive() 的时候,便阻塞住,这意味着该微进程暂停执行,直到有信息从这个通道送过来。除了往这个通道发送信息以外,没有其他任何方式可以让这个微进程恢复运行。
若有其他微进程向这个通道发送了信息,则不管当前的调度到了哪里,这个接收的微进程都立即恢复执行;而发送信息的微进程则被转移到调度列表的末尾,就像调用了 stackless.schedule() 一样。
5>计时:现在,我们对若干次实验运行过程进行计时。Python标准库中有一个 timeit.py 程序,可以用作此目的。
6>我们将channel的preference 设置为1,这使得调用send之后任务不被阻塞而继续运行,以便在之后输出正确的仓库信息。
7>In stackless, the balance of a channel is how many tasklets are waiting to send or receive on it.正数表示有send的个数;负数表示receive的个数;0表示没有等待。
总结:stackless python还是受限于GIL,多核用不上,只是比python的传统thread有些改进而已(http://stackoverflow.com/questions/377254/stackless-python-and-multicores)。所以multiprocessing构建多进程、进程内部用stackless构建微线程是不错的搭配。EVE服务器端使用stackless做的(貌似是C++/stackless python),好想看看他们的代码啊,哈哈哈。
stackless python安装:参考http://opensource.hyves.org/concurrence/install.html#installing-stackless
sudo yum install readline-devel -y ./configure --prefix=/opt/stackless --with-readline --with-zlib=/usr/include make make install
[dongsong@localhost python_study]$ touch mds/__init__.py [dongsong@localhost python_study]$ vpython Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> m = __import__('mds.m1', globals(), locals(), fromlist=[], level = 0) >>> m <module 'mds' from 'mds/__init__.py'>
class RobotMeta(type): def __new__(cls, name, bases, attrs): newbases = list(bases) import testcase import pkgutil for importer, modname, ispkg in pkgutil.iter_modules(testcase.__path__): if ispkg: continue mod = __import__('testcase.'+modname, globals(), locals(), fromlist=(modname,), level=1) if hasattr(mod, 'Robot'): newbases.append(mod.Robot) return super(RobotMeta, cls).__new__(cls, name, tuple(newbases), attrs)importlib库, importlib.import_module()
[dongsong@localhost python_study]$ touch mds/__init__.py [dongsong@localhost python_study]$ vpython Python 2.6.6 (r266:84292, Jun 18 2012, 14:18:47) [GCC 4.4.6 20110731 (Red Hat 4.4.6-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import importlib >>> m = importlib.import_module('mds.m1') >>> m <module 'mds.m1' from 'mds/m1.py'> >>>
52.对于user-defined class,如何使其支持pickle和cPickle?(下面是对项目中一个继承自dict的json串反解对象所做的修改,参考http://stackoverflow.com/questions/5247250/why-does-pickle-getstate-accept-as-a-return-value-the-very-instance-it-requi)
def __getstate__(self): return dict(self) def __setstate__(self, state): return self.update(state)
s.isalnum() 所有字符都是数字或者字母
s.isalpha() 所有字符都是字母
s.isdigit() 所有字符都是数字
s.islower() 所有字符都是小写
s.isupper() 所有字符都是大写
s.istitle() 所有单词都是首字母大写,像标题
s.isspace() 所有字符都是空白字符、\t、\n、\r
54.python networking framework, 这种python并发问题三言两语难尽其意,故另起炉灶见http://blog.csdn.net/xiarendeniao/article/details/9143059
Twisted是比较常见和广泛使用的(module index)
concurrence 跟stackless有一腿(stackless和libevent的结合体),所以对我比较有吸引力
cogen 跟上面的那个相似,移植性更好一些
gevent greenlet和libevent的结合体(greenlet是stackless的副产品、只是比stackless更原始一些、更容易满足coder对协程的控制欲),这样看跟concurrence原理差不多哦
55.python环境变量(environment variables)
import os if not os.environ.has_key('DJANGO_SETTINGS_MODULE'): os.environ['DJANGO_SETTINGS_MODULE'] = 'boosencms.settings' else: print 'DJANGO_SETTINGS_MODULE: %s' % os.environ['DJANGO_SETTINGS_MODULE']
[dongsong@localhost python-study]$ !cat cat yield.py def echo(value=None): print "Execution starts when 'next()' is called for the first time." try: while True: try: value = (yield value) except Exception, e: print "catched an exception", e value = e else: print "yield received ", value finally: print "Don't forget to clean up when 'close()' is called." generator = echo(1) print generator.next() print generator.next() print generator.send(2) generator.throw(TypeError, "spam") generator.close() [dongsong@localhost python-study]$ [dongsong@localhost python-study]$ [dongsong@localhost python-study]$ !python python yield.py Execution starts when 'next()' is called for the first time. 1 yield received None None yield received 2 2 catched an exception spam Don't forget to clean up when 'close()' is called.
57.元类metaclass详解见文章 http://blog.csdn.net/xiarendeniao/article/details/9232021
[dongsong@localhost python_study]$ cat singleton3.py #encoding=utf-8 class Singleton(type): _instances = {} def __call__(cls, *args, **kwargs): if cls not in cls._instances: cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs) return cls._instances[cls] class MyClass(object): __metaclass__ = Singleton singletonObj = Singleton('Test',(),{}) myClassObj1 = MyClass() myClassObj2 = MyClass() print singletonObj, singletonObj.__class__ print id(myClassObj1),myClassObj1,myClassObj1.__class__ print id(myClassObj2),myClassObj2,myClassObj2.__class__ [dongsong@localhost python_study]$ vpython singleton3.py <class '__main__.Test'> <class '__main__.Singleton'> 139799414931408 <__main__.MyClass object at 0x7f2596777fd0> <class '__main__.MyClass'> 139799414931408 <__main__.MyClass object at 0x7f2596777fd0> <class '__main__.MyClass'>59.python magic methods ,有些长,单开一篇文章 http://blog.csdn.net/xiarendeniao/article/details/9270407
60.struct 二进制 官方文档 http://docs.python.org/3/library/struct.html
Character | Byte order | Size | Alignment |
@ | native | native | native |
= | native | standard | none |
< | little-endian | standard | none |
> | big-endian | standard | none |
! | network (= big-endian) | standard | none |
Format | C Type | Python type | Standard size | Notes |
x | pad byte | no value | ||
c | char | bytes of length 1 | 1 | |
b | signed char | integer | 1 | (1),(3) |
B | unsigned char | integer | 1 | (3) |
? | _Bool | bool | 1 | (1) |
h | short | integer | 2 | (3) |
H | unsigned short | integer | 2 | (3) |
i | int | integer | 4 | (3) |
I | unsigned int | integer | 4 | (3) |
l | long | integer | 4 | (3) |
L | unsigned long | integer | 4 | (3) |
q | long long | integer | 8 | (2), (3) |
Q | unsigned longlong | integer | 8 | (2), (3) |
n | ssize_t | integer | (4) | |
N | size_t | integer | (4) | |
f | float | float | 4 | (5) |
d | double | float | 8 | (5) |
s | char[] | bytes | ||
p | char[] | bytes | ||
P | void * | integer | (6) |
>>> import struct >>> struct.pack('HH',1,2) '\x01\x00\x02\x00' >>> struct.pack('<HH',1,2) '\x01\x00\x02\x00' >>> struct.pack('>HH',1,2) '\x00\x01\x00\x02' >>> s= struct.pack('HH',1,2) >>> s '\x01\x00\x02\x00' >>> len(s) 4 >>> struct.unpack('HH',s) (1, 2) >>> struct.unpack_from('H', s, 2) (2,) >>> struct.unpack('H',s[0:2]) (1,)
[dongsong@localhost python_study]$ cat enclosing_1.py #encoding=utf8 a = 1 b = 2 def f(v = 0): a = 2 c = list() def g(): print 'a = %s' % a print 'b = %s' % b print 'c = %r' % c if v == 0: a += 1 else: a += v c.append(111) return g g = f() #函数返回g函数对象赋值给g; 函数对象g跟a(3)、c([111])绑定构成闭包 f(10)() #内嵌对象跟a(12)、c([111])绑定构成闭包;输出: a=12, b=2, c=[111] f() #没有任何输出,内嵌函数跟a/c绑定后的结果没有使用 g() #输出: a = 3, b = 2, c = [111] b = 3 g() #输出: a = 3, b = 3, c = [111] (b是全局变量) print a #输出全局变量: a = 1 [dongsong@localhost python_study]$ vpython enclosing_1.py a = 12 b = 2 c = [111] a = 3 b = 2 c = [111] a = 3 b = 3 c = [111] 1
[dongsong@localhost tfengyun_py]$ vpython new_user.py debug 1852589841 /data/weibofengyun/workspace-php/tfengyun_py/utils.py:26: Warning: Incorrect string value: '\xF0\x9F\x92\x91\xE4\xBD...' for column 'description' at row 1 try: affectCount = self.cursor.execute(sql)
吓人的鸟(362278013) 11:27:58 对于昨天那个数据入库Mysql报Warning的问题大概整明白了,现分享如下,非常感谢@墨迹 !! http://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html mysql5.5.3之前不支持utf8mb4,上周五那个入库警告是因为有部分unicode字符(ios设备的emoji表情)编码成utf-8以后占四字节(正常一般不超过三字节): >>> u'\u8bb0'.encode('utf-8') '\xe8\xae\xb0' >>> u'\U0001f497'.encode('utf-8') '\xf0\x9f\x92\x97' 对于不想升级mysql版本来解决问题的情况,可以把这种字符过滤掉,栈溢出上有相关讨论 http://stackoverflow.com/questions/10798605/warning-raised-by-inserting-4-byte-unicode-to-mysql 那么对于同一个Mysql数据库和一样的数据,为什么PHP程序可以正常入库(不报错不报警告、数据不被截断)呢? 原来是因为它内部自动的把utf8的四字节编码部分过滤掉了,入库以后在mysql命令行下查询会发现那些emoji表情符不见了,用PHP程序从数据库把数据查出来验证也确实如此 PS: 知之为知之,不知为不知,是知也. 来提问的都是因为比较着急了,希望各位同仁少些说教,多些实际有效建议。
Cython(http://cython.org/) 基于Pyrex,被设计用来编写python的c扩展
说到这里不得不提一下pypy(http://pypy.org/)了(虽然pypy不是用来跟c/c++交互的),pypy是python实现的python解释器,jit(Just-in-time compilation,动态编译)使其运行速度比cpython(官方解释器,一般人用的解释器)要快,支持stackless、提供微线程协作,感觉前景一片光明啊!有消息说pypy会丢弃GIL以提升多线程程序的性能,不过我看官方文档好像没这么说(http://pypy.org/tmdonate2.html#what-is-the-global-interpreter-lock)。
[dongsong@localhost python-study]$ python Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = file("code.py").read() >>> print s def func(): print "i am in function func()" return 1,2,3 >>> codeObj = compile(s,"<string>","exec") >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'codeObj', 's'] >>> codeObj <code object <module> at 0x7f761cd74738, file "<string>", line 1> >>> eval(codeObj) >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'codeObj', 'func', 's'] >>> func() i am in function func() (1, 2, 3) [dongsong@localhost python-study]$ python Python 2.6.6 (r266:84292, Jan 22 2014, 09:42:36) [GCC 4.4.7 20120313 (Red Hat 4.4.7-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = file("code.py").read() >>> exec(s) >>> dir() ['__builtins__', '__doc__', '__name__', '__package__', 'func', 's'] >>> func() i am in function func() (1, 2, 3)
66.随机字符串 http://stackoverflow.com/questions/2257441/random-string-generation-with-upper-case-letters-and-digits-in-python
>>> import string >>> import random >>> def id_generator(size=6, chars=string.ascii_uppercase + string.digits): ... return ''.join(random.choice(chars) for _ in range(size)) ... >>> id_generator() 'G5G74W' >>> id_generator(3, "6793YUIO") 'Y3U' >>> string.ascii_uppercase 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' >>> string.digits '0123456789' >>> string.ascii_uppercase + string.digits 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' >>> string.lowercase 'abcdefghijklmnopqrstuvwxyz'
[dongsong@localhost python-study]$ cat hasattr.py #encoding=utf-8 class A(object): def __init__(self): self.__a = 100 self.a = 200 def test(self): if hasattr(self,'__a'): print 'found self.__a:',self.__a else: print 'not found self.__a' if hasattr(self,'a'): print 'found self.a:', self.a else: print 'not found self.a:', self.a if __name__ == '__main__': t = A() t.test() [dongsong@localhost python-study]$ [dongsong@localhost python-study]$ python hasattr.py not found self.__a found self.a: 200
68.Python循环import : Circular (or cyclic) imports
说白了,a import b, b import a, 那么在a的主代码块(也就是“import a”时会被执行的代码)中使用module b里面的符号(b.xx、from b import xx)会出错。
另,python a.py,那么a.py初次会当做__main__ module,“import a”会重新把a执行一遍(这个在源码剖析里面有提到,也就是使用if __name__ == '__main__'判断的原因)
[root@test-22 xds]# cat maintest.py import maintest print 'main test in ..' if __name__ == '__main__': print 'aaaa' print 'main test out..' [root@test-22 xds]# [root@test-22 xds]# python maintest.py main test in .. main test out.. main test in .. aaaa main test out..