python & scrapy

  • python

1.staticmethod和classmethod

class MethodTest():
   var1 = "class var"
   
   def __init__(self, var2 = "object var"):
       self.var2 = var2
   
   @staticmethod

   def staticFun():
        print 'static method' 
   
   @classmethod

   def classFun(cls):
       print 'class method'


相同点:

1.都可以通过类或实例调用

mt = MethodTest()

MethodTest.staticFun()

mt.staticFun()

MethodTest.classFun()

mt.classFun()

2.都无法访问实例成员

    @staticmethod

   def staticFun():
        print var2  //wrong 
   @classmethod

   def classFun(cls):
       print var2  //wrong

不同点:

1.staticmethod无需参数,classmethod需要类变量作为参数传递(不是类的实例)

   def classFun(cls):
       print 'class method'  //cls作为类变量传递

2.classmethod可以访问类成员,staticmethod则不可以

   @staticmethod

   def staticFun():
        print var1  //wrong 
   @classmethod

   def classFun(cls):
       print cls.var1  //right

详见:http://blog.sina.com.cn/s/blog_45ac0d0a01017mfd.html

2.常用的函数:

2.1 Decimal, round 数字小数点处理等.

例:quotedPrice = price.get("quoted_price", None)
     if quotedPrice is not None:
            return int(round(Decimal(quotedPrice) / 100) * 100)

2.2 db.connetion.findAndModify() 对mongo的处理

2.3 xrange 与 range的区别.             xrange生成器

2.4

3.万能头,解决字符编码问题:

import sys

reload(sys)
if sys.stdout.encoding is None:
    import codecs

    writer = codecs.getwriter("utf-8")
    sys.stdout = writer(sys.stdout)

4.python操作csv文件

1. 写入并生成csv文件
# coding: utf-8

import csv

csvfile = file('csv_test.csv', 'wb')
writer = csv.writer(csvfile)
writer.writerow(['姓名', '年龄', '电话'])

data = [('小河', '25', '1234567'),('小芳', '18', '789456')]
writer.writerows(data)

csvfile.close()

2.读取csv文件

csvfile = file('csv_test.csv', 'rb')
reader = csv.reader(csvfile)

for line in reader:
    print line

csvfile.close()

3.解决乱码

import codecs

with codecs.open('/home/yjy/workspace/savedfile/1.csv', 'wb','cp936') as csvfile:
                writer = csv.writer(csvfile)
                writer.writerow( header )
                  print header
                  writer.writerows( result_list )
                  print result_list

                 print csv_content
                  csvfile.close()

4.python Json转换

         4.1 string - > Json:
result = json.loads(item)
picInfo = result[3]
         4.2  Dict - > Json:
carinfo['salesperson'] = salesperson
carinfoJson = json.dumps(carinfo)    #carinfo 为字典


  • scrapy

1.scrapy爬虫安装:

  • Python 2.7

  • pip and setuptools Python packages. Nowadays pip requires and installs setuptools if not installed.

  • lxml. Most Linux distributions ships prepackaged versions of lxml. Otherwise refer to http://lxml.de/installation.html

sudo apt-get install libxml2-dev libxslt-dev python-dev
sudo pip install Scrapy
更新安装:sudo pip install Scrapy --upgrade

2.爬虫执行:

建爬虫工程:scrapy startproject pro_name
更改设置,在spider下建立文件夹,如dianzan.py,其配置如下
from scrapy.spider import BaseSpider

class Dianzan(BaseSpider):
    name = "dianzan"
    MAX_PAGE = 10
    start_urls = [
    'http://beijing.baixing.com/'
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        open(filename, 'wb').write(response.body)

开始爬虫:scrapy crawl dianzan

3.spidermonkey安装:

1、sudo apt-get install pkg-config
2.sudo apt-get install python2.7-dev 3、sudo apt-get install libnspr4-dev
4、sudo easy_install python-spidermonkey

4.scrapy的日志:
4.1
set.py中  :LOG_LEVEL = "DEBUG"
头文件中:
import logging
logger = logging.getLogger()
LOG_FILENAME="./log_test.txt"
logging.basicConfig(filename=LOG_FILENAME,level=logging.DEBUG)
打印信息:
logger.debug(string)

4.2
控制台:scrapy crawl ganji --logfile=view.log
自动产生文件
打印信息:self.debug(stri


你可能感兴趣的:(python & scrapy)