Python使用MongoDB保存文件,几个细节需要注意


当单个文件的大小小于16M的时候用bison直接导入就可以

需要导入的库

import pymongo
import bson.binary
from cStringIO import StringIO
from gridfs import *

完整代码:

def lead_in_Mongodb(filename_lst, id_lst, url_lst):
    client = pymongo.MongoClient()
    db = client.Spyder
    collection = db['QuestMobile']
    fs = GridFS(db, collection="QuestMobile")
    for i in range(len(filename_lst)):
        with open(id_lst[i]+'.pdf', 'rb') as f:
            content = StringIO(f.read())
            collection.save(dict(
                content=bson.binary.Binary(content.getvalue()),
                filename=filename_lst[i],
                url_id = id_lst[i],
                url = url_lst[i]
            ))

但是当文件的大小超过16M是要使用GridFS机制:

from gridfs import *
完整代码
 
  
def insertFile(filename_lst, id_lst, url_lst):
    client = pymongo.MongoClient()
    db = client.Spyder
    fs = GridFS(db, collection="QuestMobile2")
    for i in range(len(id_lst)):
        with open (id_lst[i]+'.pdf', 'rb') as myimage:
            data=myimage.read()
            id = fs.put(data, filename=filename_lst[i], id=id_lst[i], url=url_lst[i])
            print id

你可能感兴趣的:(python爬虫,数据分析)