例子为工作中实际用到的一段python脚本,目的为抓取某购物网站商品详细信息,记录脚本只为一个脚印:
1、访问分析该网站
a.需求,获取价格,颜色,大小(size),颜色对应的图片,名称,库存,货币单位,其中最难的是找到其中的对应关系,这里均以颜色作为键值
a.根据经验,网站中JOSN格式的商品详细信息一般要么在HTML文档中,要么就单独发送了一个Ajax请求保存在JS中.再不然就是网站做了 价格,颜色,size单一(一个颜色,一个价格,一个size)
b.根据分析,该网站JSON格式隐藏在Ajax请求中.例如: http://www.xxx.com/ajax/productDetails.jsp?productCode=23813&xxx=xxx..;根据测试简化,其中只有一个参数(prodCode)是最重要的.
2、分析Ajax返回的内容:
1 <script> 2 window.getJcrewNameSpaceLegacy('globalObj.jcrew.browse.fullscreen') 3 globalObj.jcrew.browse.fullscreen.products = globalObj.jcrew.browse.fullscreen.products || [{ 4 images: [] 5 }] 6 globalObj.jcrew.browse.fullscreen.products[0].productName = 'Factory heathered sweatshirt sweater' 7 </script> 8 <section id="description" class="description"> 9 <span class="item-num">item 09256</span> 10 <div id="BVRRSummaryContainer"></div> 11 <div id="variants"> 12 <div class="variant-wrapper"> 13 <div class="float-left"> 14 <input type="radio" name="variants" data-variant="09256" data-varianturl="https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp" data-index="" value="845524441838050" class="product-details-variants" checked /> 15 </div> 16 <div class="product-pricing"> 17 <span class="notranslate"> Regular</span> 18 <br /> 19 <div class="product-pricing-wrapper"> 20 <span class="text-was">valued at</span> 21 <span class=" full-price price-soldout notranslate"> $59.50</span> 22 <span class="selected-color"> 23 your price 24 <span class="selected-color-price notranslate">$29.50</span> 25 </span> 26 </div> 27 </div> 28 <div class="clear"></div> 29 </div> 30 <div class="variant-wrapper"> 31 <div class="float-left"> 32 <input type="radio" name="variants" data-variant="B7151" data-varianturl="https://factory.jcrew.com/mens_special_sizes/tall/sweaters/PRD~B7151/B7151.jsp" data-index="" value="845524441838050" class="product-details-variants" /> 33 </div> 34 <div class="product-pricing"> 35 <span class="notranslate"> Tall</span> 36 <br /> 37 <div class="product-pricing-wrapper"> 38 <span class="text-was">valued at</span> 39 <span class=" price-soldout notranslate"> $64.50</span> 40 <span class="selected-color"> 41 your price 42 <span class="selected-color-price notranslate">$32.00</span> 43 </span> 44 </div> 45 </div> 46 <div class="clear"></div> 47 </div> 48 </div> 49 </section> 50 <div class="color-title">Color: 51 <span class="color-name"> 52 hthr indigo 53 </span> 54 </div> 55 <div id="priceWrapper0" class="price-wrapper"> 56 <div class="product-detail-price sale-price first-item notranslate"> 57 $29.50 58 </div> 59 <section id="color1" class="color-row last-row"> 60 <div class="color-box " data-color="MF3369" data-productcode="09256" data-index=""> 61 <a id="MF3369"> 62 <img data-imgurl="https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_fs418$" src="https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369_sw?$pdp_sw20$" class="product-detail-images" data-productcode="09256" data-index="" /> 63 </a> 64 </div> 65 <script> 66 globalObj.jcrew.browse.fullscreen.products[0].images.push({ 67 type: 'color', 68 identifier: 'MF3369', 69 url: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$', 70 thumbUrl: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_tn75$' 71 }) 72 </script> 73 <div class="color-box " data-color="GY7314" data-productcode="09256" data-index=""> 74 <a id="GY7314"> 75 <img data-imgurl="https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_fs418$" src="https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314_sw?$pdp_sw20$" class="product-detail-images" data-productcode="09256" data-index="" /> 76 </a> 77 </div> 78 <script> 79 globalObj.jcrew.browse.fullscreen.products[0].images.push({ 80 type: 'color', 81 identifier: 'GY7314', 82 url: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$', 83 thumbUrl: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_tn75$' 84 }) 85 </script> 86 <div class="color-box selected" data-color="BL8362" data-productcode="09256" data-index=""> 87 <a id="BL8362"> 88 <img data-imgurl="https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_fs418$" src="https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362_sw?$pdp_sw20$" class="product-detail-images" data-productcode="09256" data-index="" /> 89 </a> 90 </div> 91 <script> 92 globalObj.jcrew.browse.fullscreen.products[0].images.push({ 93 type: 'color', 94 identifier: 'BL8362', 95 url: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$', 96 thumbUrl: 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_tn75$' 97 }) 98 </script> 99 <div class="clear"></div> 100 </section> 101 <hr class="last-row"> 102 </div> 103 <section id="sizes" class="sizes"> 104 <header> 105 <h2>Size:</h2> 106 <span><a class="product-details-sizechart" data-sizechart="0,0" href="javascript:void(0);">size charts</a></span> 107 <div class="clear"></div> 108 </header> 109 <div class="size-box notranslate" data-size="X-SMALL" data-productcode="09256" data-index=""> 110 <a id="X-SMALL"> 111 <span>X-SMALL</span> 112 </a> 113 </div> 114 <div class="size-box notranslate" data-size="SMALL" data-productcode="09256" data-index=""> 115 <a id="SMALL"> 116 <span>SMALL</span> 117 </a> 118 </div> 119 <div class="size-box notranslate" data-size="MEDIUM" data-productcode="09256" data-index=""> 120 <a id="MEDIUM"> 121 <span>MEDIUM</span> 122 </a> 123 </div> 124 <div class="size-box notranslate" data-size="LARGE" data-productcode="09256" data-index=""> 125 <a id="LARGE"> 126 <span>LARGE</span> 127 </a> 128 </div> 129 <div class="size-box notranslate" data-size="X-LARGE" data-productcode="09256" data-index=""> 130 <a id="X-LARGE"> 131 <span>X-LARGE</span> 132 </a> 133 </div> 134 <div class="size-box notranslate" data-size="XX-LARGE" data-productcode="09256" data-index=""> 135 <a id="XX-LARGE"> 136 <span>XX-LARGE</span> 137 </a> 138 </div> 139 </section> 140 <div class="clear"></div> 141 <hr> 142 <section id="quantity" class="quantity"> 143 <h2 class="quantity-header">Quantity:</h2> 144 <select id="selectBox" data-index="" class="select-box"> 145 <option value="1">1</option> 146 <option value="2">2</option> 147 <option value="3">3</option> 148 <option value="4">4</option> 149 <option value="5">5</option> 150 <option value="6">6</option> 151 <option value="7">7</option> 152 <option value="8">8</option> 153 <option value="9">9</option> 154 </select> 155 <div class="clear"></div> 156 </section> 157 <section id="messaging" class="messaging"> 158 <!-- Not showing backordered and final sale message if sku is out of stock --> 159 </section> 160 <section id="actions" class="actions"> 161 </section> 162 <script> 163 var productDetailsJSON = '{"sizeset":[{"colors":[{"skuLongId":1689949373559852,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373559917,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":4},{"skuLongId":1689949373559858,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373253180,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373253179,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":6,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1}],"size":"X-SMALL"},{"colors":[{"skuLongId":1689949373559850,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373246447,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373559857,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373246443,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559916,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1}],"size":"SMALL"},{"colors":[{"skuLongId":1689949373559919,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373559915,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559849,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373246441,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373246446,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1}],"size":"MEDIUM"},{"colors":[{"skuLongId":1689949373559854,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559918,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4},{"skuLongId":1689949373246440,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559848,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373246445,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1}],"size":"LARGE"},{"colors":[{"skuLongId":1689949373246448,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373246444,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559851,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373559855,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559920,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4}],"size":"X-LARGE"},{"colors":[{"skuLongId":1689949373559853,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr cove","backordered":false,"preordered":false,"colorlabel":"BL8032","skuInventoryStatus":4},{"skuLongId":1689949373253823,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"colorlabel":"GY7314","skuInventoryStatus":1},{"skuLongId":1689949373253822,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"colorlabel":"BL8362","skuInventoryStatus":1},{"skuLongId":1689949373559856,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":5,"outofstock":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"colorlabel":"MF3369","skuInventoryStatus":1},{"skuLongId":1689949373559921,"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":0,"outofstock":true,"colordisplayname":"hthr prospect green","backordered":false,"preordered":false,"colorlabel":"MF3372","skuInventoryStatus":4}],"size":"XX-LARGE"}],"colorset":[{"sizes":[{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"MEDIUM","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":6,"outofstock":false,"sizelabel":"X-SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"XX-LARGE","backordered":false,"preordered":false}],"color":"BL8362","fullydomqty":false,"colordisplayname":"hthr indigo","backordered":false,"preordered":false,"finalsale":false},{"sizes":[{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"MEDIUM","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-SMALL","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"XX-LARGE","backordered":false,"preordered":false}],"color":"GY7314","fullydomqty":false,"colordisplayname":"hthr ebony","backordered":false,"preordered":false,"finalsale":false},{"sizes":[{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"X-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":5,"outofstock":false,"sizelabel":"XX-LARGE","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"MEDIUM","backordered":false,"preordered":false},{"podate":"","onlyFewLeft":false,"fullydomqty":false,"inventory":9,"outofstock":false,"sizelabel":"SMALL","backordered":false,"preordered":false}],"color":"MF3369","fullydomqty":false,"colordisplayname":"hthr cabernet","backordered":false,"preordered":false,"finalsale":false}]}'; 164 var imgSelectedColor = 'https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_fs418$'; 165 var wishlistSize = '0'; 166 var wishlistIsDefaultList = 'false'; 167 var editFlagAjax = 'false'; 168 var editWishListFlagAjax = 'false'; 169 var isSaleProduct = 'false'; 170 var editDomRtlItem = 'false'; 171 if(editDomRtlItem === 'true') { 172 $('#monogram' + '').hide(); 173 } 174 </script>
在最后的<script>标签中productDetailsJSON 变量就是我们需要的东西了,格式化后就得到了color,size,inventory(库存)之间的关系.
源码在这儿:
1 #-*- coding: utf-8 -*- 2 __author__ = '' 3 4 from pyquery import PyQuery; 5 import sys 6 import json 7 import spiderBase 8 import requests as req; 9 import re 10 req.packages.urllib3.disable_warnings() #不显示https的警告 11 class drag(spiderBase.spiderBase): 12 13 def __init__(self,url): 14 self.headers = { 15 'Referer': 'https://www.jcrew.com/', 16 'Connection': 'Keep-Alive', 17 'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3', 18 'Accept': 'text/html, application/xhtml+xml, */*', 19 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.132 Safari/537.36' 20 } 21 self.session = req.session(); 22 back_data = self.session.get(url,headers = self.headers,verify=False) 23 self.data = PyQuery(back_data.text) 24 self.status_code = back_data.status_code; 25 self.retutn_url = back_data.history; 26 url_list = url.split('/') 27 self.url_protocol = url_list[0] 28 self.domain = url_list[0]+'//'+url_list[2] 29 self.url = url 30 31 def list(self): 32 data = self.data; 33 self.urlList = []; 34 page_num = data(".product-image-wrap").items(); 35 for page_t in page_num: 36 name = page_t.parent("div").parent("div div.plus_detail_wrap a span:eq(0)").text(); 37 price = page_t.parent("div").parent("div div.plus_detail_wrap a span:eq(1)").text(); 38 self.urlList.append(dict(url=page_t.attr("href"),img=page_t("img").attr("src"),name=name,price=price)); 39 return json.dumps(self.urlList); 40 41 def page(self,gid=0,channel_id=0): 42 pyhtml = self.data; 43 #以下为cc修改后 44 # print pyhtml('script').text() 45 self.descr = pyhtml('div#prodDtlBody li').text() 46 #匹配产品code 47 prodCode = re.search(r'data-productcode="(\d*?)"',str(pyhtml),re.DOTALL).groups()[0] 48 self.unit = '$' if re.search(r'current_currency=\'(.*?)\'',pyhtml('script').text(),re.DOTALL).groups()[0] == 'USD' else re.search(r'current_currency=\'(.*?)\'',pyhtml('script').text(),re.DOTALL).groups()[0] 49 #发起产品详情请求 50 prodDetailsTxt = self.session.get(self.domain+'/browse2/ajax/product_details_ajax.jsp?prodCode=%s' % prodCode,headers = self.headers,verify=False).text 51 self.name = re.search(r'productName = \'(.*?)\'',prodDetailsTxt,re.DOTALL).groups()[0] 52 self.brand = 'J.CREW' 53 #胖瘦:[{'old_price': 'xxx', 'price': 'xxx', 'name': 'xxx'}, {'old_price': 'xxx', 'price': 'xxx', 'name': 'xxx'}] 54 variants = [ {'name':PyQuery(variant)('span.notranslate').eq(0).text(),'price':PyQuery(variant)('span.selected-color-price').text()[1:],'old_price':PyQuery(variant)('span.price-soldout').text()[1:] if PyQuery(variant)('span.price-soldout').text() != '' else PyQuery(variant)('span.selected-color-price').text()[1:] } for variant in PyQuery(prodDetailsTxt)('#variants').children('div')] 55 #颜色ID和图片:(官网默认只有一张图片) 56 coloridImg = dict([(PyQuery(box)('a').attr('id'),PyQuery(box)('img').attr('data-imgurl').replace('$pdp_fs418$','$pdp_enlarge$')) for box in PyQuery(prodDetailsTxt)('div.color-box')]) 57 #匹配库存颜色关系 58 productDetailsJSON = json.loads(re.search(r'productDetailsJSON = \'(.*?)\';',prodDetailsTxt,re.DOTALL).groups()[0]) 59 map_colorSizeInventory = {} 60 for color in productDetailsJSON['colorset']: 61 map_colorSizeInventory[color['color']] = {'colorName':color['colordisplayname'].replace(' ','-'),'sizes':[{'size':sizeInv['sizelabel'].lower(),'inventory':sizeInv['inventory']} for sizeInv in color['sizes']]} 62 #映射color名称 63 imgs_tmp = dict([(map_colorSizeInventory[colorid]['colorName'],[img]) for colorid,img in coloridImg.items()]) 64 size_tmp = dict([(cs['colorName'],cs['sizes']) for cs in map_colorSizeInventory.values()]) 65 colors_tmp = imgs_tmp.keys() 66 #映射price 67 self.price = dict([(colorName + '_'+ variant['name'],variant['price']) for variant in variants for colorName in colors_tmp ]) 68 self.old_price = dict([(colorName + '_'+ variant['name'],variant['old_price']) for variant in variants for colorName in colors_tmp ]) 69 self.colors = [colorName + '_'+ variant['name'] for variant in variants for colorName in colors_tmp ] 70 71 # self.price = {} 72 # self.old_price = {} 73 # self.colors = [] 74 # for variant in variants: 75 # for colorName in colors_tmp: 76 # self.price[colorName + '_'+ variant['name']] = variant['price'] 77 # self.old_price[colorName + '_'+ variant['name']] = variant['old_price'] 78 # self.colors.append(colorName + '_'+ variant['name']) 79 #在size和imgs中加入variant 80 self.imgs = dict([(colorName+'_'+variantName,img) for variantName in [ variant['name'] for variant in variants] for colorName,img in imgs_tmp.items()]) 81 self.size = dict([(colorName+'_'+variantName,sizes) for variantName in [ variant['name'] for variant in variants] for colorName,sizes in size_tmp.items()]) 82 # self.imgs = {} 83 # self.size = {} 84 # for variantName in [ variant['name'] for variant in variants] : 85 # for colorName,img in imgs_tmp.items(): 86 # self.imgs[colorName+'_'+variantName] = img 87 # for colorName,sizes in size_tmp.items(): 88 # self.size[colorName+'_'+variantName] = sizes 89 self.channel_id = channel_id 90 self.designer = '' 91 self.img = '' 92 self.returns = '' 93 print variants 94 print self.imgs 95 print self.size 96 print self.colors 97 print self.unit 98 return self.returnData(); #在父类中实现,可以注释掉 99 100 if __name__ == '__main__': 101 if len(sys.argv) == 3: 102 action = sys.argv[1] 103 url = sys.argv[2] 104 else: 105 url = 'https://factory.jcrew.com/mens_clothing/wear_to_work.jsp'; 106 action = "list"; 107 url = 'https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo'; 108 action = "page"; 109 obj = drag(url) 110 exe = "obj.%s()" % (action) 111 print eval(exe)
去掉注释,差不多不到100行就分析完了一个网站,python的列表式真的很强大,^_^
经过父类的json格式化处理,然后返回的字符格式化后如下:(一共六个产品)
1 [ 2 { 3 "designer":"", 4 "name":"Factory heathered sweatshirt sweater", 5 "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.", 6 "price":"29.50", 7 "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$", 8 "old_price":"59.50", 9 "returns":"", 10 "channel_id":0, 11 "colors":"hthr-cabernet_Regular", 12 "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo", 13 "imgs":[ 14 "https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$" 15 ], 16 "brand":"J.CREW", 17 "unit":"$", 18 "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "x-large"}, {"inventory": 5, "size": "xx-large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}]" 19 }, 20 { 21 "designer":"", 22 "name":"Factory heathered sweatshirt sweater", 23 "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.", 24 "price":"29.50", 25 "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$", 26 "old_price":"59.50", 27 "returns":"", 28 "channel_id":0, 29 "colors":"hthr-ebony_Regular", 30 "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo", 31 "imgs":[ 32 "https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$" 33 ], 34 "brand":"J.CREW", 35 "unit":"$", 36 "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 9, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]" 37 }, 38 { 39 "designer":"", 40 "name":"Factory heathered sweatshirt sweater", 41 "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.", 42 "price":"29.50", 43 "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$", 44 "old_price":"59.50", 45 "returns":"", 46 "channel_id":0, 47 "colors":"hthr-indigo_Regular", 48 "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo", 49 "imgs":[ 50 "https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$" 51 ], 52 "brand":"J.CREW", 53 "unit":"$", 54 "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 6, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]" 55 }, 56 { 57 "designer":"", 58 "name":"Factory heathered sweatshirt sweater", 59 "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.", 60 "price":"32.00", 61 "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$", 62 "old_price":"64.50", 63 "returns":"", 64 "channel_id":0, 65 "colors":"hthr-cabernet_Tall", 66 "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo", 67 "imgs":[ 68 "https://i.s-jcrewfactory.com/is/image/jcrew/09256_MF3369?$pdp_enlarge$" 69 ], 70 "brand":"J.CREW", 71 "unit":"$", 72 "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "x-large"}, {"inventory": 5, "size": "xx-large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}]" 73 }, 74 { 75 "designer":"", 76 "name":"Factory heathered sweatshirt sweater", 77 "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.", 78 "price":"32.00", 79 "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$", 80 "old_price":"64.50", 81 "returns":"", 82 "channel_id":0, 83 "colors":"hthr-ebony_Tall", 84 "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo", 85 "imgs":[ 86 "https://i.s-jcrewfactory.com/is/image/jcrew/09256_GY7314?$pdp_enlarge$" 87 ], 88 "brand":"J.CREW", 89 "unit":"$", 90 "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 9, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]" 91 }, 92 { 93 "designer":"", 94 "name":"Factory heathered sweatshirt sweater", 95 "descr":"Cotton. Crafted in two-tone yarns for a heathered effect. Machine wash. Import.", 96 "price":"32.00", 97 "img":"https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$", 98 "old_price":"64.50", 99 "returns":"", 100 "channel_id":0, 101 "colors":"hthr-indigo_Tall", 102 "link":"https://factory.jcrew.com/mens-clothing/sweaters/cotton/PRD~09256/09256.jsp?color_name=hthr-indigo", 103 "imgs":[ 104 "https://i.s-jcrewfactory.com/is/image/jcrew/09256_BL8362?$pdp_enlarge$" 105 ], 106 "brand":"J.CREW", 107 "unit":"$", 108 "size":"[{"inventory": 9, "size": "large"}, {"inventory": 9, "size": "medium"}, {"inventory": 9, "size": "small"}, {"inventory": 9, "size": "x-large"}, {"inventory": 6, "size": "x-small"}, {"inventory": 9, "size": "xx-large"}]" 109 } 110 ]
2015年12月16日14:07:59