yk 坤帝

最网最全python框架--scrapy（体系学习，爬取全站校花图片），学完显著提高爬虫能力（附源代码），突破各种反爬

个人公众号 yk 坤帝
后台回复scrapy 获取全部源代码

1.boss直聘爬虫（bossPro）

2.豆瓣爬虫（db）

3.fbsPro

4.百度爬虫（firstblood）

5.校花全站图片爬虫（imgsPro）

6.代理ip配置爬虫（middlePro）

7.全站视频爬虫（moviePro）

8.糗图百科视频，图片，视频下载（qiubaiPro）

9.最新问政-阳光热线问政平台–相关政策全站爬虫（sunPro）

10.网易新闻全站信息爬虫（wangyiPro）

11.中国校花网，全站校花图片爬取（xiaohuaPro）

12.数据库示例

1.boss直聘爬虫（bossPro）

boss.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月23日星期二
# 寒假开学开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy
from bossPro.items import BossproItem


class BossSpider(scrapy.Spider):
    name = 'boss'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['https://www.zhipin.com/c101010100/?query=python&ka=sel-city-101010100']

    url = 'https://www.zhipin.com/job_detail/bd1e60815ee5d3741nV92Nu1GVZY.html?ka=search_list_jname_1_blank&lid=8iXPDpH593w.search.%d'
    page_num = 2

    def parse_detail(self,response):
        item = response.meta['item']

        job_desc = response.xpath('//*[@id="main"]/div[3]/div/div[2]/div[2]/div[1]/div//text()').extract()
        job_desc = ''.join(job_desc)
        item['job_desc'] = job_desc
        print(job_desc)

        yield item

    def parse(self, response):

        li_list = response.xpath('//*[@id="main"]/div/div[2]/ul/li')
        print(li_list)
        for li in li_list:
            item = BossproItem()

            job_name = li.xpath('.//span[@class="job-name"]a/text()').extract_first()
            item['job_name'] = job_name
            print(job_name)
            detail_url ='https://www.zhipin.com/' + li.xpath('.//span[@class="job-name"]a/@href').extract_first()

            yield scrapy.Request(detail_url,callback=self.parse_detail,meta={
     'item':item})

        if self.page_num <=3:
            new_url = format(self.url%self.page_num)   
            self.page_num += 1

            yield scrapy.Request(new_url,callback = self.parse)

items.py

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html

import scrapy


class BossproItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    job_name = scrapy.Field()
    job_desc = scrapy.Field()
    #pass

middlewares.py

# -*- coding: utf-8 -*-

# Define here the models for your spider middleware
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html

from scrapy import signals


class BossproSpiderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_spider_input(self, response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.

        # Should return None or raise an exception.
        return None

    def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, dict or Item objects.
        for i in result:
            yield i

    def process_spider_exception(self, response, exception, spider):
        # Called when a spider or process_spider_input() method
        # (from other spider middleware) raises an exception.

        # Should return either None or an iterable of Request, dict
        # or Item objects.
        pass

    def process_start_requests(self, start_requests, spider):
        # Called with the start requests of the spider, and works
        # similarly to the process_spider_output() method, except
        # that it doesn’t have a response associated.

        # Must return only requests (not items).
        for r in start_requests:
            yield r

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)


class BossproDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        return None

    def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
        return response

    def process_exception(self, request, exception, spider):
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.

        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)

piplelines.py

# -*- coding: utf-8 -*-

# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html


class BossproPipeline:
    def process_item(self, item, spider):
        print(item)
        return item

settings.py

# -*- coding: utf-8 -*-

# Scrapy settings for bossPro project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'bossPro'

SPIDER_MODULES = ['bossPro.spiders']
NEWSPIDER_MODULE = 'bossPro.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'

# Obey robots.txt rules
ROBOTSTXT_OBEY = False
LOG_LEVEL = 'ERROR'

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
     
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
# SPIDER_MIDDLEWARES = {
     
#    'bossPro.middlewares.BossproSpiderMiddleware': 543,
# }

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
     
#    'bossPro.middlewares.BossproDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
     
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
ITEM_PIPELINES = {
     
   'bossPro.pipelines.BossproPipeline': 300,
}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

2.豆瓣爬虫（db）

db.py

# 大二
# 2021年3月4日
# 寒假开学时间3月7日
# -*- coding: utf-8 -*-
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码


import scrapy


class DbSpider(scrapy.Spider):
    name = 'db'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['https://www.douban.com/group/topic/157797102/']

    def parse(self, response):
        li_list = response.xpath('//*[@id="comments"]/li')
        # print(li_list)
        for li in li_list:
            # print(li)
            #/html/body/div[3]/div[1]/div/div[1]/ul[2]/li[1]/div[2]/p
            comments = li.xpath('./div[2]/p/text()').extract_first()
            print(comments)

3.fbsPro

# -*- coding: utf-8 -*-
# 大二
# 2021年2月24日星期三
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

from scrapy_redis.spiders import RedisCrawlSpider
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from fbsPro.items import FbsproItem


class FbsSpider(RedisCrawlSpider):
    name = 'fbs'
    #allowed_domains = ['www.xxx.com']
    #start_urls = ['http://www.xxx.com/']
    redis_key = 'sun'

    rules = (
        Rule(LinkExtractor(allow=r'Items/'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        item = {
     }
        #item['domain_id'] = response.xpath('//input[@id="sid"]/@value').get()
        #item['name'] = response.xpath('//div[@id="name"]').get()
        #item['description'] = response.xpath('//div[@id="description"]').get()
        item = FbsproItem()

        yield item

4.百度爬虫（firstblood）
first.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月19日星期五
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy


class FirstSpider(scrapy.Spider):
    name = 'first'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['https://www.baidu.com/','https://www.sogou.com']


    def parse(self, response):
        print(response)

5.校花全站图片爬虫（imgsPro）

img.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月23日星期二
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy
from imgsPro.items import ImgsproItem


class ImgSpider(scrapy.Spider):
    name = 'img'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['https://sc.chinaz.com/tupian/']

    def parse(self, response):
        div_list = response.xpath('//div[@id="container"]/div')
        for div in div_list:
            src ='https:' + div.xpath('./div/a/img/@src2').extract_first()
            print(src)

            item = ImgsproItem()
            item['src'] = src

            yield item

6.代理ip配置爬虫（middlePro）

middle.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月23日
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy


class MiddleSpider(scrapy.Spider):
    name = 'middle'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['http://www.baidu.com/s?wd=ip']

    def parse(self, response):
        page_text = response.text 

        with open('ip.html','w',encoding= 'utf-8') as fp:
            fp.write(page_text)

7.全站视频爬虫（moviePro）

movie.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月24日星期三
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from moviePro.items import MovieproItem
from redis import Redis


class MovieSpider(CrawlSpider):
    name = 'movie'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['http://www.4567kan.com/frim/index1.html']

    rules = (
        Rule(LinkExtractor(allow=r'/frim/index1-\d+\.html'), callback='parse_item', follow=True),
    )
    conn = Redis(host = '127.0.0.1', port = 6379)

    def parse_item(self, response):
        li_list = response.xpath('/html/body/div[1]/div/div/div/div[2]/ul/li')
        for li in li_list:

            detail_url = 'http://www.4567kan.com/' + li.xpath('./div/a/@href').extract_first()

            ex = self.conn.sadd('urls',detail_url)

            if ex == 1:

                print('该url没有被爬取过，可以进行数据的爬取')
                yield scrapy.Request(url = detail_url, callback= self.parst_detail)

            else:

                print('数据还没有更新，暂无新数据可爬！')

    def parst_detail(self,response):

        item = MovieproItem()
        item['name'] = response.xpath('/html/body/div[1]/div/div/div/div[2]/h1/text()').extract_first()
        item['desc'] = response.xpath('/html/body/div[1]/div/div/div/div[2]/p[5]/span[2]//text()').extract()
        item['desc'] = ''.join(item['desc'])
        yield item

8.糗图百科视频，图片，视频下载（qiubaiPro）
qiubai.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月19日星期五
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy
from qiubaiPro.items import QiubaiproItem

class QiubaiSpider(scrapy.Spider):
    name = 'qiubai'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['https://www.qiushibaike.com/text/']
    

#     def parse(self, response):
#         div_list = response.xpath('//div[@id="content"]/div[1]/div[2]/div')
#         all_data = []
#         for div in div_list:
#             auther = div.xpath('./div[1]/a[2]/h2/text()')[0].extract()
#             #auther = div.xpath('./div[1]/a[2]/h2/text()').extract_first()
#             content = div.xpath('./a[1]/div/span//text()').extract()
#             content = ','.join(content)
#             print(auther,content)
#             dic = {
     
#                 auther:'auther',
#                 content:'auther'
#             }
#             all_data.append(dic)

#         return all_data

# 大二
# 2021年2月21日星期日
# 寒假开学时间3月7日
    def parse(self, response):
        div_list = response.xpath('//div[@id="content"]/div[1]/div[2]/div')
        
        #all_data = []
        for div in div_list:
            auther = div.xpath('./div[1]/a[2]/h2/text()')[0].extract()
            #匿名用户
            #auther = div.xpath('./div[1]/a[2]/h2/text() | ./div[1]/span/h2/text()')[0].extract()
            #auther = div.xpath('./div[1]/a[2]/h2/text()').extract_first()
            content = div.xpath('./a[1]/div/span//text()').extract()
            content = ','.join(content)

            item = QiubaiproItem()
            item['auther'] = auther
            item['content'] = content

            yield item

9.最新问政-阳光热线问政平台–相关政策全站爬虫（sunPro）

sun.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月24日星期三
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class SunSpider(CrawlSpider):
    name = 'sun'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['http://wz.sun0769.com/political/index/politicsNewest?id=1&page=1']

    link = LinkExtractor(allow=r'id=1&page=\d+')
    rules = (
        Rule(link, callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        print(response)

10.网易新闻全站信息爬虫（wangyiPro）

wangyi.py

# -*- coding: utf-8 -*-
# 大二
# 2021年2月23日星期二
# 寒假开学时间3月7号
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import scrapy

from wangyiPro.items import WangyiproItem
from selenium import webdriver
class WangyiSpider(scrapy.Spider):
    name = 'wangyi'
    #allowed_domains = ['www.xxx.com']
    start_urls = ['https://news.163.com/']

    models_urls = []

    def __init__(self):
        self.bro = webdriver.Chrome()

    def parse(self, response):
        li_list = response.xpath('//*[@id="js_festival_wrap"]/div[3]/div[2]/div[2]/div[2]/div/ul/li')
        alist = [3,4,6,7,8]
        for index in alist:
            model_url = li_list[index].xpath('./a/@href').extract_first()
            self.models_urls.append(model_url)

        for url in self.models_urls:
            yield scrapy.Request(url,callback= self.parse_model)

    def parse_model(self,response):
        
        div_list = response.xpath('/html/body/div[1]/div[3]/div[4]/div[1]/div/div/ul/li/div/div')

        for div in div_list:
            title = div.xpath('./div/div[1]/h3/a/text()').extract_first()
            new_detail_url = div.xpath('./div/div[1]/h3/a/@href').extract_first()

            item = WangyiproItem()
            item['title'] = title

            yield scrapy.Request(url = new_detail_url,callback=self.parse_detail)

    def parse_detail(self,response):
        content = response.xpath('//*[@id="content"]//text()').extract()
        content = ''.join(content)

        item = response.mata['item']

        item['content'] = content

        yield item

11.中国校花网，全站校花图片爬取（xiaohuaPro）

xiaohua.py

-- coding: utf-8 --

大二

2021年2月22日星期一

寒假开学时间3月7日

个人公众号 yk 坤帝

后台回复scrapy 获取全部源代码

import scrapy

class XiaohuaSpider(scrapy.Spider):
name = ‘xiaohua’
#allowed_domains = [‘www.xxx.com’]
start_urls = [‘http://www.xiaohuar.com/daxue/’]
url = ‘http://www.xiaohuar.com/daxue/index_%d.html’
page_num = 2

def parse(self, response):
    li_list = response.xpath('//*[@id="wrap"]/div/div/div')
    for li in li_list:
        img_name = li.xpath('./div/div[1]/a/text()').extract_first()
        print(img_name)
        #li.xpath('./div/a/img/@src')

    if self.page_num <= 11:
        new_url = format(self.url%self.page_num)
        self.page_num += 1

        yield scrapy.Request(new_url,callback=self.parse)

middlewares.py

# -*- coding: utf-8 -*-

# Define here the models for your spider middleware
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/spider-middleware.html

from scrapy import signals


class XiaohuaproSpiderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the spider middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_spider_input(self, response, spider):
        # Called for each response that goes through the spider
        # middleware and into the spider.

        # Should return None or raise an exception.
        return None

    def process_spider_output(self, response, result, spider):
        # Called with the results returned from the Spider, after
        # it has processed the response.

        # Must return an iterable of Request, dict or Item objects.
        for i in result:
            yield i

    def process_spider_exception(self, response, exception, spider):
        # Called when a spider or process_spider_input() method
        # (from other spider middleware) raises an exception.

        # Should return either None or an iterable of Request, dict
        # or Item objects.
        pass

    def process_start_requests(self, start_requests, spider):
        # Called with the start requests of the spider, and works
        # similarly to the process_spider_output() method, except
        # that it doesn’t have a response associated.

        # Must return only requests (not items).
        for r in start_requests:
            yield r

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)


class XiaohuaproDownloaderMiddleware:
    # Not all methods need to be defined. If a method is not defined,
    # scrapy acts as if the downloader middleware does not modify the
    # passed objects.

    @classmethod
    def from_crawler(cls, crawler):
        # This method is used by Scrapy to create your spiders.
        s = cls()
        crawler.signals.connect(s.spider_opened, signal=signals.spider_opened)
        return s

    def process_request(self, request, spider):
        # Called for each request that goes through the downloader
        # middleware.

        # Must either:
        # - return None: continue processing this request
        # - or return a Response object
        # - or return a Request object
        # - or raise IgnoreRequest: process_exception() methods of
        #   installed downloader middleware will be called
        return None

    def process_response(self, request, response, spider):
        # Called with the response returned from the downloader.

        # Must either;
        # - return a Response object
        # - return a Request object
        # - or raise IgnoreRequest
        return response

    def process_exception(self, request, exception, spider):
        # Called when a download handler or a process_request()
        # (from other downloader middleware) raises an exception.

        # Must either:
        # - return None: continue processing this exception
        # - return a Response object: stops process_exception() chain
        # - return a Request object: stops process_exception() chain
        pass

    def spider_opened(self, spider):
        spider.logger.info('Spider opened: %s' % spider.name)

settings.py

# -*- coding: utf-8 -*-

# Scrapy settings for xiaohuaPro project
#
# For simplicity, this file contains only settings considered important or
# commonly used. You can find more settings consulting the documentation:
#
#     https://docs.scrapy.org/en/latest/topics/settings.html
#     https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#     https://docs.scrapy.org/en/latest/topics/spider-middleware.html

BOT_NAME = 'xiaohuaPro'

SPIDER_MODULES = ['xiaohuaPro.spiders']
NEWSPIDER_MODULE = 'xiaohuaPro.spiders'


# Crawl responsibly by identifying yourself (and your website) on the user-agent
USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36'

# Obey robots.txt rules
LOG_LEVEL = 'ERROR'
ROBOTSTXT_OBEY = False

# Configure maximum concurrent requests performed by Scrapy (default: 16)
#CONCURRENT_REQUESTS = 32

# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
#DOWNLOAD_DELAY = 3
# The download delay setting will honor only one of:
#CONCURRENT_REQUESTS_PER_DOMAIN = 16
#CONCURRENT_REQUESTS_PER_IP = 16

# Disable cookies (enabled by default)
#COOKIES_ENABLED = False

# Disable Telnet Console (enabled by default)
#TELNETCONSOLE_ENABLED = False

# Override the default request headers:
#DEFAULT_REQUEST_HEADERS = {
     
#   'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
#   'Accept-Language': 'en',
#}

# Enable or disable spider middlewares
# See https://docs.scrapy.org/en/latest/topics/spider-middleware.html
#SPIDER_MIDDLEWARES = {
     
#    'xiaohuaPro.middlewares.XiaohuaproSpiderMiddleware': 543,
#}

# Enable or disable downloader middlewares
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html
#DOWNLOADER_MIDDLEWARES = {
     
#    'xiaohuaPro.middlewares.XiaohuaproDownloaderMiddleware': 543,
#}

# Enable or disable extensions
# See https://docs.scrapy.org/en/latest/topics/extensions.html
#EXTENSIONS = {
     
#    'scrapy.extensions.telnet.TelnetConsole': None,
#}

# Configure item pipelines
# See https://docs.scrapy.org/en/latest/topics/item-pipeline.html
#ITEM_PIPELINES = {
     
#    'xiaohuaPro.pipelines.XiaohuaproPipeline': 300,
#}

# Enable and configure the AutoThrottle extension (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/autothrottle.html
#AUTOTHROTTLE_ENABLED = True
# The initial download delay
#AUTOTHROTTLE_START_DELAY = 5
# The maximum download delay to be set in case of high latencies
#AUTOTHROTTLE_MAX_DELAY = 60
# The average number of requests Scrapy should be sending in parallel to
# each remote server
#AUTOTHROTTLE_TARGET_CONCURRENCY = 1.0
# Enable showing throttling stats for every response received:
#AUTOTHROTTLE_DEBUG = False

# Enable and configure HTTP caching (disabled by default)
# See https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#httpcache-middleware-settings
#HTTPCACHE_ENABLED = True
#HTTPCACHE_EXPIRATION_SECS = 0
#HTTPCACHE_DIR = 'httpcache'
#HTTPCACHE_IGNORE_HTTP_CODES = []
#HTTPCACHE_STORAGE = 'scrapy.extensions.httpcache.FilesystemCacheStorage'

12.数据库示例

# 大二
# 2021年2月21日星期日
# 寒假开学时间3月7日
# 个人公众号 yk 坤帝
# 后台回复scrapy 获取全部源代码

import pymysql

# 链接数据库
# 参数1：mysql服务器所在主机ip
# 参数2：用户名
# 参数3：密码
# 参数4：要链接的数据库名
# db = pymysql.connect("localhost", "root", "200829", "wj" )
db = pymysql.connect("192.168.31.19", "root", "200829", "wj" )

# 创建一个cursor对象
cursor = db.cursor()

sql = "select version()"

# 执行sql语句
cursor.execute(sql)

# 获取返回的信息
data = cursor.fetchone()
print(data)

# 断开
cursor.close()
db.close()

个人公众号 yk 坤帝
后台回复scrapy 获取全部源代码

你可能感兴趣的:(xpath,爬虫,scrapy,全站校花图片爬取)

【视频】OpenCV：色彩空间转换、灰度转伪彩郭老二视频 opencv 人工智能
1、颜色空间转换使用OpenCV的函数cv::applyColorMap可以将灰度或者正常的RGB格式图片，转换成其它伪彩色，代码很简单：1）使用cv::imread加载图片；2）使用std::vectormatrices暂存转换后的所有图像；3）使用cv::applyColorMap转换图片颜色；4）使用cv::vconcat拼接所有的图片；5）使用cv::imwrite保存图片；#includ
Springboot上传图片无法回显而且浏览器页面显示404无法找到文件的路径。使用了WebMvcConfigurer接口重写了addResourceHandlers方法。 ~听风~ spring boot java spring
@ConfigurationpublicclassMyConfigurationimplementsWebMvcConfigurer{@OverridepublicvoidaddResourceHandlers(ResourceHandlerRegistryregistry){registry.addResourceHandler("/setmealpic/**").addResourceLoca
利用Python爬虫获取Shopee（虾皮）商品详情：实战指南小爬虫程序猿 python 爬虫开发语言
在跨境电商领域，Shopee（虾皮）作为东南亚及台湾地区领先的电商平台，拥有海量的商品信息。无论是进行市场调研、数据分析，还是寻找热门商品，获取Shopee商品详情都是一项极具价值的任务。然而，手动浏览和整理这些信息显然是低效且容易出错的。幸运的是，通过编写Python爬虫程序，我们可以高效地完成这一任务。本文将详细介绍如何利用Python爬虫获取Shopee商品详情，并提供完整的代码示例。一、为
VisionPro实战之传感器识别视觉王小 VisionPro实战 visionpro 机器视觉 c#
目录1.案例要求2.实现思路1.先进行图片格式转换，不然可能格式不匹配2.进行模板匹配，仔细观察之后发现可以从左侧凹陷的地方入手，再进行定位3.找出四条线段4.进行距离的测量5.编写脚本或者使用CogCreateGraphicLabelTool工具输出数据3.具体操作1.我们先创建一个CogImageConvertTool工具，进行图片转码操作。2.创建一个模板匹配工具CogPMAlignTool
WEBGL 2D游戏引擎研发系列第二章 <显示图片> 小鬼编程游戏 web开发 webgl html5 html5游戏开发扩展游戏 2d html5 前端游戏开发
WEBGL2D游戏引擎研发系列第二章~\(≥▽≤)/~HTML5游戏开发者社区（群号：326492427）转载请注明出处:http://html5gamedev.org/目录HTML52D游戏引擎研发系列第一章HTML52D游戏引擎研发系列第二章HTML52D游戏引擎研发系列第三章HTML52D游戏引擎研发系列第四章HTML52D游戏引擎研发系列第五章HTML52D游戏引擎研发系列第六章HTML5
前端实例：轮播图效果 2301_81535770 前端
利用HTML、CSS和JavaScript实现轮播图效果。一、轮播图原理：通过给窗口设置position属性和overflow属性，使得超出窗口范围的部分被隐藏，表面可见范围只包含窗口，但实际上其内部空间很大；调整胶卷相对于窗口的位置，使得整个胶卷向左移动；调用JS中的定时器，实现轮播效果。流程图如下：二、实现自动切换效果1、HTML搭建基础框架分为图片展示窗口和上下页切换按键两部分>2、CSS设
【星闪开发连载】WS63E模块的雷达功能浅析神一样的老师星闪技术 OpenHarmony 物联网
目录引言功能简介程序分析操作步骤简单测试结语引言WS63E星闪模块有个特色功能就是雷达运动感知，检测物体是否有运动，作用距离不超过6米。hi3863芯片本身不带雷达功能，是模块提供的相关功能。海思还有个WS63星闪模块，没有雷达感知能力。功能简介从开发板的图片上可以看到，右下角有个安装雷达天线的地方，使用使用1代IPEX接口。润和的套件里面没有带天线，从我的测试看没有天线，其实雷达功能是不正常的。
Python 爬虫实战：舞台剧与演出信息获取西攻城狮北 python 爬虫开发语言
作为一名对文化艺术活动和数据获取感兴趣的内容创作者，我决定利用Python爬虫技术抓取舞台剧与演出信息。这对于文艺爱好者、文化活动组织者以及相关研究人员来说，是一个极具价值的探索。一、项目背景舞台剧和各类演出活动丰富了人们的精神文化生活。许多城市都有专业的演出场馆，如国家大剧院、上海大剧院等，它们会定期发布演出信息。通过爬虫技术，我们可以自动化地获取这些演出信息，方便用户查询和分析。二、技术选型在
AsyncHttpClient使用说明书有梦想的攻城狮 netty学习专栏 Java asynchttpclient 异步处理 netty
[[toc]]AsyncHttpClient（AHC）是一个高性能、异步的HTTP客户端库，广泛用于Java和Scala应用中，特别适合处理高并发、非阻塞的HTTP请求。它基于Netty或Java原生的异步HTTP客户端实现，支持HTTP/1.1和HTTP/2协议，适用于微服务、API调用、爬虫等场景。1.核心特性特性说明异步非阻塞基于事件驱动模型，避免线程阻塞，支持高并发（如每秒数千请求）。HT
如何将微信接受的文件保存到IPhone的 Files App中？ MingDong523 iphone
如何将微信接受的文件保存到IPhone的FilesApp中？在iPhone上，将微信接收的文件保存到系统自带的**FilesApp（文件应用）**需要通过手动操作，以下是分步骤的详细方法：方法一：通过微信直接保存到FilesApp适用于：文档、图片、视频等文件打开微信文件在微信聊天或群组中，找到接收到的文件（如PDF、Word、Excel、压缩包等），长按文件，选择“用其他应用打开”（或“其他应用
计算机网络&性能优化相关内容详解 GISer_Jinger javascript 前端
1.优化页面性能：根据搜索结果，优化可以从资源加载、渲染优化、缓存策略等方面入手。网页1提到合并文件、压缩图片、使用CDN和HTTP/2。网页2和3强调了关键资源划分、减少HTTP请求、代码拆分和预加载。我需要综合这些点，分块回答。2.滚动性能优化及虚拟滚动核心：用户提到虚拟滚动是关键。网页6、8、9、10详细介绍了虚拟滚动的原理，即仅渲染可视区域元素，减少DOM操作。需要总结这些内容，并指出核心
Python爬虫笔记一（来自MOOC） Requests库入门小灰不停前进 #Python python pycharm 爬虫
Python爬虫笔记一通用代码框架：importrequestsdefgetHTMLText(url):try:r=requests.get(url,timeput=30)r.raise_for_status()#如果状态不是200，引发HTTPError异常r.encoding=r.apparemt_encodingreturnr.textexcept:return"产生异常"if__name_
Python基于深度学习的动物图片识别技术的研究与实现 Java老徐 Python 毕业设计 python 深度学习开发语言深度学习的动物图片识别技术 Python动物图片识别技术
博主介绍：✌程序员徐师兄、7年大厂程序员经历。全网粉丝12w+、csdn博客专家、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和毕业项目实战✌文末获取源码联系精彩专栏推荐订阅不然下次找不到哟2022-2024年最全的计算机软件毕业设计选题大全：1000个热门选题推荐✅Java项目精品实战案例《100套》Java微信小程序项目实战《100套》感兴趣的可以先收藏起来，还有大家
Python 常用内建模块-HTMLParser 赔罪 Python 系统学习 python 开发语言
目录HTMLParser小结练习HTMLParser如果我们要编写一个搜索引擎，第一步是用爬虫把目标网站的页面抓下来，第二步就是解析该HTML页面，看看里面的内容到底是新闻、图片还是视频。假设第一步已经完成了，第二步应该如何解析HTML呢？HTML本质上是XML的子集，但是HTML的语法没有XML那么严格，所以不能用标准的DOM或SAX来解析HTML。好在Python提供了HTMLParser来非
HTML5+CSS实现图片3D旋转效果，附音乐宁醉小白 html5 前端 html
利用程序呈现图片，可以俘获一众女生的心，增加音乐可以实现图片变化的同时也带上了想要得到效果，如此一程序实乃众人之喜。先看看程序呈现的效果，还是特别吸引人的。先在网上爬取想要呈现的美女照片，存放在文件夹img-one，与程序路径一致。图片像素需进行调整，同一面图片可以使用同一个图片，保持图片像素一致的同时也增加了立体感。第二张02.jpg和2.jpg可以倒着放，这样在程序实现的时候，可以和其他方向的
HTML实现酷炫3D相册算法与编程之美编程之美 css html js css3 javascript
欢迎点击「算法与编程之美」↑关注我们！本文首发于微信公众号："算法与编程之美"，欢迎关注，及时了解更多此系列文章。欢迎加入团队圈子！与作者面对面！直接点击！目录1、创建文件目录2、调背景色3、制作3D相册4、将图片散开，围成一圈。5、绘制透明底盘6、最终效果1、创建文件目录在Hbuilder在新建一个目录，创建css和js文件。图12、调背景色在style块里面给整个页面渲染成黑色调。*{padd
python 爬取某乎某选全部内容路笑笑
在发布了python爬取知乎盐选文章内容后，没想到居然这么快就要更新新的内容了。在下午思考第一篇python爬取知乎盐选文章内容的时候，其实就把自动爬取目录内的其他内容的方法想出来了，但是本来没想这么快更新的，哈哈。不过思来想去还是发出来吧，毕竟要不哪天就忘了。fromDecryptLoginimportloginfrombs4importBeautifulSoupimportreimportba
（含import）两行代码，将ppt的每一页幻灯片保存为图片。（如果你没装office，只装了WPS也可以，只不过更麻烦一些）几道之旅人工智能智能体及数字员工 powerpoint wps
文章目录第一步:安装包第二步：写代码，运行第三步：如果你是Office，现在已经搞定了。但我是WPS，会报错：第四步：直接去包里改代码第五步：保存对包中代码的修改，重新运行咱最开头的代码第六步：成功了第一步:安装包pipinstallpython-office第二步：写代码，运行#安装库：pipinstallpython-officeimportoffice#单页转图片office.ppt.ppt
探秘知乎数据抓取神器 —— zhihu-spider 丁慧湘Gwynne
探秘知乎数据抓取神器——zhihu-spider项目地址:https://gitcode.com/gh_mirrors/zh/zhihu-spider在知识的海洋中畅游，每一份数据都可能成为智慧的火花。今天，我们来一起探索一个专为知乎设计的数据爬虫工具——zhihu-spider，它是由计算机科学研究生MorganZhang精心打造的开源宝藏。项目介绍zhihu-spider，正如其名，是一个针对
Python 爬虫实战：从知乎盐选专栏，爬取优质内容付费数据西攻城狮北 python 爬虫开发语言实战案例知乎
目录一、前言二、准备篇2.1确定目标2.2工具与库2.3法律与道德声明三、实战篇3.1分析知乎盐选专栏页面3.2模拟登录3.3获取文章列表3.4爬取更多文章数据3.5数据存储四、分析篇4.1数据清洗4.2热门文章分析4.3收藏数分析4.4评论数分析五、总结与展望六、注意事项一、前言知乎盐选专栏作为知乎平台上的优质内容付费板块，汇聚了众多创作者的高质量文章。了解这些文章的付费数据，如点赞数、收藏数、
springBoot 和springCloud 版本对应关系 m0_74824894 面试学习路线阿里巴巴 spring boot spring cloud 后端
请求下面链接：拿到的json数据，格式化https://start.spring.io/actuator/info[这里是图片001]https://start.spring.io/actuator/info云原生脚手架CloudNativeAppInitializer(aliyun.com)[这里是图片002]https://start.aliyun.com/idea阿里云脚手架插件：Aliba
python爬虫Redis数据库 Æther_9 Python爬虫零基础入门数据库 python 爬虫
Redis数据库Redis简介Redis是完全开源免费的，遵守BSD协议，是一个高性能的key-value数据库。Redis与其他key-value缓存产品有以下三个特点：Redis支持数据的持久化，可以将内存中的数据保存在磁盘中，重启的时候可以再次加载进行使用。Redis不仅仅支持简单的key-value类型的数据，同时还提供list，set，zset，hash等数据结构的存储。redis：半持
SSE进阶详解 GISer_Jinger 面试前端 javascript
嗯，用户的问题涉及到SSE在处理富媒体文件、早期聊天应用选择SSE的原因，以及如何控制流式渲染频率。我需要根据提供的搜索结果来解答这些问题。首先，关于SSE传输富媒体文件的问题。根据搜索结果，SSE是基于文本的，比如网页2提到数据格式是文本或JSON。但用户问的是如何处理图片、视频等二进制数据。可能需要用Base64编码，这在网页5和6中提到了二进制数据的处理，但SSE本身不支持，所以需要转换。另
高德地图API详解芯作者 DD：日记云计算人工智能机器学习
高德地图API是一款基于Web的服务，为开发者提供了丰富的地理数据服务和功能。以下是对高德地图API的详细介绍：一、主要功能地图显示：支持全球范围各地的地图显示，包括街道、建筑物、自然地理等，用户可以将高德地图以图片形式嵌入自己的网页或应用中。地理/逆地理编码：提供结构化地址与经纬度之间的相互转化的能力。地理编码是将具体的地址转换为经纬度坐标的过程，逆地理编码则是通过经纬度获取地址信息。路线规划：
【Rust基础】使用Rust和WASM开发的图片压缩工具勇敢牛牛_ rust wasm 开发语言图片压缩
图片压缩工具使用rust+wasm开发了一个简易版的图片压缩工具，支持JPG、PNG、GIF、WEBP等图片格式，不限制大小，无需上传图片，完全在浏览器中执行。工具地址：https://eai.coderbox.cn/image-compression实现方式JPEG对原图进行量化，通过指定质量参数，控制压缩质量。PNG同样的进行量化，并重新将rgba颜色（如果原图是rgba）调整为8位索引颜色，
JavaScript反爬技术解析与应对不做超级小白 web逆向知识碎片 web前端 javascript 开发语言 ecmascript
JavaScript反爬技术解析与应对前言在当今Web爬虫与数据抓取的生态环境中，网站运营方日益关注数据安全与隐私保护，因此逐步采用多种反爬技术来限制非授权访问。本文从JavaScript角度出发，深入剖析主流反爬策略的技术原理，并探讨相应的绕过方案，以期为研究者和开发者提供系统性的理解与实践指导。1.JavaScript反爬技术概述1.1右键禁用与开发者工具防护部分网站采用JavaScript拦
Scrapy 入门教程 zru_9602 爬虫 scrapy
Scrapy入门教程Scrapy是一个用于爬取网站数据的Python框架，功能强大且易于扩展。本文将介绍Scrapy的基本概念、安装方法、使用示例，并展示如何编写一个基本的爬虫。1.什么是Scrapy？Scrapy是一个开源的、用于爬取网站数据的框架，主要特点包括：高效、异步的爬取机制强大的XPath和CSS选择器解析能力内置中间件，支持代理、去重等功能易于扩展，适用于各种爬虫需求2.安装Scra
Python调用WPS进行文档转换PDF及PDF转图片 IT孔乙己 python 开发语言后端
这里是利用WPS进行转换，要先安装WPS。安装依赖pipinstallpypiwin32代码#!/usr/bin/python#-*-coding:UTF-8-*-importosimportwin32com.clientdefConvertByWps(sourceFile,targetFile):ifnotos.path.exists(sourceFile):print(sourceFile+"
【网络】数据流（Data Workflow）Routes（路由）、Controllers（控制器）、Models（模型）和 Middleware（中间件）一袋米扛几楼98 网络工程/安全中间件
在图片中，数据流（DataWorkflow）描述了应用程序中数据的流动过程，涉及Routes（路由）、Controllers（控制器）、Models（模型）和Middleware（中间件）。作为初学者，理解这些组件及其联系是掌握Web应用程序开发的关键。以下是对每个技术点的详细解释，以及它们如何相互关联的分析。1.Routes（路由）定义：路由定义了应用程序的URL端点（Endpoints）以及服
数据库原理实验报告：Powerdesigner建模E-R模型并转换表不吃~香菜各类实验报告汇总需要私数据库实验报告 Powerdesigner E-R模型建模
注：此实验并不完整，仅供参考，如需完整版请私我留言一、实验目的：二、实验工具：三、实验要求：四、实验过程：图文并茂，每一步都包含详细图片，总共11页word！往期回顾：计算机接口实验报告：8254定时/计数器应用实验-CSDN博客计算机接口实验报告：D/A转换实验-CSDN博客计算机接口实验报告：LED显示实验-CSDN博客数据库原理实验报告：Powerdesigner建模E-R模型并转换表一、实
解线性方程组 qiuwanchi
package gaodai.matrix; import java.util.ArrayList; import java.util.List; import java.util.Scanner; public class Test { public static void main(String[] args) { Scanner scanner = new Sc
在mysql内部存储代码 annan211 性能 mysql 存储过程触发器
在mysql内部存储代码在mysql内部存储代码，既有优点也有缺点，而且有人倡导有人反对。先看优点： 1 她在服务器内部执行，离数据最近，另外在服务器上执行还可以节省带宽和网络延迟。 2 这是一种代码重用。可以方便的统一业务规则，保证某些行为的一致性，所以也可以提供一定的安全性。 3 可以简化代码的维护和版本更新。 4 可以帮助提升安全，比如提供更细
Android使用Asynchronous Http Client完成登录保存cookie的问题 hotsunshine android
Asynchronous Http Client是android中非常好的异步请求工具除了异步之外还有很多封装比如json的处理，cookie的处理引用 Persistent Cookie Storage with PersistentCookieStore This library also includes a PersistentCookieStore whi
java面试题 Array_06 java 面试
java面试题第一，谈谈final, finally, finalize的区别。 final-修饰符（关键字）如果一个类被声明为final，意味着它不能再派生出新的子类，不能作为父类被继承。因此一个类不能既被声明为 abstract的，又被声明为final的。将变量或方法声明为final，可以保证它们在使用中不被改变。被声明为final的变量必须在声明时给定初值，而在以后的引用中只能
网站加速 oloz 网站加速
前序:本人菜鸟，此文研究总结来源于互联网上的资料，大牛请勿喷！本人虚心学习，多指教. 1、减小网页体积的大小，尽量采用div+css模式，尽量避免复杂的页面结构，能简约就简约。 2、采用Gzip对网页进行压缩； GZIP最早由Jean-loup Gailly和Mark Adler创建，用于UNⅨ系统的文件压缩。我们在Linux中经常会用到后缀为.gz
正确书写单例模式随意而生 java 设计模式单例
　　单例模式算是设计模式中最容易理解，也是最容易手写代码的模式了吧。但是其中的坑却不少，所以也常作为面试题来考。本文主要对几种单例写法的整理，并分析其优缺点。很多都是一些老生常谈的问题，但如果你不知道如何创建一个线程安全的单例，不知道什么是双检锁，那这篇文章可能会帮助到你。　　懒汉式，线程不安全　　当被问到要实现一个单例模式时，很多人的第一反应是写出如下的代码，包括教科书上也是这样
单例模式香水浓 java
懒汉调用getInstance方法时实例化 public class Singleton { private static Singleton instance; private Singleton() {} public static synchronized Singleton getInstance() { if(null == ins
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" AdyZhang apache http server
安装Apache问题：系统找不到指定的文件 No installed service named "Apache2" 每次到这一步都很小心防它的端口冲突问题，结果，特意留出来的80端口就是不能用，烦。解决方法确保几处： 1、停止IIS启动 2、把端口80改成其它（譬如90，800，，，什么数字都好） 3、防火墙(关掉试试) 在运行处输入 cmd 回车，转到apa
如何在android 文件选择器中选择多个图片或者视频？ aijuans android
我的android app有这样的需求，在进行照片和视频上传的时候，需要一次性的从照片/视频库选择多条进行上传但是android原生态的sdk中，只能一个一个的进行选择和上传。我想知道是否有其他的android上传库可以解决这个问题，提供一个多选的功能，可以使checkbox之类的，一次选择多个处理方法官方的图片选择器(但是不支持所有版本的androi，只支持API Level
mysql中查询生日提醒的日期相关的sql baalwolf mysql
SELECT sysid,user_name,birthday,listid,userhead_50,CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')),CURDATE(), dayofyear( CONCAT(YEAR(CURDATE()),DATE_FORMAT(birthday,'-%m-%d')))-dayofyear(
MongoDB索引文件破坏后导致查询错误的问题 BigBird2012 mongodb
问题描述： MongoDB在非正常情况下关闭时，可能会导致索引文件破坏，造成数据在更新时没有反映到索引上。解决方案：使用脚本，重建MongoDB所有表的索引。 var names = db.getCollectionNames(); for( var i in names ){ var name = names[i]; print(name);
Javascript Promise bijian1013 JavaScript Promise
Parse JavaScript SDK现在提供了支持大多数异步方法的兼容jquery的Promises模式，那么这意味着什么呢，读完下文你就了解了。一.认识Promises “Promises”代表着在javascript程序里下一个伟大的范式，但是理解他们为什么如此伟大不是件简
[Zookeeper学习笔记九]Zookeeper源代码分析之Zookeeper构造过程 bit1129 zookeeper
Zookeeper重载了几个构造函数，其中构造者可以提供参数最多，可定制性最多的构造函数是 public ZooKeeper(String connectString, int sessionTimeout, Watcher watcher, long sessionId, byte[] sessionPasswd, boolea
【Java命令三】jstack bit1129 jstack
jstack是用于获得当前运行的Java程序所有的线程的运行情况(thread dump），不同于jmap用于获得memory dump [hadoop@hadoop sbin]$ jstack Usage: jstack [-l] <pid> (to connect to running process) jstack -F
jboss 5.1启停脚本　动静分离部署 ronin47
以前启动jboss，往各种xml配置文件，现只要运行一句脚本即可。start nohup sh /**/run.sh -c servicename -b ip -g clustername -u broatcast jboss.messaging.ServerPeerID=int -Djboss.service.binding.set=p
UI之如何打磨设计能力? brotherlamp UI ui教程 ui自学 ui资料 ui视频
在越来越拥挤的初创企业世界里，视觉设计的重要性往往可以与杀手级用户体验比肩。在许多情况下，尤其对于 Web 初创企业而言，这两者都是不可或缺的。前不久我们在《右脑革命：别学编程了，学艺术吧》中也曾发出过重视设计的呼吁。如何才能提高初创企业的设计能力呢?以下是 9 位创始人的体会。 1.找到自己的方式如果你是设计师，要想提高技能可以去设计博客和展示好设计的网站如D-lists或
三色旗算法 bylijinnan java 算法
import java.util.Arrays; /** 问题：假设有一条绳子，上面有红、白、蓝三种颜色的旗子，起初绳子上的旗子颜色并没有顺序，您希望将之分类，并排列为蓝、白、红的顺序，要如何移动次数才会最少，注意您只能在绳子上进行这个动作，而且一次只能调换两个旗子。网上的解法大多类似：在一条绳子上移动，在程式中也就意味只能使用一个阵列，而不使用其它的阵列来
警告:No configuration found for the specified action: \'s chiangfai configuration
1.index.jsp页面form标签未指定namespace属性。  <%@taglib prefix="s" uri="/struts-tags"%> ... <s:form action="submit" method="post"&g
redis -- hash_max_zipmap_entries设置过大有问题 chenchao051 redis hash
使用redis时为了使用hash追求更高的内存使用率，我们一般都用hash结构，并且有时候会把hash_max_zipmap_entries这个值设置的很大，很多资料也推荐设置到1000，默认设置为了512，但是这里有个坑 #define ZIPMAP_BIGLEN 254 #define ZIPMAP_END 255 /* Return th
select into outfile access deny问题 daizj mysql txt 导出数据到文件
本文转自：http://hatemysql.com/2010/06/29/select-into-outfile-access-deny%E9%97%AE%E9%A2%98/ 为应用建立了rnd的帐号，专门为他们查询线上数据库用的，当然，只有他们上了生产网络以后才能连上数据库，安全方面我们还是很注意的，呵呵。授权的语句如下： grant select on armory.* to rn
phpexcel导出excel表简单入门示例 dcj3sjt126com PHP Excel phpexcel
<?php error_reporting(E_ALL); ini_set('display_errors', TRUE); ini_set('display_startup_errors', TRUE); if (PHP_SAPI == 'cli') die('This example should only be run from a Web Brows
美国电影超短200句 dcj3sjt126com 电影
1. I see．我明白了。2. I quit! 我不干了!3. Let go! 放手!4. Me too．我也是。5. My god! 天哪!6. No way! 不行!7. Come on．来吧(赶快)8. Hold on．等一等。9. I agree。我同意。10. Not bad．还不错。11. Not yet．还没。12. See you．再见。13. Shut up!
Java访问远程服务 dyy_gusi httpclient webservice get post
随着webService的崛起，我们开始中会越来越多的使用到访问远程webService服务。当然对于不同的webService框架一般都有自己的client包供使用，但是如果使用webService框架自己的client包，那么必然需要在自己的代码中引入它的包，如果同时调运了多个不同框架的webService，那么就需要同时引入多个不同的clien
Maven的settings.xml配置 geeksun settings.xml
settings.xml是Maven的配置文件，下面解释一下其中的配置含义： settings.xml存在于两个地方： 1.安装的地方：$M2_HOME/conf/settings.xml 2.用户的目录：${user.home}/.m2/settings.xml 前者又被叫做全局配置，后者被称为用户配置。如果两者都存在，它们的内容将被合并，并且用户范围的settings.xml优先。
ubuntu的init与系统服务设置 hongtoushizi ubuntu
转载自： http://iysm.net/?p=178 init Init是位于/sbin/init的一个程序，它是在linux下，在系统启动过程中，初始化所有的设备驱动程序和数据结构等之后，由内核启动的一个用户级程序，并由此init程序进而完成系统的启动过程。 ubuntu与传统的linux略有不同，使用upstart完成系统的启动，但表面上仍维持init程序的形式。运行
跟我学Nginx+Lua开发目录贴 jinnianshilongnian nginx lua
使用Nginx+Lua开发近一年的时间，学习和实践了一些Nginx+Lua开发的架构，为了让更多人使用Nginx+Lua架构开发，利用春节期间总结了一份基本的学习教程，希望对大家有用。也欢迎谈探讨学习一些经验。目录第一章安装Nginx+Lua开发环境第二章 Nginx+Lua开发入门第三章 Redis/SSDB+Twemproxy安装与使用第四章 L
php位运算符注意事项 home198979 位运算 PHP &
$a = $b = $c = 0; $a & $b = 1; $b | $c = 1 问a,b,c最终为多少? 当看到这题时，我犯了一个低级错误，误以为位运算符会改变变量的值。所以得出结果是1 1 0 但是位运算符是不会改变变量的值的，例如： $a=1;$b=2; $a&$b; 这样a,b的值不会有任何改变
Linux shell数组建立和使用技巧 pda158 linux
1.数组定义　　[chengmo@centos5 ~]$ a=(1 2 3 4 5) 　　[chengmo@centos5 ~]$ echo $a 　　1 　　一对括号表示是数组，数组元素用“空格”符号分割开。　　 2.数组读取与赋值　　得到长度：　　[chengmo@centos5 ~]$ echo ${#a[@]} 　　5 　　用${#数组名[@或
hotspot源码(JDK7) ol_beta java HotSpot jvm
源码结构图，方便理解： ├─agent Serviceab
Oracle基本事务和ForAll执行批量DML练习 vipbooks oracle sql
基本事务的使用：从账户一的余额中转100到账户二的余额中去，如果账户二不存在或账户一中的余额不足100则整笔交易回滚 select * from account; -- 创建一张账户表 create table account( -- 账户ID id number(3) not null, -- 账户名称 nam