nginx+lua+redis实现后端反爬虫(未完成)

一、通过nginx判断user-agent实现反爬虫
进入到nginx安装目录下的conf目录,将如下代码保存为 agent_deny.conf
cd /usr/local/nginx/conf
vim agent_deny.conf


1
2
3
4
5
6
7
8
9
10
11
12
#禁止Scrapy等工具的抓取
if   ( $ http_user_agent   ~ *   ( Scrapy | Curl | HttpClient ) )   {
      return   403 ;
}
#禁止指定UA及UA为空的访问
if   ( $ http_user_agent   ~   "FeedDemon|JikeSpider|Indy Library|Alexa Toolbar|AskTbFXTV|AhrefsBot|CrawlDaddy|CoolpadWebkit|Java|Feedly|UniversalFeedParser|ApacheBench|Microsoft URL Control|Swiftbot|ZmEu|oBot|jaunty|Python-urllib|lightDeckReports Bot|YYSpider|DigExt|YisouSpider|HttpClient|MJ12bot|heritrix|EasouSpider|Ezooms|^$"   )   {
      return   403 ;             
}
#禁止非GET|HEAD|POST方式的抓取
if   ( $ request_method   ! ~   ^ ( GET | HEAD | POST ) $ )   {
     return   403 ;
}


然后,在网站相关配置中的  location / {  之后插入如下代码:
include  agent_deny .conf ;

保存后,执行如下命令,平滑重启nginx即可:
/ usr / local / nginx / sbin / nginx   - s   reload
from: http://support.huawei.com/ecommunity/bbs/10231865.html

二、通过lua+nginx+redis实现反爬虫
2.1.安装nginx、lua、redis库

2.1.1.直接下载各种库(我下载的地址是usr/local/)
git clone https://github.com/simpl/ngx_devel_kit.git
git clone https://github.com/chaoslawful/lua-nginx-module.git
git clone https://github.com/agentzh/redis2-nginx-module.git
git clone https://github.com/agentzh/
set-misc-nginx-module.git
git clone https://github.com/agentzh/echo-nginx-module.git
yum  -y install pcre pcre-dev*
from: http://www.tuicool.com/articles/6NbEbeV

2.1.2.安装Luajit库
  1. wget http://luajit.org/download/LuaJIT-2.0.0-beta9.tar.gz  
  2. tar zxvf LuaJIT-2.0.0-beta9.tar.gz  
  3. cd LuaJIT-2.0.0-beta9  
  4. make   
  5. sudo make install PREFIX=/usr/local/luajit
  6. sudo ln -sf luajit-2.0.0-beta9 /usr/local/bin/luajit

  1. -- luajit --  
  2. # tell nginx's build system where to find LuaJIT:  
  3. export LUAJIT_LIB=/path/to/luajit/lib  
  4. export LUAJIT_INC=/path/to/luajit/include/luajit-2.0

from: http://www.tuicool.com/articles/6NbEbeV

2.1.3.重新编译nginx
./configure \
--prefix=/usr/local/nginx \
--with-debug \
--with-ld-opt="-Wl,-rpath,$LUAJIT_LIB" \
--add-module=/usr/local/ngx_devel_kit \
--add-module=/usr/local/echo-nginx-module/ \
--add-module=/usr/local/lua-nginx-module/ \
--add-module=/usr/local/set-misc-nginx-module/ \
--add-module=/usr/local/redis2-nginx-module

make -j2

make install

2.2.安装lua-redis-parser
1
2
3
4
# git clone https://github.com/agentzh/lua-redis-parser.git
# export LUA_INCLUDE_DIR=/usr/include/lua5.1
# make CC=gcc
# make install CC=gcc
from: http://www.ttlsa.com/nginx/nginx-lua-redis/
编译时可能会提示没有lua.h
解决方法:
lua的库文件不存在 如果是centos 就yum intsall lua-devel
from: https://www.zhihu.com/question/20170104

然后修改nginx/conf/nginx.conf文件
1)在http加lua_package_path "/usr/local/lua-resty-redis/lib/?.lua;;";
让resty.redis能找到lua-resty-redis库

2)在server加lua_code_cache off;
作用:更新lua脚本后,只需要执行nginx -s reload,而不用重启nginx

2.3.安装luarocks
直接yum -y install luarocks

未完待续...

你可能感兴趣的:(爬虫)