解决屏蔽流氓蜘蛛抓取,如MJ12bot 、DotBot 、BLEXBot 、PetalBot 、DataForSeoBot

解决屏蔽流氓蜘蛛抓取,如MJ12bot 、DotBot 、BLEXBot 、PetalBot 、DataForSeoBot碰到这些流氓蜘蛛直接双屏蔽,不要相信他们会准守robots协议(不要太天真了)

1、robots文件屏蔽

User-agent: MJ12bot
Disallow: /
User-agent:DotBot
Disallow: /
User-agent:BLEXBot
Disallow: /
User-agent:PetalBot
Disallow: /
User-agent:DataForSeoBot
Disallow: /

2、NG等服务器规则屏蔽

if ($http_user_agent ~*  (MJ12bot|DotBot|BLEXBot|PetalBot|DataForSeoBot) )
{
      return 403;
}

3、然后来个合集:

if ($http_user_agent ~* (YandexBot|spbot|DnyzBot|Researchscan|semrushbot|yahoo|AhrefsBot|DotBot|Uptimebot|MJ12bot|MegaIndex.ru|ZoominfoBot|Mail.Ru|SeznamBot|BLEXBot|ExtLinksBot|aiHitBot|Barkrowler)){
 return 403;
}

 

解决屏蔽流氓蜘蛛抓取,如MJ12bot 、DotBot 、BLEXBot 、PetalBot 、DataForSeoBot_第1张图片

你可能感兴趣的:(爬虫,服务器)