http://101.1.251.101:9100/api/0/projects/sentry/mail/stats/?stat=received
线下安装了sentry开始复现。《sentry安装与配置》:https://clevercode.blog.csdn.net/article/details/105880652 。
分析了nginx请求日志,发现sentry客户端请求的sentry服务的接口是: http://10.1.20.101:9100/api/4/store/ ,分析sentry请求的nginx日志,发现了http的状态码中有大量的403,查看源码403,对应的APIRateLimited产生的。但是源码抛出APIRateLimited发现有10处代码都可以产生。如何找到具体是哪一处代码抛出的这个APIRateLimited报错。
这时候想到抓包工具《流量录制与回放工具–GoReplay》:https://blog.csdn.net/CleverCode/article/details/101423570 。
对10.1.20.101的9100端口进行http抓取请求包和响应包。
/Data/apps/gor/gor --input-raw :9100 --output-file /tmp/sentry.gor --input-raw-track-response --http-allow-url /store/
分析sentry.gor日志。发现403的时候,响应包出现:{“error”:“Event dropped due to filter”}
2 eb2958ceb5974ace946df51e345cf0edaa8476bf 1588233398959704818 30501033
HTTP/1.0 403 FORBIDDEN
Content-Length: 39
Expires: Thu, 30 Apr 2020 07:56:38 GMT
X-Content-Type-Options: nosniff
Content-Language: en
X-Sentry-Error: Event dropped due to filter
Vary: Accept-Language, Cookie
Last-Modified: Thu, 30 Apr 2020 07:56:38 GMT
X-XSS-Protection: 1; mode=block
Cache-Control: max-age=0
X-Frame-Options: deny
Content-Type: application/json
{"error":"Event dropped due to filter"}
在源码中查:Event dropped due to filter。找到了位置:/Data/apps/ops4env/lib/python2.7/site-packages/sentry/web/api.py 找到 raise APIForbidden(‘Event dropped due to filter’)
vi /Data/apps/ops4env/lib/python2.7/site-packages/sentry/web/urls.py
url(r'^api/(?P[\w_-]+)/store/$' , api.StoreView.as_view(),
查看对应的视图
vi /Data/apps/ops4env/lib/python2.7/site-packages/sentry/web/api.py
class StoreView(APIView):
def post(self, request, **kwargs):
try:
data = request.body
except Exception as e:
logger.exception(e)
# We were unable to read the body.
# This would happen if a request were submitted
# as a multipart form for example, where reading
# body yields an Exception. There's also not a more
# sane exception to catch here. This will ultimately
# bubble up as an APIError.
data = None
response_or_event_id = self.process(request, data=data, **kwargs)
if isinstance(response_or_event_id, HttpResponse):
return response_or_event_id
return HttpResponse(json.dumps({
'id': response_or_event_id,
}), content_type='application/json')
def process(self, request, project, auth, helper, data, **kwargs):
#.......
if helper.should_filter(project, data, ip_address=remote_addr):
app.tsdb.incr_multi([
(app.tsdb.models.project_total_received, project.id),
(app.tsdb.models.project_total_blacklisted, project.id),
(app.tsdb.models.organization_total_received, project.organization_id),
(app.tsdb.models.organization_total_blacklisted, project.organization_id),
])
metrics.incr('events.blacklisted')
event_filtered.send_robust(
ip=remote_addr,
project=project,
sender=type(self),
)
raise APIForbidden('Event dropped due to filter')
#.......
return event_id
说明出现raise APIForbidden(‘Event dropped due to filter’) 是因为helper.should_filter(project, data, ip_address=remote_addr)条件为真。看代码的意思是命中了筛选规则。
在线下的环境中,修改sentry中的should_filter源码,记录一下日志信息。
vi /Data/apps/ops4env/lib/python2.7/site-packages/sentry/coreapi.py
def add_log(self,msg):
"""
"""
import sys
f = open("/tmp/sentry.log",'a+')
f.write(msg)
f.close()
def should_filter(self, project, data, ip_address=None):
# TODO(dcramer): read filters from options such as:
# - ignore errors from spiders/bots
# - ignore errors from legacy browsers
if ip_address and not is_valid_ip(ip_address, project):
return True
for filter_cls in filters.all():
filter_obj = filter_cls(project)
self.add_log("s1 should_filter:" + str(filter_obj) + "\n")
if filter_obj.is_enabled():
self.add_log("m1 is_enabled:" + str(filter_obj) + "\n")
if filter_obj.is_enabled() and filter_obj.test(data):
return True
return False
发现有4中筛选规则,但是启用的只有WebCrawlersFilter
s1 should_filter:
s1 should_filter:
m1 is_enabled:
s1 should_filter:
s1 should_filter:
查WebCrawlersFilter源码发现。default = True 。说明默认是开启爬虫过滤的。
vim /Data/apps/ops4env/lib/python2.7/site-packages/sentry/filters/web_crawlers.py
from __future__ import absolute_import
import re
from .base import Filter
# not all of these agents are guaranteed to execute JavaScript, but to avoid
# overhead of identifying which ones do, and which ones will over time we simply
# target all of the major ones
CRAWLERS = re.compile(r'|'.join((
# various Google services
r'AdsBot',
# Google Adsense
r'Mediapartners',
# Google+ and Google web search
r'Google',
# Bing search
r'BingBot',
# Baidu search
r'Baiduspider',
# Yahoo
r'Slurp',
# Sogou
r'Sogou',
# facebook
r'facebook',
# Alexa
r'ia_archiver',
# Generic bot
r'bot[\/\s\)\;]',
# Generic spider
r'spider[\/\s\)\;]',
)), re.I)
class WebCrawlersFilter(Filter):
id = 'web-crawlers'
name = 'Filter out known web crawlers'
description = 'Some crawlers may execute pages in incompatible ways which then cause errors that are unlikely to be seen by a normal user.'
default = True
def get_user_agent(self, data):
try:
for key, value in data['sentry.interfaces.Http']['headers']:
if key.lower() == 'user-agent':
return value
except LookupError:
return ''
def test(self, data):
# TODO(dcramer): we could also look at UA parser and use the 'Spider'
# device type
user_agent = self.get_user_agent(data)
if not user_agent:
return False
return bool(CRAWLERS.search(user_agent))
修改源码 default = False,关闭爬虫筛选。重启sentry,
# export SENTRY_CONF="/Data/apps/sentry"
# sentry run web
CleverCode是一名架构师,技术交流,咨询问题,请加CleverCode创建的qq群(架构师俱乐部):517133582。加群和腾讯,阿里,百度,新浪等公司的架构师交流。【架构师俱乐部】宗旨:帮助你成长为架构师!