参考:Offline Usage
1、数据下载说明
Rally 运行时,需要从外网下载数据:
- 从 github 压测场景的配置文件,即使有网络,但是下载功能基本也是不可用,改为手动下载
- 从 aws s3 下载版本数据的压缩包,因 aws s3 不再支持 curl下载,需改为手动下载
github 下载数据失败
标准输出提示
[WARNING] No Internet connection detected. Automatic download of track data sets etc. is disabled.
改为手动下载即可:
git clone git@...
或
git clone https://...
或
下载 zip 包再解压到指定目录
推荐优先使用 git 协议下载,https 需要认证
aws s3 下载样本数据失败
以 geopoint 这个样本数据为例
标准输出提示
[ERROR] Cannot race. Error in track preparator (('Cannot find /home/apps/.rally/benchmarks/data/geopoint/documents.json.bz2\. Please disable offline mode and retry again.', None))
打开 DEBUG 日志,发现如下错误日志:
2019-07-09 10:29:02,78 -not-actor-/PID:1800 esrally.racecontrol ERROR A benchmark failure has occurred
2019-07-09 10:29:02,79 -not-actor-/PID:1800 esrally.racecontrol INFO Telling benchmark actor to exit.
2019-07-09 10:29:00,659 ActorAddr-(T|:38646)/PID:2134 esrally.track.loader INFO Downloading data from [http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopoint/documents.json.bz2] (482 MB) to [/home/apps/.rally/benchmarks/data/geopoint/documents.json.bz2].
2019-07-09 10:29:02,64 ActorAddr-(T|:38646)/PID:2134 esrally.actor ERROR Error in track preparator
Traceback (most recent call last):
File "/apps/svr/python-3.5.2/lib/python3.5/site-packages/esrally/actor.py", line 84, in guard
return f(self, msg, sender)
File "/apps/svr/python-3.5.2/lib/python3.5/site-packages/esrally/driver/driver.py", line 307, in receiveMsg_PrepareTrack
track.prepare_track(msg.track, cfg)
File "/apps/svr/python-3.5.2/lib/python3.5/site-packages/esrally/track/loader.py", line 286, in prepare_track
prep.prepare_document_set(document_set, data_root[0])
File "/apps/svr/python-3.5.2/lib/python3.5/site-packages/esrally/track/loader.py", line 424, in prepare_document_set
self.download(document_set.base_url, target_path, expected_size, msg)
File "/apps/svr/python-3.5.2/lib/python3.5/site-packages/esrally/track/loader.py", line 345, in download
net.download(data_url, target_path, size_in_bytes, progress_indicator=progress)
File "/apps/svr/python-3.5.2/lib/python3.5/site-packages/esrally/utils/net.py", line 156, in download
(local_path, download_size, expected_size_in_bytes))
esrally.exceptions.DataError: ('Download of [/home/apps/.rally/benchmarks/data/geopoint/documents.json.bz2] is corrupt. Downloaded [2548] bytes but [505295401] bytes are expected. Please retry.', None)
意思是 使用 curl 下载样本数据文件 documents.json.bz2 失败:
# 482M
http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopoint/documents.json.bz2
因为这个链接做了跳转,curl 得到的结果是一个 2548 字节 html 页面。这时,需要改为使用浏览器下载,然后再用 rz 命令上传到数据目录:
~/.rally/benchmarks/data/geopoint/
另外一个文件 documents-2.json.bz2,也会遇到同样问题,使用同样方法解决即可:
# 252M
http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopoint/documents-2.json.bz2
2、手动下载资源
2.1、下载压测场景配置(tracks)
mkdir -p ~/.rally/benchmarks
cd ~/.rally/benchmarks
sudo update-ca-trust
git clone https://github.com/elastic/rally-tracks.git
or
git clone [email protected]:elastic/rally-tracks.git (需要设置 publickey)
查看 tracks 项目提供的现成压测场景:
esrally list tracks
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
[WARNING] No Internet connection detected. Automatic download of track data sets etc. is disabled.
Available tracks:
Name Description Documents Compressed Size Uncompressed Size Default Challenge All Challenges
------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------- ----------------- ------------------- ----------------------- ---------------------------------------------------------------------------------------------------------------------------
eventdata This benchmark indexes HTTP access logs generated based sample logs from the elastic.co website using the generator available in https://github.com/elastic/rally-eventdata-track 20,000,000 755.1 MB 15.3 GB append-no-conflicts append-no-conflicts
geonames POIs from Geonames 11,396,505 252.4 MB 3.3 GB append-no-conflicts append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-fast-with-conflicts
geopoint Point coordinates from PlanetOSM 60,844,404 481.9 MB 2.3 GB append-no-conflicts append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
geopointshape Point coordinates from PlanetOSM indexed as geoshapes 60,844,404 470.5 MB 2.6 GB append-no-conflicts append-no-conflicts,append-no-conflicts-index-only,append-fast-with-conflicts
geoshape Shapes from PlanetOSM 60,523,283 13.4 GB 45.4 GB append-no-conflicts append-no-conflicts
http_logs HTTP server log data 247,249,096 1.2 GB 31.1 GB append-no-conflicts append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-index-only-with-ingest-pipeline,update
metricbeat Metricbeat data 1,079,600 87.6 MB 1.2 GB append-no-conflicts append-no-conflicts
nested StackOverflow Q&A stored as nested docs 11,203,029 663.1 MB 3.4 GB nested-search-challenge nested-search-challenge,index-only
noaa Global daily weather measurements from NOAA 33,659,481 947.3 MB 9.0 GB append-no-conflicts append-no-conflicts,append-no-conflicts-index-only
nyc_taxis Taxi rides in New York in 2015 165,346,692 4.5 GB 74.3 GB append-no-conflicts append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts-index-only,update,append-ml
percolator Percolator benchmark based on AOL queries 2,000,000 102.7 kB 104.9 MB append-no-conflicts append-no-conflicts
pmc Full text benchmark with academic papers from PMC 574,199 5.5 GB 21.7 GB append-no-conflicts append-no-conflicts,append-no-conflicts-index-only,append-sorted-no-conflicts,append-fast-with-conflicts
so Indexing benchmark using up to questions and answers from StackOverflow 36,062,278 8.9 GB 33.1 GB append-no-conflicts append-no-conflicts
-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------
跑 list 命令时,rally 自动做了一个 copy 动作
cd ~/.rally/benchmarks; mkdir tracks; cp -r rally-tracks tracks/default
所以 rally-tracks 目录可以删掉了
rm -rf rally-tracks
务必从目录 rally-tracks 挑选一个 track 来理解所有文件的作用,这样就能弄清整个压测流程了
插入说明一下压测的工作,无非就是以下几个步骤:
- 指定/创建目标 ES 集群
- 创建索引、mapping
- 导入样本数据
- 进行读写操作
- 汇报压测结果
按照这个逻辑,就可以很好地理解一个 track 的配置了。
2.2、手动下载样本数据(data)
从 list 命令可以知道,不同的压测场景,样本数据的体积不一样。可以根据需求,下载需要的数据。这里以 geopoint 为例。
# 把 python3 和 git1.9 加入 PATH
export PATH=/apps/svr/python-3.5.2/bin:$PATH
export PATH=/apps/svr/git/bin:/apps/svr/git/libexec/git-core:$PATH
# 列出所有默认的 tracks
esrally list tracks
# 获取某个 track 的 base-url,这里以 geopoint 为例
grep base-url ~/.rally/benchmarks/tracks/default/geopoint/track.json
"base-url": "http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopoint",
# 获取文件名称
cat ~/.rally/benchmarks/tracks/default/geopoint/files.txt
documents.json.bz2
documents-1k.json.bz2
# 组合下载地址:baseurl + filename
http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopoint/documents.json.bz2
http://benchmarks.elasticsearch.org.s3.amazonaws.com/corpora/geopoint/documents-1k.json.bz2
# 使用浏览器下载后,通过 rz 上传到 data 目录
cd ~/.rally/benchmarks/data/geopoint/
rz
du -sh *.bz2
253M documents-2.json.bz2
482M documents.json.bz2
# 验收,对比 tracks.json 里面的信息是否一致
vim ~/.rally/benchmarks/tracks/default/geopoint/tracks.json
"documents": [
{
"source-file": "documents.json.bz2",
"document-count": 60844404,
"compressed-bytes": 505295401,
"uncompressed-bytes": 2448564579
}
cd ~/.rally/benchmarks/data/geopoint
bzip2 -dk documents.json.bz2
wc -l documents.json
60844404 documents.json
du -b documents.json.bz2 documents.json
505295401 documents.json.bz2
2448564579 documents.json
写个脚本列出所有需要哦下载的样本数据地址
listfiles.sh
track_files=$(ls */track.json)
for track_file in $track_files; do
track_name=$(echo $track_file | awk -F '/' '{print $1}')
echo $track_name
baseurl=$(grep base-url $track_file | awk '{print $2}' | sed -e 's/,//g' -e 's/"//g' | head -n 1)
#echo $baseurl
for data_file in $(cat $track_name/files.txt); do
url="$baseurl/$data_file"
echo $url
done | sort | uniq
echo
#break
done
3、下载 ES 配置(teams)【可选】
默认压测的是 Rally 建立的 ES 本地实例的性能,需要下载 cars 配置(即不一样的 ES 配置,一个 car 表示一种 ES 配置)
cd ~/.rally/benchmarks/
mkdir teams
git clone https://github.com/elastic/rally-teams.git
or
git clone [email protected]:elastic/rally-teams.git (需要设置 publickey)
esrally list cars
____ ____
/ __ \____ _/ / /_ __
/ /_/ / __ `/ / / / / /
/ _, _/ /_/ / / / /_/ /
/_/ |_|\__,_/_/_/\__, /
/____/
Available cars:
Name Type Description
----------------------- ------ ----------------------------------
16gheap car Sets the Java heap to 16GB
1gheap car Sets the Java heap to 1GB
24gheap car Sets the Java heap to 24GB
2gheap car Sets the Java heap to 2GB
4gheap car Sets the Java heap to 4GB
8gheap car Sets the Java heap to 8GB
defaults car Sets the Java heap to 1GB
basic-license mixin Basic License
debug-non-safepoints mixin More accurate CPU profiles
ea mixin Enables Java assertions
fp mixin Preserves frame pointers
g1gc mixin Enables the G1 garbage collector
trial-license mixin Trial License
unpooled mixin Enables Netty's unpooled allocator
x-pack-ml mixin X-Pack Machine Learning
x-pack-monitoring-http mixin X-Pack Monitoring (HTTP exporter)
x-pack-monitoring-local mixin X-Pack Monitoring (local exporter)
x-pack-security mixin X-Pack Security
-------------------------------
[INFO] SUCCESS (took 3 seconds)
-------------------------------
类似 tracks,运行 list car 命令后,做了如下 copy 动作
cp -r rally-teams teams/default
所以 rally-teams 目录可以删掉了
rm -rf rally-teams
接着来看下默认的 ES 配置是什么
cd ~/.rally/benchmarks/teams/default/cars/v1; ll; cat defaults.ini
[meta]
description=Sets the Java heap to 1GB
type=car
[config]
base=vanilla
[variables]
heap_size=1g
- heap 大小为 1GB
- 使用 vanilla 目录里面的配置,
tree vanilla
:
vanilla
├── config.ini
├── README.md
└── templates
└── config
├── elasticsearch.yml
├── jvm.options
└── log4j2.properties
可以看下 elasticsearch.yml 和 jvm.options 配置,这里就不细说了
3、ES 源码下载
Rally 运行时,通过参数 -distribution-version=5.5.2
指定 ES 版本,然后自动从 github 下载 ES 源码到 distributions 目录,例如:
~/.rally/benchmarks/distributions/elasticsearch-5.5.2.tar.gz
然后编译安装到 races 目录
~/.rally/benchmarks/races/2019-07-10-11-42-55/rally-node-0/install/elasticsearch-5.5.2
ES 日志目录
~/.rally/benchmarks/races/2019-07-10-11-42-55/rally-node-0/logs
heap dump 目录
~/.rally/benchmarks/races/2019-07-10-11-42-55/rally-node-0/heapdump