As we know, Presto coordinator service has the single point of failure issue. Articles bellow illustrate the approaches to achieve high availability with haproxy.
0. Cluster roles
node | worker | coordinator | haproxy | LB | clush mater |
---|---|---|---|---|---|
node-01 | √ | ||||
node-02 | √ | ||||
node-03 | √ | ||||
node-04 | √ | ||||
node-05 | √ | ||||
node-06 | √ | ||||
node-07 | √ | ||||
node-08 | √ | ||||
node-09 | √ | ||||
node-10 | √ | ||||
node-11 | √ | ||||
node-12 | √ | √ | |||
node-13 | √ | √ | √ | ||
node-14 | √ | √ |
1. Setup clush groups
# setup clush groups
$ vim /etc/clustershell/groups
presto-worker:node-[01-12]
presto-coordinator:node-[13,14]
2. Cluster diagram
Client \ / HAProxy -------> coordinator1
\ / \ /
LB \/
worker1 / \ / \
worker2 / \ HAProxy -------> coordinator2
...
3. Git clone presto repo
$ git clone https://github.com/prestosql/presto.git /data/app/presto
$ git checkout -b presto-310 tags/310
4. Presto Config
4.1. Node config
$ cd /data/presto
$ mkdir etc
$ cd etc
$ vim /data/etc/project/presto-coordinator/node.properties
$ vim /data/etc/project/presto-worker/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/data/data/presto
4.2 JVM config
$ vim /data/etc/project/presto-coordinator/jvm.config
$ vim /data/etc/project/presto-worker/jvm.config
-server
-Xss8M
-Xmx120G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-DHADOOP_USER_NAME=hdfs_user # replacing hdfs_user with the appropriate username
-Djdk.nio.maxCachedBufferSize=2000000
4.3 Config properties
See the coordinator and worker configurations bellow
4.4 Hadoop conf
$ vim /data/etc/project/hadoop/conf/core-site.xml
$ vim /data/etc/project/hadoop/conf/hdfs-site.xml
# You can copy these files from hadoop nodes
4.5 Log levels
$ vim /data/etc/project/presto-coordinator/log.properties
$ vim /data/etc/project/presto-worker/log.properties
io.prestosql=INFO
4.6 Catalog Properties
$ vim /data/etc/project/presto-coordinator/catalog/hive.properties
$ vim /data/etc/project/presto-worker/catalog/hive.properties
connector.name=hive-hadoop2
hive.metastore.uri=thrift://*******:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
5. Distribute presto to all nodes
# create separate data folder for presto
$ clush -g all -b 'mkdir -p /data/data/presto'
# copy unpacked presto to all nodes
$ clush -g all -b --copy /data/presto/ --dest /data/
6. Install coordinator
$ vim /data/etc/project/presto-coordinator/config.properties
coordinator=true
node-scheduler.include-coordinator=false //tell coordinator not to do work
http-server.http.port=8321
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8321 //each coordinator itself only reports to its local and will not interfere with each other
$ clush -g presto-coordinator -b --copy /data/etc/project/presto-coordinator/* --dest /data/presto/etc/
7. Install haproxy on coordinator nodes
# create preosto haproxy config
$ mkdir -p /data/etc/project/presto-haproxy
$ vim /data/etc/project/presto-haproxy/haproxy.cfg
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 5s
timeout queue 1m
timeout connect 10s
timeout client 10s
timeout server 10s
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
frontend presto
bind *:8385 name presto
default_backend presto_rest
backend presto-rest
stick-table type ip size 1000
stick on dst
server discoveryserver1 node-13:8321 check
server discoveryserver2 node-14:8321 check backup
# install haproxy on coordinator nodes
$ clush -g presto-coordinator -b 'yum -y install haproxy'
# copy and overwrite haproxy config files on coordinator nodes
$ clush -g presto-coordinator -b --copy /data/etc/project/presto-haproxy/haproxy.cfg --dest /etc/haproxy/
# start chproxy service
$ clush -g presto-coordinator -b 'systemctl enable haproxy'
$ clush -g presto-coordinator -b 'systemctl start haproxy'
$ clush -g presto-coordinator -b 'systemctl status haproxy'
# test chproxy
$ curl http://node-13:8385/haproxy?stats
$ curl http://node-14:8385/haproxy?stats
8. Using haproxy before multiple haproxies to achieve load balancing
Use stream-12 as a load balancer node for two active coordinators
# install haproxy on stream-12 and config as bellow
frontend presto
bind *:80 name presto
default_backend chproxies
backend chproxies
roundrobbin
server chproxy_1 node-13:8385 check
server chproxy_2 node-14:8385 check
9. Install worker nodes
$ vim /data/etc/project/presto-worker/config.properties
coordinator=false
http-server.http.port=8321
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery.uri=http://node-12
$ clush -g presto-worker -b --copy /data/etc/project/presto-worker/* --dest /data/presto/etc/
# replace worker id with hostname
$ clush -g all -b 'sed -i "s/ffffffff-ffff-ffff-ffff-ffffffffffff/"$HOSTNAME"/" /data/presto/etc/node.properties'
# copy hadoop conf to every nodes
$ clush -g all -b --copy /data/etc/project/hadoop --dest /etc/
# start/restart all ndoes
$ clush -g all -b 'export PATH=$PATH:/usr/local/jdk1.8.0_191/bin/; /data/presto/bin/launcher restart'
10. Add presto hive shortcut
$ clush -a -b "echo 'alias presto_hive=\"/data/presto/bin/presto --server http://node-12 --catalog hive --schema default\"' >> ~/.bashrc"
11. Try it !
$ presto_hive
> show schemas;
> show tables;
> select * from system.runtime.nodes;
12. Rolling upgrade presto
$ vim ~/presto-upgrade.sh
cd /data/app/presto
rev=/tmp/presto.revision
git pull
latest=$(git rev-parse --short HEAD)
curr=$(head -1 ${rev})
if [[ $curr != $latest ]];then
echo $latest > ${rev}
./mvnw -TC4 clean install -DskipTests
if [[ "$?" -ne 0 ]] ; then
echo 'could not perform tests'; exit $?
fi
clush -a -b 'rm -rf /data/presto'
clush -a -b --copy /data/app/presto/presto-server/target/presto-server-310 --dest /data/presto
clush -a -b --copy /data/app/presto/presto-cli/target/presto-cli-310-executable.jar --dest /data/presto/bin/presto
clush -g presto-coordinator -b --copy /data/etc/project/presto-coordinator/* --dest /data/presto/etc/
clush -g presto-worker -b --copy /data/etc/project/presto-worker/* --dest /data/presto/etc/
clush -g all -b 'sed -i "s/ffffffff-ffff-ffff-ffff-ffffffffffff/"$HOSTNAME"/" /data/presto/etc/node.properties'
clush -g all -b 'export PATH=$PATH:/usr/local/jdk1.8.0_191/bin/; /data/presto/bin/launcher restart'
fi
13. References
- https://coding-stream-of-consciousness.com/2018/12/29/presto-coordinator-high-availability-ha/
- https://linuxhandbook.com/load-balancing-setup/