Slurm作业管理系统安装配置
(一)拓扑与节点角色
节点 | 角色 | 地址 |
---|---|---|
node1 | 管理节点 | 192.168.101.1 |
node2 | 计算节点 | 192.168.101.2 |
node3 | 计算节点 | 192.168.101.3 |
node4 | 计算节点 | 192.168.101.4 |
node5 | 计算节点 | 192.168.101.5 |
(二)准备工作(每台机器)
-
安装工具
yum install net-tools wget vim nfs-utils rpcbind ntp ntpdate
-
配置主机名与主机地址映射
vim /etc/hostname 修改为节点对应的名称 vim /etc/hosts 192.168.101.1 node1 192.168.101.2 node2 192.168.101.3 node3 192.168.101.4 node4 192.168.101.5 node5
-
配置Root SSH免密登录
vim /etc/ssh/sshd_config 修改 PermitRootLogin=yes 修改 PasswordAuthentication=yes 修改 PubkeyAuthentication=yes 生成 ssh密钥对 ssh-keygen -t rsa
拷贝公钥到计算节点
ssh-copy-id -i ~/.ssh/id_rsa.pub root@node[1-5]
-
配置防火墙开放端口
firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.101.0/24 port protocol=udp port=1-65535 accept' firewall-cmd --permanent --add-rich-rule='rule family=ipv4 source address=192.168.101.0/24 port protocol=tcp port=1-65535 accept'
-
关闭SELinux
vim /etc/selinux/config 修改 SELINUX=enforcing 为 SELINUX=disabled 重启
(三)配置NFS存储
-
管理节点
设置开机启动相关服务 systemctl enable rpcbind systemctl enable nfs-server systemctl enable nfs-lock systemctl enable nfs-idmap 开启相关服务 systemctl start rpcbind systemctl start nfs-server systemctl start nfs-lock systemctl start nfs-idmap
配置共享路径
mkdir /workspacce mkdir /rhome vim /etc/exports 添加 /workspace 192.168.101.0/24(rw) 添加 /rhome 192.168.101.0/24(rw) 执行exportfs -a 使生效
-
计算节点
mkdir /workspace mkdir /rhome vim /etc/fstab 添加 192.168.101.1:/workspace /workspace nfs defaults 0 0 添加 192.168.101.1:/rhome /rhome nfs defaults 0 0 执行 mount -a 挂载目录
(四)安装munge
-
管理节点
下载numge源码. 点击此处
-
创建rpm安装包
安装相关工具 yum install rpmdevtools gcc bzip2-devel openssl-devel zlib-devel 构建rpm安装包 rpmbuild -tb --without verify munge-0.5.15.tar.xz rpm -ivh rpmbuild/RPMS/x86_64/munge* 复制生成的rpm包到/workspace 供其他节点安装 cp -r rpmbuild/RPMS/x86_64/ /workspace
-
生成munge.key
sudo -u munge /usr/sbin/mungekey --verbose chown munge:munge /etc/munge/munge.key 复制生成的key /workspace 供其他节点使用 cp /etc/munge/munge.key /workspace/munge.key
启动服务
systemctl start munage
|systemctl enable munage
-
计算节点
rpm -ivh /workspace/x86_64/* cp /workspace/munage.key /etc/munage/ chown munage:munage /etc/munage/munage.key systemctl start munage systemctl enable munage
-
测试munage
munage -n | ssh [计算节点] unmunage
(五)配置NTP 时间同步(每台主机)
ntpdate ntp.aliyun.com 同步时间
systemctl start ntpd
systemctl enable ntpd
(六)安装Mysql
-
管理节点
rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2022 yum install mysql80-community-release-el7-6.noarch.rpm yum install mysql-community-{server,client,common,libs,devel}-* systemctl start mysqld systemctl enable mysqld 创建slurm mysql用户 mysql> create user 'slurm'@'localhost' identified by 'password' mysql> grant all on slurm_acct_db.* TO 'slurm'@'localhost'; create database slurm_acct_db;
-
计算节点
rpm --import https://repo.mysql.com/RPM-GPG-KEY-mysql-2022 yum install mysql80-community-release-el7-6.noarch.rpm yum install mysql-community-devel
(七)安装Slurm
-
创建用户
在每个节点创建slurm用户:
useradd slurm
-
构建slurm rpm包 点击此处
安装依赖 yum install hwloc 点击上方链接,下载Slurm 源码 rpmbuild -ta slurm*.tar.bz2 (若提示需要依赖,则安装后继续构建) cp -r rpmbuild/RPMS/x86_64 /workspace/slurm_rpm
-
安装配置slurm
rpm -ivh /workspace/slurm_rpm/*
slurm.conf (所有节点)文件可使用如下工具配置:点击此处
配置参考:
ControlMachine=node1 ControlAddr=192.168.101.1 #控制器的ip ClusterName=cluster #MailProg=/bin/mail MpiDefault=none #MpiParams=ports=#-# ProctrackType=proctrack/cgroup ReturnToService=1 SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/ SlurmUser=slurm SlurmdUser=root StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none TaskPlugin=task/cgroup PrologFlags=CONTAIN # # # TIMERS #KillWait=30 #MinJobAge=300 #SlurmctldTimeout=120 #SlurmdTimeout=300 # # # SCHEDULING SchedulerType=sched/backfill SelectType=select/cons_res #SelectTypeParameters= # # # LOGGING AND ACCOUNTING #JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/cgroup #SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log #SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log # # #配置记账 AccountingStorageHost=127.0.0.1 #数据库位置 #AccountingStoragePass= AccountingStoragePort=6819 AccountingStorageType=accounting_storage/slurmdbd # # COMPUTE NODES 计算节点信息 NodeName=node[1-5] CPUs=40 Sockets=2 CoresPerSocket=10 ThreadsPerCore=2 State=UNKNOWN PartitionName=workspace Nodes=ALL Default=YES MaxTime=INFINITE State=UP
Slurmdbd.conf (管理节点)配置参考
# # Example slurmdbd.conf file. # # See the slurmdbd.conf man page for more information. # # Archive info #ArchiveJobs=yes #ArchiveDir="/tmp" #ArchiveSteps=yes #ArchiveScript= #JobPurge=12 #StepPurge=1 # # Authentication info AuthType=auth/munge AuthInfo=/var/run/munge/munge.socket.2 # # slurmDBD info Dbdaddr=127.0.0.1 DbdHost=localhost DbdPort=6819 SlurmUser=slurm #MessageTimeout=300 DebugLevel=verbose #DefaultQOS=normal,standby LogFile=/opt/slurm/log/slurmdbd.log PidFile=/opt/slurm/log/slurmdbd.pid #PluginDir=/usr/lib/slurm #PrivateData=accounts,users,usage,jobs #TrackWCKey=yes # # Database info StorageType=accounting_storage/mysql #数据库信息 StorageHost=127.0.0.1 StoragePort=3306 StoragePass=[PASSWORD] StorageUser=slurm #数据库名称 StorageLoc=slurm_acct_db
cgroup.conf(所有节点)配置参考:
### # # Slurm cgroup support configuration file # # See man slurm.conf and man cgroup.conf for further # information on cgroup configuration parameters #-- CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=no CgroupMountpoint=/sys/fs/cgroup
-
配置完毕后,将slurm.conf 、cgroup.conf 复制到所有节点的/etc/slurm/目录下
chown slurm:slurm /etc/slurm/slurm.conf
|chown slurm:slurm /etc/slurm/slurmdbd.conf
-
启动服务
管理节点:
systemctl start slurmctld && systemctl start slurmdbd && systemctl start slurmd
并设置开机启动
计算节点:
systemctl start slurmd
并设置开机启动 -
测试
sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST workspace* up infinite 5 idle node[1-5]
OpenMPI 安装
OpenMPI 是一种高性能消息传递库,可以很方便的把串行程序,改为 多线程并行程序,适合多核心电脑,可以和 MPI 搭配使用,对 C 语言 和 Fortran 高性能计算支持很好
准备工作 (所有节点)
yum install gcc gcc-c++ gcc-gfortran make
tar -zxf openmpi-4.1.4.tar.gz
点击此处下载OpenMPI 源码
编译
cd openmpi-4.1.4
./configure --prefix=/opt/openmpi/4.1.4/ CC=gcc CXX=g++ FC=gfortran
make -j40 && make install
修改环境变量
vim /etc/profile
尾部添加
OPENMPI=/opt/openmpi/4.1.4
PATH=$OPENMPI/bin:$PATH
LD_LIBRARY_PATH=$OPENMPI/lib:$LD_LIBRARY_PATH
INCLUDE=$OPENMPI/include:$INCLUDE
CPATH=$OPENMPI/include:$CPATH
MANPATH=$OPENMPI/share/man:$MANPATH
export PATH
export LD_LIBRARY_PATH
export INCLUDE
export CPATH
export MANPATH
source /etc/profile