slurm作业调度集群搭建及配置

目录

一  环境准备

二 时间同步

三  munge认证 

四 数据库安装

五 slurm搭建

六  集群用户管理和初始化配置

qos配置


一  环境准备

主机规划

master  192.168.220.128

node1 192.168.220.129

关闭防火墙  hosts文件互相通信

二 时间同步

1.安装时间同步软件
yum -y install ntp.x86_64
yum -y install ntpdate.x86_64

2.同步阿里云时间服务器
ntpdate ntp.aliyun.com

3.开机自启服务
systemctl stop firewalld
systemctl enable ntpd
systemctl restart ntpd

三  munge认证 

 master配置

1.确保没有安装过munge和munge用户
yum remove -y munge munge-libs munge-devel
userdel -r munge

2.添加munge用户

export MUNGEUSER=1120
groupadd -g $MUNGEUSER munge 
useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge 

3.安装munge软件
yum install munge munge-devel munge-libs rng-tools -y

4.添加配置文件

rngd -r /dev/urandom
create-munge-key
dd if=/dev/urandom bs=1 count=1024 > /etc/munge/munge.key  
chown munge: /etc/munge/munge.key 
chmod 400 /etc/munge/munge.key
chown -R munge: /var/lib/munge
chown -R munge: /var/run/munge
chown -R munge: /var/log/munge

5.发送秘钥给客户端

scp /etc/munge/munge.key root@node1:/etc/munge/

6.启动服务

systemctl start munge
systemctl enable munge

node1配置

1.确保没有安装过munge和munge用户
yum remove -y munge munge-libs munge-devel
userdel -r munge

2.添加munge用户

export MUNGEUSER=1120
groupadd -g $MUNGEUSER munge 
useradd -m -c "MUNGE Uid 'N' Gid Emporium" -d /var/lib/munge -u $MUNGEUSER -g munge -s /sbin/nologin munge 

3.安装munge软件
yum install munge munge-devel munge-libs rng-tools -y

4.添加配置文件

rngd -r /dev/urandom
chmod 700 /etc/munge
chown -R munge: /etc/munge
chown -R munge: /var/lib/munge
chown -R munge: /var/run/munge
chown -R munge: /var/log/munge​

5.启动服务
systemctl start rngd
systemctl start munge
systemctl enable rngd
systemctl enable munge

四 数据库安装

1.安装数据库
yum install mariadb-server mariadb-devel

2.重置密码
mysql_secure_installation
mysql -u root -ppassword   #password为之前你设置的root密码

3.进入数据库
# 生成slurm用户,以便该用户操作slurm_acct_db数据库,其密码是SomePassWD,可自行设定
create user 'slurm'@'localhost' identified by 'SomePassWD';
 
# 生成账户数据库slurm_acct_db
create database slurm_acct_db;
# 赋予slurm从本机localhost采用密码SomePassWD登录具备操作slurm_acct_db数据下所有表的全部权限
grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by 'SomePassWD' with grant option;
# 赋予slurm从node1采用密码SomePassWD登录具备操作slurm_acct_db数据下所有表的全部权限
grant all on slurm_acct_db.* TO 'slurm'@'node1' identified by 'SomePassWD' with grant option;
 
# 生成作业信息数据库slurm_jobcomp_db
create database slurm_jobcomp_db;
# 赋予slurm从本机localhost采用密码SomePassWD登录具备操作slurm_jobcomp_db数据下所有表的全部权限
grant all on slurm_jobcomp_db.* TO 'slurm'@'localhost' identified by 'SomePassWD' with grant option;
# 赋予slurm从node1采用密码SomePassWD登录具备操作slurm_jobcomp_db数据下所有表的全部权限
grant all on slurm_jobcomp_db.* TO 'slurm'@'node1' identified by 'SomePassWD' with grant option;

 flush privileges; ##保存配置

修改配置文件my.cnf

# The following options will be passed to all MySQL clients
[client]
port=3306
socket=/var/lib/mysql/mysql.sock
default-character-set=utf8mb4
 
# Here follows entries for some specific programs
[mariadb_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
 
# The MySQL server
[mariadb]
# explicit_defaults_for_timestamp = true
datadir=/var/lib/mysql
port = 3306
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
#
sql_mode='STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
 
interactive_timeout=1200
wait_timeout=1800
 
skip_name_resolve=OFF
innodb_file_per_table=ON
max_connections=2048
max_connect_errors=1000000
max_allowed_packet=16M
sort_buffer_size=512K
net_buffer_length=16K
read_buffer_size=512K
read_rnd_buffer_size=512K
character_set_server=utf8mb4
collation_server=utf8mb4_bin
thread_stack=256K
thread_cache_size=384
tmp_table_size=96M
max_heap_table_size=96M
 
#open slow query
slow_query_log=OFF
slow_query_log_file=/var/lib/mysql/mysql-slow-query.log
#set slow time to default 10 second, minimum value 0
long_query_time=4
 
local_infile=OFF
 
# binary logging is required for replication
#log_bin=mysql-bin
 
#master - slave syncronized setting
log_slave_updates=ON
 
server-id=1
 
log-bin=mysql-bin
sync_binlog=1
binlog_checksum = none
binlog_format = mixed
auto-increment-increment = 2
auto-increment-offset = 1
slave-skip-errors = all
sync_binlog=1
binlog_checksum = none
binlog_format = mixed
auto-increment-increment = 2
auto-increment-offset = 1
slave-skip-errors = all
 
 
# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = /var/lib/mysql/
#innodb_data_file_path = ibdata1:10M:autoextend
#innodb_log_group_home_dir = /var/lib/mysql/
#innodb_log_arch_dir = /var/lib/mysql/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
event_scheduler=ON
default_storage_engine=InnoDB
innodb_buffer_pool_size=1024M  #64M # 1024M
innodb_purge_threads=1
innodb_log_file_size=128M
innodb_log_buffer_size=2M
innodb_lock_wait_timeout=900  #120
 
bulk_insert_buffer_size=32M
myisam_sort_buffer_size=8M
 
#MySQL rebuild index allowed maxmum cache file
myisam_max_sort_file_size=4G
myisam_repair_threads=1
lower_case_table_names=0
 
[mysqldump]
quick
max_allowed_packet=16M
 
#[isamchk]
#key_buffer = 16M
#sort_buffer_size = 16M
#read_buffer = 4M
#write_buffer = 4M
 
[myisamchk]
key_buffer=16M
sort_buffer_size=16M
read_buffer=4M
write_buffer=4M
 
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d 
4.重启数据库 
cd /var/lib/mysql
mv ib_logfile0 ib_logfile0.bak
mv ib_logfile1 ib_logfile1.bak

systemctl restart mariadb && systemctl enable mariadb

五 slurm搭建

1.编译slurm包
yum install -y readline-devel perl-ExtUtils* perl-Switch pam-devel lua-devel hwloc-devel


2.下载slurm包 本次使用的是最新版本22.05

Downloads | SchedMDhttps://www.schedmd.com/downloads.php

3.编译安装包

rpmbuild -ta -with lua slurm-21.08.8-2.tar.bz2

root用户编译完成后会在/root/rpmbuild/RPMS/x86_64下生成与slurm有关的rpm包
4.添加slurm用户

export SLURMUSER=1121
groupadd -g $SLURMUSER slurm 
useradd -m -c "SLURM workload manager" -d /var/lib/slurm -u $SLURMUSER -g slurm -s /bin/bash slurm

5.安装rmp包

rpm -ivh slurm*.rpm

master节点操作

6.创建配置文件
 
#slurmdbd.conf文件为slurmdbd服务的配置文件,所有者必须为slurm用户
touch /etc/slurm/slurmdbd.conf
chown slurm:slurm /etc/slurm/slurmdbd.conf
chmod 600 /etc/slurm/slurmdbd.conf
 
#slurm.conf文件为slurmd、slurmctld的配置文件,所有者必须为root用户
touch /etc/slurm/slurm.conf
chown root:root /etc/slurm/slurm.conf
 
#建立slurmctld服务存储其状态等的目录,由slurm.conf中StateSaveLocation参数定义:
mkdir /var/spool/slurmctld
chown slurm:slurm /var/spool/slurmctld
 
#建立日志文件存储目录,并修改目录权限
mkdir /var/log/slurm
cd /var/log/slurm/
touch slurmd.log
touch slurmctld.log
touch slurmdbd.log
chown slurm:slurm /var/log/slurm

编辑/etc/slurm/slurmdbd.conf文件,添加如下内容:

AuthType=auth/munge  #认证方式,该处采用munge进行认证
AuthInfo=/var/run/munge/munge.socket.2 #为了与slurmctld控制节点通信的其它认证信息
 
# slurmDBD info
DbdAddr=localhost # 数据库节点名
DbdHost=localhost # 数据库IP地址
DbdPort=6819 # 数据库端口号,默认为6819
SlurmUser=slurm # 用户数据库操作的用户
MessageTimeout=60 # 允许以秒为单位完成往返通信的时间,默认为10秒
DebugLevel=5 # 调试信息级别,quiet:无调试信息;fatal:仅严重错误信息;error:仅错误信息; info:错误与通常信息;verbose:错误和详细信息;debug:错误、详细和调试信息;debug2:错误、详细和更多调试信息;debug3:错误、详细和甚至更多调试信息;debug4:错误、详细和甚至更多调试信息;debug5:错误、详细和甚至更多调试信息。debug数字越大,信息越详细
LogFile=/var/log/slurm/slurmdbd.log # slurmdbd守护进程日志文件绝对路径
PidFile=/var/run/slurmdbd.pid # slurmdbd守护进程存储进程号文件绝对路径
 
# Database info 
StorageType=accounting_storage/mysql # 数据存储类型
StorageHost=localhost # 存储数据库节点名
StoragePort=3306 # 存储数据库服务端口号
StoragePass=password # 存储数据库密码
StorageUser=slurm # 存储数据库用户名
StorageLoc=slurm_acct_db # 存储位置,对应数据库中的slurm_acct_db的表名称

编辑完成后保存,启动slurmdbd服务并加入开机自启:

systemctl enable slurmdbd
systemctl restart slurmdbd

配置slurmd服务  
        无配置模式是Slurm的一项新特性(从20.02版起支持),可以允许计算节点和用户登录节点从slurmctld守护进程获取配置而无需采用 /etc/slurm 等目录下的本地配置文件。需要在管理节点的slurm.conf文件配置SlurmctldParameters=enable_configless选项。

编辑/etc/slurm/slurm.conf文件,添加如下内容:

配置的更多详情可参考slurm官方说明:Slurm Workload Manager - slurm.conf

# slurm.conf file. Please run configurator.html
# (in doc/html) to build a configuration file customized
# for your environment.
#
 
################################################
#                   CONTROL                    #
################################################
ClusterName=hgy       #集群名称
SlurmctldHost=cn03    #管理服务主控节点名
SlurmUser=root        #slurm的主用户
SlurmdUser=root       #slurmd服务的启动用户
SlurmctldPort=6817    #slurmctld服务端口
SlurmdPort=6818        #slurmd服务的端口
AuthType=auth/munge    #采用munge认证,与其他计算节点通信
 
################################################
#            LOGGING & OTHER PATHS             #
################################################
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
StateSaveLocation=/var/spool/slurmctld
SlurmctldParameters=enable_configless  #采用无配置模式
 
################################################
#                  ACCOUNTING                  #
################################################
AccountingStorageEnforce=associations,limits,qos  #account存储数据的配置选项
AccountingStorageHost=cn03    #数据库存储节点
AccountingStoragePass=/var/run/munge/munge.socket.2 #munge认证文件,与slurmdbd.conf文件中的AuthInfo文件同名。
AccountingStoragePort=6819     #slurmd服务监听端口,默认为6819
AccountingStorageType=accounting_storage/slurmdbd   #数据库记账服务
AccountingStorageTRES=cpu,mem,energy,node,billing,fs/disk,vmem,pages,gres/gpu:tesla  #记账信息
AcctGatherEnergyType=acct_gather_energy/none   #作业消耗能源信息,none代表不采集
AcctGatherFilesystemType=acct_gather_filesystem/none
AcctGatherInterconnectType=acct_gather_interconnect/none
AcctGatherNodeFreq=0
AcctGatherProfileType=acct_gather_profile/none
ExtSensorsType=ext_sensors/none
ExtSensorsFreq=0
 
################################################
#                      JOBS                    #
################################################
JobCompHost=localhost   #作业完成信息的数据库本节点
#JobCompLoc=
JobCompPass=password    #slurm用户数据库密码
JobCompPort=6819        #作业完成信息数据库端口,与上面的端口一致
JobCompType=jobcomp/mysql  #作业完成信息数据存储类型,采用mysql数据库
JobCompUser=slurm        #作业完成信息数据库用户名
JobContainerType=job_container/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
PrivateData=jobs,usage
DisableRootJobs=NO
 
#
################################################
#           SCHEDULING & ALLOCATION            #
################################################
PreemptMode=OFF
PreemptType=preempt/none
PreemptExemptTime=00:00:00
PriorityType=priority/multifactor
SchedulerTimeSlice=300
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_CPU
SlurmSchedLogLevel=0
 
################################################
#                   TOPOLOGY                   #
################################################
TopologyPlugin=topology/none
 
################################################
#                    TIMERS                    #
################################################
BatchStartTimeout=100
CompleteWait=0
EpilogMsgTime=2000
GetEnvTimeout=10
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=600
SlurmdTimeout=600
WaitTime=0
MessageTimeout=30
TCPTimeout=10
 
################################################
#                    POWER                     #
################################################
ResumeRate=300
ResumeTimeout=120
SuspendRate=60
SuspendTime=NONE
SuspendTimeout=60
#
################################################
#                    DEBUG                     #
################################################
DebugFlags=NO_CONF_HASH
 
################################################
#               PROCESS TRACKING               #
################################################
ProctrackType=proctrack/linuxproc
 
################################################
#             RESOURCE CONFINEMENT             #
################################################
TaskPlugin=task/affinity
TaskPluginParam=threads
 
 
################################################
#                    PRIORITY                  #
################################################
#PrioritySiteFactorPlugin=
PriorityDecayHalfLife=7-00:00:00
PriorityCalcPeriod=00:05:00
PriorityFavorSmall=No
#PriorityFlags=
PriorityMaxAge=7-00:00:00
PriorityUsageResetPeriod=NONE
PriorityWeightAge=0
PriorityWeightAssoc=0
PriorityWeightFairShare=0
PriorityWeightJobSize=0
PriorityWeightPartition=0
PriorityWeightQOS=1000
 
################################################
#                    OTHER                     #
################################################
AllowSpecResourcesUsage=No
CoreSpecPlugin=core_spec/none
CpuFreqGovernors=Performance,OnDemand,UserSpace
CredType=cred/munge
EioTimeout=120
EnforcePartLimits=NO
MpiDefault=none
FirstJobId=2
JobFileAppend=0
JobRequeue=1
MailProg=/bin/mail
MaxArraySize=1001
MaxDBDMsgs=24248
MaxJobCount=10000
MaxJobId=67043328
MaxMemPerNode=UNLIMITED
MaxStepCount=40000
MaxTasksPerNode=512
MCSPlugin=mcs/none
ReturnToService=2
RoutePlugin=route/default
TmpFS=/tmp
TrackWCKey=no
TreeWidth=50
UsePAM=0
SwitchType=switch/none
UnkillableStepTimeout=60
VSizeFactor=0
 
################################################
#                    NODES                     #
################################################
NodeName=master CPUs=3 Boards=1 SocketsPerBoard=1 CoresPerSocket=3 ThreadsPerCore=1 RealMemory=2827   State=UNKNOWN  #节点信息
################################################
#                  PARTITIONS                  #
################################################
PartitionName=debug MAXCPUsPerNode=2  Nodes=ALL Default=YES MaxTime=INFINITE State=UP #分区信息 


计算节点配置

1.创建日志目录,并添加日志文件:      

mkdir -p /var/log/slurm
cd /var/log/slurm/
touch slurmd.log
2. 编辑/lib/systemd/system/slurmd.service文件,按照如下所述修改:

#修改前
ExecStart=/usr/sbin/slurmd -D -s $SLURMD_OPTIONS
 
#修改后
ExecStart=/usr/sbin/slurmd --conf-server cn03:6817 -D -s $SLURMD_OPTIONS
        执行成功后直接启动slurmd服务并加入开机自启即可。  
3.重启服务
systemctl daemon-reload
systemctl enable slurmd
systemctl start slurmd

4.主节点配置

[root@master x86_64]# vim /lib/systemd/system/slurmctld.service
[Unit]
Description=Slurm controller daemon
After=network-online.target munge.service mariadb.service
##要添加maridb服务 否则重启时服务起不来


        此时输入sinfo已经可以看到分区信息了,至此slurm安装配置完毕。
若节点显示darin 
[root@master x86_64]# scontrol update NodeName=node1 State=resume 刷新节点

六  集群用户管理和初始化配置

节点都显示idle则代表节点正常
[root@master x86_64]# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
debug*       up   infinite      2   idle master,node1


1.集群添加account属性  默认属性

sacctmgr add account normal Description="Default account"

2.添加用户
UIDNOW=1300
useradd test -p test -d /public/home/$1 -u ${UIDNOW} -s /bin/bash
scp /etc/passwd /etc/shadow /etc/group node1:/etc/

3.将用户加入slurm集群组

sacctmgr -i add user test DefaultAccount=normal

4.测试
bash-4.2$ srun -n 6 hostname
master
node1
node1
master
node1
master

qos配置

1,添加qos
[root@master x86_64]# sacctmgr add qos ceshi
##添加qos
[root@master x86_64]# sacctmgr show qos format=name,priority,user
##展示qos和优先级
[root@master x86_64]# sacctmgr modify qos ceshi set priority=10 
##修改qos ceshi 优先级10
[root@master x86_64]# sacctmgr modify user test set qos=ceshi
##将用户添加qos

用户操作

查询用户

sacctmgr show user ***

添加用户

sacctmgr add user sghpc2 DefaultAccount=acct02 Qos=test_qos

修改用户 

sacctmgr modify user sghpc2 set QoS=nomal

删除用户

sacctmgr delete user username

slurm作业调度管理系统配置-集群搭建步骤6_slurm集群搭建_R★F的博客-CSDN博客https://blog.csdn.net/xhk12345678/article/details/124710528?spm=1001.2014.3001.5502此篇只是完善此博主的slurm搭建中遇到的错误,由衷感谢他提供的模板。

你可能感兴趣的:(linux,作业调度系统,运维)