一、 背景介绍
简介:
用密度泛函理论计算固体的电子结构。它基于键结构计算最准确的方案——完全势能(线性)增广平面波((L)APW)+局域轨道(lo)方法。在密度泛函中可以使用局域(自旋)密度近似(LDA)或广义梯度近似(GGA)。WIEN 2000使用全电子方案,包含相对论影响。
功能:
计算固体特性。键能和态密度,电子密度和自旋密度,X射线结构因子,Baders的“分子中的原子”概念,总能量,力,平衡结构,结构优化,分子动力学,电场梯度,异构体位移,超精细场,自旋极化(铁磁性和反铁磁性结构),自旋-轨道耦合,X射线发射和吸收谱,电子能量损失谱计算固体的光学特性费米表面LDA,GGA,meta-GGA,LDA+U,轨道极化中心对称和非中心对称晶格,内置230个空间群图形用户界面和用户指南友好的用户环境W2web (WIEN to WEB)可以很容易的产生和修改输入文件。它还能帮助用户执行各种任务(如电子密度,态密度,等)。
平台:
unix /linux
二、 软件的安装设置
1、硬件环境
Shanghai/Suse 10u2
2、软件版本
Ver:wien2k09
3、安装Intel 编译器
ifort/icc
Ver:11.083
4、安装Intel的MKL
Ver:10.1.2.024
5、安装mpich v1.2.7
./configure -c++=icpc -cc=icc -f77=ifort -f90=ifort --prefix=/home/soft/mpi/mpich-1.2.7-intel
make
make install
6、设置环境变量
vi ~/.bashrc
添加如下:
##############MPICH###########
export PATH=/home/soft/mpi/mpich-1.2.7-intel/bin:$PATH
################intel compiler###################
. /home/soft/intel/Compiler/11.0/083/bin/intel64/ifortvars_intel64.sh
. /home/soft/intel/Compiler/11.0/083/bin/intel64/iccvars_intel64.sh
###############intel mkl###################
export LD_LIBRARY_PATH=/home/soft/intel/mkl/10.1.2.024/lib/em64t/:$LD_LIBRARY_PATH
7、安装fftw库
tar zxf fftw-2.1.5.tar.gz
cd fftw-2.1.5/
export F77=ifort
export CC=icc
./configure --prefix=/home/soft/mathlib/fftwv215-mpich --enable-mpi
make
make install
8、创建编译目录
进入安装用户目录
su - mjhe
mkdir ~/WIEN2k_09
cp WIEN_2k.tar ~/WIEN2k_09
9、将压缩包解开
cd ~/WIEN2k_09
tar xf WIEN2k_09.tar
./expand_lapw
10、 编译
./siteconfig_lapw
其中几个编译参数需要修改: (可以参考如下)
specify a system
K Linux (Intel ifort 10.1 compiler + mkl 10.0 )
specify compiler
Current selection: ifort
Current selection: icc
specify compiler options, BLAS and LAPACK
Current settings:
O Compiler options: -FR -mp1 -w -prec_div -pc80 -pad -align -DINTEL_VML -traceback
L Linker Flags: $(FOPT) -L/home/soft/intel/mkl/10.1.2.024/lib/em64t/ -pthread -i-static
P Preprocessor flags '-DParallel'
mkl的库用静态的:
R R_LIB (LAPACK+BLAS): /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_lapack.a /home/soft/intel/mkl/10.1.2.024/lib/em64t/libguide.a /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_core.a /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_em64t.a
configure Parallel execution
Shared Memory Architecture? (y/n):n
Remote shell (default is ssh) = ssh
Do you have MPI and Scalapack installed and intend to run
finegrained parallel? (This is usefull only for BIG cases)!
(y/n) n
Current selection: mpiifort
Current settings:
采用静态库
RP RP_LIB(SCALAPACK+PBLAS): -lmkl_intel_lp64 /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_scalapack_lp64.a /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_sequential.a /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_blacs_lp64.a /home/soft/mathlib/fftwv215-mpich/lib/libfftw_mpi.a /home/soft/mathlib/fftwv215-mpich/lib/libfftw.a -lmkl /home/soft/intel/mkl/10.1.2.024/lib/em64t/libguide.a
//
RP RP_LIB(SCALAPACK+PBLAS): -lmkl_intel_lp64 /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_scalapack_lp64.a /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_sequential.a /home/soft/intel/mkl/10.1.2.024/lib/em64t/libmkl_blacs_lp64.a -L/data1/soft/lib/lib/ -lfftw_mpi -lfftw -lmkl /data1/soft/intel/mkl/10.0.3.020/lib/em64t/libguide.a
FP FPOPT(par.comp.options): $(FOPT)
MP MPIRUN commando : mpirun -np _NP_ -machinefile _HOSTS_ _EXEC_
Dimension Parameters
该部分可以采用默认值,也可以设置为(4GB以上内存)
PARAMETER (NMATMAX= 30000)
PARAMETER (NUME= 1000)
进入编译部分:
Compile/Recompile
A Compile all programs (suggested)
主要在编译mpi并行版本的5个可执行文件时会出错,因此编译后需要检查以下文件是否存在:
./SRC_lapw0/lapw0_mpi
./SRC_lapw1/lapw1_mpi
./SRC_lapw1/lapw1c_mpi
./SRC_lapw2/lapw2_mpi
./SRC_lapw2/lapw2c_mpi
11、 安装后设置
./userconfig_lapw
editor shall be: vi
其余都回车
修改.bashrc,注释以下这行:
#ulimit -s unlimited
修改parallel_options
setenv WIEN_MPIRUN “mpirun -machinefile _HOSTS_ -np _NP_ _EXEC_”
12、 配置web界面
用root用户打开apache服务
service apache2 start
在普通用户下执行
w2web
将打开7890端口作为wien2k的web界面
13、 算例测试
进行串行计算:
以系统自带算例TiC为例:
cd TiC
mkdir TiC
cp ../TiC.struct .
生成原子信息:
instgen_lapw
初始化算例:
init_lapw –b
计算:
run_lapw
可以看到程序的输出结果在*.output中,如有错误可以在TiC.dayfile中查询。
进行并行计算:
测试并行环境是否设置:
testpara_lapw
测试算例计算状态:
testpara1_lapw
testpara2_lapw
根据.machines文件不同决定进行k点或mpi并行计算:
K点:
granularity:1
1:node31:1
1:node31:1
1:node32:1
1:node32:1
lapw0:node31:2 node32:2
extrafine:1
mpi:
granularity:1
1:node31:2
1:node32:2
lapw0:node31:2 node32:2
extrafine:1
计算:
run_lapw -p
14、 采用作业调度提交作业
cat wien2k.pbs
###########################################################################
# #
# Script for submitting parallel wien2k_09 jobs to Dawning cluster. #
# #
###########################################################################
###########################################################################
# Lines that begin with #PBS are PBS directives (not comments).
# True comments begin with "# " (i,e., # followed by a space).
###########################################################################
#PBS -S /bin/bash
#PBS -N TiO2
#PBS -j oe
#PBS -l nodes=1:ppn=8
#PBS -V
#############################################################################
# -S: shell the job will run under
# -o: name of the queue error filename
# -j: merges stdout and stderr to the same file
# -l: resources required by the job: number of nodes and processors per node
# -l: resources required by the job: maximun job time length
#############################################################################
#########parallel mode is mpi/kpoint############
PARALLEL=mpi //表示采用mpi并行或k点并行
echo $PARALLEL
################################################
NP=`cat ${PBS_NODEFILE} | wc -l`
NODE_NUM=`cat $PBS_NODEFILE|uniq |wc -l`
NP_PER_NODE=`expr $NP / $NODE_NUM`
username=`whoami`
export WIENROOT=/home/users/mjhe/wien2k_09/
export PATH=$PATH:$WIENROOT:.
WIEN2K_RUNDIR=/scratch/${username}.${PBS_JOBID}
export SCRATCH=${WIEN2K_RUNDIR}
#creat scratch dir
if [ ! -a $WIEN2K_RUNDIR ]; then
echo "Scratch directory $WIEN2K_RUNDIR created."
mkdir -p $WIEN2K_RUNDIR
fi
cd $PBS_O_WORKDIR
###############creating .machines################
case $PARALLEL in
mpi)
echo "granularity:1" >.machines
for i in `cat $PBS_NODEFILE |uniq `
do
echo "1:"$i":"$NP_PER_NODE >> .machines
done
printf "lapw0:">> .machines
#####lapw0 用mpi并行#############
for i in `cat ${PBS_NODEFILE}|uniq`
do
printf $i:$NP_PER_NODE" " >>.machines
done
#################################
####lapw0用mpi并行 报错的算例用以下 mpi error lapw0########
# printf `cat ${PBS_NODEFILE}|uniq|head -1`:1>>.machines
#############end#################
printf "/n" >>.machines
echo "extrafine:1">>.machines
;;
kpoint)
echo "granularity:1" >.machines
for i in `cat $PBS_NODEFILE`
do
echo "1:"$i":"1 >> .machines
done
printf "lapw0:">> .machines
#####lapw0 用mpi并行#############
for i in `cat ${PBS_NODEFILE}|uniq`
do
printf $i:$NP_PER_NODE" " >>.machines
done
#################################
####lapw0用mpi并行 报错的算例用以下 mpi error lapw0########
# printf `cat ${PBS_NODEFILE}|uniq|head -1`:1>>.machines
#############end#################
printf "/n" >>.machines
echo "extrafine:1">>.machines
;;
esac
#################end creating####################
####### Run the parallel executable "WIEN2K"#########
instgen_lapw
init_lapw -b
clean -s
echo "##################start time is `date`########################"
run_lapw -p
echo "###################end time is `date`########################"
rm -rf $WIEN2K_RUNDIR
########################END########################
一般需要修改的地方已用红字标出
该脚本可以实现算例的初始化,必须在存在*.struct的前提下进行。
15、 性能benchmark
CB65
Shanghai 2382:16GB 147GB SAS
1000Gb/mpich v1.2.7
TiO2算例:
NMATMAX=30000
2进程k点,mpi并行lapw0、k点并行lapw1、lapw2模块
4m44s
4进程k点,mpi并行lapw0、k点并行lapw1、lapw2模块
4m30s
8进程k点,mpi并行lapw0、k点并行lapw1、lapw2模块
6m29s
2进程mpi,mpi并行lapw0、lapw1、lapw2模块
7m53s
4进程mpi,mpi并行lapw0、lapw1、lapw2模块
6m56s
8进程mpi,mpi并行lapw0、lapw1、lapw2模块
9m5s
标准测试算例:
官方提供的测试算例:
串行:
test_case
export OMP_NUM_THREADS=1
time x lapw1 –c
SUM OF WALL CLOCK TIMES: 135.0 (INIT = 1.0 + K-POINTS = 133.9)
export OMP_NUM_THREADS=4
time x lapw1 –c
SUM OF WALL CLOCK TIMES: 62.0 (INIT = 1.0 + K-POINTS = 61.0)
export OMP_NUM_THREADS=8
time x lapw1 –c
SUM OF WALL CLOCK TIMES: 56.2 (INIT = 1.0 + K-POINTS = 55.2)
并行:
time x lapw1 –p
test_case
2 kpoint:
test_case.output1: SUM OF WALL CLOCK TIMES: 62.0 (INIT = 1.0 + K-POINTS = 61.0)
test_case.output1_1: SUM OF WALL CLOCK TIMES: 138.5 (INIT = 1.0 + K-POINTS = 137.5)
4 kpoint:
test_case.output1: SUM OF WALL CLOCK TIMES: 62.0 (INIT = 1.0 + K-POINTS = 61.0)
test_case.output1_1: SUM OF WALL CLOCK TIMES: 134.9 (INIT = 1.0 + K-POINTS = 133.9)
mpi-benchmark
2process:
mpi-benchmark.output1_1: TIME HAMILT (CPU) = 134.1, HNS = 116.4, HORB =0.0, DIAG=697.5
mpi-benchmark.output1_1: TOTAL CPU TIME: 950.0 (INIT = 1.9 + K-POINTS = 948.1)
mpi-benchmark.output1_1: SUM OF WALL CLOCK TIMES: 1138.9 (INIT =2.2 + K-POINTS =1136.7)
4process:
mpi-benchmark.output1_1: TIME HAMILT (CPU) = 67.8, HNS = 70.5, HORB = 0.0, DIAG = 420.6
mpi-benchmark.output1_1: TOTAL CPU TIME: 560.7 (INIT = 1.8 + K-POINTS = 558.9)
mpi-benchmark.output1_1: SUM OF WALL CLOCK TIMES: 643.2 (INIT = 2.2 + K-POINTS = 640.9)
8process:
mpi-benchmark.output1_1: TIME HAMILT (CPU) = 40.4, HNS = 44.9, HORB = 0.0, DIAG = 422.0
mpi-benchmark.output1_1: TOTAL CPU TIME: 509.3 (INIT = 1.9 + K-POINTS = 507.4)
mpi-benchmark.output1_1: SUM OF WALL CLOCK TIMES: 614.3 (INIT = 2.2 + K-POINTS = 612.0)
16process:
mpi-benchmark.output1_1: TIME HAMILT (CPU) = 22.6, HNS = 32.5, HORB = 0.0, DIAG = 140.5
mpi-benchmark.output1_1: TOTAL CPU TIME: 197.5 (INIT = 1.9 + K-POINTS = 195.7)
mpi-benchmark.output1_1: SUM OF WALL CLOCK TIMES: 1190.0 (INIT =2.8 + K-POINTS =1187.2)
可以用grep TIME *output1* 显示计算时间
16、 其他
三、 Troubleshooting
1、需要在所有计算节点建立本地缓存目录/scratch
mkdir /scratch
chmod 777 /scratch
2、每次进行计算时需要将算例先清空、重做初始化
3、其他
四、 其他
1 本文命令、代码和超链接采用斜体五号字表示
2 Reference
2.1 User’sGuide,February5,2009
2.2 http://www.wien2k.at/reg_user/benchmark/