首先说明,这是我第一次系统性的了解Ubuntu和数值模式,本文中将会出现很多基础错误,请大家多多指教。
本文的主要目的是记录一下模式的移植过程,以防以后再次移植时忘记。
将自己这一步一步的详细经历写下来,也希望能帮助像我一样的零基础的朋友快速实现模式的运行,共勉。
本文为详细介绍移植过程中的步骤以及遇到的问题和踩的坑,较为啰嗦,但对我来说具有纪念意义,勿喷。
第一次发文,如出现笔误或者描述不足请大家及时批评指正,望和各位共同探讨学习,[email protected]。
得益于实验室建设,将实验室小型服务器集群利用了起来,对整套服务器各层节点进行Ubuntu-18.04的重新安装。因此可以说本次移植过程是完全从零开始(零基础,空白机)。
特别感谢@creative_peng,@mxj_Bruce等人的文章,让我在移植模式前有了一定的认识,对模式的成功移植起到了关键作用。
以下相关文章,供大家参考:
- https://escomp.github.io/CESM/versions/cesm2.1/html/index.html #官方说明
- https://blog.csdn.net/mayubins/article/details/122190826?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2aggregatepagefirst_rank_ecpm_v1~rank_v31_ecpm-3-122190826.pc_agg_new_rank&utm_term=CESM2%E5%AE%89%E8%A3%85&spm=1000.2123.3001.4430
- https://blog.csdn.net/m0_37388053/article/details/104080143?utm_term=CESM2%E5%AE%89%E8%A3%85&utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2allsobaiduweb~default-2-104080143&spm=3001.4430
类似文章还有很多,再次感谢各位前辈们的经验分享。
算力节点x3,主控管理节点x1。
配置思路:仅将CESM2主程序以及相关环境文件配置在主控管理节点,并通过建立NFS共享文件夹形式实时同步至各算力节点。
ps:此次移植平台各节点之间已通过设置静态ip可基于交换机进行通信,即ping的通。
首先创建存储文件夹并设置局域网共享
mkdir /app/ # 用于存放安装环境文件
mkdir /BIGDATA/ # 用于存放CLM主程序以及CESM2_inputdata
开启各节点间SSH免密登陆以及设置开启NFS局域网共享,可参看:
https://www.cnblogs.com/xiaohuiji190/p/14973563.html
https://blog.csdn.net/linzhiji/article/details/122539768
https://blog.csdn.net/weixin_32630003/article/details/119447126
例如换国内源,update以及设置静态ip等问题,就不再赘述。
1| sudo apt-get install build-essential
2| sudo apt-get install libc6-dev
3| sudo apt-get install python
4| sudo apt-get install cmake
5| sudo apt-get install libxml2-utils
等等...
解压进入gcc源码目录进行编译
cd gcc-8.3.0
首先安装gcc依赖库
sudo apt-get install m4
sudo apt-get install texinfo bison flex
---------------------------------------
./contrib/download_prerequisites #这块需要网络下载,有点慢,下载好后分别进入文件夹进行配置。
---------------------------------------
cd gmp-6.1.0
./configure --prefix=/usr/local/gmp-6.1.0 #--prefix=存储路径,后续不再描述
make
sudo make install
---------------------------------------
cd mpfr-3.1.4
./configure --prefix=/usr/local/mpfr-3.1.4 --with-gmp=/usr/local/gmp-6.1.0
make
sudo make install
---------------------------------------
cd mpc-1.0.3
./configure --prefix=/usr/local/mpc-1.0.3 --with-gmp=/usr/local/gmp-6.1.0 --with-mpfr=/usr/local/mpfr-3.1.4
make
sudo make install
---------------------------------------
cd isl-0.18
./configure --prefix=/usr/local/isl-0.18 --with-gmp-prefix=/usr/local/gmp-6.1.0
make
sudo make install
编辑依赖库环境变量
export LD_LIBRARY_PATH=/usr/local/gmp-6.1.0/lib:/usr/local/mpfr-3.1.4/lib:/usr/local/mpc-1.0.3/lib:/usr/local/isl-0.18$LD_LIBRARY_PATH
安装gcc主程序并配置默认
cd gcc-8.3.0
./configure prefix=/app/gcc8 --enable-checking=release --enable-languages=c,c++,fortran,obj-c++ --disable-multilib --with-gmp=/usr/local/gmp-6.1.0 --with-mpfr=/usr/local/mpfr-3.1.4 --with-mpc=/usr/local/mpc-1.0.3
**一定要enable fortran**
make # 超级慢,可尝试 -j4,但也有人也说不建议,我个人而言目前没遇到问题。
sudo make install #执行完此步即文件安装完成,可移除默认apt-get install等安装的gcc。
移除低版本编译器
sudo apt-get remove gcc
sudo apt-get remove g++
sudo apt-get remove gfortran
编辑环境变量
sudo vi ~/.bashrc #在最后添加如下内容
### gcc8
export GCC_ROOT=/app/gcc8
export PATH=$GCC_ROOT/bin:$PATH
export LD_LIBRARY_PATH=/app/gcc8/lib64/:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/app/gcc8/lib/:$LD_LIBRARY_PATH
export MANPATH=/app/gcc8/share/man:$MANPATH
:wq #保存并退出编辑
source ~/.bashrc #类似于刷新一下
关联新库文件
sudo rm -rf /usr/bin/gcc
sudo rm -rf /usr/bin/g++
sudo rm -rf /usr/bin/gfortran
sudo ln -s /app/gcc8/bin/gcc /usr/bin/gcc
sudo ln -s /app/gcc8/bin/g++ /usr/bin/g++
sudo ln -s /app/gcc8/bin/gfortran /usr/bin/gfortran
至此完成gcc8.3的安装
gcc8 --version #终端输入
-----------------------
gcc (GCC) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
进入MPICH源码目录进行编译
./configure --prefix=/app/mpich3
make -j4
sudo make install
编辑环境
sudo vi ~/.bashrc
## MPICH3
export MPI_ROOT=/app/mpich3
export PATH=$MPI_ROOT/bin:$PATH
export LD_LIBRARY_PATH=/app/mpich3/lib/:$LD_LIBRARY_PATH
export MANPATH=$MPI_ROOT/man:$MANPATH
:wq
source ~/.bashrc
配置完并行程序后可尝试集群内的并行测试,可参照@海岛Blog
https://blog.csdn.net/tigerisland45/article/details/53893350
cd zlib
./configure --prefix=/app/netcdf
make
sudo make install
cd curl
./configure --prefix=/app/netcdf
make -j4
sudo make install
编辑环境
sudo vi ~/.bashrc
## netcdf
export NETCDF_ROOT=/app/netcdf
export PATH=$NETCDF_ROOT/bin:$PATH
export LD_LIBRARY_PATH=/app/netcdf/lib/:$LD_LIBRARY_PATH
export MANPATH=/app/netcdf/share/man:$MANPATH
:wq
source ~/.bashrc
!!!安装HDF5 Pnetcdf和NETCDF之前要先进行编译器的环境设置,且顺序不能乱!!!
如下可能存在没用设置,以防找不到相关文件,做一步也无妨。
sudo rm -rf /usr/bin/mpicc
sudo rm -rf /usr/bin/mpicxx
sudo rm -rf /usr/bin/mpif77
sudo rm -rf /usr/bin/mpif90
sudo rm -rf /usr/bin/mpifort
sudo ln -s /app/mpich3/bin/mpicc /usr/bin/mpicc
sudo ln -s /app/mpich3/bin/mpicxx /usr/bin/mpicxx
sudo ln -s /app/mpich3/bin/mpif77 /usr/bin/mpif77
sudo ln -s /app/mpich3/bin/mpif90 /usr/bin/mpif90
sudo ln -s /app/mpich3/bin/mpifort /usr/bin/mpifort
export CC=/app/mpich3/bin/mpicc
export CXX=/app/mpich3/bin/mpicxx
export FC=/app/mpich3/bin/mpifort
export F77=/app/mpich3/bin/mpifort
export F90=/app/mpich3/bin/mpifort
cd hdf5
./configure --prefix=/app/netcdf --with-zlib=/app/netcdf --enable-fortran --enable-fortran2003 --enable-parallel --with-pic
make -j4
sudo make install
source ~/.bashrc #因为我把zlib curl hdf5 netcdf-c netcdf-f都放在一个文件夹,所以直接source就行
cd pnetcdf
./configure --prefix=/app/pnetcdf
make -j4
sudo make install
编辑环境
sudo vi ~/.bashrc
## pnetcdf
export Pnetcdf_ROOT=/app/pnetcdf
export PATH=$Pnetcdf_ROOT/bin:$PATH
export LD_LIBRARY_PATH=/app/pnetcdf/lib/:$LD_LIBRARY_PATH
export MANPATH=/app/pnetcdf/share/man:$MANPATH
:wq
source ~/.bashrc
cd netcdf-c
CFLAGS="-O3 -fPIC -I/app/netcdf/include" CPPFLAGS="-O3 -fPIC -I/app/netcdf/include" FFLAGS="-O3 -fPIC" LDFLAGS=-L/app/netcdf/lib ./configure --prefix=/app/netcdf --enable-static --enable-shared --enable-netcdf4 --enable-largefile --enable-large-file-tests --enable-diskless --enable-mmap --with-zlib=/app/netcdf
make -j4
sudo make install
配置成功会看到:
+-------------------------------------------------------------+
| Congratulations! You have successfully installed netCDF! |
| |
| You can use script "nc-config" to find out the relevant |
| compiler options to build your application. Enter |
| |
| nc-config --help |
| |
| for additional information. |
| |
| CAUTION: |
| |
| If you have not already run "make check", then we strongly |
| recommend you do so. It does not take very long. |
| |
| Before using netCDF to store important data, test your |
| build with "make check". |
| |
| NetCDF is tested nightly on many platforms at Unidata |
| but your platform is probably different in some ways. |
| |
| If any tests fail, please see the netCDF web site: |
| http://www.unidata.ucar.edu/software/netcdf/ |
| |
| NetCDF is developed and maintained at the Unidata Program |
| Center. Unidata provides a broad array of data and software |
| tools for use in geoscience education and research. |
| http://www.unidata.ucar.edu |
+-------------------------------------------------------------+
cd netcdf-f
./configure --prefix=/app/netcdf --with-netCDF=/app/netcdf --enable-pnetcdf --disable-shared CPPFLAGS="-I/app/netcdf/include" LDFLAGS="-L/app/netcdf/lib"
make -j4
sudo make install
终于啊,终于。。
+-------------------------------------------------------------+
| Congratulations! You have successfully installed the netCDF |
| Fortran libraries. |
| |
| You can use script "nf-config" to find out the relevant |
| compiler options to build your application. Enter |
| |
| nf-config --help |
| |
| for additional information. |
| |
| CAUTION: |
| |
| If you have not already run "make check", then we strongly |
| recommend you do so. It does not take very long. |
| |
| Before using netCDF to store important data, test your |
| build with "make check". |
| |
| NetCDF is tested nightly on many platforms at Unidata |
| but your platform is probably different in some ways. |
| |
| If any tests fail, please see the netCDF web site: |
| http://www.unidata.ucar.edu/software/netcdf/ |
| |
| NetCDF is developed and maintained at the Unidata Program |
| Center. Unidata provides a broad array of data and software |
| tools for use in geoscience education and research. |
| http://www.unidata.ucar.edu |
+-------------------------------------------------------------+
10.安装blas, cblas, lapack库文件
http://www.netlib.org/lapack/index.html#_lapack_version_3_5_0
cd lapack-3.5.0
进入解压后文件夹后,复制make.inc.example为make.inc
cp make.inc.example make.inc
再编辑Makefile的内容,改成如下形式,将原本对第二行的注释,改为对第一行的注释
#lib: lapacklib tmglib
lib: blaslib variants lapacklib tmglib
CFLAGS = -O3 -I$/home/hanlzh/Desktop/lapack-3.5.0/INCLUDE -fno-stack-protector
# 上方代码在桌面进行安装的 因此是Desktop
:wq
##
make时若出现:
--------------------------------------------------------
NEP: Testing Nonsymmetric Eigenvalue Problem routines
./EIG/xeigtstz < nep.in > znep.out 2>&1
Makefile:463: recipe for target 'znep.out' failed
make[1]: *** [znep.out] Error 139
make[1]: Leaving directory '/home/xfbupt/project/other/lapack-3.7.1/TESTING'
Makefile:42: recipe for target 'lapack_testing' failed
make: *** [lapack_testing] Error 2
--------------------------------------------------------
这应该是测试错误,编译其实基本已经完成了
这时只需要执行下面的语句,修改一下栈的大小就可以编译成功了。
ulimit -s 100000
make clean
make
--------------------------------------------------------
将生成的liblapack.a,librefblas.a,libtmglib.a 三个库拷贝到目标地址
sudo cp liblapack.a /app/lapack-3.5.0/liblapack.a
sudo cp librefblas.a /app/lapack-3.5.0/libblas.a #注意一下文件名,不同系统要求略有差异
sudo cp libtmglib.a /app/lapack-3.5.0/libtmglib.a
--------------------------------------------------------
sudo vi ~/.bashrc
## lapack-3.5.0
export PATH=/app/lapack-3.5.0/lapacke/include:$PATH
export LD_LIBRARY_PATH=/app/lapack-3.5.0/:$LD_LIBRARY_PATH
:wq
source ~/.bashrc
git clone https://github.com/esmf-org/esmf.git
cp -r /esmf /app/
sudo vi ~/.bashrc #修改~/.bashrc并添加如下信息
#---------------ESMF environment variables begin-----------
export ESMF_DIR=/app/esmf
export ESMF_BOPT=g
export ESMF_COMM=mpiuni
export ESMF_COMPILER=gfortran
export ESMF_ABI=64
export ESMF_INSTALL_PREFIX=/app/esmf/esmf_install
#export ESMF_NETCDF=($your path )/esmf/netcdf
export ESMF_NETCDF_INCLUDE=/app/netcdf/include
#export ESMF_NETCDF_LIBPATH=($your path )/netcdf-4.6.1/lib
export ESMF_NETCDF_LIBPATH=/app/netcdf/lib
export ESMF_NETCDF_LIBS="-lnetcdf -lnetcdff"
export ESMF_OS=Linux
export ESMF_TESTMPMD=ON
export ESMF_PTHREADS=ON
export ESMF_OPENMP=ON
export ESMF_TESTEXHAUSTIVE=ON
export ESMF_TESTHARNESS_ARRAY=RUN_ESMF_TestHarnessArrayUNI_2
export ESMF_TESTHARNESS_FIELD=RUN_ESMF_TestHarnessFieldUNI_1
export ESMF_NO_INTEGER_1_BYTE=FALSE
export ESMF_NO_INTEGER_2_BYTE=FALSE
export ESMF_FORTRANSYMBOLS=default
export ESMF_DEFER_LIB_BUILD=ON
export ESMF_TESTWITHTHREADS=OFF
export ESMF_CXXCOMPILER=g++
export ESMF_CXXLINKER=g++
export ESMF_F90COMPILER=gfortran
export ESMF_F90LINKER=gfortran
export ESMF_INSTALL_BINDIR=bin/bing/Linux.gfortran.64.mpiuni.default
export ESMF_INSTALL_MODDIR=mod/modg/Linux.gfortran.64.mpiuni.default
export ESMF_INSTALL_LIBDIR=lib/libg/Linux.gfortran.64.mpiuni.default
export ESMF_INSTALL_HEADERDIR=include
export ESMF_INSTALL_DOCDIR=doc
export ESMFBIN_PATH=/app/esmf/bin/bing/Linux.gfortran.64.mpiuni.default
export ESMFLIB_PATH=/app/esmf/lib/libg/Linux.gfortran.64.mpiuni.default
export MPIEXEC=/app/mpich3/bin/mpiexec
export MY_ESMF_REGRID=/app/esmf/bin/bing/Linux.gfortran.64.mpiuni.default/ESMF_RegridWeightGen
#----------ESMF environment variables end-------------
:wq
source ~/.bashrc
sudo chmod -R 777 /app/esmf #给文件夹权限,否则后续无法完成创建
make #不能用 sudo make 否则无法读取bashrc环境变量
make check
make install
----------------------------
ESMF installation complete.
----------------------------
至此,CESM2绝大部分的环境已配置完成。
首先下载cesm2框架
参照官方:https://escomp.github.io/CESM/versions/cesm2.1/html/downloading_cesm.html
git clone -b release-cesm2.1.3 https://github.com/ESCOMP/CESM.git my_cesm_sandbox
cd my_cesm_sandbox
./manage_externals/checkout_externals
#刚开始会卡在这一步,后来才知道,可以让国外同学帮忙下载
将主程序cp到/BIGDATA下,并进行机器配置
sudo cp -r /my_cesm_sandbox /BIGDATA/clm5.0 #根据自己习惯进行了重命名
cd /BIGDATA/clm5.0/cime/config/cesm/machines #模式框架的机器参数指定
其中,config_batch.xml
为作业提交系统设定;config_compilers.xml
为编译器设置,config_machines.xml
为路径等关键信息设置。
config_batch.xml
设定vi添加
<!-- hanlzh -->
<batch_system MACH="lzh" type="none">
</batch_system>
config_compilers.xml
设定<compiler MACH="lzh" COMPILER="gnu">
<CFLAGS>
<append DEBUG="FALSE"> -O2 </append>
</CFLAGS>
<CONFIG_ARGS>
<append> --host=Linux </append>
</CONFIG_ARGS>
<CPPDEFS>
<append> -DLINUX -DFORTRANUNDERSCORE -DNO_R16 -DCPRGNU </append>
</CPPDEFS>
<FFLAGS>
<append DEBUG="FALSE"> -O2 </append>
<append> -ffree-line-length-none </append>
</FFLAGS>
<MPICC> mpicc </MPICC>
<MPICXX> mpicxx </MPICXX>
<MPIFC> mpif90 </MPIFC>
<SCC> gcc </SCC>
<SFC> gfortran </SFC>
<ESMF_LIBDIR>/app/esmf/esmf_install/lib/libg/Linux.gfortran.64.mpiuni.default</ESMF_LIBDIR>
<MPI_LIB_NAME>mpich</MPI_LIB_NAME>
<MPI_PATH>/app/mpich3</MPI_PATH>
<NETCDF_PATH>/app/netcdf</NETCDF_PATH>
<PNETCDF_PATH>/app/netcdf</PNETCDF_PATH>
<LAPACK_LIBDIR>/app/lapack-3.5.0</LAPACK_LIBDIR>
<SLIBS>
<append>-L/app/netcdf/lib -lnetcdf -lnetcdff -lhdf5 -lhdf5_hl -lz -lpnetcdf -L/app/lapack-3.5.0 -llapack -lblas</append>
</SLIBS>
</compiler>
</config_compilers>
config_machines.xml
设定<machine MACH="lzh">
<DESC> Ubuntu gcc8.3 mpich3 </DESC>
<NODENAME_REGEX>regex.expression.matching.your.machine</NODENAME_REGEX>
<OS>LINUX</OS>
<PROXY> https://howto.get.out </PROXY>
<COMPILERS>gnu</COMPILERS>
<MPILIBS>mpich</MPILIBS>
<PROJECT>none</PROJECT>
<SAVE_TIMING_DIR> </SAVE_TIMING_DIR>
<CIME_OUTPUT_ROOT>/BIGDATA/clm5.0/cime/scripts/$CASE</CIME_OUTPUT_ROOT>
<DIN_LOC_ROOT>/BIGDATA/cesm/inputdata</DIN_LOC_ROOT>
<DIN_LOC_ROOT_CLMFORC>/BIGDATA/cesm/inputdata/atm/datm7</DIN_LOC_ROOT_CLMFORC>
<DOUT_S_ROOT>/BIGDATA/clm5.0/cime/scripts/output/$CASE</DOUT_S_ROOT>
<BASELINE_ROOT>/BIGDATA/clm5.0/cime/scripts/$CASE</BASELINE_ROOT>
<CCSM_CPRNC>/BIGDATA/clm5.0/cime/scripts/$CASE</CCSM_CPRNC>
<GMAKE>make</GMAKE>
<GMAKE_J>4</GMAKE_J>
<BATCH_SYSTEM>none</BATCH_SYSTEM>
<SUPPORTED_BY>[email protected]</SUPPORTED_BY>
<MAX_TASKS_PER_NODE>24</MAX_TASKS_PER_NODE>
<MAX_MPITASKS_PER_NODE>24</MAX_MPITASKS_PER_NODE>
<PROJECT_REQUIRED>FALSE</PROJECT_REQUIRED>
<mpirun mpilib="default">
<executable>mpirun</executable>
<arguments>
<arg name="ntasks"> -n {{ total_tasks }} </arg>
</arguments>
</mpirun>
<module_system type="none"/>
<environment_variables>
<env name="OMP_STACKSIZE">256M</env>
<env name="NETCDF_PATH">/app/netcdf/</env>
</environment_variables>
<resource_limits>
<resource name="RLIMIT_STACK">-1</resource>
</resource_limits>
</machine>
inputdata等相关路径在模式运行前要提前创建好(mkdir)
cd /BIGDATA/clm5.0/cime/scripts
./create_newcase --case mycase1 --res f09_g16 --compset I2000Clm50BgcCru --mach lzh --run-unsupported
#创建一个名为mycase1的案例,分辨率为f09_g16,运行I2000Clm50BgcCru模块,在lzh机器上。
cd mycase1
./case.setup #制定好env_mach_pes.xml和env_run.xml后设定case。
./case.build --skip-provenance-check #设定输出变量,输出频率,时间间隔之后编译case,
这步时间较长,大多数错误都会出现在这个部分,如遇error,可查看所提示的log文件,绝大部分错误原因都是缺少库文件或读写权限。
#例如 sudo ln -s /usr/lib/x86_64-linux-gnu/libmpfr.so.6 /usr/lib/x86_64-linux-gnu/libmpfr.so.4
./check_input_data #检查case所需的输入数据情况。没有的,可下载好导入设定的文件夹下。
./case.submit #作业提交并运行。