Linpack安装在安装之前,我们需要做一些软件准备,相关的软件及下载地址如下。
(1)Linux平台,最新稳定内核的Linux发行版最佳,可以选择Red hat, Centos等。
(2)MPICH2,这是个并行计算的软件,可以到http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads下载最新的源码包。
(3)Gotoblas,BLAS库(Basic LinearAlgebra Subprograms)是执行向量和矩阵运算的子程序集合,这里我们选择公认性能最好的Gotoblas,最新版可到http://www.tacc.utexas.edu/tacc-projects/下载,需要注册。
(4)HPL,linpack测试的软件,可在http://www.netlib.org/benchmark/hpl/下载最新版本。
一、Mpich2的安装过程
1、解压软件包
tar zxvf mpich2-1.1.1p1.tar.gz cd mpich2-1.1.1p1
指定目录编译 ./configure--prefix=/root/linpack/mpi --with-pm=smpd --enable-f77
make
make install
2、配置环境变量
vim~/.bashrc
PATH="$PATH:/usr/local/mpi/bin"
source .bashrc 3、
测试环境变量
which smpd
whichmpiexec
下面这两部据说在测试时需要输入密码,但是不知道为什么这个密码没有生效。
4、修改/root/.mpd.conf
secretword=myword
chmod 600 /root/.mpd.conf
5、修改/etc/mpd.conf
secretword=myword
chmod 600 /etc/mpd.conf
6、测试mpich2的进程smpd是否启动
[root@LG01 linpack]# which smpd
/root/linpack/mpi/bin/smpd
[root@LG01 linpack]# smpd –s
[root@LG01 linpack]# ps -ef | grep smpd
测试mpi是否启动
[root@LG01 linpack]#mpiexec -n 1 hostname
二、Gotoblas,BLAS库(Basic LinearAlgebra Subprograms)是执行向量和矩阵运算的子程序集合,这里我们选择公认性能最好的GotoblasGotoBLAS2-1.13_bsd.tar.gz
#tar -xzvf GotoBLAS2-1.13_bsd.tar.gz
#cd GotoBLAS2
#vi Makefile.rule
改四个地方,标注为(# modified)的行:
#
# Beginning ofuser configuration
#
# This library's version
VERSION = 1.13
# You can specify the target architecture, otherwiseit's
# automatically detected.
TARGET = PENRYN NEHALEM
# If you want to support multiple architecture in onebinary
# DYNAMIC_ARCH = 1
# C compiler including binary type(32bit / 64bit).Default is gcc.
# Don't use Intel Compiler or PGI, it won't generateright codes as I expect.
CC = gcc # modified (设置C编译器)
# Fortran compiler. Default is g77.
FC = gfortran # modified (设置fortran编译器)
# Even you can specify cross compiler
# CC = x86_64-w64-mingw32-gcc
# FC = x86_64-w64-mingw32-gfortran
# If you need 32bit binary, define BINARY=32,otherwise define BINARY=64
BINARY=64 # modified (64为linux操作系统)
# About threaded BLAS. It will be automaticallydetected if you don't
# specify it.
# For force setting for single threaded, specifyUSE_THREAD = 0
# For force setting for multi threaded, specify USE_THREAD = 1
# USE_THREAD = 0
# If you're going to use this library with OpenMP,please comment it in.
# USE_OPENMP = 1
# You can define maximum number of threads. Basicallyit should be
# less than actual number of cores. If you don'tspecify one, it's
# automatically detected by the the script.
NUM_THREADS = 1 # modified (单线程运行,可以根据需要配成多线程)
# If you don't need CBLAS interface, please comment itin.
# NO_CBLAS = 1
# If you want to use legacy threaded Level 3implementation.
# USE_SIMPLE_THREADED_LEVEL3 = 1
# If you want to drive whole 64bit region by BLAS. Notall Fortran
# compiler supports this. It's safe to keep comment itout if you
# are not sure(equivalent to "-i8" option).
# INTERFACE64 = 1
# Unfortunately most of kernel won't give us highquality buffer.
# BLAS tries to find the best region before enteringmain function,
# but it will consume time. If you don't like it, youcan disable one.
# NO_WARMUP = 1
# If you want to disable CPU/Memory affinity on Linux.
# NO_AFFINITY = 1
# If you would like to know minute performance reportof GotoBLAS.
# FUNCTION_PROFILE = 1
# Support for IEEE quad precision(it's *real*REAL*16)( under testing)
# QUAD_PRECISION = 1
# Theads are still working for a while after finishingBLAS operation
# to reduce thread activate/deactivate overhead. Youcan determine
# time out to improve performance. This number shouldbe from 4 to 30
# which corresponds to (1 << n) cycles. Forexample, if you set to 26,
# thread will be running for (1 << 26)cycles(about 25ms on 3.0GHz
# system). Also you can control this mumber byGOTO_THREAD_TIMEOUT
# CCOMMON_OPT += -DTHREAD_TIMEOUT=26
# Using special device driver for mapping physicallycontigous memory
# to the user space. If bigphysarea is enabled, itwill use it.
# DEVICEDRIVER_ALLOCATION = 1
# If you need to synchronize FP CSR between threads(for x86/x86_64 only).
# CONSISTENT_FPCSR = 1
# If you need santy check by comparing reference BLAS.It'll be very
# slow (Not implemented yet).
# SANITY_CHECK = 1
# Common Optimization Flag; -O2 is enough.
COMMON_OPT += -O2
# Profiling flags
COMMON_PROF = -pg
#
# End of userconfiguration
#
进入目录执行:./quickbuild.64bit
若出现如下:
../kernel/x86_64/gemm_ncopy_4.S:192: Error: undefinedsymbol `RPREFETCHSIZE' in operation ../kernel/x86_64/gemm_ncopy_4.S:193: Error:undefined symbol `RPREFETCHSIZE' in operation../kernel/x86_64/gemm_ncopy_4.S:194: Error: undefined symbol `RPREFETCHSIZE' inoperation ../kernel/x86_64/gemm_ncopy_4.S:195: Error: undefined symbol`RPREFETCHSIZE' in operation
则执行:
gmake clean
make BINARY=64 TARGET=NEHALEM
出现以上错误的原因为,cpu太新,配置文件不识别,需要重新指定一下CPU类型
三、安装HPL。
进入hpl文件夹从setup文件夹下cp与自己平台相近的Make.<arch>文件,复制到hpl文件夹内,比如我们的平台为Intel xeon,所以就选择了Make.Linux_PII_FBLAS,它代表Linux操作系统、PII平台、采用FBLAS库。
tar xzvf hpl-2.0.tar.gz
vi Make. Linux_PII_FBLAS
# OF THISSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# ######################################################################
#
#----------------------------------------------------------------------
# - shell--------------------------------------------------------------
#----------------------------------------------------------------------
#
SHELL =/bin/sh
#
CD =cd
CP =cp
LN_S =ln -s
MKDIR =mkdir
RM =/bin/rm -f
TOUCH =touch
#
#----------------------------------------------------------------------
# - Platform identifier------------------------------------------------
#----------------------------------------------------------------------
#
ARCH =Linux_PII_FBLAS
#
#----------------------------------------------------------------------
# - HPL Directory Structure / HPL library------------------------------
#----------------------------------------------------------------------
#
TOPdir = /root/linpack/hpl
INCdir =$(TOPdir)/include
BINdir =$(TOPdir)/bin/$(ARCH)
LIBdir =$(TOPdir)/lib/$(ARCH)
#
HPLlib =$(LIBdir)/libhpl.a
#
#----------------------------------------------------------------------
# - Message Passing library (MPI)--------------------------------------
#----------------------------------------------------------------------
# MPinc tells the C compiler where to find theMessage Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for definingMPinc and MPlib.
#
MPdir = /root/linpack/mpi
MPinc =-I$(MPdir)/include
MPlib =$(MPdir)/lib/libmpich.a
#
#----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL)-----------------------------
# ----------------------------------------------------------------------
# LAinc tells the C compiler where to find theLinear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAincand LAlib.
#
LAdir = /root/linpack/GotoBLAS2
LAinc =
LAlib = $(LAdir)/libgoto2.a $(LAdir)/libgoto2.so
#
#----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
#----------------------------------------------------------------------
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** optionshould be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_ : all lower case and a suffixed underscore (Suns,
# Intel, ...), [default]
# -DNoChange : all lower case (IBM RS6000),
# -DUpCase : all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a Cshort.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle : The string address is passed at the string loca-
# tion on the stack, and the string length is then
# passed as an F77_INTEGER after all explicit
# stack arguments, [default]
# -DStringStructPtr : The address of a structure is passed by a
# Fortran 77 string, and the structure is of the
# form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal : A structure is passed by value for each Fortran
# 77 string, and the structure is of the form:
# struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle : Special option for Cray machines, which uses
# Cray fcd (fortran character descriptor) for
# interoperation.
#
F2CDEFS =-DAdd__ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics-------------------------------
#----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH)$(LAinc) $(MPinc)
HPL_LIBS =$(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options-----------------------------------------------
#
# -DHPL_COPY_L force the copy of the panel L beforebcast;
# -DHPL_CALL_CBLAS call the cblas interface;
# -DHPL_CALL_VSIPL call the vsip library;
# -DHPL_DETAILED_TIMING enable detailed timers;
#
# By default HPL will:
# *) not copyL before broadcast,
# *) call theBLAS Fortran 77 interface,
# *) notdisplay detailed timing information.
#
HPL_OPTS =
#
#----------------------------------------------------------------------
#
HPL_DEFS =$(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
#----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags---------------------------
#----------------------------------------------------------------------
#
CC = /root/linpack/mpi/bin/mpicc
CCNOOPT =$(HPL_DEFS)
CCFLAGS =$(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#
# On some platforms, it is necessary to use theFortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER = /root/linpack/mpi/bin/mpif77
LINKFLAGS =$(CCFLAGS)
#
ARCHIVER =ar
ARFLAGS = r
RANLIB =echo
#
#----------------------------------------------------------------------
make arch=Linux_pII_FBLAS
进入/linpack/hpl-2.0/bin/Linux_PII_FBLAS,可以看到hpl.dat和xhpl说明安装成功
上面的方法是常规服务器测试的常规方法,但是遇到ibm blade就不太好使了,我们开头用这个测试,只能达到理论值的20%左右,但是实际值应该能达到50%以上才正常,最后我们到intel官方网站去找了他们官方linpack。
Ø TLF-SOFT-Intel.C.Plus.Plus.Composer.XE.2011.5.220.LINUX.ISO-TBE.iso(INTEL的C,其中包含了INTEL的数学库MKL)
Ø TLF-SOFT-Intel.Fortran.Composer.XE.2011.5.220.LINUX.ISO-TBE.iso(INTEL的fortran)
Ø l_mpi_p_4.0.3.008.tgz(INTEL的mpi软件)
参考文章:http://book.51cto.com/art/200911/162338.htm
database群:119224876(db china联盟) 虚拟化云计算方面群:229845401(虚拟化-云计算-物联网)