Linpack安装过程

 

    Linpack安装在安装之前,我们需要做一些软件准备,相关的软件及下载地址如下。

1Linux平台,最新稳定内核的Linux发行版最佳,可以选择Red hat, Centos等。

2MPICH2,这是个并行计算的软件,可以到http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads下载最新的源码包。

3GotoblasBLAS库(Basic LinearAlgebra Subprograms)是执行向量和矩阵运算的子程序集合,这里我们选择公认性能最好的Gotoblas,最新版可到http://www.tacc.utexas.edu/tacc-projects/下载,需要注册。

4HPLlinpack测试的软件,可在http://www.netlib.org/benchmark/hpl/下载最新版本。

一、Mpich2的安装过程

1、解压软件包

tar zxvf mpich2-1.1.1p1.tar.gz cd mpich2-1.1.1p1

指定目录编译 ./configure--prefix=/root/linpack/mpi --with-pm=smpd --enable-f77

make

make install

2、配置环境变量

vim~/.bashrc       

PATH="$PATH:/usr/local/mpi/bin"   

source .bashrc 3

测试环境变量   

which smpd       

whichmpiexec  

下面这两部据说在测试时需要输入密码,但是不知道为什么这个密码没有生效。

4、修改/root/.mpd.conf    

secretword=myword   

chmod 600 /root/.mpd.conf

5、修改/etc/mpd.conf    

secretword=myword   

chmod 600 /etc/mpd.conf

6、测试mpich2的进程smpd是否启动

[root@LG01 linpack]# which smpd

/root/linpack/mpi/bin/smpd

[root@LG01 linpack]# smpd –s

[root@LG01 linpack]# ps -ef | grep smpd

测试mpi是否启动

[root@LG01 linpack]#mpiexec -n 1 hostname

二、GotoblasBLAS库(Basic LinearAlgebra Subprograms)是执行向量和矩阵运算的子程序集合,这里我们选择公认性能最好的GotoblasGotoBLAS2-1.13_bsd.tar.gz

#tar -xzvf GotoBLAS2-1.13_bsd.tar.gz

#cd GotoBLAS2

#vi Makefile.rule

改四个地方,标注为(#  modified)的行:

#

#  Beginning ofuser configuration

#

 

# This library's version

VERSION = 1.13

 

# You can specify the target architecture, otherwiseit's

# automatically detected.

TARGET = PENRYN                          NEHALEM                     

 

# If you want to support multiple architecture in onebinary

# DYNAMIC_ARCH = 1

 

# C compiler including binary type(32bit / 64bit).Default is gcc.

# Don't use Intel Compiler or PGI, it won't generateright codes as I expect.

CC = gcc                                       #  modified (设置C编译器)

 

# Fortran compiler. Default is g77.

FC = gfortran                     # modified  (设置fortran编译器)

 

# Even you can specify cross compiler

# CC = x86_64-w64-mingw32-gcc

# FC = x86_64-w64-mingw32-gfortran

 

# If you need 32bit binary, define BINARY=32,otherwise define BINARY=64

BINARY=64                                           #  modified (64linux操作系统)

 

# About threaded BLAS. It will be automaticallydetected if you don't

# specify it.

# For force setting for single threaded, specifyUSE_THREAD = 0

# For force setting for multi  threaded, specify USE_THREAD = 1

# USE_THREAD = 0

 

# If you're going to use this library with OpenMP,please comment it in.

# USE_OPENMP = 1

 

# You can define maximum number of threads. Basicallyit should be

# less than actual number of cores. If you don'tspecify one, it's

# automatically detected by the the script.

NUM_THREADS = 1   #  modified  (单线程运行,可以根据需要配成多线程)

 

# If you don't need CBLAS interface, please comment itin.

# NO_CBLAS = 1

 

# If you want to use legacy threaded Level 3implementation.

# USE_SIMPLE_THREADED_LEVEL3 = 1

 

# If you want to drive whole 64bit region by BLAS. Notall Fortran

# compiler supports this. It's safe to keep comment itout if you

# are not sure(equivalent to "-i8" option).

# INTERFACE64 = 1

 

# Unfortunately most of kernel won't give us highquality buffer.

# BLAS tries to find the best region before enteringmain function,

# but it will consume time. If you don't like it, youcan disable one.

# NO_WARMUP = 1

 

# If you want to disable CPU/Memory affinity on Linux.

# NO_AFFINITY = 1

 

# If you would like to know minute performance reportof GotoBLAS.

# FUNCTION_PROFILE = 1

 

# Support for IEEE quad precision(it's *real*REAL*16)( under testing)

# QUAD_PRECISION = 1

 

# Theads are still working for a while after finishingBLAS operation

# to reduce thread activate/deactivate overhead. Youcan determine

# time out to improve performance. This number shouldbe from 4 to 30

# which corresponds to (1 << n) cycles. Forexample, if you set to 26,

# thread will be running for (1 << 26)cycles(about 25ms on 3.0GHz

# system). Also you can control this mumber byGOTO_THREAD_TIMEOUT

# CCOMMON_OPT  += -DTHREAD_TIMEOUT=26

 

# Using special device driver for mapping physicallycontigous memory

# to the user space. If bigphysarea is enabled, itwill use it.

# DEVICEDRIVER_ALLOCATION = 1

 

# If you need to synchronize FP CSR between threads(for x86/x86_64 only).

# CONSISTENT_FPCSR = 1

 

# If you need santy check by comparing reference BLAS.It'll be very

# slow (Not implemented yet).

# SANITY_CHECK = 1

 

# Common Optimization Flag; -O2 is enough.

COMMON_OPT += -O2

 

# Profiling flags

COMMON_PROF = -pg

 

#

#  End of userconfiguration

#

 进入目录执行:./quickbuild.64bit

若出现如下:

../kernel/x86_64/gemm_ncopy_4.S:192: Error: undefinedsymbol `RPREFETCHSIZE' in operation ../kernel/x86_64/gemm_ncopy_4.S:193: Error:undefined symbol `RPREFETCHSIZE' in operation../kernel/x86_64/gemm_ncopy_4.S:194: Error: undefined symbol `RPREFETCHSIZE' inoperation ../kernel/x86_64/gemm_ncopy_4.S:195: Error: undefined symbol`RPREFETCHSIZE' in operation

则执行:

gmake clean

make BINARY=64 TARGET=NEHALEM

出现以上错误的原因为,cpu太新,配置文件不识别,需要重新指定一下CPU类型

三、安装HPL

进入hpl文件夹从setup文件夹下cp与自己平台相近的Make.<arch>文件,复制到hpl文件夹内,比如我们的平台为Intel xeon,所以就选择了Make.Linux_PII_FBLAS,它代表Linux操作系统、PII平台、采用FBLAS库。

tar xzvf hpl-2.0.tar.gz

vi Make. Linux_PII_FBLAS

#  OF THISSOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# ######################################################################

#

#----------------------------------------------------------------------

# - shell--------------------------------------------------------------

#----------------------------------------------------------------------

#

SHELL        =/bin/sh

#

CD           =cd

CP           =cp

LN_S         =ln -s

MKDIR        =mkdir

RM           =/bin/rm -f

TOUCH        =touch

#

#----------------------------------------------------------------------

# - Platform identifier------------------------------------------------

#----------------------------------------------------------------------

#

ARCH         =Linux_PII_FBLAS

#

#----------------------------------------------------------------------

# - HPL Directory Structure / HPL library------------------------------

#----------------------------------------------------------------------

#

TOPdir      = /root/linpack/hpl

INCdir       =$(TOPdir)/include

BINdir       =$(TOPdir)/bin/$(ARCH)

LIBdir       =$(TOPdir)/lib/$(ARCH)

#

HPLlib       =$(LIBdir)/libhpl.a

#

#----------------------------------------------------------------------

# - Message Passing library (MPI)--------------------------------------

#----------------------------------------------------------------------

# MPinc tells the C  compiler where to find theMessage Passing library

# header files, MPlib  is defined  to be the name of  the library to be

# used. The variable MPdir is only used for definingMPinc and MPlib.

#

MPdir       = /root/linpack/mpi

MPinc        =-I$(MPdir)/include

MPlib        =$(MPdir)/lib/libmpich.a

#

#----------------------------------------------------------------------

# - Linear Algebra library (BLAS or VSIPL)-----------------------------

# ----------------------------------------------------------------------

# LAinc tells the C  compiler where to find theLinear Algebra  library

# header files, LAlib  is defined  to be the name of  the library to be

# used. The variable LAdir is only used for defining LAincand LAlib.

#

LAdir       = /root/linpack/GotoBLAS2

LAinc       =

LAlib       = $(LAdir)/libgoto2.a $(LAdir)/libgoto2.so

#

#----------------------------------------------------------------------

# - F77 / C interface --------------------------------------------------

#----------------------------------------------------------------------

# You can skip this section  if and only if  you are not planning to use

# a  BLAS  library featuring a Fortran 77interface.  Otherwise,  it  is

# necessary to  fill out the  F2CDEFS variable  with  the appropriate

# options. **One and only one**  optionshould be chosen in **each** of

# the 3 following categories:

#

# 1) name space (How C calls a Fortran 77 routine)

#

# -DAdd_             : all lower case and a suffixed underscore  (Suns,

#                      Intel, ...),                          [default]

# -DNoChange         : all lower case (IBM RS6000),

# -DUpCase           : all upper case (Cray),

# -DAdd__            : the FORTRAN compiler in use is f2c.

#

# 2) C and Fortran 77 integer mapping

#

# -DF77_INTEGER=int  : Fortran 77 INTEGER is a C int,        [default]

# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,

# -DF77_INTEGER=short : Fortran 77 INTEGER is a Cshort.

#

# 3) Fortran 77 string handling

#

# -DStringSunStyle   : The string address is passed at the string loca-

#                      tion on the stack, and the string length is then

#                      passed as  an  F77_INTEGER after  all  explicit

#                      stack arguments,                       [default]

# -DStringStructPtr  : The address  of  a structure  is  passed by  a

#                      Fortran 77  string,  and the structure is of the

#                      form: struct {char *cp; F77_INTEGER len;},

# -DStringStructVal  : A structure is passed by value for each  Fortran

#                      77 string,  and  the structure is  of the form:

#                      struct {char *cp; F77_INTEGER len;},

# -DStringCrayStyle  : Special option for  Cray  machines, which  uses

#                      Cray  fcd  (fortran character  descriptor)  for

#                      interoperation.

#

F2CDEFS      =-DAdd__ -DF77_INTEGER=int -DStringSunStyle

#

# ----------------------------------------------------------------------

# - HPL includes / libraries / specifics-------------------------------

#----------------------------------------------------------------------

#

HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH)$(LAinc) $(MPinc)

HPL_LIBS     =$(HPLlib) $(LAlib) $(MPlib)

#

# - Compile time options-----------------------------------------------

#

# -DHPL_COPY_L           force the copy of the panel L beforebcast;

# -DHPL_CALL_CBLAS       call the cblas interface;

# -DHPL_CALL_VSIPL       call the vsip  library;

# -DHPL_DETAILED_TIMING  enable detailed timers;

#

# By default HPL will:

#    *) not copyL before broadcast,

#    *) call theBLAS Fortran 77 interface,

#    *) notdisplay detailed timing information.

#

HPL_OPTS     =

#

#----------------------------------------------------------------------

#

HPL_DEFS     =$(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)

#

#----------------------------------------------------------------------

# - Compilers / linkers - Optimization flags---------------------------

#----------------------------------------------------------------------

#

CC          = /root/linpack/mpi/bin/mpicc

CCNOOPT      =$(HPL_DEFS)

CCFLAGS      =$(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall

#

# On some platforms, it is necessary  to use theFortran linker to find

# the Fortran internals used in the BLAS library.

#

LINKER      = /root/linpack/mpi/bin/mpif77                       

LINKFLAGS    =$(CCFLAGS)

#

ARCHIVER     =ar

ARFLAGS      = r

RANLIB       =echo

#

#----------------------------------------------------------------------

make arch=Linux_pII_FBLAS

进入/linpack/hpl-2.0/bin/Linux_PII_FBLAS,可以看到hpl.datxhpl说明安装成功

 

       上面的方法是常规服务器测试的常规方法,但是遇到ibm blade就不太好使了,我们开头用这个测试,只能达到理论值的20%左右,但是实际值应该能达到50%以上才正常,最后我们到intel官方网站去找了他们官方linpack。

Ø  TLF-SOFT-Intel.C.Plus.Plus.Composer.XE.2011.5.220.LINUX.ISO-TBE.iso(INTELC,其中包含了INTEL的数学库MKL)

Ø  TLF-SOFT-Intel.Fortran.Composer.XE.2011.5.220.LINUX.ISO-TBE.isoINTELfortran

Ø  l_mpi_p_4.0.3.008.tgzINTELmpi软件)

参考文章:http://book.51cto.com/art/200911/162338.htm

 

 

 
database群:119224876(db china联盟) 虚拟化云计算方面群:229845401(虚拟化-云计算-物联网)

你可能感兴趣的:(Integer,library,fortran,compiler,optimization,structure)