编译HPL (hpl-2.0_FERMI_v08.tar) 过程简记

HPL: A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers

安装HPL for GPU (hpl-2.0_FERMI_v08.tar) 之前,机器里要预装编译器,并行环境MPI、基本线性代数子方程(BLAS)或矢量图形信号处理器(VSIPL)两者之一。

我装的是BLAS,也装了CBLAS。不记得是不是必须,我还装了一个LAPACK(Linear Algebra PACKage,http://www.netlib.org/lapack/)。

1. BLAS

    比较简单,make即可

2. CLBAS

    make文件中有个BLLIB的路径指示为../../librefblas.a,修改为BLAS的blas.LINUX.a实际位置

3. LAPCK

    make文件中指定对BLAS的位置就行了。成功后大概有个下面的结果。

                        -->   LAPACK TESTING SUMMARY  <--
                Processing LAPACK Testing output found in the TESTING direcory
SUMMARY                 nb test run     numerical error         other error
================        ===========     =================       ================
REAL                    1064911         39      (0.004%)        0       (0.000%)
DOUBLE PRECISION        1052315         203     (0.019%)        0       (0.000%)
COMPLEX                 508588          2       (0.000%)        0       (0.000%)
COMPLEX16               530862          28      (0.005%)        0       (0.000%)

--> ALL PRECISIONS      3156676         272     (0.009%)        0       (0.000%)

4.  HPL for GPU

  1.     关键改对make中这两个文件的位置改对,其他的保持默认。
        LAdir:  CBLAS 库或 VSIPL 库所在的目录
        LAlib:CBLAS 库或 VSIPL 库头文件、库文件
        编译完成后,生成可执行文件 xhpl (在hpl//bin 目录下)。我的 用的是默认的 CUDA_pinned
        可以借鉴 setup 目录里的各种平台下的编译文件模板,我用的是 Make.Linux_PII_CBLAS
       
       
  2.     中间有几次提示找不到 libhpl.a, needed by 'dexe.grd'  我将Make.CUDA_pinned 中定义 libhpl.a 的地址写成绝对地址后,错误消失
        原始为:HPLlib       = $(LIBdir)/libhpl.a
        修改为:HPLlib       = /home/michaelchen/SHOC/hpl-2.0_FERMI_v08/lib/CUDA_pinned/libhpl.a

  3.     MPI的地址也要写对。我用的是openmpi,lib用的是 lib/libmpi.so

  4.     Make.CUDA_pinned 中 LINKER 用的是g77,这个在我这里没有,已经更新为 gfortran

  5.     遇到Make.inc access permission denied  错误,多半是链接文件原始是root开头,需要修改,链接到我们自己的Make.CUDA_pinned文件内容
        使用 ln -sf source/Make.CUDA_pinned Make.inc



  6.     这里附上我的编译文件: Make.CUDA_pinned
        粗体部分为我修改过的地方。
    # ----------------------------------------------------------------------
    # - shell --------------------------------------------------------------
    # ----------------------------------------------------------------------
    #
    SHELL        = /bin/sh
    #
    CD           = cd
    CP           = cp
    LN_S         = ln -s
    MKDIR        = mkdir
    RM           = /bin/rm -f
    TOUCH        = touch
    #
    # ----------------------------------------------------------------------
    # - Platform identifier ------------------------------------------------
    # ----------------------------------------------------------------------
    #
    ARCH         = CUDA_pinned
    #
    # ----------------------------------------------------------------------
    # - HPL Directory Structure / HPL library ------------------------------
    # ----------------------------------------------------------------------
    #
    TOPdir       = /home/michaelchen/SHOC/hpl-2.0_FERMI_v08
    INCdir       = $(TOPdir)/include
    BINdir       = $(TOPdir)/bin/$(ARCH)
    LIBdir       = $(TOPdir)/lib/$(ARCH)
    #
    HPLlib       = /home/michaelchen/SHOC/hpl-2.0_FERMI_v08/lib/CUDA_pinned/libhpl.a
    #
    # ----------------------------------------------------------------------
    # - Message Passing library (MPI) --------------------------------------
    # ----------------------------------------------------------------------
    # MPinc tells the  C  compiler where to find the Message Passing library
    # header files,  MPlib  is defined  to be the name of  the library to be
    # used. The variable MPdir is only used for defining MPinc and MPlib.
    #
    MPdir        = /opt/openmpi-1.4.3
    MPinc        = -I$(MPdir)/include
    MPlib        = $(MPdir)/lib/libmpi.so
    #
    # ----------------------------------------------------------------------
    # - Linear Algebra library (BLAS or VSIPL) -----------------------------
    # ----------------------------------------------------------------------
    # LAinc tells the  C  compiler where to find the Linear Algebra  library
    # header files,  LAlib  is defined  to be the name of  the library to be
    # used. The variable LAdir is only used for defining LAinc and LAlib.
    #
    LAdir        = $(HOME)/SHOC/CBLAS
    LAinc        =
    LAlib        = $(LAdir)/lib/cblas_LINUX.a
    #
    # ----------------------------------------------------------------------
    # - F77 / C interface --------------------------------------------------
    # ----------------------------------------------------------------------
    # You can skip this section  if and only if  you are not planning to use
    # a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
    # necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
    # options.  **One and only one**  option should be chosen in **each** of
    # the 3 following categories:
    #
    # 1) name space (How C calls a Fortran 77 routine)
    #
    # -DAdd_              : all lower case and a suffixed underscore  (Suns,
    #                       Intel, ...),                           [default]
    # -DNoChange          : all lower case (IBM RS6000),
    # -DUpCase            : all upper case (Cray),
    # -DAdd__             : the FORTRAN compiler in use is f2c.
    #
    # 2) C and Fortran 77 integer mapping
    #
    # -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
    # -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
    # -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
    #
    # 3) Fortran 77 string handling
    #
    # -DStringSunStyle    : The string address is passed at the string loca-
    #                       tion on the stack, and the string length is then
    #                       passed as  an  F77_INTEGER  after  all  explicit
    #                       stack arguments,                       [default]
    # -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
    #                       Fortran 77  string,  and the structure is of the
    #                       form: struct {char *cp; F77_INTEGER len;},
    # -DStringStructVal   : A structure is passed by value for each  Fortran
    #                       77 string,  and  the  structure is  of the form:
    #                       struct {char *cp; F77_INTEGER len;},
    # -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
    #                       Cray  fcd  (fortran  character  descriptor)  for
    #                       interoperation.
    #
    F2CDEFS      =
    #
    # ----------------------------------------------------------------------
    # - HPL includes / libraries / specifics -------------------------------
    # ----------------------------------------------------------------------
    #
    HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
    HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
    #
    # - Compile time options -----------------------------------------------
    #
    # -DHPL_COPY_L           force the copy of the panel L before bcast;
    # -DHPL_CALL_CBLAS       call the cblas interface;
    # -DHPL_CALL_VSIPL       call the vsip  library;
    # -DHPL_DETAILED_TIMING  enable detailed timers;
    #
    # By default HPL will:
    #    *) not copy L before broadcast,
    #    *) call the BLAS Fortran 77 interface,
    #    *) not display detailed timing information.
    #
    HPL_OPTS     = -DHPL_CALL_CBLAS
    #
    # ----------------------------------------------------------------------
    #
    HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
    #
    # ----------------------------------------------------------------------
    # - Compilers / linkers - Optimization flags ---------------------------
    # ----------------------------------------------------------------------
    #
    CC           = /usr/bin/gcc
    CCNOOPT      = $(HPL_DEFS)
    CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops
    #
    # On some platforms,  it is necessary  to use the Fortran linker to find
    # the Fortran internals used in the BLAS library.
    #
    LINKER       = /usr/bin/gfortran
    LINKFLAGS    = $(CCFLAGS)
    #
    ARCHIVER     = ar
    ARFLAGS      = r
    RANLIB       = echo
    #
    # ----------------------------------------------------------------------

  7. 运行时好几次说有些.so, .a 文件找不到等等,这些都可以在同目录下的path文件指定清楚,记住source使其生效。
    或者直接拷贝/链接对应的文件到目录中。
    比如我用的是CUDA 4.0,但是HPL for CUDA是建立在CUDA3.0上的。所以当它寻找 usr/local/cuda/lib64 目录中的 libcublas.so.3 等文件时会提示找不到,只有so.4。
    我们可以用上面的方法解决。

  8. 运行时还可能会有一个错误:
    hpl-2.0_FERMI_v08/bin/CUDA_pinned/xhpl: error while loading shared libraries: libcublas.so.3: wrong ELF class: ELFCLASS32
    这是因为CUDA中用错了lib的版本,不应该用 usr/local/cuda/lib , 而应该用 usr/local/cuda/lib64 中的 so 文件

    运行mpirun -np 1 xi03 成功。

    如果出现“Error allocating scratch space 2048.00 MB”之类的问题,在对应的src/cuda中找到源文件(比如cuda_dgemm.c第229行中的‘2048’),进行相应修改(比如改为1024),调小一些即可。
    eg:err1=cudaMalloc((void**)&dev_scratch[0], (size_t)(1024.0*1024.0*1024.0) );


    具体如何微调 HPL.dat 中的值,可以参考HPL 根目录下的 TUNE 文件。


Ref:  http://blog.sina.com.cn/s/blog_442806280100mxbu.html

你可能感兴趣的:(CUDA/GPU)