需要的软件包:
1> mpi运行环境,这里我们使用的是:mpich2-1.5.tar.gz
2>矩阵库GOTOBLAS,我们使用的是:GotoBLAS2-1.13.tar.gz
3>linpack测试软件包:hpl-2.1.tar.gz
安装过程:
1> GOTOBLAS2代数库的安装
查看cpu架构:cat /proc/cpuinfo
[root@compute-0 ~]# cat /proc/cpuinfo processor: 0 vendor_id: AuthenticAMD cpu family: 16 model: 5 model name: AMD Athlon(tm) II X4 620 Processor # 注:我的是AMD的架构 如果你的是intel 应该是CORE2的架构 stepping: 2 cpu MHz: 2600.147 cache size: 512 KB fpu: yes fpu_exception: yes cpuid level: 5 wp: yes flags: fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc up rep_good tsc_reliable nonstop_tsc unfair_spinlock pni cx16 x2apic popcnt hypervisor lahf_lm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw bogomips: 5200.29 TLB size: 1024 4K pages clflush size: 64 cache_alignment: 64 address sizes: 40 bits physical, 48 bits virtual power management: |
修改Makefile.rul (注:对Makefile.rul修改,用户可以更加针对自己的硬件平台进行安装,因此效率会高很多。
所有架构、编译器的选择,多线程的设置等都是修改这个文件。)
# # Beginning of user configuration #
# This library's version VERSION = 1.13
# You can specify the target architecture, otherwise it's # automatically detected. # TARGET = PENRYN
# If you want to support multiple architecture in one binary # DYNAMIC_ARCH = 1
# C compiler including binary type(32bit / 64bit). Default is gcc. # Don't use Intel Compiler or PGI, it won't generate right codes as I expect. CC = gcc
# Fortran compiler. Default is g77. # FC = gfortran
# Even you can specify cross compiler # CC = x86_64-w64-mingw32-gcc # FC = x86_64-w64-mingw32-gfortran
# If you need 32bit binary, define BINARY=32, otherwise define BINARY=64 BINARY=64
# About threaded BLAS. It will be automatically detected if you don't # specify it. # For force setting for single threaded, specify USE_THREAD = 0 # For force setting for multi threaded, specify USE_THREAD = 1 USE_THREAD = 1
# If you're going to use this library with OpenMP, please comment it in. USE_OPENMP = 1 # You can define maximum number of threads. Basically it should be # less than actual number of cores. If you don't specify one, it's # automatically detected by the the script. # NUM_THREADS = 24
# If you don't need CBLAS interface, please comment it in. # NO_CBLAS = 1
# If you want to use legacy threaded Level 3 implementation. # USE_SIMPLE_THREADED_LEVEL3 = 1
# If you want to drive whole 64bit region by BLAS. Not all Fortran # compiler supports this. It's safe to keep comment it out if you # are not sure(equivalent to "-i8" option). # INTERFACE64 = 1
# Unfortunately most of kernel won't give us high quality buffer. # BLAS tries to find the best region before entering main function, # but it will consume time. If you don't like it, you can disable one. # NO_WARMUP = 1
# If you want to disable CPU/Memory affinity on Linux. # NO_AFFINITY = 1
# If you would like to know minute performance report of GotoBLAS. # FUNCTION_PROFILE = 1
# Support for IEEE quad precision(it's *real* REAL*16)( under testing) # QUAD_PRECISION = 1
# Theads are still working for a while after finishing BLAS operation # to reduce thread activate/deactivate overhead. You can determine # time out to improve performance. This number should be from 4 to 30 # which corresponds to (1 << n) cycles. For example, if you set to 26, # thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz # system). Also you can control this mumber by GOTO_THREAD_TIMEOUT # CCOMMON_OPT += -DTHREAD_TIMEOUT=26
# Using special device driver for mapping physically contigous memory # to the user space. If bigphysarea is enabled, it will use it. # DEVICEDRIVER_ALLOCATION = 1
# If you need to synchronize FP CSR between threads (for x86/x86_64 only). # CONSISTENT_FPCSR = 1
# If you need santy check by comparing reference BLAS. It'll be very # slow (Not implemented yet). # SANITY_CHECK = 1
# Common Optimization Flag; -O2 is enough. COMMON_OPT += -O2
# Profiling flags COMMON_PROF = -pg
# # End of user configuration #
|
执行make , 成功之后会显示:
GotoBLAS build complete.
OS ... Linux Architecture ... x86_64 BINARY ... 64bit C compiler ... GCC (command line : gcc) Fortran compiler ... G77 (command line : g77) Library Name ... libgoto2_barcelona-r1.13.a (Single threaded) |
在gotoblas2的目录下多出几个文件:这两个文件就是我们后面用到的库文件。
lrwxrwxrwx 1 root root 23 Mar 28 22:14 libgoto2.a -> libgoto2_athlon-r1.13.a -rw-r--r-- 1 root root 5235402 Mar 28 22:18 libgoto2_athlon-r1.13.a -rwxr-xr-x 1 root root 2503038 Mar 28 22:18 libgoto2_athlon-r1.13.so lrwxrwxrwx 1 root root 24 Mar 28 22:18 libgoto2.so -> libgoto2_athlon-r1.13.so |
2> mpi运行环境的安装
省略。
3>HPL的安装
1. 从setup里面,复制出来符合自己系统的Make文件,这里我的是AMD的,我复制出
来的是Make.Linux_ATHLON_CBLAS,如果你是Intel的应该复制Make.Linux_PII_CBLAS
到上一级目录。
2. 修改该Make.Linux_ATHLON_CBLAS文件
# # ---------------------------------------------------------------------- # - HPL Directory Structure / HPL library ------------------------------ # ---------------------------------------------------------------------- # TOPdir = /home/houqd/hpl-2.1 # hpl的目录 INCdir = $(TOPdir)/include BINdir = $(TOPdir)/bin/$(ARCH) LIBdir = $(TOPdir)/lib/$(ARCH) # HPLlib = $(LIBdir)/libhpl.a
# ---------------------------------------------------------------------- # - MPI directories - library ------------------------------------------ # ---------------------------------------------------------------------- # MPinc tells the C compiler where to find the Message Passing library # header files, MPlib is defined to be the name of the library to be # used. The variable MPdir is only used for defining MPinc and MPlib. # MPdir = /home/houqd/mpich2.1.5 # mpi的路径 MPinc = -I$(MPdir)/src/include MPlib = $(MPdir)/lib/.libs/libmpich.a # 这个需要注意一下,安装完mpi的目录需要看一下,可能会有些不同
# # ---------------------------------------------------------------------- # - Linear Algebra library (BLAS or VSIPL) ----------------------------- # ---------------------------------------------------------------------- # LAinc tells the C compiler where to find the Linear Algebra library # header files, LAlib is defined to be the name of the library to be # used. The variable LAdir is only used for defining LAinc and LAlib. # LAdir = /home/houqd/gotoblas2 # gotoblas2的安装目录 LAinc = LAlib = $(LAdir)/libgoto2.a $(LAdir)/libgoto2.so
# # ---------------------------------------------------------------------- # - Compilers / linkers - Optimization flags --------------------------- # ---------------------------------------------------------------------- # CC = /usr/local/bin/mpicc # mpicc的路径 CCNOOPT = $(HPL_DEFS) CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall # LINKER = /usr/local/bin/mpicc LINKFLAGS = $(CCFLAGS) |
3. 执行make arch=Linux_ATHLON_CBLAS
完成后在bin目录的Linux_ATHLON_CBLAS下面将产生测试文件
HPL.dat和xhpl
在lib目录的Linux_ATHLON_CBLAS下面将产生库文件
libhpl.a 完成后显示:
T/V N NB P Q Time Gflops -------------------------------------------------------------------------------- WR00R2R4 35 4 4 1 0.61 5.008e-05 HPL_pdgesv() start time Thu Mar 28 20:47:20 2013
HPL_pdgesv() end time Thu Mar 28 20:47:21 2013
-------------------------------------------------------------------------------- ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0164438 ...... PASSED ================================================================================
Finished 864 tests with the following results: 864 tests completed and passed residual checks, 0 tests completed and failed residual checks, 0 tests skipped because of illegal input values. -------------------------------------------------------------------------------- |
mpirun -np 4 ./xhpl
当编译MPI程序并且编译通过,但是执行时报错 [proxy:0:[email protected]] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory) [proxy:0:[email protected]] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory) [proxy:0:[email protected]] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory) [proxy:0:[email protected]] HYDU_create_process (./utils/launch/launch.c:75): execvp error on file xhpl (No such file or directory)
=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 255 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== |
An explanation of the input/output parameters follows: T/V : Wall time / encoded variant. N : The order of the coefficient matrix A. NB : The partitioning blocking factor. P : The number of process rows. Q : The number of process columns. Time : Time in seconds to solve the linear system. Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 29 30 34 35 NB : 1 2 3 4 PMAP : Row-major process mapping P : 2 1 4 Q : 2 4 1 PFACT : Left Crout Right NBMIN : 2 4 NDIV : 2 RFACT : Left Crout Right BCAST : 1ring DEPTH : 0 SWAP : Mix (threshold = 64) L1 : transposed form U : transposed form EQUIL : yes ALIGN : 8 double precision words |