HPL Tuning
After having built the executable hpl/bin/<arch>/xhpl, one may want to modify the input data file HPL.dat. This file should reside in the same directory as the executable hpl/bin/<arch>/xhpl. An example HPL.dat file is provided by default. This file contains information about the problem sizes, machine configuration, and algorithm features to be used by the executable. It is 31 lines long. All the selected parameters will be printed in the output generated by the executable.
We first describe the meaning of each line of this input file below. Finally, a few useful experimental guide lines to set up the file are given at the end of this page.
--------------------------------------------------------------------------------
Description of the HPL.dat File
Line 1: (unused) Typically one would use this line for its own good. For example, it could be used to summarize the content of the input file. By default this line reads:
HPL Linpack benchmark input file
--------------------------------------------------------------------------------
Line 2: (unused) same as line 1. By default this line reads:
Innovative Computing Laboratory, University of Tennessee
--------------------------------------------------------------------------------
Line 3: the user can choose where the output should be redirected to. In the case of a file, a name is necessary, and this is the line where one wants to specify it. Only the first name on this line is significant. By default, the line reads:
HPL.out output file name (if any)
This means that if one chooses to redirect the output to a file, the file will be called "HPL.out". The rest of the line is unused, and this space to put some informative comment on the meaning of this line.
--------------------------------------------------------------------------------
Line 4: This line specifies where the output should go. The line is formatted, it must begin with a positive integer, the rest is unsignificant. 3 choices are possible for the positive integer, 6 means that the output will go the standard output, 7 means that the output will go to the standard error. Any other integer means that the output should be redirected to a file, which name has been specified in the line above. This line by default reads:
6 device out (6=stdout,7=stderr,file)
which means that the output generated by the executable should be redirected to the standard output.
--------------------------------------------------------------------------------
Line 5: This line specifies the number of problem sizes to be executed. This number should be less than or equal to 20. The first integer is significant, the rest is ignored. If the line reads:
3 # of problems sizes (N)
this means that the user is willing to run 3 problem sizes that will be specified in the next line.
--------------------------------------------------------------------------------
Line 6: This line specifies the problem sizes one wants to run. Assuming the line above started with 3, the 3 first positive integers are significant, the rest is ignored. For example:
3000 6000 10000 Ns
means that one wants xhpl to run 3 (specified in line 5) problem sizes, namely 3000, 6000 and 10000.
--------------------------------------------------------------------------------
Line 7: This line specifies the number of block sizes to be runned. This number should be less than or equal to 20. The first integer is significant, the rest is ignored. If the line reads:
5 # of NBs
this means that the user is willing to use 5 block sizes that will be specified in the next line.
--------------------------------------------------------------------------------
Line 8: This line specifies the block sizes one wants to run. Assuming the line above started with 5, the 5 first positive integers are significant, the rest is ignored. For example:
80 100 120 140 160 NBs
means that one wants xhpl to use 5 (specified in line 7) block sizes, namely 80, 100, 120, 140 and 160.
--------------------------------------------------------------------------------
Line 9: This line specifies how the MPI processes should be mapped onto the nodes of your platform. There are currently two possible mappings, namely row- and column-major. This feature is mainly useful when these nodes are themselves multi-processor computers. A row-major mapping is recommended.
--------------------------------------------------------------------------------
Line 10: This line specifies the number of process grid to be runned. This number should be less than or equal to 20. The first integer is significant, the rest is ignored. If the line reads:
2 # of process grids (P x Q)
this means that you are willing to try 2 process grid sizes that will be specified in the next line.
--------------------------------------------------------------------------------
Line 11-12: These two lines specify the number of process rows and columns of each grid you want to run on. Assuming the line above (10) started with 2, the 2 first positive integers of those two lines are significant, the rest is ignored. For example:
1 2 Ps6 8 Qs
means that one wants to run xhpl on 2 process grids (line 10), namely 1-by-6 and 2-by-8. Note: In this example, it is required then to start xhpl on at least 16 nodes (max of Pi-by-Qi). The runs on the two grids will be consecutive. If one was starting xhpl on more than 16 nodes, say 52, only 6 would be used for the first grid (1x6) and then 16 (2x8) would be used for the second grid. The fact that you started the MPI job on 52 nodes, will not make HPL use all of them. In this example, only 16 would be used. If one wants to run xhpl with 52 processes one needs to specify a grid of 52 processes, for example the following lines would do the job:
4 2 Ps13 8 Qs
--------------------------------------------------------------------------------
Line 13: This line specifies the threshold to which the residuals should be compared with. The residuals should be or order 1, but are in practice slightly less than this, typically 0.001. This line is made of a real number, the rest is not significant. For example:
16.0 threshold
In practice, a value of 16.0 will cover most cases. For various reasons, it is possible that some of the residuals become slightly larger, say for example 35.6. xhpl will flag those runs as failed, however they can be considered as correct. A run should be considered as failed if the residual is a few order of magnitude bigger than 1 for example 10^6 or more. Note: if one was to specify a threshold of 0.0, all tests would be flagged as failed, even though the answer is likely to be correct. It is allowed to specify a negative value for this threshold, in which case the checks will be by-passed, no matter what the threshold value is, as soon as it is negative. This feature allows to save time when performing a lot of experiments, say for instance during the tuning phase. Example:
-16.0 threshold
--------------------------------------------------------------------------------
The remaning lines allow to specifies algorithmic features. xhpl will run all possible combinations of those for each problem size, block size, process grid combination. This is handy when one looks for an "optimal" set of parameters. To understand a little bit better, let say first a few words about the algorithm implemented in HPL. Basically this is a right-looking version with row-partial pivoting. The panel factorization is matrix-matrix operation based and recursive, dividing the panel into NDIV subpanels at each step. This part of the panel factorization is denoted below by "recursive panel fact. (RFACT)". The recursion stops when the current panel is made of less than or equal to NBMIN columns. At that point, xhpl uses a matrix-vector operation based factorization denoted below by "PFACTs". Classic recursion would then use NDIV=2, NBMIN=1. There are essentially 3 numerically equivalent LU factorization algorithm variants (left-looking, Crout and right-looking). In HPL, one can choose every one of those for the RFACT, as well as the PFACT. The following lines of HPL.dat allows you to set those parameter