For faster R use OpenBLAS

From: http://www.stat.cmu.edu/~nmv/2013/07/09/for-faster-r-use-openblas-instead-better-than-atlas-trivial-to-switch-to-on-ubuntu/

Installing additional BLAS libraries on Ubuntu

For Ubuntu, there are currently three different BLAS options that can be easily chosen: "libblas" the reference BLAS, "libatlas" the ATLAS BLAS, and "libopenblas" the OpenBLAS. Their package names are

$ apt-cache search libblas
libblas-dev - Basic Linear Algebra Subroutines 3, static library
libblas-doc - Basic Linear Algebra Subroutines 3, documentation
libblas3gf - Basic Linear Algebra Reference implementations, shared library
libatlas-base-dev - Automatically Tuned Linear Algebra Software, generic static
libatlas3gf-base - Automatically Tuned Linear Algebra Software, generic shared
libblas-test - Basic Linear Algebra Subroutines 3, testing programs
libopenblas-base - Optimized BLAS (linear algebra) library based on GotoBLAS2
libopenblas-dev - Optimized BLAS (linear algebra) library based on GotoBLAS2

Since libblas already comes with Ubuntu, we only need to install the other two for our tests. (NOTE: In the following command, delete 'libatlas3gf-base' if you don't want to experiment with ATLAS.):

$ sudo apt-get install libopenblas-base libatlas3gf-base

Switching between BLAS libraries

Now we can switch between the different BLAS options that are installed:

$ sudo update-alternatives --config libblas.so.3
There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf).

Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode
1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode
2 /usr/lib/libblas/libblas.so.3gf 10 manual mode
3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode

Press enter to keep the current choice[*], or type selection number:
Side note: If the above returned:
update-alternatives: error: no alternatives for libblas.so.3gf

Try

$ sudo update-alternatives --config libblas.so.3

instead. See the comments at the end of the post for further details.

From the selection menu, I picked 3, so it now shows that choice 3 (OpenBLAS) is selected:

$ sudo update-alternatives --config libblas.so.3gf
There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf).

Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode
1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode
2 /usr/lib/libblas/libblas.so.3gf 10 manual mode
* 3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode

And we can pull the same trick to choose between LAPACK implementations. From the output we can see that OpenBLAS does not provide a new LAPACK implementation, but ATLAS does:

$ sudo update-alternatives --config liblapack.so.3
There are 2 choices for the alternative liblapack.so.3gf (providing /usr/lib/liblapack.so.3gf).

Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 auto mode
1 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 manual mode
2 /usr/lib/lapack/liblapack.so.3gf 10 manual mode

So we will do nothing in this case, since OpenBLAS is supposed to use the reference implementation (which is already selected).

Checking that R is using the right BLAS

Now we can check that everything is working by starting R in a new terminal:

$ R

R version 3.0.1 (2013-05-16) -- "Good Sport"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
...snip...
Type 'q()' to quit R.

>

Great. Let's see if R is using the BLAS and LAPACK libraries we selected. To do so, we open another terminal so that we can run a few more shell commands. First, we find the PID of the R process we just started. Your output will look something like this:

$ ps aux | grep exec/R
1000 18065 0.4 1.0 200204 87568 pts/1 Sl+ 09:00 0:00 /usr/lib/R/bin/exec/R
root 19250 0.0 0.0 9396 916 pts/0 S+ 09:03 0:00 grep --color=auto exec/R

The PID is the second number on the '/usr/lib/R/bin/exec/R' line. To see
which BLAS and LAPACK libraries are loaded with that R session, we use the "list open files" command:

$ lsof -p 18065 | grep 'blas\|lapack'
R 18065 nathanvan mem REG 8,1 9342808 12857980 /usr/lib/lapack/liblapack.so.3gf.0
R 18065 nathanvan mem REG 8,1 19493200 13640678 /usr/lib/openblas-base/libopenblas.so.0

As expected, the R session is using the reference LAPACK (/usr/lib/lapack/liblapack.so.3gf.0) and OpenBLAS (/usr/lib/openblas-base/libopenblas.so.0)

Testing the different BLAS/LAPACK combinations

I used Simon Urbanek's most recent benchmark script. To follow along, first download it to your current working directory:

$ curl http://r.research.att.com/benchmarks/R-benchmark-25.R -O

And then run it:

$ cat R-benchmark-25.R | time R --slave
Loading required package: Matrix
Loading required package: lattice
Loading required package: SuppDists
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called SuppDists
...snip...

Ooops. I don't have the SuppDists package installed. I can easily load it via Michael Rutter's ubuntu PPA:

$ sudo apt-get install r-cran-suppdists

Now Simon's script works wonderfully. Full output

$ cat R-benchmark-25.R | time R --slave
Loading required package: Matrix
Loading required package: lattice
Loading required package: SuppDists
Warning messages:
1: In remove("a", "b") : object 'a' not found
2: In remove("a", "b") : object 'b' not found

R Benchmark 2.5
===============
Number of times each test is run__________________________: 3

I. Matrix calculation
---------------------
Creation, transp., deformation of a 2500x2500 matrix (sec): 1.36566666666667
2400x2400 normal distributed random matrix ^1000____ (sec): 0.959
Sorting of 7,000,000 random values__________________ (sec): 1.061
2800x2800 cross-product matrix (b = a' * a)_________ (sec): 1.777
Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.00866666666667
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.13484335940626

II. Matrix functions
--------------------
FFT over 2,400,000 random values____________________ (sec): 0.566999999999998
Eigenvalues of a 640x640 random matrix______________ (sec): 1.379
Determinant of a 2500x2500 random matrix____________ (sec): 1.69
Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.51366666666667
Inverse of a 1600x1600 random matrix________________ (sec): 1.40766666666667
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.43229160585452

III. Programmation
------------------
3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.10533333333333
Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 1.169
Grand common divisors of 400,000 pairs (recursion)__ (sec): 2.267
Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.213
Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.32600000000001
--------------------------------------------
Trimmed geom. mean (2 extremes eliminated): 1.23425893178325

Total time for all 15 tests_________________________ (sec): 19.809
Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.26122106386747
--- End of test ---

134.75user 16.06system 1:50.08elapsed 137%CPU (0avgtext+0avgdata 1949744maxresident)k
448inputs+0outputs (3major+1265968minor)pagefaults 0swaps

Where the elapsed time at the very bottom is the part that we care about. With OpenBLAS and the reference LAPACK, the script took 1 minute and 50 seconds to run. By changing around the selections with update-alternatives, we can test out R with ATLAS (3:21) or R with the reference BLAS (9:13). For my machine, OpenBLAS is a clear winner.

你可能感兴趣的:(Linux)