March 2009 | My Outsourced Brain

C++ Libraries for Scientific Computing

For my work, I have been comparing different methods for matrix and vector manipulations in C/C++ and other programming languages. In this post, I will focus on three packages in C/C++, GSL, armadillo, and IT++. I compare them by their utility and the ease of using them.

C++ is a very fast language for numerical and scientific computing, however first compilers were not efficient. By mid-90s it caught up with fortran in speed, with some c++ libraries (valarray mentioned) performing still slow ([1], [2]).

More recent benchmarks show that - depending on the implementations (and libraries) you use - C++ can be faster than fortran ([3], [4] includes python). Of course, benchmarks are often skewed. A recurrent criticism is that the implementation for one language is sub-optimal. This is where the Computer Language Benchmark Game comes in. They define different problems from scientific computing and ask people to submit their code in about 30 different languages. They check the programs for correctness, benchmark the code, and publish the best results for each language. C++ is one of the best languages compared. Maybe not surprisingly java also fares pretty well. Bytecode compilers for java have been improving dramatically since 1995. While there have been claims that java code would outperform C++ it seems that C++ code is still faster [5].

Now, for C++, if there are big differences in speed between libraries, which one to use?

Many of them use wrappers for LAPACK, ATLAS, and other linear algebra packages or are otherwise optimized for speed.

You can define vectors and matrices in C++, however they do not provide a lot of comfort. You need to run loops and index the old way and do bounds checking. Of course, there is GNU Scientific Library (gsl), written like much GNU software in C, although there exist several wrappers for C++. The GSL syntax is very tedious and counter-intuitive for people used to algebra software such as Matlab/Octave or R. For example, instead of writing a(i)+=b you have to use this wordy statement to add scalar b to float vector a at position i:

gsl_vector_float_set(a,i,gsl_vector_float_get(a,i)+b)


Adding of vectors or matrices C=A+B is not even implemented. Instead there is A+=B [6]. In GSL matrix addition C=A+B is like this:

void gsl_matrix_Add( gsl_matrix* C,const gsl_matrix* A, const gsl_matrix* B ){  
  gsl_matrix_memcpy( C, A );
  gsl_matrix_add( C, B);
}


I found two packages for numerical computing that look very neat and boasted with efficient implementations: Armadillo and IT++. Armadillo seems to be the project of mainly one Australian researcher and IT++ is a project of Chalmers University, Sweden. Both libraries are very easy to install on linux systems and both projects give example codes. The presentation of armadillo is excellent and IT++ gives a conversion table for matlab code. Armadillo is the newer project, so it's maybe not as mature (?), it doesn't have the many methods for signal processing that IT++ has, however the project claims it is much faster than IT++.

I installed both and made some attempts at benchmarking. I tried to compare IT++, Armadillo, and GSL for double matrix addition. I tested all three for addition of quadratic matrices with sides of 1 to 2000 (step size 100), doing 1000 repetitions. This gave nice plots of matrix size against seconds. First it seemed armadillo was fastest, however there was a lot of variation across trials for all 3 packages, so I think, results are not conclusive.

What I did however conclude was that all were surprisingly fast. I found especially Armadillo very easy to use and it also offers a very good online reference. Armadillo offers import/export to text files. IT++ uses its own binary format, but they offer a matlab script to read the data. You can actually use both packages together. Armadillo offers a library for conversions to and from IT++ C++ matrix and vector formats. It's my favorite of the three.

[ Read more... ]

Line styles, colors, and markers in complex plots in Matlab

For creating figures for print, lines and markers should be distinguishable by both, color and style. Matlab offers a variety of styles and colors, but the plot command when given a matrix as y input, only changes colors. You can spend a lot of time checking what looks good and what not. Here I present a script which can help to generate plots where each curve is distinguishable by marker, color, and line style.

This script is for generating plots with many curves, where each curve has its own marker, color, and line style.

Something like the following does the job:


You might also be interested in my tutorial of how to produce first quality figures with matlab.

[ Read more... ]