Apr 20, 2009

Class Dependency Graphs

There are several tools for creating UML models of program source code. Some work for java, some for C++, and some not at all. Unfortunately, the latter case seems to be prominent. I searched google and looked in wikipedia. I saw many comments in forums lamenting that there are no useful uml diagram utilities in linux. In this post I comment on some of these methods and present two simple shell scripts for generating class dependency graphs.

One tool I tried is umbrello, which at the first try worked nicely, then however crashed (maybe there was too much source code) and never launched. I have seen people commenting similar behavior with umbrello. ArgoUML is another java-based tool, which produced diagrams from my code, however hardly legible ones. Class names overlapped and were not readable at all.

Actually, if you just want to see the dependency of you classes, it should not be so difficult to generate. Dependency diagrams from source code can be made in a very simple way, by using a bash 3-line shell script and visualizing with graphviz (see my introduction to graphviz).

For C++, you just parse the header files and find "class bar: public foo {." Then you parse this line and output to a file (here diagram.dot):

echo "digraph G {"> diagram.dot
sed -n -e "s/class [ ]*\([a-zA-Z0-9_]*\)[ ]*:[ ]*public [ ]*\([a-zA-Z0-9_]*\)[ ]*[{][ ]*/\1 -> \2 ;/p" *.h >> diagram.dot
echo "}" >> diagram.dot

compile to pdf:
dot -Tpdf diagram.dot -o diagram.pdf

And finished. You can see the inheritance of your classes in the pdf. Note that this script -- as it is -- only handles single inheritance from public classes.

You can use very similar commands for Java:

echo "digraph G {"> diagram.dot
sed -n -e "s/class [ ]*\([a-zA-Z0-9_]*\)[ ]* extends [ ]*\([a-zA-Z0-9_]*\)[ ]*[{][ ]*/\1 -> \2 ;/p" *.java >> diagram.dot
echo "}" >> diagram.dot

Enjoy. Please leave a comment for questions or suggestions.

Graph Visualization Software (Graphviz)

For a long time already I have wanted to post about this cool tool. If you haven't hear of it yet, you maybe should. Graph Visualization Software (short graphviz) is an open-source software package that visualizes graphs, structures, networks, and dependency diagrams.

In order to show how neat, simple, and useful it is, two examples:

An Undirected Graph
Write a text file "graph.dot". We'll define two nodes in our graph: n1 and n2. They are connected.
graph G {
n1 -- n2 ;

We compile the file:
neato -Tpdf graph.dot -o undirgraph.pdf

A Directed Graph
Now, we try a directed graph. N1 connects to n2. N2 doesn't connect back to n1.
digraph G {
n1 -> n2 ;

Again: compiling:
neato -Tpdf graph.dot -o dirgraph.pdf

You might want to try out the different rendering algorithms provided with dot, neato, twopi, circo, and fdp.

BTW, you might want to try inkscape to edit the pdf. You can move network nodes around, changes fonts, etc. See my other blog post about how to produce high quality figures in linux for latex using scalable vector graphics.

Enjoy. Please leave a comment for questions or suggestions.

Apr 10, 2009

C++ Libraries for Scientific Computing

For my work, I have been comparing different methods for matrix and vector manipulations in C/C++ and other programming languages. In this post, I will focus on three packages in C/C++, GSL, armadillo, and IT++. I compare them by their utility and the ease of using them.

C++ is a very fast language for numerical and scientific computing, however first compilers were not efficient. By mid-90s it caught up with fortran in speed, with some c++ libraries (valarray mentioned) performing still slow ([1], [2]).

More recent benchmarks show that - depending on the implementations (and libraries) you use - C++ can be faster than fortran ([3], [4] includes python). Of course, benchmarks are often skewed. A recurrent criticism is that the implementation for one language is sub-optimal. This is where the Computer Language Benchmark Game comes in. They define different problems from scientific computing and ask people to submit their code in about 30 different languages. They check the programs for correctness, benchmark the code, and publish the best results for each language. C++ is one of the best languages compared. Maybe not surprisingly java also fares pretty well. Bytecode compilers for java have been improving dramatically since 1995. While there have been claims that java code would outperform C++ it seems that C++ code is still faster [5].

Now, for C++, if there are big differences in speed between libraries, which one to use?

Many of them use wrappers for LAPACK, ATLAS, and other linear algebra packages or are otherwise optimized for speed.

You can define vectors and matrices in C++, however they do not provide a lot of comfort. You need to run loops and index the old way and do bounds checking. Of course, there is GNU Scientific Library (gsl), written like much GNU software in C, although there exist several wrappers for C++. The GSL syntax is very tedious and counter-intuitive for people used to algebra software such as Matlab/Octave or R. For example, instead of writing a(i)+=b you have to use this wordy statement to add scalar b to float vector a at position i:


Adding of vectors or matrices C=A+B is not even implemented. Instead there is A+=B [6]. In GSL matrix addition C=A+B is like this:

void gsl_matrix_Add( gsl_matrix* C,const gsl_matrix* A, const gsl_matrix* B ){  
  gsl_matrix_memcpy( C, A );
  gsl_matrix_add( C, B);

I found two packages for numerical computing that look very neat and boasted with efficient implementations: Armadillo and IT++. Armadillo seems to be the project of mainly one Australian researcher and IT++ is a project of Chalmers University, Sweden. Both libraries are very easy to install on linux systems and both projects give example codes. The presentation of armadillo is excellent and IT++ gives a conversion table for matlab code. Armadillo is the newer project, so it's maybe not as mature (?), it doesn't have the many methods for signal processing that IT++ has, however the project claims it is much faster than IT++.

I installed both and made some attempts at benchmarking. I tried to compare IT++, Armadillo, and GSL for double matrix addition. I tested all three for addition of quadratic matrices with sides of 1 to 2000 (step size 100), doing 1000 repetitions. This gave nice plots of matrix size against seconds. First it seemed armadillo was fastest, however there was a lot of variation across trials for all 3 packages, so I think, results are not conclusive.

What I did however conclude was that all were surprisingly fast. I found especially Armadillo very easy to use and it also offers a very good online reference. Armadillo offers import/export to text files. IT++ uses its own binary format, but they offer a matlab script to read the data. You can actually use both packages together. Armadillo offers a library for conversions to and from IT++ C++ matrix and vector formats. It's my favorite of the three.