Popularity of Python, R, and Matlab

Matlab has been used for years. In recent years, with the rise of linux, open source community projects such as python and GNU R have found increasing use and a recent article in NY times wrote about the rise of R. I took the time to create graphs to compare trends in popularity of three algebra computing platforms. This could help in predicting which is the future platform for scientific computing.

People use different software packages for data analysis including excel, SAS, SPSS, or perl. I also have a post about C++ software for number crunching. I could have taken all these alternatives into account, however, I think desiderata for a language should be these three:
  • It should be possible to do fast-prototyping. This includes:
    • It should be a scripting language.
    • It should have a lot of available libraries
  • It should be cross-platform (at least linux and windows) to be portable.
  • It should be fast.
Python (with SciPy and NumPy), R, and matlab meet these desiderata. All of them are cross-platform and scripting languages. All of them are reasonably fast and allow the easy integration of C, C++, and Fortran code to achieve more speed-up. There are a lot of libraries available especially for R and matlab (I am not knowledgeable about python libraries).

So, in this post I want to look at the popularity of python, R, and matlab. I first look at the numbers of citations in scientific publications, then I link to google trends to see visits to project websites.

To look at the scientific zeitgeist, I compiled numbers of citations from scholarly articles of the three platforms counting them in citeseerX and google scholar.

Matlab is the oldest software package. Matlab v. 1.0 was released 1984. Guido van Rossum published version 0.9.0 of python in 1991. The R mailing list started 1997; this was with version v. 0.16.

Google scholar has a huge index of scholarly articles and searches within the text of the articles (which is not done by all search engines). CiteseerX has much fewer articles and article count differs hugely over years with much fewer articles in the last three years. Therefore the curves were normalized by the total article count for each year.

I am conscious that searching "python" many hits are not relevant, however I think the graphs here nonetheless can give some general idea to python's relative importance. Results should be taken with a grain of salt.

Google scholar returns different numbers of hits and different hits over several trials, a fact which I ignored for simplicity. There were also inconsistencies in the "recent articles" feature. You can say you want articles published earliest in some year and you would expect the more years back you go, the more articles you find. However this is not the case always, so the google scholar graph is restricted to 2001 onwards.

Data as of January 8th, 2009.


Here the graphs (click to increase):

CiteSeerX



Google scholar



Matlab comes out as the big winner. Python and R find increasing popularity. From both graphs you can see that R and python are on the rise. CiteseerX does not have as many citations as google scholar (between 1,500 and 30,000 for each year between 1990 and 2008). Because of big fluctuations in the matlab curve in the google scholar count, I do not dare to make any conclusion about matlab.

Alexa and google trends among other services alow you the comparison of websites. Here is the link to google trends' comparison between R, matlab, and python. Surprisingly, python and matlab are not that different in page visits.

--
Graphs created in R. Thanks to Statsmethods.net for the explanation on the plot and points functions in R.
[ Read more... ]

Geniuses

Often when I hear people at public events talking admiringly about geniuses I would like to close my ears. I think in the media there is the image of the child genious, of "the gifted," which were given something by some divinity (or genes) that normal people don't have. I have the impression that many people are too willing to repeat this vision that justifies their denial of the fact that they didn't take responsibility for their life and didn't work as hard as the so-called genius.

I think the word genius is devoid of meaning or rather for me it tells more about the speaker than about the object. I prefer the rags to riches archetype, "if you work hard enough, you can make it" (whatever "it" refers to; I never read any novels by Horatio Algers), though I recognize there are many factors to becoming good at something. Francis Galton's exhibition in heritary genius forms the other extreme of this conceptual dimension. According to this second view, we are born with a talent (more general: IQ) or not. If we are not, then that's the end of the story, some people are and they are famous, such as Mozart (the incarnation for many of the child-genious).

Malcolm Gladwell's book was a discussion of the idea that I had needed. In outliers he brings forth the argument that so-called geniuses, such as Mozart or others (he gives more examples: the Beatles, Bill Joy, Bill Gates - admittedly I didn't like that last example, because of Microsoft's negative corporate image and the perceived worth of their products), could only be successful because of a mix of cultural and family background, hard work, and timely opportunities presenting themselves.

With hard work Malcolm Gladwell refers to the 10,000 hours rule, which says that you need about 10,000 hours to become very proficient in a field. 10,00 hours that's about 3 hours daily for 10 years (forget about holidays, you can slack off about every second sunday). To make the 10,000 hours in about 3 years (say for PhD) you would need 12 hours daily to make it in little more than 3 years (with free weekends). Go to work! [ Read more... ]

Edit PDF Files with pdfedit

Collaborative editing can be painful and in the end you might end up doing some changes to pdf versions of submitted files. pdfedit comes in handy. I needed some time searching the internet (very high noise level for this topic), while the program kept arguing "document is read-only." How to actually change files? Answer: tools > delinearize, open document, save document under new name, open again. The editor is comfortable enough (although the UI is kind of messy).

P.S.: Inkscape is a very comfortable editor. It allows import and export of many image formats including PDF and EPS. See my blog entry about creating print-quality figures with SVG, where I tell more about inkscape. [ Read more... ]

LaTex Poster Template (BAposter)

You can make great posters with LaTex. However hand-positioning elements through coordinates, adjusting size and margins can be very time-consuming and awkward. Already for quite some time I wanted to post about a particular latex poster template. I used it for two posters in ECRO 2008 and later a poster for FET 2009.

I like about this library that it is very elegant and that conversion from article to poster is effortless. You organize the poster in boxes, which have a header each, and which you align relative to each other using keywords such as below, top, column, row. A code snippet from my FET poster:

\headerbox{Conclusions}{name=conclusions,column=1,below=discussion}{
\begin{itemize}
\compresslist
\item Some properties affect odor coding very strongly (sulfur--containing functional group).
\item Bond saturation affects also seems very relevant (alkyne, alkane, and alkene).
\item Carboxylic acid and aromatic, still seems to be important.
\item Results partly confirm relevance of dimensions of molecular properties suggested by Johnson and Leon
\end{itemize}
}

The image below shows a poster from Ecro 2008. I hope you can see the poster in the correct colors. The title bars should be blue.

The first figure on the left is made in GNU R and is so much better than the other ones created in matlab! The main problem with the matlab figures seems to be in matlab's inability in linux to save figures with transparent backgrounds. I learned a lot about exporting figures in the meantime. See my article on exporting figures from latex.


Thanks to Brian Amberg for his template, called baposter, (ba stands for his initials, I suppose).

For alternatives, another poster template which looks quite usable (I haven't tried it) is A0-poster. See the article about scribus posters on linux.com. I found scribus poster templates: at scribus post and on the official scribus site.

What is your experience with poster presentations in latex?


You might also be interested in my article on exporting figures from matlab. I also wrote an articles about exporting data from matlab and how to get good combinations of linestyles and colors for plots.
[ Read more... ]

Mythical Man Month

I am reading the Mythical Man Month, a book by Fred Brooks, about management of software projects. The titular man months refer to calculations in the planning of projects that equal men with months. It is in this book where Brooks quips famously: "adding manpower to a late software project, makes it later."

The book is a great read, short and to the point. Brooks shares from his experiences managing software projects, among others the OS/360, an operating system. The OS/360 was very ambitious and very late. One of Brooks' lessons is the second system effect, which predicts (I am simplifying) that the second system of a project manager will be much better than the first one.

Each chapter starts with a quote. Such as this one from genesis:
Chapter 11: 1 Now all the earth had one language and the words were common [to all].
2 And men moved east, and they found a plain in the land of Shinar 1 and they settled there.
3 And they said to each other, each to their fellow, "Come. Let us make bricks, and let us bake them with fire." They used brick instead of stone and tar instead of mortar.
4 Then they said, "Come. Let us build a city for ourselves, and a tower, the top of which reaches the heavens. So let us make a name for ourselves lest we be scattered over the face of the entire earth."
5 But Yahweh came down to see the city and the tower that the sons of man had built,
6 And Yahweh said, "See! The people are one and their language is the same for all of them. And now they have begun to do this; in the future nothing that they plan to do will be impossible for them.
7 Come. Let us go down and let us confuse their language so that they will not understand each other's language, each will not understand their fellow."
8 So Yahweh scattered them from there over the face of the entire earth, and they stopped building the city.
9 Because of this, the name of the city is called "Babel," 2 because Yahweh confused the language of the entire world, and Yahweh scattered them from there over the face of the entire world.

Bibel translation by Richard Hooker.

I wonder what other people think of this bible passage.
[ Read more... ]