How to Migrate a Subversion Repository

Subversion is a program that allows you to store different versions of files, so you can revert edits, restore deleted files, and much more. If you never used subversion before you might want to see my introduction to subversion.

There are only few simple steps to migrate svn repository from one server to another. In this post I describe them.

As I told in several other posts, we had severe problem with electricity outages in Barcelona. As a busy phd student with a need for a usable remote server, that's no option. In my post on commercial remote backup servers I discussed alternatives. I am moving my subversion repositories to the new server now.

First step: copy your remote repository from oldserver to newserver. This can be done in different ways. Simplest is using rsync, scp, or others (optionally involving dump files).

Second step: (on client)
svn switch --relocate oldurl newurl

See also my posts on basic subversion commands and automated backup using subversion.

[ Read more... ]

Wikipedia Reaches 3 Million Articles

The English Wikipedia reached the 3 million article mark today. This is undoubtedly a sign of it's continuous extension and an opportunity to examine some numbers to talk about wikipedia's future growth. In this post I look at how the growth of wikipedia progresses.

First see Jimmy Wales, it's co-founder talk about the 3 million article milestone.



Recently it was discussed at slashdot whether wikipedia is approaching the limit of its growth. Slashdot members have been connected to early growth of wikipedia and I think it's save to say that many slashdot members have contributed at least once to wikipedia. Many comments in the discussion complained about disruptive behavior by other wikipedia contributors and the increasingly closed nature of it's community.

Has wikipedia reached it's limits? Let's look at the numbers.

I took data from wikipedia pools, where people take bets on when certain milestones are reached and plotted them. The abscissa gives the number of days since wikipedia's inception (taken as January 16th, 2001), the ordinate the number of articles. (Click to enlarge.)


I admit that this is of course a very crude statistics consisting of only 5 points, however it was fast to plot. I remember back in 2005 or early 2006 on the wikipedia statistics page there was a discussion whether wikipedia's growth was exponential with more credibility given to the exponential growth hypothesis, however most of the graphs there are outdated. To get more data I could have downloaded the whole wikipedia history (several GBs) or taken data from the milestone page, but I think the conclusion is warranted from this plot: the growth is definitely not exponential, at least since early 2006.

This probably means that there are bottlenecks to it's growth. Growth itself, as accumulation of new articles as measured here, has not to be seen necessarily as positive. Within the wikipedia community there are essentially two fractions as to adding new articles:

  • there are the inclusionists who want to include more articles (and relax politics on notability) and
  • exclusionist, who want stricter controls on which articles are to be included within wikipedia. Their position would be to favor quality over quantity.
Whether quality is actually improving is another question that I will not look at in this post.  

Matlab scripts used to calculate and plot
statcalc.m
% data taken from http://en.wikipedia.org/wiki/Wikipedia:Pools 
pools={'March 17, 2005','August 4, 2005','March 1, 2006','September 9, 2007','August 17, 2009'}; 
number_of_articles=[500000,666666,1000000,2000000,3000000];  
for i=1:numel(pools) days(i)=since_project(pools{i}); end 
figure; plot(days,number_of_articles,'*-'); ylabel('number of articles reached'); xlabel('days passed');


since_project.m  
function d=since_project(datestr)  
% calculates time passed since wikipedia exists % uses time_passed function. 
% wikipedia existed since 16 January 2001, see http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_records 
d=time_passed('16 January 2001',datestr);


time_passed.m  
function d=time_passed(date1str,date2str)  
% calculate time passed between two dates in days % time in dd Dec YYYY format.  
% used for wikipedia statistics  
[a,d1]=unix(['date -d "' date1str '" +%j']); 
d1=str2num(d1); [a,d2]=unix(['date -d "' date2str '" +%j']); d2=str2num(d2); dy=(str2num(date2str(end-3:end))-str2num(date1str(end-3:end)))*365; d=dy+d2-d1;
[ Read more... ]

Best Games in Linux

While Linux is a great operating system for computer geeks, it has been a recurring tongue-in-cheek among them to proclaim the Year of the Linux Desktop in view a market uptake of below 10%. One of the difficulties for Linux to gain broader acceptance used to be usability (especially ease of installation), hardware driver support, and availability of commercial applications. All these points are changing and mostly not longer true. In this post I write about linux gaming, where also great progress has been made.

While there are many commercial games, mostly Windows-based that can be played in linux over the wine emulator or older DOS games which can be played over dosbox, there are also a lot of free games out. Free as in speech and beer. In this post I link to some of the free games that I liked best.

Many games are already included in the standard installation of ubuntu and other Linux distributions. If you are looking for great games in linux, the wikipedia article on linux gaming provides a start. Some recommendations and reviews of linux games can be found for example on WHDb, linuxlinks, the linux journal, or strangegamer.

My personal favorites include these (some of them are already included in the ubuntu repository):
Eat the Whistle, a football arcade
Freeciv, highly addictive clone of sid meyer's civilization
Freecol, clone of Sid Meyer's colonization
Bygfoot, a football manager
Ur-Quan Masters (aka Star Control 2), science fiction strategy adventure
Widelands, a real-time strategy, similar to Settlers II
Battle for Wesnoth, a turn-based strategy game

The renpy game engine allows the easy creation of visual novels. You can find already many games developed for it. Many of the games are bishoku games, i.e. they are oriented towards young adults, however a lot of them are quite enjoyable. I liked especially:
Winter Shard
Senior Year
Real Life
Heiress II
Daemonophilia
The Dreaming

ScummVM, the game engine originally designed for LucasArts adventure games, now freely licensed under the GPL, allows to play many point-and-click adventure games mostly from the 90s. Examples are Monkey Island, Maniac Mansion, Zak McKracken by LucasArts, Police Quest, King's Quest, and Larry by Sierra On-Line. Two great free games playable under ScummVM and to be found on the ScummVM project page are Beneath a Steel Sky and Flight of the Amazon Queen.

BTW, many of the old Sierra adventures can be played online, at sarien.net.

To give you more ideas, these two videos shows 18 games that can be installed from the ubuntu package manager.




[ Read more... ]

End of the World Predictions

There are endless predictions of doom of humanity in popular media. Many people are concerned about a environmental collapse. Most dire predictions tend to focus on a single threat, I think. Stephen Petranek (former editor of Discover) compiles in his talk at the TED forum several threats to humanity, together with some recipes for avoiding impending catastrophe.



Take it serious or not, looking into the threats is worth some few minutes. If nothing else they might be good for some chuckles. Here comes Petranek's list as he ordered the items by urgency (together with some references to wikipedia articles):

1. Meteoroid hits the earth.
2. Rogue black hole.
3. Global epidemic.
4. Giant solar flares.
5. Reversal of the earth's magnetic field
6. Biotech Disaster. He mentions gene-manipulation of food.
7. Particle Accelerator Mishap.
8. The Ecosystem Collapses.
9. Aliens invade earth.
10. We loose the will to survive.

The last point was popularized in Decline and Fall of the Roman Empire, the Decline of the West, and Planet of the Apes.

Is this the end of humanity?
[ Read more... ]

Mex Function for Mahalanobis Function

In my last post I showed how to write mex functions with the armadillo library to get both fast and clean mex functions. Armadillo is a library for C++ that has a syntax similar to matlab.

Here I give an example of a mex function using armadillo. It computes the Mahalanobis distance and is based on the template from the last post.



Enjoy. Please leave a comment below for suggestions and questions.
[ Read more... ]

Fast Scientific Computation in Mex Functions Using Armadillo

Typical situation: you run a matlab script for some simulation or experiment and need to run it faster. You find out with the profiler that there is a function that takes a lot of time and want to optimize that function. You want to write it in C/C++ and wrap it as a mex.

In an earlier post I did a review of different C/C++ libraries for scientific computing (among them IT++, GSL, and others) and concluded that the armadillo linear algebra library has a lot of functions, is very easy to use, and very fast. So, I thought it would be very nice to make it work within mex functions. This could make functions in C++ both very legible and easy to write at the same time fast.

I managed to do this and in this post I explain how.

It is kind of a hack, but works smoothly without problems, although the compilation was first a hassle that took some time. See the end of this post on how to compile.

The trick is to first create an armadillo structure and then let its pointer indicate the input structure from matlab (dimensionality of the structure has to be adapted, if not you get segfaulted). When the function ends, armadillo wants to destroy its structures, so before, you have to make it point to the original location and adapt dimensionality (or else...).

Here comes a toy example of a mex file that should illustrate how to use it with matlab.


Compilation:

If you don't have it yet, take 5 minutes to install the armadillo library.

For compilation you should include the additional compiler switches -lcblas -DARMA_NO_DEBUG. In an ideal world you would only have to type this:

>> mex toy_example.cpp -lcblas -DARMA_NO_DEBUG

Unfortunately this is not an ideal world. While this compiled for me without problems, on execution of the function within matlab, I got this error a lot:

invalid mex file.
GLIBCXX_3.4.9' not found


I first sym-linked matlab-provided shared libraries (.so files) to the ones of my system. I am not sure this is a good idea and anyways it didn't solve the problem. I edited heavily the mexopts.sh file which gives the parameters to mex compiler, however this did not help, it didn't seem to take these parameters. I finally compiled from bash prompt by a variation of the commands I found mex used (use -v switch with mex). The commands that work for me are these (you might have to adapt them slightly):

>> g++ -c -I/opt/matlab/extern/include -I/opt/matlab/simulink/include -DMATLAB_MEX_FILE -ansi -fPIC -fno-omit-frame-pointer -pthread -DMX_COMPAT_32 -DNDEBUG -DARMA_NO_DEBUG "toy_example.cpp"
>>g++ -c -I/opt/matlab/extern/include -I/opt/matlab/simulink/include -DMATLAB_MEX_FILE -ansi -D_GNU_SOURCE -fexceptions -fPIC -fno-omit-frame-pointer -pthread -DMX_COMPAT_32 -O3 -DNDEBUG "/opt/matlab/extern/src/mexversion.c"
>>g++ -O3 -pthread -shared -Wl,--version-script,/opt/matlab/extern/lib/glnxa64/mexFunction.map -Wl,--no-undefined -o "toy_example.mexa64" mahal_dist.o mexversion.o -Wl,-rpath-link,/opt/matlab/bin/glnxa64 -L/opt/matlab/bin/glnxa64 -lmx -lmex -lmat -lm -lcblas

Hope you enjoy using this. Please also see my mex implementation of the mahalanobis function.

Happy coding! Please leave comments below.
[ Read more... ]