Jul 29, 2009

Microsoft's Bing will cater to Yahoo Search

Microsoft's search engine Bing, released this year, has been promoted heavily and it's market share is on the rise, now at about 5-7 percent. After the failed attempt at an acquisition, Microsoft now reached a deal with Yahoo to provide their search results. Yahoo will profit by taking over some ad-sales (see article). This agreement - if approved by regulators and when ultimately implemented - would boost Microsoft's search market share to about 25-30% (by June/July data). As for yahoo, PC World titles: "Yahoo is over."

It would be good to have more competition in the search market. Any move of Microsoft to take market from google by competition (e.g. Microsoft Office Online) brings immediate advantage to everybody. A question for me however is: will this move affect significantly (read: slow down) the market introduction of Google Chrome OS, which many hope could bring more competition to the OS market?

Anyways, tried the blind test? You enter search terms and get back results in three columns corresponding to bing, yahoo, and google, only that you don't know which is which. You choose the best search engine and then see the branding.

Jul 27, 2009

Introduction to Data Mining

I just found a paper called Top 10 algorithms in data mining. By mixture of expert committee and voting the IEEE International Conference on Data Mining (ICDM) identified the 10 most imfuential data mining algorithms. These are C4.5, k-means, support vector machine, Apriori, expectation maximization, page rank, AdaBoost, k-nearest neighbors, Naive Bayes, and CART. Algorithms are explained in the paper.

For other resources, if you only want a very short first introduction start from the wikipedia article. One of the reference books on data mining is Pattern Classification by Duda and Hart.

Jul 26, 2009

Singularity is Coming (or not)

Aready some time ago, I read Ray Kurzweil's book The Age of Spiritual Machines. Now I saw his talk at TED. There is also a documentary out, The Transcendent Man (official site). The message is very simple and often repeated: we are developing faster. He is the techno-cult guru and his message may be a consolation for people afraid of technological change or afraid of dying. In this post I will look at some points of his talk at TED.

Watch Ray Kurzweil talk at TED. His talk is titled "how technology will transform us."

I found a joke while I was peering through some articles and I found it very to the point. Credit goes to slashdot commentator L. VeGas [1]:
Pessimist: "That glass is half empty."
Optimist: "That glass is half full."
Kurzweil: "The self-cloning milk in that glass will replicate thanks to nanobots and end world hunger."

Kurzweil basically shows graphs to prove that we are approaching the "singularity". The concept of singularity denotes a point in time where change accelerates very very fast (sloppy wording intended). It seems obvious that growth is accelerating (maybe every generation experiences it anew) and the notion of singularity is very fuzzy. Basically Kurzweil is doing a technical chart analysis extrapolating the graphs into the future and saying "hey, see how fast!" This was critized as exponential growth fallacy by Paul Davies in Nature reviewing Kurzweil's book The Singularity is Near.

However I like to see the graphs and since they are licensed under creative commons (attribution), I wanted to put one here on my blog to see it more often. So the graph below was created by Ray Kurzweil and I found it on wikipedia.

J├╝rgen Schmidhuber, an eminent German scientist in artificial intelligence, also shows historic dates on his history page and argues for an omega point. However he cautions it could be a cognitive bias that we remember recent dates better than more ancient and states that historians in the 16th century thought, speedup of breakthroughs of Western bookprint (around 1444) and the re-discovery of America (1492), the Reformation (1517), indicated a convergence of history.

The growth in knowledge is due to so-called second order feedback factors of technological growth (increase in the carrying capacity of land, better health care, demographic growth, more inventions, ..., see wikipedia entry on Andrey Korotayev). I found graphs of market penetration of technologies together with an analysis of these factors such as found in The Skeptical Environmentalist, see earlier post, far more useful than chart hanami.

An interesting point he makes is that transhumanism will come unavoidably with technological progress, but this will be the topic of another post.

Jul 21, 2009

Produce Print-Quality Figures from Matlab

This post explains how to create print-quality figures in Matlab. The process is efficient and results in editable vector-graphics that you can include in any document including Microsoft Word, Open Office, or latex. Before you create your figures you might also want to check my article on combinations of linestyles and markers to have clear plots in black-and-white and color.


There are serveral ways to export figures from matlab for a publication. In Windows, in the figure menu in matlab you choose to copy the figure to clipboard and then edit it in powerpoint or word any figure property you like. After that you paste them to your word document. On Linux you would usually export to EPS to get highest quality images.

EPS offers potentially the best quality images, however it brings problems especially with font size, figure size, and transparency. Increasing font sizes might overlay multiple layers of the same text. The bounding box is often messed up and you might want transparency. Laprint, an export tool for matlab, is a work-around for the font size problem, however legends do not get exported correctly and the figures still miss transparency. In this post I discuss a way to produce high-quality figures avoiding all those problems.

One way around the problems with Matlab is not using Matlab. GNU R produces very high quality figures, however R also has its own limitations. How to get the right font size and transparency with Matlab? Don't worry, there is a good solution that gives high quality figures.


I just rediscovered the SVG format. It is really easy to retouch EPS images or import SVGs. SVG, short for scalable vector graphics, is a XML-based file format for two-dimensional vector graphics. This makes it really easy to read the files and edit them from the command line using sed, grep, awk, etc.

Why should you learn to use a vector format? Because the quality is very high!

If you export your matlab figure to EPS the problem is then to edit the image files. If you edit them with a raster editor, for example the gimp the quality gets completely lost. In contrast if you export from matlab to SVG, you edit your SVG file with inkscape or one of the other editors, similar to any desktop publishing tool. You can directly change line attributes, change text, and change the bounding box very comfortably.

There are free vector graphic editors with functionality similar to Adobe InDesign such as inkscape, and many other tools to edit and create SVGs, including OpenOffice, Xfig, scribus. gnuplot. Konqueror, Firefox, and, since recently, Internet Explorer can display SVGs. Editing of SVGs in Inkscape offers all comforts of a modern desktop publishing tool. See this Inkscape tutorial for more info about editing.

SVG in Matlab

For matlab, there is the package plot2svg, which allows you to export figures from matlab to SVG. In order to export your figure instead of exportfig(1,'filename.eps','width',5,'color','rgb'); you do plot2svg('filename.svg',1).

There are small caveats to using inkscape that bring complications, which however can be easily overcome. If you export directly with inkscape to eps, transparent colors might get black, you don't see anything. This also occurs when exporting to Latex (pdftricks).

There is a solution: convert the edited SVG first to PDF and then convert from pdf to eps using e.g. ImageMagick. You can integrate eps figures with most word processors and it would be the default image format with latex (texi2dvi).

Breakdown so far:
  1. (matlab) export to svg
    >> plot2svg('filename.svg',1)
  2. edit in svg editor (e.g. inkscape), export to pdf
    inkscape -F filename.svg --export-pdf=filename.pdf
  3. convert pdf to eps
    convert filename.pdf filename.eps
This doesn't help with the font size problem yet. But you can automate the editing and conversion process, which reduces the problem considerably.
  1. Change font size: sed -i "/font-size/s/[0-9][0-9]*pt/14pt/g" filename.svg. In this case, you would change your font size to 11pt.
  2. Convert from SVG to pdf: inkscape -f filename.svg --export-pdf=filename.pdf
  3. Convert from pdf to eps: convert filename.pdf filename.eps
Using this method, you'll have nearly same comfort as with matlab's copy-figure function in windows and get figures of much higher quality because of the vector format.

Alternative EPS Export

In a comment to my post LaTex Poster Template, Marcus vLW pointed out a method for exporting to EPS from matlab, which impressed me a lot, so I want to reshare his method here. It seems to give proper transparency. h=figure('position',[100 0 1100 1100],'paperpositionmode','auto', 'color','none','InvertHardcopy','off');
% plot something...
Enjoy. Please vote this post up if you liked it and leave a comment below for questions and suggestions. 

Jul 16, 2009

Collaborative Recommender System

I was thinking about some business ideas recently.

One idea was to give users recommendations based on collaborative ratings. It would use nearest neighbor search to find items which are probably liked by similar people. Similarity is evaluated by metrics.

This is already implemented with books on amazon.com, where you can get a message "users who looked at [...] also looked at." Netflix, a US-American movie renting service, uses the cinematch algorithm and offers a prize of $1,000,000 each year for best algorithm (that beats its cinematch by at least 10% accuracy).

I haven't seen the functionality for free for book lists or movie lists. I want a service where you rate some movies, books, music group, say books, and get back some recommendations on books that people with similar preferences enjoy. The emphasis would be on usability, no-cost, speed.

I thought about writing a data base backend and integrating a software with facebook, android, and iphone. For android there would be the incentive of $250,000, however there's little time left now until mid August to the end of the android developer challenge.

I decided to focus on other things now, anybody interested?

Jul 15, 2009

Algebraic Sets in Latex

Again, it took me several minutes to find the command for element of in latex. For the record, it's \in. Here are some more useful commands for common sets in algebra.

\newcommand\Nset{\mathnumsetfont N} % set of positive integer numbers
\newcommand\Zset{\mathnumsetfont Z} % set of integer numbers
\newcommand\Qset{\mathnumsetfont Q} % set of rational numbers
\newcommand\Rset{\mathnumsetfont R} % set of real numbers
\newcommand\Cset{\mathnumsetfont C} % set of complex numbers
\newcommand\Hset{\mathnumsetfont H} % set of quaternions

M\in \Rset{}^{308,1834}$

You need to include the libraries amsmath and amssymb to use this.

Jul 14, 2009

Best Movies Ever

I am not planning to write a flame here. I am not going to advocate a best movie list. This post is about the feeling of closure when refreshing memories.

The last few weeks I was feeling nostalgia for some films I watched as a child and a youth. For example, I remember that I had laughed my head off watching Terence Hill in They call me Trinity, now with more distance of years and being more mature, except for some few scenes which I still found funny, I thought the film was very much a waste of my time, although I can see the appeal for audiences of young boys. The second example is The Good, the Bad, and the Ugly, directed by Sergio Leone, music score by Ennio Morricone. I had liked this film very much when I saw it as a boy, but at least can't remember finding it funny. Now watching it again, I found the three hours of the film spinning away. A very pleasurable experience.

I was following the top250 list in imdb and found there were still some films I didn't know. Sometimes I just didn't know the international title, but in some cases I had actually missed the film when it came out. Another list of best movies include the American Film Institute's (find it in wikipedia).

Worthwhile reading about good films: Roger Ebert - The Great Movies.

This would be a great subject for a wikibook, which could start compiling the articles on some movies, together with some history of movie making (e.g. history of cinema). Just discovered, wikipedia now has the functionality to compile books from articles. When reading an article, in the left panel, there's a "add to book" link. Press it and you have a new section in your book. You can print the book and order it by mail.

On a side note, it's a great relief that Adobe finally released the flash plugins for amd64. It's more comfortable to watch movies in a browser than downloading them via youtube-download.