Nov 21, 2008

receiver operating characteristic (roc) and area under the curve (AUC) in matlab

You want to use a classifier, for example a support vector machine, to distinguish between different classes. When making binary decisions you want to make sure that your decision criterion is sensitive to your target class and that it is specific to the target class. Say you want to detect cancer, but only cancer. That's why there are roc curves and the area under the curve.

I searched for implementations in matlab for calculating areas under the curve (auc) and found several, unfortunately, they had shortcomings. Some were incorrect. Most of them took the outcome of the decision criterion, Y, and the target vector, T, as input, which doesn't make sense, because the auc is a summary statistics not over just two vectors Y and T. The idea is that you take true positive rate (TPR) and false positive rate (FPR) from each {Y,T} pair and summarize over several runs of the decision criterion with different parameters.

Here is my implementation corresponding to Tom Fawcett's algorithm 3 in "roc graphs: notes and practical considerations for researchers, " 2004.

Here is how you get true positive rate and false positive rate (i assume you have T,Y element of {-1,1}:

Enjoy. Please comment below for any questions or suggestions.

Nov 20, 2008

Matlab Svmlight Interface

The most used implementations for SVM (support vector machines) are currently svmlight and libsvm. While libsvm comes with interfaces for many different programming languages, svmlight (svm-light perf) has the advantage that you can specify the loss function. I have very disproportionate classes in my training data, so using the area under the curve (AUC), the area under the receiver operating characteristics (ROC), brings a great improvement.

I couldn't find a good matlab interface, so I wrote one. Note that it is functional, but quite simple.
function Y=svmlight(training,test,params)
% (very) simple wrapper for svmlight
% Writes matrices in sparse format to data file that can be used by svmlight.
% Columns are variables, rows are observations.
% It is assumed that the first column of the matrix is the target. Targets are elements of {-1,1}.
% These steps are made:
% 1. output matlab matrix to text file
% 2. format text file for svm (awk)
% 3. create classification model (svm_learn)
% 4. apply classification model (svm_classify)
% All files are written in the /tmp/ directory
% Example:
% Y=svmlight(data(traininds,:),data(testinds,:),'-c 1 -w 3 -l 10 ');
% (if you set parameters for svmlight don't forget to include the learning options!)
% (c) Benjamin Auffarth, 2008
% licensed under CC-by-sa (creative commons attribution share-alike)
if nargin<3
   params='-c 1 ';
[s,w]=unix(['svmlight/svm_perf_learn ' params trainfile '.svm2 ' trainfile '.model']);
if s
'error in executing smv-light!');w;
   error('svm_perf_learn not found or returned error');
[s,w]=unix(['svmlight/svm_perf_classify -v 0 ' testfile '.svm2 ' trainfile '.model ' testfile '.dat']);
if s

   disp('error in executing smv-light!');w;
    error('svm_perf_classify not found or returned error');
Y=dlmread([testfile '.dat']);

function fname=sparse_write(M)
[a,fname]=unix('date +/tmp/_svm_%F_-%H:%M_%S%N');
fname=fname(1:end-1); % get rid of newline character
dlmwrite([fname '.svm1'],M,'delimiter',' ');
unix(['awk -F" " ''{printf $1" "; for (i=2;i<=NF;i++) {printf i-1":"$i " "}; print ""}'' ' fname '.svm1 > ' fname '.svm2']);


Some Explanations

You need awk installed (obviously). If you are working in a windows environment, you can install awk on cygwin, wubi, or install awk on windows).

Temporary files are generated and stored in the /tmp/ directory. If you are on windows you might want to change that to "."

You need svm-light perf installed. The function searches for it in the svmlight subdirectory. You might want to adapt that to point to your local installation of svmlight.

Enjoy. Please leave comments below.

Nov 7, 2008

Export Data from Matlab to Text Files

First eigenfunction of the L-shaped membrane, ...Image via Wikipedia

There are many different ways to export and import data from matlab. You can import and export data from and to matlab binary formats (MAT files), text files, Excel spreadsheet (works only on Windows), XML, several special purpose formats, and a lot of image, audio, and video file formats. Text formats are very useful, because they very portable in that they can be read and written by many different applications.

In this post I will give some examples of exporting to different text formats. I will also mention how to import data using complementary commands.

The simplest way to export data to text format is this:
save -ascii

Matlab exports data by default in the scientific numeric format. If you want to use these data with some other program outside matlab, this can lead to problems. Although nowadays many programs use libraries that permit reading scientific notation (e.g. boost regex library for C++), it is sometimes better to write to a fixed-digits format.

Using save, the -double option says that you want the numbers in 16-digit format.

To write matrix A to a column separated value (CSV) file, there are several alternatives. dlmwrite is one possibility:
>> dlmwrite('attr20.ascii',A,'delimiter',',');

The default delimiter is already the comma, so the last parameter is unnecessary. If you want your data space-separated this command is your friend:
>> dlmwrite('attr20.ascii',A,'delimiter','\t');

If you use the tabulator as delimiter, you can use also use save:

>> save('attr20.ascii','A','-ascii','-double','-tabs');

The last option I give here is csvwrite:
>> csvwrite('attr20.ascii',A); 

These commands work for vectors and two-dimensional matrices.

Also sometimes useful is diary to save your command history to a disk file. You can view and edit the resulting text file using any word processor.

For importing data to matlab you can use the corresponding commands dlmread, load, csvwrite. Some files you might have to filter before reading them into matlab. For example to get rid of comments. Say the files provide comments at the start of the line starting with the percent sign (%). Then filtering can be done with sed:
> sed -i /^%/d *

Enjoy. Please leave a comment below for questions and suggestions.

You might also be interested in my article on exporting figures from matlab. I also wrote an article about creating videos in matlab.

Nov 6, 2008

Handy Nix Shell Commands - Alarm Clock Oneliners and More

Some linux commands are fantastic!

I found some nice ones in a slashdot postings about surprising linux commands and others come from slashdot user tpwch.

Alarm Clocks and Notices

Want a simple alarm clock? Try this one:
echo "cat /dev/urandom > /dev/dsp" | at 7am tomorrow

Your food is in the oven?
sleep $((20*60)); xmessage "Dinner is done"

You can also use zenity to get a popup message:
sleep $((20*60)); zenity --info --text "Dinner is done"

Or, if you prefer a voice message, you can use espeak or festival, which I explain in Free Text-to-Speech.
sleep $((20*60)); echo "Take the food out of the oven" | espeak

Notification of events.

For example, notifying me when some specific thing changed on a website:
CHECKLINE="$(curl -s [] | grep "currently undergoing maintenence")"
while true; do
sleep 120
[ -z "$CHECKLINE" ] && xmessage "somewebsite is open again" && exit

Checking for changes on a website:
while true; do
CONTENT=$(elinks -dump 1 -dump-charset iso-8859-1 "")
MD5=$(echo -n $CONTENT | md5sum -)

[ "${MD5}" != "${OLD_MD5}" ] && {
xmessage "$(printf "New action: :\n\n${CONTENT}")"
sleep 120

Prank Greetings

Some nice greetings to your colleague?
cat /dev/random | write colleague

Nov 4, 2008

Watching the US Presidential Elections 2008

Is it the maverick soldier and his Christian beauty queen? Or rather that Barack Hussein Arab Muslim Obama? See and watch for yourself the Americans wonder of democracy and the final decision.

Here you find a schedule (CST is CET - 7). See also the map on

You can watch live streams directly in your browser (if you have the plugins), or in some player such as vlc or totem. In linux with totem you have the advantage to be prompted for installation of missing plugins.

For CNN the address is

At CSPAN you can also find 3 streams.

BTW, Linus Torvalds is endorsing Obama as you can see in his blog.