When making binary decisions you want to make sure that your decision criterion is sensitive to your target class and that it is specific to the target class. Say you want to detect cancer, but only cancer. That's why there are roc curves and the area under the curve.
I searched for implementations in matlab for calculating areas under the curve (auc) and found several, unfortunately, they had shortcomings. Some were incorrect. Most of them took the outcome of the decision criterion, Y, and the target vector, T, as input, which doesn't make sense, because the auc is a summary statistics not over just two vectors Y and T. The idea is that you take true positive rate (TPR) and false positive rate (FPR) from each {Y,T} pair and summarize over several runs of the decision criterion with different parameters.
Here is my implementation corresponding to Tom Fawcett's algorithm 3 in "roc graphs: notes and practical considerations for researchers, " 2004.
Here is how you get true positive rate and false positive rate (i assume you have T,Y element of {-1,1}:
Enjoy. Please comment below for any questions or suggestions.
[ Read more... ]
receiver operating characteristic (roc) and area under the curve (AUC) in matlab
Labels: area under the curve auc matlab numerical computing receiver operating characteristics roc software
Benjamin Auffarth, Nov 21, 2008
receiver operating characteristic (roc) and area under the curve (AUC) in matlab
2008-11-21T17:05:00+01:00
Benjamin Auffarth
area under the curve|auc|matlab|numerical computing|receiver operating characteristics|roc|software|
Comments
Links to this post
Matlab Svmlight Interface
Labels: auc classifier interface loss function matlab numerical computing roc skewed classes software svm svm light svmlight
The most used implementations for SVM (support vector machines) are currently svmlight and libsvm. While libsvm comes with interfaces for many different programming languages, svmlight (svm-light perf) has the advantage that you can specify the loss function. I have very disproportionate classes in my training data, so using the area under the curve (AUC), the area under the receiver operating characteristics (ROC), brings a great improvement.
I couldn't find a good matlab interface, so I wrote one. Note that it is functional, but quite simple.
error('svm_perf_learn not found or returned error');
end
testfile=sparse_write(test);
[s,w]=unix(['svmlight/svm_perf_classify -v 0 ' testfile '.svm2 ' trainfile '.model ' testfile '.dat']);
if s
disp('error in executing smv-light!');w;
error('svm_perf_classify not found or returned error');
end
Y=dlmread([testfile '.dat']);
end
function fname=sparse_write(M)
[a,fname]=unix('date +/tmp/_svm_%F_-%H:%M_%S%N');
fname=fname(1:end-1); % get rid of newline character
dlmwrite([fname '.svm1'],M,'delimiter',' ');
unix(['awk -F" " ''{printf $1" "; for (i=2;i<=NF;i++) {printf i-1":"$i " "}; print ""}'' ' fname '.svm1 > ' fname '.svm2']);
end
Temporary files are generated and stored in the /tmp/ directory. If you are on windows you might want to change that to "."
You need svm-light perf installed. The function searches for it in the svmlight subdirectory. You might want to adapt that to point to your local installation of svmlight.
Enjoy. Please leave comments below.
[ Read more... ]
I couldn't find a good matlab interface, so I wrote one. Note that it is functional, but quite simple.
Code snippet - svmlight-wrapper.m on Snipplr
'error in executing smv-light!');w;error('svm_perf_learn not found or returned error');
end
testfile=sparse_write(test);
[s,w]=unix(['svmlight/svm_perf_classify -v 0 ' testfile '.svm2 ' trainfile '.model ' testfile '.dat']);
if s
disp('error in executing smv-light!');w;
error('svm_perf_classify not found or returned error');
end
Y=dlmread([testfile '.dat']);
end
function fname=sparse_write(M)
[a,fname]=unix('date +/tmp/_svm_%F_-%H:%M_%S%N');
fname=fname(1:end-1); % get rid of newline character
dlmwrite([fname '.svm1'],M,'delimiter',' ');
unix(['awk -F" " ''{printf $1" "; for (i=2;i<=NF;i++) {printf i-1":"$i " "}; print ""}'' ' fname '.svm1 > ' fname '.svm2']);
end
Some Explanations
You need awk installed (obviously). If you are working in a windows environment, you can install awk on cygwin, wubi, or install awk on windows).Temporary files are generated and stored in the /tmp/ directory. If you are on windows you might want to change that to "."
You need svm-light perf installed. The function searches for it in the svmlight subdirectory. You might want to adapt that to point to your local installation of svmlight.
Enjoy. Please leave comments below.
[ Read more... ]
Export Data from Matlab to Text Files
Labels: ascii file column separated values csv csvread csvwrite dlmread dlmwrite import load matlab matrices matrix save scientific format software text file vector
There are many different ways to export and import data from matlab. You can import and export data from and to matlab binary formats (MAT files), text files, Excel spreadsheet (works only on Windows), XML, several special purpose formats, and a lot of image, audio, and video file formats. Text formats are very useful, because they very portable in that they can be read and written by many different applications.
In this post I will give some examples of exporting to different text formats. I will also mention how to import data using complementary commands.
The simplest way to export data to text format is this:
Matlab exports data by default in the scientific numeric format. If you want to use these data with some other program outside matlab, this can lead to problems. Although nowadays many programs use libraries that permit reading scientific notation (e.g. boost regex library for C++), it is sometimes better to write to a fixed-digits format.
Using
To write matrix A to a column separated value (CSV) file, there are several alternatives.
>>
The default delimiter is already the comma, so the last parameter is unnecessary. If you want your data space-separated this command is your friend:
>>
If you use the tabulator as delimiter, you can use also use
The last option I give here is
These commands work for vectors and two-dimensional matrices.
Also sometimes useful is
For importing data to matlab you can use the corresponding commands dlmread, load, csvwrite. Some files you might have to filter before reading them into matlab. For example to get rid of comments. Say the files provide comments at the start of the line starting with the percent sign (%). Then filtering can be done with sed:
>
Enjoy. Please leave a comment below for questions and suggestions.
[ Read more... ]
In this post I will give some examples of exporting to different text formats. I will also mention how to import data using complementary commands.
The simplest way to export data to text format is this:
save -asciiMatlab exports data by default in the scientific numeric format. If you want to use these data with some other program outside matlab, this can lead to problems. Although nowadays many programs use libraries that permit reading scientific notation (e.g. boost regex library for C++), it is sometimes better to write to a fixed-digits format.
Using
save, the -double option says that you want the numbers in 16-digit format. To write matrix A to a column separated value (CSV) file, there are several alternatives.
dlmwrite is one possibility:>>
dlmwrite('attr20.ascii',A,'delimiter',',');The default delimiter is already the comma, so the last parameter is unnecessary. If you want your data space-separated this command is your friend:
>>
dlmwrite('attr20.ascii',A,'delimiter','\t');If you use the tabulator as delimiter, you can use also use
save:>> save('attr20.ascii','A','-ascii','-double','-tabs');The last option I give here is
csvwrite:>> csvwrite('attr20.ascii',A); These commands work for vectors and two-dimensional matrices.
Also sometimes useful is
diary to save your command history to a disk file. You can view and edit the resulting text file using any word processor.For importing data to matlab you can use the corresponding commands dlmread, load, csvwrite. Some files you might have to filter before reading them into matlab. For example to get rid of comments. Say the files provide comments at the start of the line starting with the percent sign (%). Then filtering can be done with sed:
>
sed -i /^%/d * Enjoy. Please leave a comment below for questions and suggestions.
[ Read more... ]
Benjamin Auffarth, Nov 7, 2008
Export Data from Matlab to Text Files
2008-11-07T10:46:00+01:00
Benjamin Auffarth
ascii file|column separated values|csv|csvread|csvwrite|dlmread|dlmwrite|import|load|matlab|matrices|matrix|save|scientific format|software|text file|vector|
Comments
Links to this post
Alarm clock in linux
Some linux commands are fantastic! See slashdot postings about surprising linux commands.
Want a simple alarm clock? Try this one:
Some nice greetings to your colleague?
[ Read more... ]
Want a simple alarm clock? Try this one:
echo "cat /dev/urandom > /dev/dsp" | at 7am tomorrow
Some nice greetings to your colleague?
cat /dev/random | write colleague
[ Read more... ]
Benjamin Auffarth, Nov 6, 2008
Alarm clock in linux
2008-11-06T20:32:00+01:00
Benjamin Auffarth
linux|software|
Comments
Links to this post
Watching the US Presidential Elections 2008
Labels: us elections 2008
Is it the maverick soldier and his Christian beauty queen? Or rather that Barack Hussein Arab Muslim Obama? See and watch for yourself the Americans wonder of democracy and the final decision.
Here you find a schedule (CST is CET - 7). See also the map on cnn.com.
You can watch live streams directly in your browser (if you have the plugins), or in some player such as vlc or totem. In linux with totem you have the advantage to be prompted for installation of missing plugins.
For CNN the address is http://www.cnn.com/video/live/cnnlive_1.asx.
At CSPAN you can also find 3 streams.
BTW, Linus Torvalds is endorsing Obama as you can see in his blog.
[ Read more... ]
Here you find a schedule (CST is CET - 7). See also the map on cnn.com.
You can watch live streams directly in your browser (if you have the plugins), or in some player such as vlc or totem. In linux with totem you have the advantage to be prompted for installation of missing plugins.
For CNN the address is http://www.cnn.com/video/live/cnnlive_1.asx.
At CSPAN you can also find 3 streams.
BTW, Linus Torvalds is endorsing Obama as you can see in his blog.
[ Read more... ]
Benjamin Auffarth, Nov 4, 2008
Watching the US Presidential Elections 2008
2008-11-04T09:36:00+01:00
Benjamin Auffarth
us elections 2008|
Comments
Links to this post
Subscribe to:
Posts (Atom)

