Nov 20, 2008

Matlab Svmlight Interface

The most used implementations for SVM (support vector machines) are currently svmlight and libsvm. While libsvm comes with interfaces for many different programming languages, svmlight (svm-light perf) has the advantage that you can specify the loss function. I have very disproportionate classes in my training data, so using the area under the curve (AUC), the area under the receiver operating characteristics (ROC), brings a great improvement.

I couldn't find a good matlab interface, so I wrote one. Note that it is functional, but quite simple.
function Y=svmlight(training,test,params)
% (very) simple wrapper for svmlight
% Writes matrices in sparse format to data file that can be used by svmlight.
% Columns are variables, rows are observations.
% It is assumed that the first column of the matrix is the target. Targets are elements of {-1,1}.
%
% These steps are made:
% 1. output matlab matrix to text file
% 2. format text file for svm (awk)
% 3. create classification model (svm_learn)
% 4. apply classification model (svm_classify)
%
% All files are written in the /tmp/ directory
%
% Example:
% Y=svmlight(data(traininds,:),data(testinds,:),'-c 1 -w 3 -l 10 ');
% (if you set parameters for svmlight don't forget to include the learning options!)
%
% (c) Benjamin Auffarth, 2008
% licensed under CC-by-sa (creative commons attribution share-alike)
if nargin<3
   params='-c 1 ';
end
trainfile=sparse_write(training);
[s,w]=unix(['svmlight/svm_perf_learn ' params trainfile '.svm2 ' trainfile '.model']);
if s
   disp(
'error in executing smv-light!');w;
   error('svm_perf_learn not found or returned error');
end
testfile=sparse_write(test);
[s,w]=unix(['svmlight/svm_perf_classify -v 0 ' testfile '.svm2 ' trainfile '.model ' testfile '.dat']);
if s

   disp('error in executing smv-light!');w;
    error('svm_perf_classify not found or returned error');
end
Y=dlmread([testfile '.dat']);
end

function fname=sparse_write(M)
[a,fname]=unix('date +/tmp/_svm_%F_-%H:%M_%S%N');
fname=fname(1:end-1); % get rid of newline character
dlmwrite([fname '.svm1'],M,'delimiter',' ');
unix(['awk -F" " ''{printf $1" "; for (i=2;i<=NF;i++) {printf i-1":"$i " "}; print ""}'' ' fname '.svm1 > ' fname '.svm2']);
end

Download

Some Explanations

You need awk installed (obviously). If you are working in a windows environment, you can install awk on cygwin, wubi, or install awk on windows).

Temporary files are generated and stored in the /tmp/ directory. If you are on windows you might want to change that to "."

You need svm-light perf installed. The function searches for it in the svmlight subdirectory. You might want to adapt that to point to your local installation of svmlight.

Enjoy. Please leave comments below.

16 comments:

  1. Please don't copy-paste this code if you are in a windows environment. Download it from snipplr.

    ReplyDelete
  2. This is exactly what I need. Thank you very much!

    ReplyDelete
  3. @Dylan: You are welcome. Glad it helped.

    ReplyDelete
  4. thank so much this information could help me to do a fix in my computer , these kind of blog can help some people.

    ReplyDelete
  5. Just for my own reference, you can do this without matlab using standard *nix commands.

    To convert from csv to svmlight format:
    awk -F"," '{printf $1" "; for(i=2;i<=NF;i++) {printf i-1":"$i " "}; print ""}' hepatitis.data > hepatitis.svmlight

    Separate into training and test if needed:
    head -n 155 hepatitis.svmlight > hepatitis.svmlight.training
    tail -n 155 hepatitis.svmlight > hepatitis.svmlight.test

    Run training and classification:
    ./svm_learn -c 1 -# 1 -w 3 -l 10 hepatitis.svmlight.training .model
    ./svm_classify hepatitis.svmlight.test .model .outputfile

    ReplyDelete
  6. hi, would you plz describe for me how can I use this code for KNN classifier?or do you have a code for ROC of KNN and Glass data set?I need it for a part of my project,plz help me,thank you

    ReplyDelete
  7. This is for SVM not for KNN. For KNN you can use matlab's knn functions. For ROC/AUC statistics see my post on ROC.

    ReplyDelete
  8. Can you get an example of use.
    like which format is the training parameter? 

    Thanks!!

    ReplyDelete
  9. training is a matrix, See the commented text in the function:
    % Columns are variables, rows are observations.
    % It is assumed that the first column of the matrix is the target. Targets are elements of {-1,1}.

    ReplyDelete
  10. What is the function sparse_write? Matlab doesn't recognize it.

    ReplyDelete
  11. just a question for installation of SVMLIGHT it 'll be in with directory(may be in work folder of matlab)??and plz can you exmplain exatly what's role of awk??

    ReplyDelete
  12. I tried the native mex interface for svmlight, but the performance is awful (many orders of magnitude slower than the libsvm interface) and I suspect it is doe to the mex interface itself. Not having the time to redevelop it myself, I came searching for another - this is something along the lines of what I would have tried to do by myself, but now I don't need to understand the svmlight doc format. I still feel dirty using this way though.

    Many thanks.

    ReplyDelete
  13. Hallo,
    I am working on SVM, I want ask you a help, I want compiling and invoking the algorithm of SVMlight in ( http://svmlight.joachims.org/ )as MEX function
    from within the MATLAB environment.
    cd('C:\Users\hp\Documents\MATLAB\svm_struct')
    addpath('C:\Users\hp\Documents\MATLAB\svm_mex601\matlab');
    addpath ('C:\Users\hp\Documents\MATLAB\svm_mex601\bin');
    compilemex();
    cmd=['-c 1 -w 3 -l 10 '];
    model=svm_learn(X,Y,cmd);
    it gives me:

    compile failed
    Undefined function or method 'svm_learn'
    Thanks you for help me!!
    Reply

    ReplyDelete