This is the companion website to the tutorial Support Vector Machines and Kernels for Computational Biology, which takes the reader through the basics of machine learning, support vector machines (SVMs) and kernels for real-valued and sequence data. The example of splice site prediction is used to illustrate the main ideas.
Many of the problems in computational biology are in the form of prediction: starting from prediction of a gene's structure, prediction of its function, interactions, and role in disease. SVMs and related kernel methods are extremely good at solving such problems. In the simplest form one tries to discriminate between objects that belong to one of two categories.
SVMs use two key concepts to solve this problem: large-margin separation and kernel functions. The idea of large margin separation can be motivated by classification of points in two dimensions (see figure on right). A simple way to classify the points is to draw a straight line as the separation boundary. One would intuitively draw the line such that it is as far as possible away from the points in both sets. |
For large margin separation it turns out that only the relative position or similarity of the points to each other is important. Such a similarity is computed by a so-called kernel function. The simplest kernel function is the dot-product between two feature vectors (known as the linear kernel), which leads to a linear decision boundary between the two classes. Nonlinear kernels such as the polynomial kernel (see figure on the left) provide additional flexibility. |
We provide software with which the reader can follow the examples in the tutorial. One may also try some of the algorithms on our Galaxy-based webservice based on easysvm.
Copyright © 2008 A. Ben-Hur, C.S. Ong, S. Sonnenburg, B. Schölkopf, and G. Rätsch