SVM, Gaussian Process and some other vital Machine Learning components make use of kernels. There are lots of good detailed tutorials, but they are mostly too complicated and for high level users. I wanted to summarize the basics, for Machine Learning applicants. Infact this article is not very interesting or different :) Just wanted to summarize some basics.
Code github : Link
Below is some basic terminology .
Gaussian processes: Gaussian process is a stochastic process (a collection of random variables ), such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed.
Kernel function
A kernel function (or covariance function in a Gaussian ) is a similarity metric. If the 2 input vectors are similar, kernel outputs higher values.
Kernels can convert linear models to non-linear by just calculating the inner product. Instead of calculating exact positions in new space, we just need to know the relative positions in new space.(Kernel Trick)
Types of Kernel
· The linear kernel: This is the same as the support vector classifier, or the hyperplane, without any transformation at all.
· The polynomial kernel: It is capable of creating nonlinear, polynomial decision boundaries.
· The radial basis function (RBF) kernel: This kernel is capable of transforming highly nonlinear feature spaces to linear ones. It is even capable of creating elliptical (i.e. enclosed) decision boundaries. It has localized and finite response along the entire x-axis.(It gives you a correlation which is bounded with parameters variance and length scale, so maps everything into an interval you like)
Below is a super simple code piece. Function apply_nonlinear,applies a nonlinear transformation to a data, and function apply_SVC_kernel function tries to fit it. With this super simple functions ,I will try to visualize different dataset with different transformations.
Lets generate a random data with function make_gaussian_quantiles. Data is as below.
You can see pow2 and pow3 shapes and kernel scores below. For pow2 Linear seems to perform best. For pow3 RBF performs much better.
Lets check for a shape which has clear bounds. Below data was generated with make_circles library function.
You can see that RBF got 100 at pow2 and pow3. Also Linear has 100 percent success at pow2.
I also want to show very simply a RBF kernel applied to data. Below you can see application of RBF kernel
As you can see applying Kernel, changes the dimension of data. When you increase the dimension, the distribution of points changes, and unsolvable problem can be solvable. I suggest checking distribution of your data, in different dimensions to understand if your problem becomes more separable by a hyperplane with all possible kernels.