Visualizing SVD for Recommender Systems

mustafac
5 min readApr 15, 2021

In this post I will try to visualize what SVD does for us. I will try to predict my personal likes with algorithm. I will use standard dataset MovieLens. This will be series of post for Recommender Systems with:
SVD
Surprise Recommender Library
Neural Network

Dataset everyone is using for Recommender Systems

The code is at github. Link
Also I uploaded my combined csv into github.

SVD at it’s core, makes a factorization of latent factors. What does this mean? How many features does a movie has?
Genre : Comedy, Adventure, War, Fiction …
Actors : Brad Pitt, Leonardo DiCaprio
Visuals: Nature, fiction,

So we have thousands of features. But humans are complicated. You may not like DiCaprio but you like fiction so you like “Inception”, so our problem is very difficult because we need to learn combinations of parameters also. So instead of learning thousands of features we assume what if our thousands features were only 50 features. Does this explain the variance in this data. That is what basically SVD does for us.

There are lots of very nice tutorials on internet. I want to do something different. We do not know who are these people in the dataset. The only person I am sure of his likes is “ME”. So I will add myself into the dataset.
I get max user id in set and 1 plus to my id.

Now I define my likes as below. Just add with some movies with 5 or 2. You can add different ratings.

You can check your ratings.

SVD algorithm is simple and 1 line algorithm. Below I have 3 utility methods.
1st method applies SVD for requested dimension. 2nd makes predictions with calculated matrices and the 3rd return these values.

Now call this for different dimensions. So probably 5 and 20 features will be so simple so it will underfit but others will yield acceptable results.

You can think above matrices as regression coefficients. They show how important these features are.

Now lets evaluate the quality of these factorization. I will write a method to find most similar movies to a given movie. And apply it for different dimensions as above. So we can check which have better guesses. Below for a given movie(title) , I find movieId and index, and then I find cosine similarities of all movies according to Movie&Feature result of SVD. Then I sort by point and got only top_n results. So at top we expect the movie itself.(Cosine of something with itself 1 or near 1 ,because of rounding)

Now it is time to call this method for all SVD dimensions and check the result.
Below you can see the call and the results.

If you check above, nearly all of them have good results. But I doubt this could be due to some similar popular movies are rated too much. In fact I would calculate Bayesian rating if I was doing for production. Example:
Bayesian prior average = 3.5
Average rating number per item = 50
Current item rated number = 10
Sum of rating = 20
Bayesian Adjusted = ( 50 x 3.5 + 20 ) / ( 50 + 10) = 3.25

So these negative ratings do not directly effect, they have an effect as a combined effect. So what above means is, we do not want rarely rated items to be confident. So as you see above any item must have near to average or more than average to effect the real rating.

We can do the same to find most similar users to us. This time I use User&Feature matrix and calculate cosine with current user with all other. Then I see their most rated movies. In fact this would be better with KNN algorithms , I am simplifying. Again most similar user to me is “ME” again so I use “top_indexes[1:]” above, be careful.

Top ratings of people who are similar to me

Let me do another visualizations , to show similar movies to a movie. Below I use Movie&Feature matrix of SVD. I get these features and do PCA dimension reduction. Then I return these coordinates as X,Y and movie name. So now we can check which movies are near each other.

10 most similar movies to Terminator for 50 features
40 most similar movies to Terminator for 50 features

Here I showed multiple visualizations for SVD method. Since it is the core of other algorithms also, it is good to understand it at first. At later posts I will compare the results with this SVD results.

--

--