Musings of a Data Scientist: Matrix Factorization Techniques for Recommender Systems

"As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest-neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels."

So begins the paper written by Yehuda Koren (Yahoo Research), Robert Bell and Chris Volinsky (AT&T Labs--Research) and published by the IEEE Computer Society.

Companies like Netflix and Amazon use what they know about their subscribers/visitors so they can optimize recommendations and create more value. This paper talks about the recommender systems and approaches. That's where I'll start for this post.

Recommendation systems are based on two main strategies, content filtering and collaborative filtering.

Content Filtering

Content filtering focuses on creating a profile for a user or product to describe his/her/its nature. Movies are described by genre, rating, actors, box office success, etc. People might be characterized by their demographics or answers to survey questions.

One drawback here is that the data can be hard to find or collect. For example, movie-watchers don't watch all movies ever and when they do watch a movie, they may not rate it.

Collaborative Filtering

Collaborative filtering relies on past user behavior, such as transactions and rating products. One positive aspect here is that no profile is created for new users. This approach analyzes the connections between users and the similarities among products in order to make recommendations.

A drawback is that collaborative filtering needs to address what to do with a new product or user. It takes some time to collect data to understand it/him/her better. This is called the "cold start" problem.

There are two main areas of collaborative filtering, neighborhood methods and latent factor methods.

Neighborhood Methods

Neighborhood models looks at the relationships between items (or between users). Items are given a preference by a user and neighboring items are those which are given a similar rating by the same user.

An example in the paper is of the movie Saving Private Ryan, whose neighbors might be war movies, Steven Spielberg movies or Tom Hanks movies, etc. To predict a specific user's preference for the movie, one needs to look at the neighboring movies that this user rated.

Latent Factor Methods

This method tries to characterize the items and the users by learning on a number of "latent" (hidden) factors. The number of latent variables can be small or large. Choosing a number that is too small may hurt the prediction engine because it may not learn enough details about items and users. On the other hand, choosing a number that is too high will not provide additional helpful information. The run-time will be longer though.

The idea of latent variables are interesting to think about and understand. By their very nature (being hidden), we will not know what they are. It's just the way the learning system understands the features it encounters.

This may be a little clearer with some examples. If we could know what the latent variables were, we might see some recognizable ones like genre or movie duration. We might also see some that are less recognizable or measurable such as "depth of character" or "quirkiness". Still, other latent factors might not be recognizable at all. These features are all treated as orthogonal dimensions and are represented numerically.

Very interesting.

Musings of a Data Scientist

Wednesday, March 4, 2015

Matrix Factorization Techniques for Recommender Systems

Content Filtering

Collaborative Filtering

Neighborhood Methods

Latent Factor Methods

No comments:

Post a Comment