CPSC 330 Lecture 16: Recommendation systems

Focus on the breath!

Announcements

  • HW6 was due yesterday.
  • No classes or OH during the midterm break.
  • Midterm 2 coming up next week!
  • If you find the breathing exercises helpful, you’re very welcome to join our weekly meditation sessions.
    • 🕑 When? Every Wednesdays at 2 PM
    • 📍 Where? ICCS 146

iClicker question 💡

What percentage of watch time on YouTube do you think comes from recommendations?

    1. 50%
    1. 60%
    1. 20%
    1. 90%

Based on Google Developers (2022). The number may have changed since then!

What is a recommendation system?

A recommendation system suggests products, services, or content that a user is likely to consume or enjoy.

Example: Recommender systems

  • A user visits Amazon to shop.
  • Amazon knows:
    • what the user viewed or purchased before
    • what similar users bought
  • The goal: recommend items that maximize user engagement or sales.
  • There’s no single “right” label. The goal is to model user behaviour.

Why should we care?

  • Recommendations shape almost everything we buy or watch.
  • Central to the success of companies like Amazon, Netflix, YouTube, and Spotify.

Why recommendation systems?

  • They help users navigate information overload.
  • Without them, finding the right item would require sifting through thousands of options.
  • Recommendations reduce effort and improve user experience.

The other side: Filter bubbles

  • Recommenders often amplify what users already like or what similar users like.
  • This can create filter bubbles, limiting exposure to diverse content.
  • Probably harmless in shopping, but can have serious consequences in domains like news, politics, or science, where diverse viewpoints matter.

Data and problem setup

What data do we need?

To build a recommender, we typically use:

  • User–item interactions (ratings, clicks, views, purchases)
  • Item or user features (e.g., genre, price, age)
  • Historical data (purchase or viewing history)

Problem formulation

  • We have \(N\) users and \(M\) items.
  • Observed data = interactions:
    • Movie ratings (Netflix)
    • Song plays (Spotify)
    • Product purchases (Amazon)

Goal: predict unobserved interactions.

The utility matrix

  • Rows = users, columns = items
  • Each entry \(y_{ij}\) = user \(i\)’s interaction (e.g., rating) with item \(j\)

Sparsity

  • The utility matrix is mostly empty.
  • Each user interacts with only a small fraction of all items.
  • Examples:
    • Netflix users rate only a few shows out of thousands.
    • Amazon shoppers review only a handful of products.

What do we predict?

We aim to fill in the missing entries. In other words, predict ratings or preferences the user hasn’t expressed yet.

Rating prediction \(\neq\) regression

Regression

\[ \begin{bmatrix} \checkmark & \checkmark & \checkmark & \checkmark & \checkmark\\ \checkmark & \checkmark & \checkmark & \checkmark & \checkmark\\ \checkmark & \checkmark & \checkmark & \checkmark & \checkmark\\ \checkmark & \checkmark & \checkmark & \checkmark & ?\\ \checkmark & \checkmark & \checkmark & \checkmark & ?\\ \checkmark & \checkmark & \checkmark & \checkmark & ?\\ \end{bmatrix} \]

Rating prediction

\[ \begin{bmatrix} ? & ? & \checkmark & ? & \checkmark\\ \checkmark & ? & ? & ? & ?\\ ? & \checkmark & \checkmark & ? & \checkmark\\ ? & ? & ? & ? & ?\\ ? & ? & ? & \checkmark & ?\\ ? & \checkmark & \checkmark & ? & \checkmark \end{bmatrix} \]

Main approaches

  • Collaborative filtering
    • “Unsupervised” learning
    • We only have labels \(y_{ij}\) (rating of user \(i\) for item \(j\)).
    • We learn latent features.
  • Content-based recommenders (today’s focus)
    • Supervised learning
    • Extract features \(x_i\) of users and/or items building a model to predict rating \(y_i\) given \(x_i\).
    • Apply model to predict for new users/items.
  • Hybrid
    • Combining collaborative filtering with content-based filtering

Evaluating Recommender Systems

How do we evaluate recommendations?

  • Is there a single correct answer for what should be recommended or what rating should be predicted?
    • Not really!
  • Still, we need ways to compare different methods and measure their usefulness.

Why evaluation matters?

  • We’ll experiment with different ways to fill in missing entries in the utility matrix.
  • Even though recommendations are subjective, we still need quantitative metrics to judge:
    • How well do our predictions match real user behavior?
    • Which model performs better?

RMSE for rating prediction

  • RMSE, which measures how close predicted ratings are to actual ratings, is one of the commonly used metric to evaluate recommendation systems
  • In 2006, Netflix launched the Netflix Prize competition.
  • They released a dataset of 100 million movie ratings and offered $1 million to the first team that improved Netflix’s existing algorithm by at least 10% in RMSE on a held-out test set.

Source: Netflix Tech Blog

Class demo

iClicker Exercise

Select all of the following statements which are True

    1. In the context of recommendation systems, the shapes of validation utility matrix and train utility matrix are the same.
    1. RMSE perfectly captures what we want to measure in the context of recommendation systems.
    1. It would be reasonable to impute missing values in the utility matrix by taking the average of the ratings given to an item by similar users.
    1. In KNN type imputation, if a user has not rated any items yet, a reasonable strategy would be recommending them the most popular item.

iClicker Exercise

Select all of the following statements which are True

  1. In content-based filtering we leverage available item features in addition to similarity between users.
  2. In content-based filtering you represent each user in terms of known features of items.
  3. In the set up of content-based filtering we discussed, if you have a new movie, you would have problems predicting ratings for that movie.
  4. In content-based filtering if a user has a number of ratings in the training utility matrix but does not have any ratings in the validation utility matrix then we won’t be able to calculate RMSE for the validation utility matrix.

Baseline approaches

  • Global average baseline
  • Per-user average baseline
  • Per-item average baseline
  • Average of 2 and 3
    • Take an average of per-user and per-item averages.
  • \(k\)-Nearest Neighbours imputation

Content-based filtering

iClicker

Select all of the following statements which are True

    1. In content-based filtering we leverage available item features in addition to similarity between users.
    1. In content-based filtering you represent each user in terms of known features of items.
    1. In the set up of content-based filtering we discussed, if you have a new movie, you would have problems predicting ratings for that movie.
    1. In content-based filtering if a user has a number of ratings in the training utility matrix but does not have any ratings in the validation utility matrix then we won’t be able to calculate RMSE for the validation utility matrix.