CPSC 330 Lecture 17: Introduction to Natural Language Processing

Focus on the breath!

Announcements

  • No classes next week.
  • Midterm 2 coming up next week.
  • Do you need OH to get ready for the midterm? If yes, I can hold OH on Monday.

iClicker question

Do you need office hours to help you prepare for the midterm?

    1. Monday at 2 PM works well for me
    1. Monday at 4 PM works well for me
    1. No, I don’t need office hours

Recap

Imagine that you recently watched Minimalism: A Documentary About the Important Things (2015) and rated it highly.

Suppose a recommendation system suggests items based on item similarity. Which movie would be recommended to you?

    1. 🎬 Avatar: The Way of Water (2022) (Highly popular sci-fi movie with themes of nature and connection.)
    1. 🎬 The Art of Effortless Living (2021) (A lesser-known documentary about mindfulness and simple living.)

Similarity metrics

  • Similarity based on Euclidean distance

\[distance(vec1, vec2) = \sqrt{\sum_{i =1}^{n} (vec1_i - vec2_i)^2}\]

  • Dot product similarity:

\[similarity_{dot product}(vec1,vec2) = vec1.vec2\]

  • Cosine similarity: normalized version of dot product.

\[similarity_{cosine}(vec1,vec2) = \frac{vec1.vec2}{\left\lVert vec1\right\rVert_2 \left\lVert vec2\right\rVert_2}\]

Which metric in what context?

Given a query vector “Query” in the picture below and the three item vectors, determine the ranking of the items for the three similarity measures below:

  • Example: Similarity based on Euclidean distance: item B > item C > item A

  • Similarity based on dot product: ?

- Cosine similarity: ?



Adapted from here.





What is NLP?

  • Natural Language Processing (NLP) is a field at the intersection of computer science, linguistics, and artificial intelligence.
  • It focuses on enabling computers to understand, interpret, and generate human language.

Examples of NLP applications

Key challenges in NLP

  • Ambiguity: words can have multiple meanings and meaning depends on previous words/sentences
    • I had toast with jam vs We got stuck in a traffic jam.
    • If the baby does not thrive on raw milk, boil it.
  • Structure: syntax and grammar vary widely
    • Time flies like an arrow vs Fruit flies like a banana.
  • World knowledge: understanding beyond text
    • Olive oil: oil made from olives
    • Baby oil: oil made for babies

Goal of this lecture

NLP is a broad field. In this lecture I’ll give you a high-level introduction to

  • Topic Modeling
  • Word embeddings
  • Language models and large language models

Class demo