CPSC 330 Lecture 17: Introduction to Natural Language Processing
Announcements
- No classes next week.
- Midterm 2 coming up next week.
- Do you need OH to get ready for the midterm? If yes, I can hold OH on Monday.
iClicker question
Do you need office hours to help you prepare for the midterm?
- Monday at 2 PM works well for me
- Monday at 4 PM works well for me
- No, I don’t need office hours
Recap
Imagine that you recently watched Minimalism: A Documentary About the Important Things (2015) and rated it highly.
Suppose a recommendation system suggests items based on item similarity. Which movie would be recommended to you?
- 🎬 Avatar: The Way of Water (2022) (Highly popular sci-fi movie with themes of nature and connection.)
- 🎬 The Art of Effortless Living (2021) (A lesser-known documentary about mindfulness and simple living.)
Similarity metrics
- Similarity based on Euclidean distance
\[distance(vec1, vec2) = \sqrt{\sum_{i =1}^{n} (vec1_i - vec2_i)^2}\]
\[similarity_{dot product}(vec1,vec2) = vec1.vec2\]
- Cosine similarity: normalized version of dot product.
\[similarity_{cosine}(vec1,vec2) = \frac{vec1.vec2}{\left\lVert vec1\right\rVert_2 \left\lVert vec2\right\rVert_2}\]
Which metric in what context?
Given a query vector “Query” in the picture below and the three item vectors, determine the ranking of the items for the three similarity measures below:

- Example: Similarity based on Euclidean distance: item B > item C > item A
- Similarity based on dot product: ?
- Cosine similarity: ?
Adapted from here.
What is NLP?
- Natural Language Processing (NLP) is a field at the intersection of computer science, linguistics, and artificial intelligence.
- It focuses on enabling computers to understand, interpret, and generate human language.
Examples of NLP applications
Key challenges in NLP
- Ambiguity: words can have multiple meanings and meaning depends on previous words/sentences
- I had toast with jam vs We got stuck in a traffic jam.
- If the baby does not thrive on raw milk, boil it.
- Structure: syntax and grammar vary widely
- Time flies like an arrow vs Fruit flies like a banana.
- World knowledge: understanding beyond text
- Olive oil: oil made from olives
- Baby oil: oil made for babies
Goal of this lecture
NLP is a broad field. In this lecture I’ll give you a high-level introduction to
- Topic Modeling
- Word embeddings
- Language models and large language models