By the end of this module, you will be able to:
Each of you will receive a sticky note with a word on it at some point. Here’s what you’ll do:
Check out this recent BMO ad.
A common application for predicting the next word is the ‘smart compose’ feature in your emails, text messages, and search engines.
The sentiment of the sentence “I like machine learning” is:
Q: Who won the Nobel Prize in 2024 for their work in deep learning? A:
Q: Who won the Nobel Prize in 2024 for their work in deep learning? A: Geoffrey
What are some reasonable predictions for the next word in the sequence?
I am studying law at the University of British Columbia Point Grey campus in Vancouver because I want to work as a ___
Markov model is unable to capture such long-distance dependencies in language.
Enter attention and transformer models! Transformer models are at the core of all state-of-the-art Generative AI models (e.g., BERT, GPT3, GPT4, Gemini, DALL-E, Llama, Github Copilot)?
Source: GPT-4 Technical Report
When we process information, we often selectively focus on specific parts of the input, giving more attention to relevant information and less attention to irrelevant information. This is the core idea of attention.
Consider the examples below:
Example 1: She left a brief note on the kitchen table, reminding him to pick up groceries.
Example 2: The diplomat’s speech struck a positive note in the peace negotiations.
Example 3: She plucked the guitar strings, ending with a melancholic note.
The word note in these examples serves quite distinct meanings, each tied to different contexts. To capture varying word meanings across different contexts, we need a mechanism that considers the wider context to compute each word’s contextual representation.
If you want to use pre-trained LLMs, it’s useful to know that there are three main types of LLMs.
Feature | Decoder-only (e.g., GPT-3) | Encoder-only (e.g., BERT, RoBERTa) | Encoder decoder (e.g., T5, BARD) |
---|---|---|---|
Output Computation is based on | Information earlier in the context | Entire context (bidirectional) | Encoded input context |
Text Generation | Can naturally generate text completion | Cannot directly generate text | Can generate outputs naturally |
Example | Our ML workshop audience is ___ | Our ML workshop audience is the best! → positive | Input: Translate to Mandarin: Long but productive day! Output: 漫长而富有成效的一天! |
from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
# Sentiment analysis pipeline
analyzer = pipeline("sentiment-analysis", model='distilbert-base-uncased-finetuned-sst-2-english')
analyzer(["I asked my model to predict my future, and it said '404: Life not found.'",
'''Machine learning is just like cooking—sometimes you follow the recipe,
and other times you just hope for the best!.'''])
[{'label': 'NEGATIVE', 'score': 0.995707631111145},
{'label': 'POSITIVE', 'score': 0.9994770884513855}]
['i left with my bouquet of red and yellow tulips under my arm feeling slightly more optimistic than when i arrived',
'i was feeling a little vain when i did this one',
'i cant walk into a shop anywhere where i do not feel uncomfortable',
'i felt anger when at the end of a telephone call',
'i explain why i clung to a relationship with a boy who was in many ways immature and uncommitted despite the excitement i should have been feeling for getting accepted into the masters program at the university of virginia',
'i like to have the same breathless feeling as a reader eager to see what will happen next',
'i jest i feel grumpy tired and pre menstrual which i probably am but then again its only been a week and im about as fit as a walrus on vacation for the summer',
'i don t feel particularly agitated',
'i feel beautifully emotional knowing that these women of whom i knew just a handful were holding me and my baba on our journey',
'i pay attention it deepens into a feeling of being invaded and helpless',
'i just feel extremely comfortable with the group of people that i dont even need to hide myself',
'i find myself in the odd position of feeling supportive of']
from transformers import AutoTokenizer
from transformers import pipeline
import torch
#Load the pretrained model
model_name = "facebook/bart-large-mnli"
classifier = pipeline('zero-shot-classification', model=model_name)
exs = dataset["test"]["text"][:10]
candidate_labels = ["sadness", "joy", "love","anger", "fear", "surprise"]
outputs = classifier(exs, candidate_labels)
sequence | labels | scores | |
---|---|---|---|
0 | im feeling rather rotten so im not very ambiti... | [sadness, anger, surprise, fear, joy, love] | [0.7367963194847107, 0.10041721910238266, 0.09... |
1 | im updating my blog because i feel shitty | [sadness, surprise, anger, fear, joy, love] | [0.7429746985435486, 0.13775986433029175, 0.05... |
2 | i never make her separate from me because i do... | [love, sadness, surprise, fear, anger, joy] | [0.3153638243675232, 0.22490324079990387, 0.19... |
3 | i left with my bouquet of red and yellow tulip... | [surprise, joy, love, sadness, fear, anger] | [0.42182087898254395, 0.3336702883243561, 0.21... |
4 | i was feeling a little vain when i did this one | [surprise, anger, fear, love, joy, sadness] | [0.5639430284500122, 0.17000176012516022, 0.08... |
5 | i cant walk into a shop anywhere where i do no... | [surprise, fear, sadness, anger, joy, love] | [0.37033382058143616, 0.36559492349624634, 0.1... |
6 | i felt anger when at the end of a telephone call | [anger, surprise, fear, sadness, joy, love] | [0.9760521054267883, 0.01253431849181652, 0.00... |
7 | i explain why i clung to a relationship with a... | [surprise, joy, love, sadness, fear, anger] | [0.4382022023200989, 0.232231006026268, 0.1298... |
8 | i like to have the same breathless feeling as ... | [surprise, joy, love, fear, anger, sadness] | [0.7675782442092896, 0.13846899569034576, 0.03... |
9 | i jest i feel grumpy tired and pre menstrual w... | [surprise, sadness, anger, fear, joy, love] | [0.7340186834335327, 0.11860235780477524, 0.07... |
While these models are super powerful and useful, be mindful of the harms caused by these models. Some of the harms as summarized here are: