Lecture 1: Introduction to CPSC 330

Varada Kolhatkar

🎯 Learning Outcomes

By the end of this module, you will be able to:

  • Explain the difference between AI, ML, and DL
  • Describe what machine learning is and when it is appropriate to use ML-based solutions.
  • Identify different types of machine learning problems, such as classification, regression, clustering, and time series forecasting.
  • Recognize common data types in machine learning, including tabular, text, and image data.
  • Evaluate whether a machine learning solution is suitable for your problem or whether a rule-based or human-expert solution is more appropriate.

CPSC 330 website


  • Course Jupyter book: https://ubc-cs.github.io/cpsc330-2025W1
  • Course GitHub repository: https://github.com/UBC-CS/cpsc330-2025W1

🤝 Introductions 🤝

Meet your instructor

  • Varada Kolhatkar [ʋəɾəda kɔːlɦəʈkər]
  • You can call me Varada, V, or Ada.
  • Associate Professor of Teaching in the Department of Computer Science.
  • Ph.D. in Computational Linguistics at the University of Toronto.
  • I primarily teach machine learning courses in the Master of Data Science (MDS) program.
  • Contact information
    • Email: kvarada@cs.ubc.ca
    • Office: ICCS 237

Meet Eva (a fictitious persona)!

Eva is among one of you. She has some experience in Python programming. She knows machine learning as a buzz word. During her recent internship, she has developed some interest and curiosity in the field. She wants to learn what is it and how to use it. She is a curious person and usually has a lot of questions!

You all

  • Introduce yourself to your neighbour.
  • Since we’re going to spend the semester with each other, I would like to know you a bit better.
  • Please fill out Getting to know you survey when you get a chance.

Asking questions during class

  • You are welcome to ask questions by raising your hand.
  • No question is a stupid question.
  • Recommended reading as you begin your learning journey: The Fear of Publicly Not Knowing

What I quickly came to realize was that publicly not knowing wasn’t a indicator of stupidity, it was an indicator of understanding. And from what I’ve seen, it is one of the clearest indicators of success in people — more than school prestige, more than GPA.

Activity 1


Discuss with you neighbour

  • What do you know about machine learning?
  • What would you like to get out this course?
  • Are there any particular topics or aspects of this course that you are especially excited or anxious about? Why?

Which cat do you think is AI-generated?

  • A
  • B
  • Both
  • None
  • What clues did you use to decide?

What is AI, ML, DL?

  • Artificial Intelligence (AI): Making computers act smart
  • Machine Learning (ML): Learning patterns from data
  • Deep Learning (DL): Using neural networks to learn complex patterns

Let’s walk through an example

  • Have you used search in Google Photos? You can search for “cat” and it will retrieve photos from your libraries containing cats.
  • This can be done using image classification.

Image classification

  • Imagine we want a system that can tell cats and foxes apart.
  • How might we do this with traditional programming? With ML?

Image ID Whiskers Present Ear Size Face Shape Fur Color Eye Shape Label
1 Yes Small Round Mixed Round Cat
2 Yes Medium Round Brown Almond Cat
3 Yes Large Pointed Red Narrow Fox
4 Yes Large Pointed Red Narrow Fox
5 Yes Small Round Mixed Round Cat
6 Yes Large Pointed Red Narrow Fox
7 Yes Small Round Grey Round Cat
8 Yes Small Round Black Round Cat
9 Yes Large Pointed Red Narrow Fox

Traditional programming: example

  • You hard-code rules. If all of the following satisfy, it’s a fox.
    • pointed face ✅
    • red fur ✅
    • narrow eyes ✅
  • This works for normal cases, but what if there are exceptions

ML approach: example

  • We don’t tell the model the exact rule. Instead, we give it many labeled images, and it learns probabilistic patterns across multiple features, not rigid rules.
    • If fur is red 90% chance of Fox.

DL approach: example

  • A neural network automatically learns which features to look at (edges textures objects).
  • No need to even specify face shape or fur colour. It learns relevant features on its own.

What is ML?

  • Learn patterns from examples (data)
  • Make predictions, identify patterns, generate content
  • ML systems improve over time with more data
  • No single model fits all problems

When to use ML?

  • When the problem can’t be solved with a fixed set of rules
  • When you have lots of data and complex relationships
  • When human decision-making is too slow or inconsistent
Approach Best for
Traditional Programming Rules are known, data is clean/predictable
Machine Learning Rules are complex/unknown, data is noisy

When to use Machine Learning (ML) solutions?

Example: Supervised classification

  • We want to predict liver disease from tabular features:
Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio Target
40 14.5 6.4 358 50 75 5.7 2.1 0.50 Disease
33 0.7 0.2 256 21 30 8.5 3.9 0.80 Disease
24 0.7 0.2 188 11 10 5.5 2.3 0.71 No Disease
60 0.7 0.2 171 31 26 7.0 3.5 1.00 No Disease
18 0.8 0.2 199 34 31 6.5 3.5 1.16 No Disease

Model training

from lightgbm.sklearn import LGBMClassifier
model = LGBMClassifier(random_state=123, verbose=-1)
model.fit(X_train, y_train)
LGBMClassifier(random_state=123, verbose=-1)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

New examples

  • Given features of new patients below we’ll use this model to predict whether these patients have the liver disease or not.
Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio
19 1.4 0.8 178 13 26 8.0 4.6 1.30
12 1.0 0.2 719 157 108 7.2 3.7 1.00
60 5.7 2.8 214 412 850 7.3 3.2 0.78
42 0.5 0.1 162 155 108 8.1 4.0 0.90

Model predictions on new examples

  • Let’s examine predictions
pred_df = pd.DataFrame({"Predicted_target": model.predict(X_test).tolist()})
df_concat = pd.concat([pred_df, X_test.reset_index(drop=True)], axis=1)
HTML(df_concat.to_html(index=False))
Predicted_target Age Total_Bilirubin Direct_Bilirubin Alkaline_Phosphotase Alamine_Aminotransferase Aspartate_Aminotransferase Total_Protiens Albumin Albumin_and_Globulin_Ratio
No Disease 19 1.4 0.8 178 13 26 8.0 4.6 1.30
Disease 12 1.0 0.2 719 157 108 7.2 3.7 1.00
Disease 60 5.7 2.8 214 412 850 7.3 3.2 0.78
Disease 42 0.5 0.1 162 155 108 8.1 4.0 0.90

Example: Supervised regression

Suppose we want to predict housing prices given a number of attributes associated with houses. The target here is continuous and not discrete.

target bedrooms bathrooms sqft_living sqft_lot floors waterfront view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft_lot15
509000.0 2 1.50 1930 3521 2.0 0 0 3 8 1930 0 1989 0 98007 47.6092 -122.146 1840 3576
675000.0 5 2.75 2570 12906 2.0 0 0 3 8 2570 0 1987 0 98075 47.5814 -122.050 2580 12927
420000.0 3 1.00 1150 5120 1.0 0 0 4 6 800 350 1946 0 98116 47.5588 -122.392 1220 5120
680000.0 8 2.75 2530 4800 2.0 0 0 4 7 1390 1140 1901 0 98112 47.6241 -122.305 1540 4800
357823.0 3 1.50 1240 9196 1.0 0 0 3 8 1240 0 1968 0 98072 47.7562 -122.094 1690 10800

Building a regression model

from lightgbm.sklearn import LGBMRegressor

X_train, y_train = train_df.drop(columns= ["target"]), train_df["target"]
X_test, y_test = test_df.drop(columns= ["target"]), train_df["target"]

model = LGBMRegressor()
model.fit(X_train, y_train);

Predicting prices of unseen houses

pred_df = pd.DataFrame(
    {"Predicted_target": model.predict(X_test[0:4]).tolist()}
)
df_concat = pd.concat([pred_df, X_test[0:4].reset_index(drop=True)], axis=1)
HTML(df_concat.to_html(index=False))
Predicted_target bedrooms bathrooms sqft_living sqft_lot floors waterfront view condition grade sqft_above sqft_basement yr_built yr_renovated zipcode lat long sqft_living15 sqft_lot15
345831.740542 4 2.25 2130 8078 1.0 0 0 4 7 1380 750 1977 0 98055 47.4482 -122.209 2300 8112
601042.018745 3 2.50 2210 7620 2.0 0 0 3 8 2210 0 1994 0 98052 47.6938 -122.130 1920 7440
311310.186024 4 1.50 1800 9576 1.0 0 0 4 7 1800 0 1977 0 98045 47.4664 -121.747 1370 9576
597555.592401 3 2.50 1580 1321 2.0 0 2 3 8 1080 500 2014 0 98107 47.6688 -122.402 1530 1357

We are predicting continuous values here as apposed to discrete values in disease vs. no disease example.

Text data

Example: Text classification

  • Suppose you are given some data with labeled spam and non-spam messages and you want to predict whether a new message is spam or not spam.
sms_df = pd.read_csv(DATA_DIR + "spam.csv", encoding="latin-1")
sms_df = sms_df.drop(columns = ["Unnamed: 2", "Unnamed: 3", "Unnamed: 4"])
sms_df = sms_df.rename(columns={"v1": "target", "v2": "sms"})
train_df, test_df = train_test_split(sms_df, test_size=0.10, random_state=42)
target sms
spam LookAtMe!: Thanks for your purchase of a video clip from LookAtMe!, you've been charged 35p. Think you can do better? Why not send a video in a MMSto 32323.
ham Aight, I'll hit you up when I get some cash
ham Don no da:)whats you plan?
ham Going to take your babe out ?
ham No need lar. Jus testing e phone card. Dunno network not gd i thk. Me waiting 4 my sis 2 finish bathing so i can bathe. Dun disturb u liao u cleaning ur room.

Let’s train a model

X_train, y_train = train_df["sms"], train_df["target"]
X_test, y_test = test_df["sms"], test_df["target"]
clf = make_pipeline(CountVectorizer(max_features=5000), LogisticRegression(max_iter=5000))
clf.fit(X_train, y_train) # Training the model
Pipeline(steps=[('countvectorizer', CountVectorizer(max_features=5000)),
                ('logisticregression', LogisticRegression(max_iter=5000))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Unseen messages

  • Now use the trained model to predict targets of unseen messages:
sms
3245 Funny fact Nobody teaches volcanoes 2 erupt, tsunamis 2 arise, hurricanes 2 sway aroundn no 1 teaches hw 2 choose a wife Natural disasters just happens
944 I sent my scores to sophas and i had to do secondary application for a few schools. I think if you are thinking of applying, do a research on cost also. Contact joke ogunrinde, her school is one m...
1044 We know someone who you know that fancies you. Call 09058097218 to find out who. POBox 6, LS15HB 150p
2484 Only if you promise your getting out as SOON as you can. And you'll text me in the morning to let me know you made it in ok.

Predicting on unseen data

The model is accurately predicting labels for the unseen text messages above!

  sms spam_predictions
3245 Funny fact Nobody teaches volcanoes 2 erupt, tsunamis 2 arise, hurricanes 2 sway aroundn no 1 teaches hw 2 choose a wife Natural disasters just happens ham
944 I sent my scores to sophas and i had to do secondary application for a few schools. I think if you are thinking of applying, do a research on cost also. Contact joke ogunrinde, her school is one me the less expensive ones ham
1044 We know someone who you know that fancies you. Call 09058097218 to find out who. POBox 6, LS15HB 150p spam
2484 Only if you promise your getting out as SOON as you can. And you'll text me in the morning to let me know you made it in ok. ham

Examplel: Text classification with LLMs

 

from transformers import pipeline, AutoModelForTokenClassification, AutoTokenizer
# Sentiment analysis pipeline
analyzer = pipeline("sentiment-analysis", model='distilbert-base-uncased-finetuned-sst-2-english')
analyzer(["I asked my model to predict my future, and it said '404: Life not found.'",
          '''Machine learning is just like cooking—sometimes you follow the recipe, 
            and other times you just hope for the best!.'''])
[{'label': 'NEGATIVE', 'score': 0.995707631111145},
 {'label': 'POSITIVE', 'score': 0.9994770884513855}]

Zero-shot learning


  • Now suppose you want to identify the emotion expressed in the text rather than just positive or negative.
['im feeling rather rotten so im not very ambitious right now',
 'im updating my blog because i feel shitty',
 'i never make her separate from me because i don t ever want her to feel like i m ashamed with her',
 'i left with my bouquet of red and yellow tulips under my arm feeling slightly more optimistic than when i arrived',
 'i was feeling a little vain when i did this one',
 'i cant walk into a shop anywhere where i do not feel uncomfortable',
 'i felt anger when at the end of a telephone call',
 'i explain why i clung to a relationship with a boy who was in many ways immature and uncommitted despite the excitement i should have been feeling for getting accepted into the masters program at the university of virginia',
 'i like to have the same breathless feeling as a reader eager to see what will happen next',
 'i jest i feel grumpy tired and pre menstrual which i probably am but then again its only been a week and im about as fit as a walrus on vacation for the summer']

Zero-shot learning for emotion detection


from transformers import AutoTokenizer
from transformers import pipeline 
import torch

#Load the pretrained model
model_name = "facebook/bart-large-mnli"
classifier = pipeline('zero-shot-classification', model=model_name)
exs = dataset["test"]["text"][10:20]
candidate_labels = ["sadness", "joy", "love","anger", "fear", "surprise"]
outputs = classifier(exs, candidate_labels)

Zero-shot learning for emotion detection


sequence labels scores
0 i don t feel particularly agitated [surprise, anger, joy, sadness, fear, love] [0.360086590051651, 0.3019047975540161, 0.11901281774044037, 0.11381449550390244, 0.060391612350940704, 0.04478970915079117]
1 i feel beautifully emotional knowing that these women of whom i knew just a handful were holding me and my baba on our journey [joy, love, surprise, fear, sadness, anger] [0.36994314193725586, 0.2887154817581177, 0.25607964396476746, 0.04292308911681175, 0.03344891220331192, 0.008889704942703247]
2 i pay attention it deepens into a feeling of being invaded and helpless [fear, surprise, sadness, anger, joy, love] [0.34146907925605774, 0.3088078498840332, 0.25616830587387085, 0.07989830523729324, 0.007844790816307068, 0.005811698734760284]
3 i just feel extremely comfortable with the group of people that i dont even need to hide myself [joy, surprise, love, sadness, anger, fear] [0.3305220901966095, 0.29472312331199646, 0.15343144536018372, 0.07691436260938644, 0.07596738636493683, 0.06844153255224228]
4 i find myself in the odd position of feeling supportive of [surprise, joy, fear, love, sadness, anger] [0.8287988901138306, 0.04317942634224892, 0.039773669093847275, 0.031413134187459946, 0.031412459909915924, 0.025422409176826477]
5 i was feeling as heartbroken as im sure katniss was [sadness, surprise, fear, love, anger, joy] [0.7667969465255737, 0.1818471997976303, 0.025871198624372482, 0.011756820604205132, 0.008171578869223595, 0.005556156858801842]
6 i feel a little mellow today [surprise, joy, love, fear, sadness, anger] [0.49373722076416016, 0.26321911811828613, 0.113678477704525, 0.06402145326137543, 0.050955016165971756, 0.01438874565064907]
7 i feel like my only role now would be to tear your sails with my pessimism and discontent [sadness, anger, surprise, fear, joy, love] [0.6992796063423157, 0.20048746466636658, 0.06185894086956978, 0.03287427872419357, 0.00364686525426805, 0.0018528560176491737]
8 i feel just bcoz a fight we get mad to each other n u wanna make a publicity n let the world knows about our fight [anger, surprise, sadness, fear, joy, love] [0.6029903292655945, 0.19827111065387726, 0.10198860615491867, 0.08116946369409561, 0.01011708565056324, 0.005463303066790104]
9 i feel like reds and purples are just so rich and kind of perfect [joy, surprise, love, anger, fear, sadness] [0.3644145727157593, 0.3051210045814514, 0.19462504982948303, 0.055566269904375076, 0.05413524806499481, 0.026137862354516983]

Image data

Example: Predicting labels of a given image

  • Suppose you have a bunch of animal images. You do not have any labels associated with them and you want to predict labels of these images.
  • We can use machine learning to predict labels of these images using a technique called transfer learning.

                         Class  Probability score
                     tiger cat              0.636
              tabby, tabby cat              0.174
Pembroke, Pembroke Welsh corgi              0.081
               lynx, catamount              0.011
--------------------------------------------------------------

                                     Class  Probability score
         cheetah, chetah, Acinonyx jubatus              0.994
                  leopard, Panthera pardus              0.005
jaguar, panther, Panthera onca, Felis onca              0.001
       snow leopard, ounce, Panthera uncia              0.000
--------------------------------------------------------------

                                   Class  Probability score
                                 macaque              0.885
patas, hussar monkey, Erythrocebus patas              0.062
      proboscis monkey, Nasalis larvatus              0.015
                       titi, titi monkey              0.010
--------------------------------------------------------------

                        Class  Probability score
Walker hound, Walker foxhound              0.582
             English foxhound              0.144
                       beagle              0.068
                  EntleBucher              0.059
--------------------------------------------------------------

:::

Clustering images

Finding groups in food images


K-Means on food dataset


densenet = models.densenet121(weights="DenseNet121_Weights.IMAGENET1K_V1")
densenet.classifier = torch.nn.Identity()  # remove that last "classification" layer
Z_food = get_features_unsup(densenet, food_inputs)
k = 5
km = KMeans(n_clusters=k, n_init='auto', random_state=123)
km.fit(Z_food)
KMeans(n_clusters=5, random_state=123)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Examining food clusters


for cluster in range(k):
    get_cluster_images(km, Z_food, X_food, cluster, n_img=6)
39
Image indices:  [ 39 197  12  14 138 181]

228
Image indices:  [228  65 128  54 175 260]

138
Image indices:  [138  54 185 278  39  89]

193
Image indices:  [193  39 145 212 169 108]

120
Image indices:  [120 268 244  94  72  87]

Example: Finding most similar items

  • Consider the following titles and queries.
# Corpus of existing paper titles or abstracts
corpus = [
    "Mapping eelgrass beds in British Columbia using remote sensing",
    "Bayesian optimization for reaction discovery and yield improvement",
    "RNA-seq analysis of microbiome interactions and infection susceptibility",
    "Using YOLOv8 for automatic object detection in microscopy images",
    "Anomaly detection in ocean temperature sensor data with machine learning",
    "Embedding-based literature recommendation using scientific abstracts"
]

# List of new queries to compare
queries = [
    "Predicting yield of chemical reactions using optimization techniques",
    "Tracking changes in marine vegetation through drone imagery",
    "Detecting patterns of infection from gene expression data",
    "Literature discovery using sentence embeddings and neural search",
    "Outlier detection in sensor measurements from ocean buoys"
]

Which queries are similar to which titles?

from sentence_transformers import SentenceTransformer, util
import numpy as np

# Load the MiniLM model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Encode corpus and queries
corpus_embeddings = model.encode(corpus, convert_to_tensor=True).cpu()
query_embeddings = model.encode(queries, convert_to_tensor=True).cpu()

# Set number of top matches to show
top_k = 1

Which queries are similar to which titles?


🔍 Query 1: Predicting yield of chemical reactions using optimization techniques
   ➤ Match 1: Bayesian optimization for reaction discovery and yield improvement
     Cosine similarity: 0.7420

🔍 Query 2: Tracking changes in marine vegetation through drone imagery
   ➤ Match 1: Mapping eelgrass beds in British Columbia using remote sensing
     Cosine similarity: 0.5305

🔍 Query 3: Detecting patterns of infection from gene expression data
   ➤ Match 1: RNA-seq analysis of microbiome interactions and infection susceptibility
     Cosine similarity: 0.4292

🔍 Query 4: Literature discovery using sentence embeddings and neural search
   ➤ Match 1: Embedding-based literature recommendation using scientific abstracts
     Cosine similarity: 0.5649

🔍 Query 5: Outlier detection in sensor measurements from ocean buoys
   ➤ Match 1: Anomaly detection in ocean temperature sensor data with machine learning
     Cosine similarity: 0.6847

Interactive: Is ML appropriate?

❓❓ Questions for you

iClicker cloud join link:

Select all that apply: Which problems are suitable for ML?

    1. Checking if a UBC email address ends with @student.ubc.ca before allowing login
    1. Deciding which students should be awarded a scholarship based on their personal essays
    1. Predicting which songs you’ll like based on your Spotify listening history
    1. Detecting plagiarism by checking if two essays are exactly identical
    1. Automatically tagging photos of your friends on Instagram

Summary: When is ML suitable?

Approach Best Used When…
Machine Learning The dataset is large and complex, and the decision rules are unknown, fuzzy, or too complex to define explicitly
Rule-based System The logic is clear and deterministic, and the rules or thresholds are known and stable
Human Expert The problem involves ethics, creativity, emotion, or ambiguity that can’t be formalized easily

Activity 2

Think of a problem you have come across in the past which could be solved using machine learning.

  • What would be the input and output?
  • How do humans solve this now? Are there heuristics or rules?
  • What kind of data do you have or could you collect?

Types of machine learning

Here are some typical learning problems.

  • Supervised learning (Gmail spam filtering)
    • Training a model from input data and its corresponding targets to predict targets for new examples.
  • Unsupervised learning (Google News)
    • Training a model to find patterns in a dataset, typically an unlabeled dataset.
  • Reinforcement learning (AlphaGo)
    • A family of algorithms for finding suitable actions to take in a given situation in order to maximize a reward.
  • Recommendation systems (Amazon item recommendation system)
    • Predict the “rating” or “preference” a user would give to an item.

What is supervised learning?

  • Training data comprises a set of observations (X) and their corresponding targets (y).
  • We wish to find a model function f that relates X to y.
  • We use the model function to predict targets of new examples.

🤔 Eva’s questions


At this point, Eva is wondering about many questions.

  • How are we exactly “learning” whether a message is spam and ham?
  • Are we expected to get correct predictions for all possible messages? How does it predict the label for a message it has not seen before?
  • What if the model mis-labels an unseen example? For instance, what if the model incorrectly predicts a non-spam as a spam? What would be the consequences?
  • How do we measure the success or failure of spam identification?
  • If you want to use this model in the wild, how do you know how reliable it is?
  • Would it be useful to know how confident the model is about the predictions rather than just a yes or a no?

It’s great to think about these questions right now. But Eva has to be patient. By the end of this course you’ll know answers to many of these questions!

Break

Surveys

  • Please complete the “Getting to know you” survey on Canvas.
  • Also, please complete the anonymous restaurant survey on Qualtrics here.
    • We will try to analyze this data set in the coming weeks.

About this course

Important

Course website: https://github.com/UBC-CS/cpsc330-2025W1 is the most important link. Please read everything on this GitHub page!

Important

Make sure you go through the syllabus thoroughly and complete the syllabus quiz before Sept 19th at 11:59pm.

CPSC 330 vs. 340

Read https://github.com/UBC-CS/cpsc330-2025W1/blob/main/docs/330_vs_340.md which explains the difference between two courses.

TLDR:

  • 340: how do ML models work?
  • 330: how do I use ML models?
  • CPSC 340 has many prerequisites.
  • CPSC 340 goes deeper but has a more narrow scope.
  • I think CPSC 330 will be more useful if you just plan to apply basic ML.

Registration, waitlist and prerequisites

Important

Please go through this document carefully before contacting your instructors about these issues. Even then, we are very unlikely to be able to help with registration, waitlist or prerequisite issues.

  • There are still seats available in Section 103.
  • If you are on the waitlist and would like to try your chances, you should already have access to Canvas and Piazza.
  • Please note that it is your responsibility to complete and submit all assessments while you are on the waitlist. No concessions will be made for students who are waitlisted.
  • If you are unable to secure a seat this term, the course will be offered again with two sections next semester, and once more in the summer.

Lecture format

  • In person lectures T/Th.
  • Sometimes there will be videos to watch before lecture. You will find the list of pre-watch videos in the schedule on the course webpage.
  • We will also try to work on some questions and exercises together during the class.
  • All materials will be posted in this GitHub repository.

Tutorials

  • Weekly tutorials will be run by the TAs.
  • There is a small participation grade associated with attending tutorials.
  • Make use of this helpful resource.

Homework assignments

  • First homework assignment is due this coming Tuesday, September 9, midnight. This is a relatively straightforward assignment on Python.
  • If you struggle with this assignment then that could be a sign that you will struggle later on in the course.
  • You must do the first two homework assignments on your own.

Exams

  • We’ll have two self-scheduled midterms and one final in Computer-based Testing Facility (CBTF).

Course calendar

Here is our course Calendar. Make sure you check it on a regular basis:

https://htmlpreview.github.io/?https://github.com/UBC-CS/cpsc330-2025W1/blob/main/docs/calendar.html

Course structure

  • Introduction
    • Week 1
  • Part I: ML fundamentals, preprocessing, midterm 1
    • Weeks 2, 3, 4, 5, 6, 7, 8
  • Part II: Unsupervised learning, transfer learning, common special cases, midterm 1
    • Weeks 8, 9, 10, 11, 12
  • Part III: Communication and ethics
    • ML skills are not beneficial if you can’t use them responsibly and communicate your results. In this module we’ll talk about these aspects.
    • Weeks 13, 14

Code of conduct

  • Our main forum for getting help will be Piazza.

Important

Please read this entire document about asking for help. TLDR: Be nice.

Homework format: Jupyter notebooks

  • Our notes are created in a Jupyter notebook, with file extension .ipynb.
  • Also, you will complete your homework assignments using Jupyter notebooks.
  • Confusingly, “Jupyter notebook” is also the original application that opens .ipynb files - but has since been replaced by Jupyter lab.
    • I am using Jupyter lab, some things might not work with the Jupyter notebook application.
    • You can also open these files in Visual Studio Code.

Jupyter notebooks

  • Notebooks contain a mix of code, code output, markdown-formatted text (including LaTeX equations), and more.
  • When you open a Jupyter notebook in one of these apps, the document is “live”, meaning you can run the code.

For example:

1 + 1
2
x = [1, 2, 3]
x[0] = 9999
x
[9999, 2, 3]

Jupyter

  • By default, Jupyter prints out the result of the last line of code, so you don’t need as many print statements.
  • In addition to the “live” notebooks, Jupyter notebooks can be statically rendered in the web browser, e.g. this.
    • This can be convenient for quick read-only access, without needing to launch the Jupyter notebook/lab application.
    • But you need to launch the app properly to interact with the notebooks.

Lecture notes

  • All the lectures from last year are available here.
  • We cannot promise anything will stay the same from last year to this year, so read them in advance at your own risk.
  • A “finalized” version will be pushed to GitHub and the Jupyter book right before each class.
  • Each instructor will have slightly adapted versions of notes to present slides during lectures.
  • You will find the link to these slides in our repository: https://github.com/UBC-CS/cpsc330-2025W1/tree/main/lectures/102-Varada-lectures

Grades

  • The grading breakdown is here.
  • The policy on challenging grades is here.

Setting up your computer for the course

Course conda environment

  • Follow the setup instructions here to create a course conda environment on your computer.
  • If you do not have your computer with you, you can partner up with someone and set up your own computer later.

Python requirements/resources

We will primarily use Python in this course.

Here is the basic Python knowledge you’ll need for the course:

  • Basic Python programming
  • Numpy
  • Pandas
  • Basic matplotlib
  • Sparse matrices

Homework 1 is all about Python.

Note

We do not have time to teach all the Python we need but you can find some useful Python resources here.



Checklist for you before the next class