sklearn
CountVectorizer
scikit-learn
’s CountVectorizer
to encode text dataCountVectorizer
: Transforms text into a matrix of token countsmax_features
: Control the number of features used in the modelmax_df
, min_df
: Control document frequency thresholdsngram_range
: Defines the range of n-grams to be extractedstop_words
: Enables the removal of common words that are typically uninformative in most applications, such as “and”, “the”, etc.Select all of the following statements which are TRUE.
handle_unknown="ignore"
would treat all unknown categories equally.max_features
hyperparameter of CountVectorizer
the training score is likely to go up.CountVectorizer
. If you encounter a word in the validation or the test split that’s not available in the training data, we’ll get an error.cross_validate
, each fold might have slightly different number of features (columns) in the fold.X
and y
is linear.Ridge
vs. LinearRegression
Ridge
adds a parameter to control the complexity of a model. Finds a line that balances fit and prevents overly large coefficients.LinearRegression
Ridge
Ridge
.Select all of the following statements which are TRUE.
alpha
of Ridge
is likely to decrease model complexity.Ridge
can be used with datasets that have multiple features.Ridge
, we learn one coefficient per training example.Select all of the following statements which are TRUE.
C
hyperparameter increases model complexity.Given an input, the probability that it belongs to class \(j \in \{1, 2, \dots, K\}\) is calculated using the softmax function:
\(P(y = j \mid x_i) = \frac{e^{w_j^\top x_i + b_j}}{\sum_{k=1}^{K} e^{w_k^\top x_i + b_k}}\)
Compute Probabilities:
For each class \(j\), compute the probability \(P(y = j \mid x_i)\) using the softmax function.
Select the Class with the Highest Probability:
The predicted class \(\hat{y}\) is:
\(\hat{y} = \arg \max_{j \in \{1, \dots, K\}} P(y = j \mid x_i)\)
Aspect | Binary Logistic Regression | Multinomial Logistic Regression |
---|---|---|
Target variable | 2 classes (binary) | More than 2 classes (multi-class) |
Getting probabilities | Sigmoid | Softmax |
parameters | \(d\) weights, one per feature and the bias term | \(d\) weights and a bias term per class |
Output | Single probability | Probability distribution over classes |
Use case | Binary classification (e.g., spam detection) | Multi-class classification (e.g., flower species) |
So far, we have worked with various transformers and supervised machine learning models. The goal of this activity is collaboratively complete tables that provide an overview of
(This will serve as a handy reference for your upcoming exam and beyond!)
Fill in the following table with at least one entry per box.
Model | Strengths | Weaknesses | Key hyperparameters |
---|---|---|---|
decision tree | |||
\(k\)-NN | |||
RBF SVM | |||
linear models |
Transformation | Purpose | Use cases | Key consideration |
---|---|---|---|
Imputation | |||
Scaling | |||
One-hot encoding | |||
Ordinal encoding | |||
Bag-of-words encoding |
A few big questions remain: