Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
---|---|---|---|
Main hyperparameters | Max depth, min samples split | Number of neighbors (\(k\)) | C (regularization), Gamma (RBF kernel width) |
Interpretability | |||
Handling of non-linearity | |||
Scalability |
Aspect | Decision Trees | K-Nearest Neighbors (KNN) | Support Vector Machines (SVM) with RBF Kernel |
---|---|---|---|
Sensitivity to outliers | |||
Memory usage | |||
Training time | |||
Prediction time | |||
Multiclass support |
iClicker cloud join link: https://join.iclicker.com/VYFJ
Take a guess: In your machine learning project, how much time will you typically spend on data preparation and transformation?
The question is adapted from here.
iClicker cloud join link: https://join.iclicker.com/VYFJ
Select all of the following statements which are TRUE.
StandardScaler
ensures a fixed range (i.e., minimum and maximum values) for the features.StandardScaler
calculates mean and standard deviation for each feature separately.SimpleImputer
The transformed data has a different shape than the original data.iClicker cloud join link: https://join.iclicker.com/VYFJ
Select all of the following statements which are TRUE.
scikit-learn
pipeline object with an estimator as the last step, you can call fit
, predict
, and score
on it.scikit-learn
pipeline.You’re trying to find a suitable date based on:
Person | Age | #FB Friends | Euclidean Distance Calculation | Distance |
---|---|---|---|---|
A | 25 | 400 | √(5² + 150²) | 150.08 |
B | 27 | 300 | √(3² + 50²) | 50.09 |
C | 30 | 500 | √(0² + 250²) | 250.00 |
D | 60 | 250 | √(30² + 0²) | 30.00 |
Based on the distances, the two nearest neighbors (2-NN) are:
What’s the problem here?
Fill in missing data using a chosen strategy:
Imputation is like filling in your average or median or most frequent grade for an assessment you missed.
Ensure all features have a comparable range.
Scaling is like adjusting the number of everyone’s Facebook friends so that both the number of friends and their age are on a comparable scale. This way, one feature doesn’t dominate the other when making comparisons.
Convert categorical features into binary columns.
Turn “Apple, Banana, Orange” into binary columns:
Fruit | 🍎 | 🍌 | 🍊 |
---|---|---|---|
Apple 🍎 | 1 | 0 | 0 |
Banana 🍌 | 0 | 1 | 0 |
Orange 🍊 | 0 | 0 | 1 |
Convert categories into integer values that have a meaningful order.
Turn “Poor, Average, Good” into 1, 2, 3:
Rating | Ordinal |
---|---|
Poor | 1 |
Average | 2 |
Good | 3 |
sklearn
Transformers vs Estimatorsfit
and transform
methods.
fit(X)
: Learns parameters from the data.transform(X)
: Applies the learned transformation to the data.SimpleImputer
): Fills missing values.StandardScaler
): Standardizes features.fit
and predict
methods.
fit(X, y)
: Learns from labeled data.predict(X)
: Makes predictions on new data.DecisionTreeClassifier
, SVC
, KNeighborsClassifier
sklearn
PipelinesChaining a StandardScaler
with a KNeighborsClassifier
model.