Select all of the following statements which are TRUE.
gamma tend to result in higher training score but probably lower validation score.gamma and C, we can’t be certain if the model becomes more complex or less complex.| Model | Parameters and hyperparameters | Strengths | Weaknesses |
|---|---|---|---|
| Decision Trees | |||
| KNNs | |||
| SVM RBF |
You’re trying to find a suitable date based on:
| Person | Age | #FB Friends | Euclidean Distance Calculation | Distance |
|---|---|---|---|---|
| A | 25 | 400 | \(\sqrt{5^2 + 150^2}\) | 150.08 |
| B | 27 | 300 | \(\sqrt{3^2 + 50^2}\) | 50.09 |
| C | 30 | 500 | \(\sqrt{0^2 + 250^2}\) | 250.00 |
| D | 60 | 250 | \(\sqrt{30^2 + 0^2}\) | 30.00 |
Based on the distances, the two nearest neighbors (2-NN) are:
What’s the problem here?
Take a guess: In your machine learning project, how much time will you typically spend on data preparation and transformation?
The question is adapted from here.
Select all of the following statements which are TRUE.
StandardScaler ensures a fixed range (i.e., minimum and maximum values) for the features.StandardScaler calculates mean and standard deviation for each feature separately.SimpleImputer The transformed data has a different shape than the original data.Select all of the following statements which are TRUE.
scikit-learn pipeline object with an estimator as the last step, you can call fit, predict, and score on it.scikit-learn pipeline.Fill in missing data using a chosen strategy:
Imputation is like filling in your average or median or most frequent grade for an assessment you missed.
Ensure all features have a comparable range.
Scaling is like adjusting the number of everyone’s Facebook friends so that both the number of friends and their age are on a comparable scale. This way, one feature doesn’t dominate the other when making comparisons.
Convert categorical features into binary columns.
Turn “Apple, Banana, Orange” into binary columns:
| Fruit | 🍎 | 🍌 | 🍊 |
|---|---|---|---|
| Apple 🍎 | 1 | 0 | 0 |
| Banana 🍌 | 0 | 1 | 0 |
| Orange 🍊 | 0 | 0 | 1 |
Convert categories into integer values that have a meaningful order.
Turn “Poor, Average, Good” into 1, 2, 3:
| Rating | Ordinal |
|---|---|
| Poor | 1 |
| Average | 2 |
| Good | 3 |
sklearn Transformers vs Estimatorsfit and transform methods.
fit(X): Learns parameters from the data.transform(X): Applies the learned transformation to the data.SimpleImputer): Fills missing values.StandardScaler): Standardizes features.fit and predict methods.
fit(X, y): Learns from labeled data.predict(X): Makes predictions on new data.DecisionTreeClassifier, SVC, KNeighborsClassifiersklearn PipelinesChaining a StandardScaler with a KNeighborsClassifier model.
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
pipeline = make_pipeline(StandardScaler(), KNeighborsClassifier())
pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)