sklearn
CountVectorizer
scikit-learn
’s CountVectorizer
to encode text dataCountVectorizer
: Transforms text into a matrix of token countsmax_features
: Control the number of features used in the modelmax_df
, min_df
: Control document frequency thresholdsngram_range
: Defines the range of n-grams to be extractedstop_words
: Enables the removal of common words that are typically uninformative in most applications, such as “and”, “the”, etc.iClicker cloud join link: https://join.iclicker.com/VYFJ
Select all of the following statements which are TRUE.
handle_unknown="ignore"
would treat all unknown categories equally.max_features
hyperparameter of CountVectorizer
the training score is likely to go up.CountVectorizer
. If you encounter a word in the validation or the test split that’s not available in the training data, we’ll get an error.cross_validate
, each fold might have slightly different number of features (columns) in the fold.X
and y
is linear.Ridge
vs. LinearRegression
Ridge
adds a parameter to control the complexity of a model. Finds a line that balances fit and prevents overly large coefficients.LinearRegression
Ridge
Ridge
.iClicker cloud join link: https://join.iclicker.com/VYFJ
Select all of the following statements which are TRUE.
alpha
of Ridge
is likely to decrease model complexity.Ridge
can be used with datasets that have multiple features.Ridge
, we learn one coefficient per training example.iClicker cloud join link: https://join.iclicker.com/VYFJ
Select all of the following statements which are TRUE.
C
hyperparameter increases model complexity.