Why does reality bore me so? Does this arrive from my distaste for effort?
Today's topic is feature selection, the process of selecting a subset of relevant features from original data to be used in model training. The more features you have, the greater the amount of information you can obtain, but models can become overly complex and prone to overfitting this way without professional filtering.
For example, in a sports event, the team leader wants to select suitable players based on the results or performance of the event. The test event includes "100-meter race" and "50-meter swim", both of which are related to explosive power and cardiorespiratory endurance. related. If a runner happens to have good results in the 100-meter race but poor performance in the 50-meter swim, the team leader's judgment will easily be biased. Therefore, if the 50-meter swim can better reflect the physical fitness of the athletes, the 100-meter race can be ignored.
Underneath is a graph containing numerous components of feature selection. Keep it in mind as a reference as we explore several unlisted feature selection methods later.
Before I list you the primary feature selection methods, there is a topic I need to share: the variety of machine learning models. Here is a brief statement for two of the models:
Extreme Gradient Boost (XGB): an algorithm based on gradient boosting. Composed of several weak learners, the main principle of XGB is gradient boosting trees, training a sequence of decision trees repeatedly. It allows the error correction of the next tree according to the predictions of the previous tree. For each iteration, XGB is able to optimize the model by making the loss function smaller and renewing the weights of every decision tree by using gradient descent algorithm.
K-Nearest Neighbors (KNN): a classification approach which involves determining an appropriate value for K, the number of neighboring samples considered during classification. For every new sample, KNN calculates the distances between that sample and all samples in the training set. After that, KNN will select the K nearest neighbor based on these distances, and the algorithm will eventually classify the new sample by majority vote in the labels of the K nearest neighbors.
Now that you have learnt about these prediction models, set up the foundations for creating and training your machine learning models.
With the above out of the way, we can dive onto the many methods of feature selection for prediction model training:
Exhaustive selection: finds all possible feature combinations and fit the models one by one to find the best combo. Considered the most time-consuming method but provides the most certain answer.
Sequential forward selection (SFS): sequentially adds features to an empty candidate set until the addition of further features does not decrease the criterion.
Sequential backward selection (SBS): sequentially removes features from a full candidate set until the removal of further features increase the criterion.
Sequential floating forward selection (SFFS): in each iteration, SFFS considers not only adding the best new feature but also removing any existing feature that might be hindering performance.
Sequential floating backward selection (SFBS): after removing a feature in the backward selection process, SFBS also evaluates the possibility of adding back a previously removed feature if it could improve performance in combination with the remaining features.
Recursive Feature Elimination (RFE): utilizes a user-specified estimator, like a decision tree or linear model, to evaluate the importance of features and iteratively removes the least important ones until a desired number of features remain.
RFE with Cross-Validation (RFECV): automatically selects the number of features to keep, in contrast to RFE which requires a number of features to keep.
Below is a table summarizing each of the 6 methods' logics: