Feature selection
This transform selects relevant features from the test and train datasets that have strong correlation with the target column. This drops the null, constant and highly correlated columns.
Parameters
The table gives a brief description about each parameter in Encode Column transform.
- Name:
By default, the transform name is populated. You can also add a custom name for the transform.
- Train Dataset:
The file name of the train dataset. You can select the dataset that was uploaded from the drop-down list to select relevant features.(Required: True, Multiple: False)
- Test Dataset:
The file name of the test dataset. You can select the dataset that was uploaded from the drop-down list to select relevant features.(Required: True, Multiple: False)
- Target Column:
The name of the target column for which the strongly correlated features must be selected. (Required: True, Multiple: False, Options: [‘FIELDS’], Datasets: [‘df’])
- Output Train Dataset:
The file name with which the train dataset is created with features that are correlated with the target column. (Required: True, Multiple: False)
- Output Test Dataset:
The file name with which the test output dataset is created with features that are correlated with the target column. (Required: True, Multiple: False)
Example of feature selection transform inputted with data:
The train output after running the Feature selection transform on the dataset appears as below:
The test output after running the Feature selection transform on the dataset appears as below:
The feature selection summary gives the list of columns that are removed from the dataset.
How to use it in Notebook
The following is the code snippet you must use in the Jupyter Notebook editor to run the Feature selection transform:
transform = Transform()
transform.name = "feature selection"
transform.templateId = feature_selection.id
transform.variables = {
"inputTrainDataset": train_w_mte.name,
"inputTestDataset": test_w_mte.name,
"targetCol": targetCol,
"origionalDatasetName": dataset_input_name,
"outputTrainDataset": fs_train_ds_name,
"outputTestDataset": fs_test_ds_name
}
recipe_fs = project.addRecipe([train_w_mte, test_w_mte], name="feature_selection")
#recipe_fs.prepareForLocal(transform, contextId="recipe_fs")
recipe_fs.addTransform(transform)
recipe_fs.run()
Requirements
scikit-learn pandas