Feature selection

This transform selects relevant features from the test and train datasets that have strong correlation with the target column. This drops the null, constant and highly correlated columns.

Parameters

The table gives a brief description about each parameter in Encode Column transform.

Name:

By default, the transform name is populated. You can also add a custom name for the transform.

Train Dataset:

The file name of the train dataset. You can select the dataset that was uploaded from the drop-down list to select relevant features.(Required: True, Multiple: False)

Test Dataset:

The file name of the test dataset. You can select the dataset that was uploaded from the drop-down list to select relevant features.(Required: True, Multiple: False)

Target Column:

The name of the target column for which the strongly correlated features must be selected. (Required: True, Multiple: False, Options: [‘FIELDS’], Datasets: [‘df’])

Output Train Dataset:

The file name with which the train dataset is created with features that are correlated with the target column. (Required: True, Multiple: False)

Output Test Dataset:

The file name with which the test output dataset is created with features that are correlated with the target column. (Required: True, Multiple: False)

Example of feature selection transform inputted with data:

../../../_images/featureselection_input.png

The train output after running the Feature selection transform on the dataset appears as below:

../../../_images/featureselectiontrain_output.png

The test output after running the Feature selection transform on the dataset appears as below:

../../../_images/featureselectiontest_output.png

The feature selection summary gives the list of columns that are removed from the dataset.

../../../_images/featureselectiondashboard.png

How to use it in Notebook

The following is the code snippet you must use in the Jupyter Notebook editor to run the Feature selection transform:

transform = Transform()
transform.name = "feature selection"
transform.templateId = feature_selection.id
transform.variables = {
    "inputTrainDataset": train_w_mte.name,
    "inputTestDataset": test_w_mte.name,
    "targetCol": targetCol,
    "origionalDatasetName": dataset_input_name,
    "outputTrainDataset": fs_train_ds_name,
    "outputTestDataset": fs_test_ds_name
}

recipe_fs = project.addRecipe([train_w_mte, test_w_mte], name="feature_selection")
#recipe_fs.prepareForLocal(transform, contextId="recipe_fs")
recipe_fs.addTransform(transform)
recipe_fs.run()

Requirements

scikit-learn pandas