Feature selection

This transform selects relevant features from the test and train datasets that have strong correlation with the target column. This drops the null, constant and highly correlated columns.

Parameters

The table gives a brief description about each parameter in Encode Column transform.

Name:: By default, the transform name is populated. You can also add a custom name for the transform.
Train Dataset:: The file name of the train dataset. You can select the dataset that was uploaded from the drop-down list to select relevant features.(Required: True, Multiple: False)
Test Dataset:: The file name of the test dataset. You can select the dataset that was uploaded from the drop-down list to select relevant features.(Required: True, Multiple: False)
Target Column:: The name of the target column for which the strongly correlated features must be selected. (Required: True, Multiple: False, Options: [‘FIELDS’], Datasets: [‘df’])
Output Train Dataset:: The file name with which the train dataset is created with features that are correlated with the target column. (Required: True, Multiple: False)
Output Test Dataset:: The file name with which the test output dataset is created with features that are correlated with the target column. (Required: True, Multiple: False)

Example of feature selection transform inputted with data:

../../../_images/featureselection_input.png

The train output after running the Feature selection transform on the dataset appears as below:

../../../_images/featureselectiontrain_output.png

The test output after running the Feature selection transform on the dataset appears as below:

../../../_images/featureselectiontest_output.png

The feature selection summary gives the list of columns that are removed from the dataset.

../../../_images/featureselectiondashboard.png

How to use it in Notebook

The following is the code snippet you must use in the Jupyter Notebook editor to run the Feature selection transform:

transform = Transform()
transform.name = "feature selection"
transform.templateId = feature_selection.id
transform.variables = {
    "inputTrainDataset": train_w_mte.name,
    "inputTestDataset": test_w_mte.name,
    "targetCol": targetCol,
    "origionalDatasetName": dataset_input_name,
    "outputTrainDataset": fs_train_ds_name,
    "outputTestDataset": fs_test_ds_name
}

recipe_fs = project.addRecipe([train_w_mte, test_w_mte], name="feature_selection")
#recipe_fs.prepareForLocal(transform, contextId="recipe_fs")
recipe_fs.addTransform(transform)
recipe_fs.run()

Requirements

scikit-learn pandas