Train Test Split
This transform allows you to split the dataset into training and testing sets to build a predictive model. You can specify the percentage of data to be used for training and testing the model.
Parameters
This table provides a brief description about each parameter in Train Test Split transform.
- Name:
By default, the transform name is populated. You can also add a custom name for the transform.
- Input Dataset:
The file name of the input dataset on which train and test split transform must be applied. You can select the dataset that was uploaded from the drop-down list. (Required: True, Multiple: False)
- Target column:
The target column used for predictions.
- Test size:
The percentage of data to be used for testing the model. Based on this, the data will be split into two; one for testing and the other for training.
- Output Train Dataset:
The file name with which the output dataset is created after training the model. (Required: True, Multiple: False)
- Output Test Dataset:
The file name with which the output dataset is created after testing the model. (Required: True, Multiple: False)
The sample input for this transform looks as shown in the screenshot:
The output after running the Train Test Split transform on the dataset appears as below. This is the output after training the model.
This is the output generated after testing the model.
How to use it in Notebook
The following is the code snippet you must use in the Jupyter Notebook editor to run the Train Test Split transform:
train_ds_name = dataset_input_name + "_train"
test_ds_name = dataset_input_name + "_test"
transform = Transform()
transform.name = "train test split"
transform.templateId = train_test_split.id
transform.variables = {
"inputDataset": dataset_w_bin_cols.name,
"targetCol": targetCol,
"test_size": 0.2,
"output_train": train_ds_name,
"output_test": test_ds_name
}
recipe_split = project.addRecipe([dataset_w_bin_cols], name="train test split")
# recipe_split.prepareForLocal(transform, contextId="recipe_split")
recipe_split.addTransform(transform)
recipe_split.run()
Requirements
pandas