Sample Dataset

This function creates a random sample from the dataset

tags: [“Data Preparation”]

Parameters

Input Dataset: Input Dataset to get a sample of it (Required: True, Multiple: False)

Sampling Type: To take a proportion of the dataset, pick “frac”. To take a certain number of rows of the dataset pick “n” (Required: True, Multiple: False, Datatypes: [‘STRING’], Options: [‘CONSTANT’], Default_value: ‘frac’, Constant_options: [‘frac’, ‘n’])

Sample Rows/Proportion: If it is “n” pick the number of rows, if it is “frac” pick the proportion (Required: True, Multiple: False, Datatypes: [‘FLOAT’], Options: [‘CONSTANT’], Default_value: ‘0.8’)

Replace: Choose Replace equals to “True” if you want records to be repeated (Required: True, Multiple: False, Datatypes: [‘STRING’], Options: [‘CONSTANT’], Default_value: ‘False’, Constant_options: [‘False’, ‘True’])

Weight Column: Weight Column to do the sample (Required: False, Multiple: False, Datatypes: [‘FLOAT’], Options: [‘FIELDS’], Datasets: [‘df’])

Output Dataset: Output Dataset to be created (Required: True, Multiple: False)

How to use it in Notebook

template=TemplateV2.get_template_by('Sample Dataset')

recipe_Sample_Dataset= project.addRecipe([car_data, employee_data, temperature_data, only_numeric], name='Sample Dataset')

transform=Transform()
transform.templateId = template.id
transform.name='Sample Dataset'
transform.variables = {
'input_dataset':'car',
'type':"frac",
'number':0.8,
'replacen':"FALSE",
'weight':"horsepower",
'output_dataset':'car_sample'}
recipe_Sample_Dataset.add_transform(transform)
recipe_Sample_Dataset.run()

How to use it in RC UI

../../../_images/Sample_Dataset.png

Requirements

pandas