Intersection

This function finds the intersection of two datasets based on the column which is common in both and returns the rows that are present in two datasets. The transform will use inner join operation.

tags: [“Data Preparation”]

Parameters

The table gives a brief description about each parameter in Head transform.

Name:

By default, the transform name is populated. You can also add a custom name for the transform.

First Input Dataset:

The file name of the first dataset. You can select the dataset that was uploaded from the drop-down list. (Required: True, Multiple: False)

Second Input Dataset:

The file name of the second dataset. You can select the dataset that was uploaded from the drop-down list to perform intersection operation. (Required: True, Multiple: False)

Intersection:

The file name with which the output dataset is created rows that are present in both datasets. (Required: True, Multiple: False)

ID:

The column ID used to intersect. (Required: True, Multiple: False)

The sample input for this transform looks as shown in the screenshot.

../../../_images/intersection_input.png

The output after running the Intersection transform on the dataset appears as below:

../../../_images/intersection_output.png

How to use it in Notebook

The following is the code snippet you must use in the Jupyter Notebook editor to run the Intersection transform:

template=TemplateV2.get_template_by('Intersection')

recipe_Intersection= project.addRecipe([car_data, employee_data, temperature_data, only_numeric], name='Intersection')

transform=Transform()
transform.templateId = template.id
transform.name='Intersection'
transform.variables = {
'input_dataset':'car',
'input_dataset':'car',
'output_dataset':'intersec',
'id':"car_ID"}
recipe_Intersection.add_transform(transform)
recipe_Intersection.run()

Requirements

pandas