Undersample timeseries data

This transform reduces the data collected over time and retains the general pattern of the data by allowing users to aggregate values based on the frequency, which can be by hour, week, month, or year. One of the aggregation methods, such as mean, max, min, or median, is applied to the numerical value column of the grouped data.

Parameters

This table provides a brief description about each parameter in Undersample timeseries data transform.

Name:

By default, the transform name is populated. You can also add a custom name for the transform.

Input dataset:

The file name of the input dataset on which this transform must be applied. You can select the dataset that was uploaded from the drop-down list.

Group by:

The column based on which the time series data is grouped.

Timestamp:

The name of the timestamp column in the dataset. The name should be as is in the dataset.

Frequency:

The frequency at which the time series data is undersampled. Possible values:

  • B - business day frequency

  • C - custom business day frequency

  • D - calendar day frequency

  • W - weekly frequency

  • M - month end frequency

  • SM - semi-month end frequency (15th and end of month)

  • BM - business month end frequency

  • CBM - custom business month end frequency

  • MS - month start frequency

  • SMS - semi-month start frequency (1st and 15th)

  • BMS - business month start frequency

  • CBMS - custom business month start frequency

  • Q - quarter end frequency

  • BQ - business quarter end frequency

  • QS - quarter start frequency

  • BQS - business quarter start frequency

  • Y - year end frequency

  • BY - business year end frequency

  • YS - year start frequency

  • BYS - business year start frequency

  • BH - business hour frequency

  • H - hourly frequency

  • min - minutely frequency

  • S - secondly frequency

  • ms - milliseconds

  • us - microseconds

  • N - nanoseconds

Resample Type:

The method used to calculate the aggregate value for the selected frequency. Possible values:

  • Mean

  • Median

  • Max

  • Min

Ouput dataset:

The file name with which the dataset is created. This file has the aggregate values for the selected frequency.

The sample input for this transform looks as shown in the screenshot:

../../../_images/undersampleintput1.png

The output after running the Undersample timeseries data transform on the dataset appears as below.

../../../_images/undersample_output.png

How to use it in Notebook

The following is the code snippet you must use in the Jupyter Notebook editor to run the Undersample timeseries data transform:

sensor_cleaned_recipe_predict = project.addRecipe([fill_null_predict_dataset], name='sensor_cleaned_recipe_predict')
undersample_timeseries = Transform()
undersample_timeseries.templateId = undersample_timeseries_template.id
undersample_timeseries.name='undersample_timeseries'
undersample_timeseries.variables = {
    'inputDataset': "fill_null_output_predict",
    'Col_to_undersample_by':'Turbine_ID',
    'Timestamp':"Timestamp",
    'Frequency': "D",
    'Resample_type': "MEAN",
    'outputDataset':'sensor_cleaned_predict'
}
sensor_cleaned_recipe_predict.add_transform(undersample_timeseries)
sensor_cleaned_recipe_predict.run()