Scheduler overview
Schedulers are tasks that automate the execution of project flows at a specified frequency, such as daily, weekly, or recurring intervals using a cron schedule expression. You can schedule project flows for specific scenarios within a project to run at the defined time. If a project does not include scenarios, the default scenario can be selected to execute the flow.
Schedulers also provide the flexibility to configure parameters for storing newly generated models and output datasets after each execution. When a scheduler is triggered, the project flow runs and produces the latest output dataset, which is stored on the platform by default. Furthermore, you can configure an external data source to save both input and output datasets generated during each run by specifying it during scheduler creation.
Creating a scheduler
Use this procedure to create a scheduler for a scenario within a project.
From the left navigation menu, select Projects. The Projects dashboard is displayed.
Select the project for which you want to create a scheduler. You can create schedulers for different scenarios in a project.
The Canvas page is displayed.
Click the Scheduler tab on the project navigation menu on the left to open the Scheduler page.
data:image/s3,"s3://crabby-images/b9585/b9585670399da0aa0a3b45e5ce3dd60a787af921" alt="../_images/jobs.gif"
Do one of the following:
Click the plus icon
on the top right-corner of the page.
Click the +New Scheduler option to create a scheduler. However, you can only view this option when there are no schedulers created in this project.
data:image/s3,"s3://crabby-images/16677/16677ab980b4d6bbcb996e313619944cbf3ce672" alt="../_images/newjob.png"
The following page is displayed where you can create a scheduler to run this data pipeline at the set time interval.
data:image/s3,"s3://crabby-images/49068/490683b65e87b1f733615a711eb204be3f0606b6" alt="../_images/jobspage.png"
Click the default scheduler name to provide a custom name on the top.
data:image/s3,"s3://crabby-images/1c4f1/1c4f1faba7bfe66fbf9147dda9ba1b921459a9ce" alt="../_images/customjobname.png"
Select the scenario on which you want to run the project flow at the scheduled frequency.
Select the scheduler frequency. Possible values:
Daily - This displays Hrs and Min. drop-down to select the time at which the job should be triggered.
Weekly - This displays days in a week and time at which the scheduler should be run.
Cron - This displays the unix-corn format to create a scheduler.
View the project canvas.
Click Save to create the scheduler. This also enables the +Destination option to configure the data connector to which you can publish the generated output datasets or the input dataset.
You can see the project variables button only if the variables are defined at the project level. After creating the scheduler, you can change the value in project variables.
data:image/s3,"s3://crabby-images/5bf0e/5bf0e4cfd9e1be20da4d03d325947afebd239f09" alt="../_images/destinationbutton.png"
Click + Destination. This opens the Destinations side panel.
Note
This button is enabled only if you have configured external datasources in your tenant.
data:image/s3,"s3://crabby-images/1ad99/1ad9970a7cef58e15cec040f59f2293502d87974" alt="../_images/destinationsidepanel.png"
Click + DESTINATION.
Select the dataset that you want to add to the destination. If the dataset list is huge, you can use the search option to search for the dataset you want.
Select the destination from the drop-down list. You can only view the list of external datasources configured under this tenant excluding Snowflake and Fivetran connectors.
When you select the SQL connector to synchronize or copy the output dataset generated after running the project, the table name column is displayed. Here, you can provide the table name and select either “Append” or “Replace”. Opting for the “Append” option will append the dataset to the existing one, provided both datasets share the same schema. Alternatively, selecting the “Replace” option will replace the existing dataset with the new one.
If you choose the data connector as MongoDB, you can provide the database name and collection. In the event that the provided collection name already exists, the new dataset will be appended to the existing collection.
Provide the destination folder and destination file name to save the file in the destination folder with the new file name after the job is run every time at the scheduled time.
data:image/s3,"s3://crabby-images/a4e3d/a4e3d995c82d185f095a4e479040b3ffa29d4fc5" alt="../_images/destinationdetails.png"
Select the To create new file for every run check box to create a new file after every job run. The new file will be saved with the RUN ID. Clearing this check box overrides the existing file.
Click Save to save this destination. This button is enabled only after you select all the required destination fields.
data:image/s3,"s3://crabby-images/a5570/a5570052b63bc94e0a5d7adceb143c90c0f1d871" alt="../_images/externaldatasourceconf.png"
Note
You can store files in multiple destinations. To add another destination, click + DESTINATION. If you want to remove any destination, click the delete icon.
If you no longer want to save the output to the configured destination, you can use the delete icon to delete the destination.
Close the window after configuring the destination for the job.
Click GLOBAL VARIABLES to change the configured parameters for this job.
Note: The GLOBAL VARIABLES button is enabled only when the global variables are declared at the project level. To configure global variables, refer configuring global variables at a project level.
Change the value for the key. Please note that you cannot change the key.
Running the scheduler manually
Use this procedure to run and re-run the scheduler manually. However, the run is performed automatically on the recurring schedule.
To run a specific scheduler:
Select the project to run the job.
Click the Scheduler tab to open the list of jobs created in this project. You can only see the jobs list page if the jobs are created for this project.
Select the schedule name link that you want to run manually. This takes you to this specific Schedulers page
data:image/s3,"s3://crabby-images/ce151/ce151c23514779ec2f2ba0d91418962772ffbd9d" alt="../_images/jobrun.png"
Click Run to run the job manually. This opens the Manual Run Configuration side panel.
data:image/s3,"s3://crabby-images/98999/989995214e10f5d2e01d7441b6ee118ba9c52ee6" alt="../_images/runmanually.png"
Provide the run name and click RUN.
data:image/s3,"s3://crabby-images/1012c/1012c5c7aedc5bf6d7b0c5f9dee0ae1449e975fe" alt="../_images/manualrun.png"
When the job run is in progress, the status changes from Created changes to Entity Loading and then to Running. Once the job is run successfully, you can view the output in the Job Run History page.
Click the ellipses icon that appears in the Run Name column on the Job History page and select RE-RUN to re-run the job.
Managing schedulers in a project
Use this procedure to manage all the jobs scheduled in a project.
Hover over the menu icon and select Projects. The Projects dashboard is displayed.
Select the project for which you can to schedule or create a job. You can create jobs for different scenarios in a project.
Click the Scheduler tab on the left navigation menu of the project to open the schedulers page and view the list of schedulers you have already created.
data:image/s3,"s3://crabby-images/7a608/7a6087501b9a23bf9b1cfd2205969bd400418a20" alt="../_images/listofjobs.png"
Note
If there are multiple schedulers, you can use the search option to find the scheduler you want.
You can also create a new scheduler, using the plus option. To create a job, see Creating a Scheduler.
Click on the Job name that you want to edit. This redirects you to the Jobs page where you can edit the job details.
Modify the required details.
Click Save to view the new changes.
On this Jobs page, you can also:
Run this job manually, clicking the Run button.
View the run history, using the Run history icon
. This allows you to view the history of all jobs run till date and up to 300 records of last 30 days.
Pause the job that is running, using the Pause icon
. You can click the same icon to start the paused job.
Click the Action drop-down to select the Delete option to delete this job permanently.
Click the Timeout 1hr option to change the time out duration of the job. You can view this option when you click the Actions drop-down. By default, the time out duration is set to 1 hr. Setting this will terminate the job after this duration.
Viewing the schedulers in a project
Use this procedure to view all the jobs in a project and see the output generated after every job run.
Hover over the menu icon and select Projects. The Projects dashboard is displayed.
Select the project for which you can to schedule or create a job. You can create jobs for different scenarios in a project.
Click the Scheduler tab on the left navigation menu of the project to open the jobs page and view the list of jobs you have already created.
data:image/s3,"s3://crabby-images/4624c/4624ce320da7bff7e929354af043a1c6143d30db" alt="../_images/jobsnew.png"
Review this information:
- Job Name:
The name of the job.
- Status:
The status of this scheduler. Possible values:
Scheduler Active - By default is set to Active.
Scheduler Inactive - Indicates that the manual run has been paused.
- Last run by:
Indicates whether the business user or scheduler has run the scheduler last.
- Last Run:
The date and time at which the scheduler was run lastly.
- Last 5 Runs:
Indicates the last five scheduler run status. Possible values:
Failed - The scheduler has failed to run.
Success - The scheduler has been run successfully.
Created - The scheduler has been created.
- Last Run Output:
Click to view the output generated after running the scheduler. You can only view the output after the scheduler is run and until then this option remains disabled.
- Last Run Log:
Click to view the logs. You can check the logs to understand the errors in jobs that have failed to run.
- Last Run Canvas:
Click to view the canvas page. You can only view the page to see the failed blocks and successful blocks in the data pipeline.
- Last Run Project Variables:
Click to view the last run project variables in this job.
- Updated on:
The date and time when the job was last updated.
- Updated by:
The user who last modified the job details.
You can click on the table settings icon to reorder the columns and select and deselect the columns you want to view.
Viewing the run history of a specific scheduler
Use this procedure to view the run history of a particular scheduler in the project.
Select the project and click scheduler to view the list of schedulers created within this project.
Do one of the following to access the scheduler run history page
Click the scheduler name under Scheduler Name column in the table to navigate to the respective Scheduler page and click the Run History icon.
data:image/s3,"s3://crabby-images/9697f/9697ff54cd5571670a4a87f2ab27e8bb331ee327" alt="../_images/specificjob.png"
data:image/s3,"s3://crabby-images/8ca4a/8ca4ac4b534dc4c98ec4aa899a2add22855dc278" alt="../_images/runhistory.png"
Click the ellipses icon in the Job Name column and select the RUN HISTORY option to view the list of completed scheduler runs for the selected scheduler.
data:image/s3,"s3://crabby-images/61d58/61d581409e793b196713f30e09014f84721185f9" alt="../_images/runhistory12.png"
The Run History page appears.
data:image/s3,"s3://crabby-images/c5b72/c5b72612b324197b1dd0903cbd8a90770c58a6c2" alt="../_images/runhistory1.png"
Review the following details displayed for every job run:
- Run Name:
The name of the scheduler run.
- Triggered by:
The user or scheduler that triggered it.
- Started at:
The time at which the scheduler or user has started to run.
- Runtime:
The duration of the run.
- Status:
The status of the scheduler after the run. Possible status:
Success - Indicates that the job run is successful.
Started - Indicates that the job run is in progress.
Failed - Indicates that the job has failed to run.
Timed out - Indicates the job has been timed out.
Recipe Timed out - Indicates that the recipe within the job has been timed out.
Created - Indicates the job has been created.
Recipe Running - Indicates that the recipe within the flow has started to run.
Entity loading - Indicates that the data is loaded for entities with external data source
- Output:
The output generated after running the scheduler automatically or manually. Click to view and download the input dataset on which the model was created and generated output datasets. You can also view the generated artifacts and models but cannot download them.
- Log:
The log of this particular scheduler. Click to view the logs to debug the issues in the job run.
- Canvas:
The canvas generated after running the job. Click to view the canvas on which the run job was performed.
- Project variables:
The variables used in this job run. Click to view the variables linked to this scheduler.
Publishing the updated data pipeline to selected jobs from canvas
Use this procedure to republish the data pipeline to schedulers. When you update the dataset, delete a recipe or add a new recipe to the data pipeline, you can republish the new flow to scheduler using the republish option on the canvas. This updates the canvas on the selected schedulers.
To publish the changes made in the data pipeline to all or specific scheduler(s) in a project:
Select the project to navigate to the canvas view page.
Click the Actions drop-down and select Publish to Schedulers on the canvas. This displays Republish canvas to schedulers dialog.
data:image/s3,"s3://crabby-images/c40ae/c40aee0d96b669952a2224119c3d6d44e8981b24" alt="../_images/republishjobs.png"
This displays the list of schedulers to which you want to publish the latest or updated data pipeline.
data:image/s3,"s3://crabby-images/ff457/ff45714814ed77ae3c4b3387ef39f902e6b52bda" alt="../_images/republishjobs1.png"
Select the check boxes corresponding to the jobs to which you want to update the latest canvas. This enables the Yes, Republish button.
data:image/s3,"s3://crabby-images/87d42/87d4283c2844f08dcf81f3c1e85c658e453cf001" alt="../_images/selectedjobstorepublish.png"
Click Yes, Republish to republish or update the latest data pipeline to the selected jobs.
You can now navigate to the respective job page to see the latest data pipeline. From the next schedule, the job run is performed on the new modeling pipeline.
Fetching the latest data pipeline to a specific scheduler
Use this procedure to fetch the changes made to the data pipeline on the canvas to the data pipeline in a specific scheduler.
To publish the changes made to the data pipeline on the canvas to a specific scheduler from scheduler page:
Select the project to navigate to the canvas view page.
Select Scheduler from the project level navigation. This takes you to the Schedulers page where you can view the list of schedulers created for this project.
Select the scheduler to which you want to publish the changes made to the data pipeline. This takes you to the selected scheduler page.
Click the Republish button in the canvas section to incorporate all the changes that were made to the canvas at the project level to this pipeline.
data:image/s3,"s3://crabby-images/e20cd/e20cd64aee687b6420020e48dfef4531cdea4b1f" alt="../_images/republishbutton.png"
The Republish Canvas to scheduler window appears.
Click Yes, Republish to republish the project canvas to the scheduler.
Comparing the canvas of the scheduler with current canvas of the project
Use this procedure to compare current canvas of the project and canvas of the scheduler side-by-side to track changes.
To compare the canvas of the scheduler with the current canvas of the project:
1.Select the project to navigate to the canvas view page.
Select Jobs from the project level navigation. This takes you to the schedulers page where you can view the list of schedulers created for this project.
Select the scheduler that you want to compare with the current canvas of the project. This opens the scheduler page.
data:image/s3,"s3://crabby-images/ff6f3/ff6f3bfd21df9f0d486c896fc97b136a47c22f93" alt="../_images/compare.png"
Click Compare to compare the canvas of this scheduler with the canvas of the project to notice the differences. You can see the difference between both the canvases side by side.
data:image/s3,"s3://crabby-images/bac58/bac5849ad3fc6b8890544ba0e9301a7eb2940842" alt="../_images/comparejobs.png"
If you notice the canvas of the scheduler is not up-to-date, you can click Republish to fetch the latest canvas of the project to update the changes.