Building a solution

The following documentation will help you build a solution end to end on RapidCanvas through the notebook interface. Please ensure that you have the latest SDK installed before running this.

Building a solution on RapidCanvas involves the following steps:

  • Import functions

  • Authenticate your client

  • Create a Custom Environment

  • Create a new project

  • Build a flow file for the project

  • Execute the project

  • Publish the project as a solution

  • Update solution documentation

In the next section we will go through these steps using a sample project. Download the project files here:Reference Project

After unzipping, move the employee project folder to the root folder where you have installed your SDK.

Opening jupyter notebook

To open jupyter notebook you can use the following

jupyter-notebook

In Jupyter Notebook, you should be able to see the employee_flow.ipynb file and clicking on it will open these following steps

ℹ️ Please note that RapidCanvas only supports the default ipynb kernel in jupyter notebook

Import functions

# Before you go to the next step, execute your import function

import sys

from utils.rc.client.requests import Requests
from utils.rc.client.auth import AuthClient

from utils.rc.dtos.project import Project
from utils.rc.dtos.dataset import Dataset
from utils.rc.dtos.recipe import Recipe
from utils.rc.dtos.transform import Transform
from utils.rc.dtos.template_v2 import TemplateV2, TemplateTransformV2
from utils.rc.dtos.solution import Solution
from utils.rc.dtos.env import Env
from utils.rc.dtos.env import EnvType
from utils.rc.dtos.dataSource import DataSource
from utils.rc.dtos.dataSource import DataSourceType
from utils.rc.dtos.dataSource import GcpConfig

import json
import os
import pandas as pd
import logging
from utils.utils.log_util import LogUtil
LogUtil.set_basic_config(format='%(levelname)s:%(message)s', level=logging.INFO)

Authenticate your client

Authenticate your client using a token or your user credentials

# Requests.setRootHost("https://test.dev.rapidcanvas.net/api/")
AuthClient.setToken()

Creating a custom environment

A custom environment allows you to choose the infrastructure you need to execute your project. Here are the available custom environments and their usage gudelines

SMALL: 1 Core, 2GB Memmory
MEDIUM: 2 Cores, 4GB Memmory
LARGE: 4 Cores, 8GB Memmory
CPU_LARGE: 8 Cores, 16GB Memmory
MAX_LARGE: 12 Cores, 32GB Memmory
EXTRA_MAX_LARGE: 12 Cores, 48GB Memmory

You can create a new env by executing the cell below:

env = Env.createEnv(
    name="new_custom_env",
    description="env for my projects",
    envType=EnvType.SMALL, #pick one of the pre-defined configs
    requirements="jq==1.2.2 yq==3.0.2 plotly==5.8.0" #additional packages to be installed for your custom env
)

Creating a new project

Create a new project under your tenant

# Create project on platform
project = Project.create(
    name='Employee',
    description='Employee_promotion',
    createEmpty=True,
    envId=env.id
)
project.id

This has now created a new project named “Employee” under your tenant. You can check the same on the RapidCanvas UI by logging in here: RapidCanvas UI

Building a flow file for the project

Building a flow file for the project involves the following steps:

  • Upload your dataset:

  • Create a new template or use existing templates provided by RapidCavas for data modification:

  • Create a transform from the template

  • Create a recipe

  • Add a transform or a list of transforms to your recipe

  • Run your recipe

  • Push output of your recipe to a new table

Uploading your dataset

Execute cell below to create new tables and upload your dataset:

#This creates a dataset on RapidCanvas called "employee" and uploads the employee_promotion_case.csv file to it.
employee = project.addDataset(
    dataset_name='employee',
    dataset_description='Employee Promotion Dataset',
    dataset_file_path='data/employee_promotion_case.csv' #path as per your folder structure in Jypyter
)
#This creates a dataset on RapidCanvas called "region" and uploads the Region_States.csv file to it.
region = project.addDataset(
    dataset_name='region',
    dataset_description='Region States Dataset',
    dataset_file_path='data/Region_States.csv' #path as per your folder structure in Jypyter
)

Uploading your dataset - Google Cloud

This step allows you to create a custom data source. In this example we are connecting to Google Cloud Platform to which local data can be uploaded to and downloaded from

# dataSource = DataSource.createDataSource(
#     "gcp-custom-source",
#     DataSourceType.GCP_STORAGE,
#     {
#         GcpConfig.BUCKET: "YOUR BUCKET NAME HERE", #Get in touch with your RapidCanvas POC for your GCP bucket name, service account and access key
#         GcpConfig.ACCESS_KEY: "/Users/../../access_key.json"} #Local path to your access key
# )

# dataSource.id

You can use the following commands to upload data from local to your RapidCanvas bucket in Gcloud

# from utils.notebookhelpers.gcs import GCSHelper
# gcs_helper = GCSHelper.init('/path/to/key.json', '<your-root-dir-name>')
# gcs_helper.list_files()
# gcs_helper.upload_file('/path/to/the/file/to/be/uploaded', '/relative/remote/dir/path') # if /relative/remote/path is not passed, file will be uploaded to root of the directory
# gcs_helper.download_file('/path/to/remote/file', '/path/to/dir/to/be/downloaded') # To download files in Gcloud bucket to local

Upload your dataset to your Google Cloud bucket before executing this step

# region_gcp = project.addDataset(
#     dataset_name="region_gcp",
#     dataset_description="region data from gcp",
#     data_source_id=dataSource.id,
#     data_source_options={GcpConfig.FILE_PATH: "region_states_gcp.csv"} #provide the file path as per your bucket
# )
# you can review a sample of data here
# region_gcp.getData()

Uploading your dataset - S3

#Update for S3
#Update for S3

Template Usage

You can create a new template or use an existing template provided by RapidCavas for data modification. Execute cell below to use an existing Time difference template:

time_diff_template = TemplateV2(
    name="time_diff", description="Calculate the time difference between two dates",project_id=project.id,
    source="CUSTOM", status="ACTIVE", tags=["UI", "Scalar"]
)
time_diff_template_transform = TemplateTransformV2(
    type = "python", params=dict(notebookName="timediff.ipynb"))

time_diff_template.base_transforms = [time_diff_template_transform]
time_diff_template.publish("transforms/timediff.ipynb")

List existing templates

List existing templates from RapidCanvas library

templates = TemplateV2.get_all()
TemplateV2.clean_view(templates)

To further read about RapidCanvas templates refer to this section: Building a template

Create a transform from the template

A transform can be created from a template using the following:

calculate_age_transform = Transform()
calculate_age_transform.templateId = time_diff_template.id
calculate_age_transform.name='age'
calculate_age_transform.variables = {
    'inputDataset': 'employee',
    'start_date': 'birth_date',
    'end_date': 'start_date',
    'how': 'years',
    'outputcolumn': 'age',
    'outputDataset': 'employee_with_age'
}

Create a recipe

To create your recipe execute the following:

calculate_age_recipe = project.addRecipe([employee], name='calculate_age_recipe')

Add a transform to your recipe

You can add a single transform or multiple transforms to your recipe.

calculate_age_recipe.add_transform(calculate_age_transform)

Run your recipe

To run your recipe, execute the following:

calculate_age_recipe.run()

Output dataset and review sample

To generate output dataset and review a sample, execute the following:

employee_with_age=calculate_age_recipe.getChildrenDatasets()['employee_with_age']
employee_with_age.getData(5)

All these changes are auto updated on RapidCanvas UI. To review the flow created in the project on RapidCanvas UI, click on your project name in the Dashboard page: RapidCanvas UI

Template to build a visualisation

Here is another example of using templates in RapidCanvas

Create a new template or use existing templates provided by RapidCavas for data modification

geo_map_template = TemplateV2(
    name="GeolocationMap", description="Plot map based on geolocation",project_id=project.id,
    source="CUSTOM", status="ACTIVE", tags=["UI", "Visualization"]
)
geo_map_template_transform = TemplateTransformV2(
    type = "python", params=dict(notebookName="GeoMap.ipynb"))

geo_map_template.base_transforms = [geo_map_template_transform]
geo_map_template.publish("transforms/GeoMap.ipynb")

Create a transform from the template

geo_map_transform = Transform()
geo_map_transform.templateId = geo_map_template.id
geo_map_transform.name='geomap_employee_location'
geo_map_transform.variables = {
    'GeoDataset': 'employee_with_age',
    'Lat': 'lat',
    'Long': 'long',
    'GeoChartName': 'employee_map_location'
}

Create a recipe

geo_map_recipe=project.addRecipe([employee_with_age], name='geo_map_recipe')

Add a transform or a list of transforms to your recipe

geo_map_recipe.add_transform(geo_map_transform)

# geo_map_recipe.prepareForLocal(geo_map_transform, contextId='new_transform', template_id=geo_map_template.id, nb_name="GeoMap.ipynb")

Run your recipe

geo_map_recipe.run()

You can view the output dashboard on RapidCanvas UI in your project: RapidCanvas UI

Publishing the project as a solution

List of existing solutions

You can look at the list of available RapidCanvas solutions here:

solutions = Solution.get_all()
Solution.clean_view(solutions)

Publishing a new solution

An end to end project can be convered and published as a solution. This allows other users to consume this.

# solutions = Solution.create(name="Sample Employee Solution", sourceProjectId=project.id, description="Sample Solution built on Employee Project", tags=["Sample", "New Users"], isGlobal=False, icon="icon_url")

#Solutions are published locally to your tenant and are accessible by other users in your tenant
#A published solution can be used to create a new project

Your published solution is now accessible as part of the Solutions UI. You can review the solution details here RapidCanvas UI

Update solution documentation

A published solution needs to be documented to inform users about the use case as well as the business impact

Sample solution documentation

You can refer to the documentation of this sample project here: Sample Project Documentation

Users are recommended to follow the documentation structure as listed in the sample project documentation.

You can update the documentation of your solution in GitHub under ` <https://github.com/../../projects/your_projects/project_name/doc/info.rst>`__

Reference Notebooks