utils.rcclient.entities.dataset

Module Contents

Classes

DatasetMeta

Dataset

Attributes

logger

utils.rcclient.entities.dataset.logger
class utils.rcclient.entities.dataset.DatasetMeta
entity_type: str = 'BASE'
entity_ontology: str = 'NONE'
entity_view_type: str
encoding: str
separator: str
header: str
data_type_map: str
ontology_map: str
classmethod from_(res_dto)
class utils.rcclient.entities.dataset.Dataset
id: str
name: str
project_id: str
display_name: str
description: str
file_path: str
status: utils.rcclient.enums.DatasetStatus
data_source_id: str
data_source_options: str
datasetMeta: DatasetMeta
entityMeta: DatasetMeta
dataset_meta: DatasetMeta
metadata: str
markdown: str
icon: str
image: str
_dataset_service: ClassVar[utils.rcclient.services.dataset_service.DatasetService]
__post_init__()
classmethod from_(res_dto)
from_res(res_dto)
classmethod create(name, project_id, description, display_name, file_path, data_source_id, data_source_options, metadata, dataset_meta, force_upload, markdown, icon, image)
add_markdown(markdown: str)
ensure_schema_download(save_to)
getData(num_rows=10, scenario_id=None, project_run_entry_id=None) pandas.DataFrame

Get sample data of the dataset

Parameters:
  • num_rows (int, optional) – Number of rows to show, maximum can be 100.

  • data (Defaults to 10. pass -1 to get full)

  • scenario_id (string, optional)

  • project_run_entry_id (string, optional)

Returns:

pd.DataFrame

get_full_data(scenario_id=None, project_run_entry_id=None) pandas.DataFrame

Get full data of the dataset

Parameters:
  • scenario_id (string, optional)

  • project_run_entry_id (string, optional)

Returns:

pd.DataFrame

getCols(scenario_id=None, project_run_entry_id=None) List[str]

Get column names of the dataset

Returns:

List of all the column names

Return type:

List[str]

classmethod __deserialize_res(res_json, response_class)
refresh()
get_stats_status()
static getDataset(datasetId: str) Dataset

Get dataset object using id of the dataset

Parameters:

datasetId (str) – id of the dataset

Return type:

Dataset

static deleteDataset(datasetId: str)

Delete the dataset using id of the dataset

Parameters:

datasetId (str) – id of the dataset

download_dataset(folder_path: str, project_run_entry_id: str = None, scenario_id: str = None, file_type: utils.rcclient.enums.DatasetFileType = FileType.CSV, separator: str = ',')

Downloads the dataset in local file system :param folder_path: folder in which the file needs to be downloaded :param project_run_entry_id: job run id :param scenario_id: scenario id :param file_type: type of the file :param separator: separator, to be used only in csv case

Returns:

update_sync_options(sync_data_source_id: str, sync_data_source_options: dict)
sync()
download_dataset_schema(folder_path: str, file_name: str = None)
get_dataset_schema()
get_dataset_ontologies_data_types()
saveParquet(filePath: str, limit: int = 1000000)
saveCSV(filePath: str, limit=10000)
get_all_segments(segment_name: str = None) List[utils.rcclient.entities.segment.Segment]

Get all the created segments of the dataset

Parameters:

segment_name (str, optional) – name of the segment, if given, it fetches only that segment. Defaults to None.

Returns:

List of the segments

Return type:

List[seg_entity.Segment]

add_segment(name: str, description: str, condition: utils.rcclient.entities.segment.ItemExpression | utils.rcclient.entities.segment.GroupExpression | utils.rcclient.entities.segment.GlobalRefExpression | utils.rcclient.entities.segment.DataLabelExpression | utils.rcclient.entities.segment.BooleanExpression, row_limit: int = None) utils.rcclient.entities.segment.Segment

Add a segment to the dataset

Parameters:
  • name (str) – name of the segment

  • description (str) – description of the segment

  • condition (Union[ seg_entity.ItemExpression, seg_entity.GroupExpression, seg_entity.GlobalRefExpression, seg_entity.DataLabelExpression, seg_entity.BooleanExpression ])

  • row_limit (int, optional) – Defaults to None.

Returns:

created segment

Return type:

seg_entity.Segment

update_dataset(dataset_meta=None, metadata=None)