utils.rcclient.entities.dataset
Module Contents
Classes
Attributes
- utils.rcclient.entities.dataset.logger
- class utils.rcclient.entities.dataset.DatasetMeta
- entity_type: str = 'BASE'
- entity_ontology: str = 'NONE'
- entity_view_type: str
- encoding: str
- separator: str
- header: str
- data_type_map: str
- ontology_map: str
- classmethod from_(res_dto)
- class utils.rcclient.entities.dataset.Dataset
- id: str
- name: str
- project_id: str
- display_name: str
- description: str
- file_path: str
- data_source_id: str
- data_source_options: str
- datasetMeta: DatasetMeta
- entityMeta: DatasetMeta
- dataset_meta: DatasetMeta
- metadata: str
- markdown: str
- icon: str
- image: str
- _dataset_service: ClassVar[utils.rcclient.services.dataset_service.DatasetService]
- __post_init__()
- classmethod from_(res_dto)
- from_res(res_dto)
- classmethod create(name, project_id, description, display_name, file_path, data_source_id, data_source_options, metadata, dataset_meta, force_upload, markdown, icon, image)
- askai(query: str, chart_name: str = None)
- add_markdown(markdown: str)
- ensure_schema_download(save_to)
- getData(num_rows=10, scenario_id=None, project_run_entry_id=None) pandas.DataFrame
Get sample data of the dataset
- Parameters:
num_rows (int, optional) – Number of rows to show, maximum can be 100.
data (Defaults to 10. pass -1 to get full)
scenario_id (string, optional)
project_run_entry_id (string, optional)
- Returns:
pd.DataFrame
- get_full_data(scenario_id=None, project_run_entry_id=None) pandas.DataFrame
Get full data of the dataset
- Parameters:
scenario_id (string, optional)
project_run_entry_id (string, optional)
- Returns:
pd.DataFrame
- getCols(scenario_id=None, project_run_entry_id=None) List[str]
Get column names of the dataset
- Returns:
List of all the column names
- Return type:
List[str]
- classmethod __deserialize_res(res_json, response_class)
- refresh()
- get_stats_status()
- static getDataset(datasetId: str) Dataset
Get dataset object using id of the dataset
- Parameters:
datasetId (str) – id of the dataset
- Return type:
- static deleteDataset(datasetId: str)
Delete the dataset using id of the dataset
- Parameters:
datasetId (str) – id of the dataset
- download_dataset(folder_path: str, project_run_entry_id: str = None, scenario_id: str = None, file_type: utils.rcclient.enums.DatasetFileType = FileType.CSV, separator: str = ',')
Downloads the dataset in local file system :param folder_path: folder in which the file needs to be downloaded :param project_run_entry_id: job run id :param scenario_id: scenario id :param file_type: type of the file :param separator: separator, to be used only in csv case
Returns:
- update_sync_options(sync_data_source_id: str, sync_data_source_options: dict)
- sync()
- download_dataset_schema(folder_path: str, file_name: str = None)
- get_dataset_schema()
- get_dataset_ontologies_data_types()
- saveParquet(filePath: str, limit: int = 1000000)
- saveCSV(filePath: str, limit=10000)
- get_all_segments(segment_name: str = None) List[utils.rcclient.entities.segment.Segment]
Get all the created segments of the dataset
- Parameters:
segment_name (str, optional) – name of the segment, if given, it fetches only that segment. Defaults to None.
- Returns:
List of the segments
- Return type:
List[seg_entity.Segment]
- add_segment(name: str, description: str, condition: utils.rcclient.entities.segment.ItemExpression | utils.rcclient.entities.segment.GroupExpression | utils.rcclient.entities.segment.GlobalRefExpression | utils.rcclient.entities.segment.DataLabelExpression | utils.rcclient.entities.segment.BooleanExpression, row_limit: int = None) utils.rcclient.entities.segment.Segment
Add a segment to the dataset
- Parameters:
name (str) – name of the segment
description (str) – description of the segment
condition (Union[ seg_entity.ItemExpression, seg_entity.GroupExpression, seg_entity.GlobalRefExpression, seg_entity.DataLabelExpression, seg_entity.BooleanExpression ])
row_limit (int, optional) – Defaults to None.
- Returns:
created segment
- Return type:
seg_entity.Segment
- update_dataset(dataset_meta=None, metadata=None)