Data Quality

Label Trust Estimate

Label Trust Estimate measures the potentially mislabeled records in the dataset. You can download the dataset with the estimates using the MarkovML SDK.

circle-info

The pre-requisite is having a registered dataset with MarkovML with an overall label quality estimate on the top right. You can find the steps to register the dataset with MarkovML here. Currently, only text datasets are supported

Code

import markov

dataset = markov.dataset.get_by_name(dataset_name="Sentiment Analysis Tweets")

# Access the data quality information
data_quality = dataset.quality

# Access the data quality metrics as a DataFrame
data_quality.df

# Retrieve a direct download link for data  quality data frame
data_quality.url

Sample Result

The data frame following columns

  • is_label_issue: A boolean indicating whether there are issues with the labels in the dataset.

  • label_quality: A numerical score representing the quality of the labels in the dataset.

  • Other columns are the original labels (target) and feature the user selected during dataset registration.

Last updated