Compare Datasets
Compare datasets using the MarkovML Python SDK.
MarkovML allows the comparison of a primary dataset against multiple datasets. The comparision consists of the following:
Basic info includes the number of records in each dataset, data family names, dataset source, etc.
Class distribution for the datasets
Dataset segment similarity based on statistical measures. The following measures are computed
Steps
Follow Register a Dataset to register datasets before initiating dataset comparison. You will need dataset IDs (ds_id) to trigger a comparison. Only registered datasets can be compared.
Trigger a comparison run between a primary dataset and multiple secondary datasets.
It takes some time to finish the comparision. Once the comparision is complete, you can see the comparision results on the Runs Page
. You'll get a notification in the email when the comparision is complete.
View dataset comparison jobs on the Runs page
Last updated
Was this helpful?