# Datasets & Data Families

<figure><img src="https://3247973094-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F2vGu2QFqzSu6XdcGUFmu%2Fuploads%2FpkaRkDm8MqiZ0sU3KPXD%2FMarkovML%20Dataset.png?alt=media&#x26;token=6f76b3a2-b9ee-4585-8569-f1496b6e41b4" alt=""><figcaption><p>MarkovML Datasets may be segmented to distinguish data used to train, test, or validate a model.</p></figcaption></figure>

A **Dataset** is a collection of examples. Each example, in turn, comprises one or more **variables** and possibly a **label** or **target**. When you register your datasets with MarkovML, we analyze your data and help you to understand key characteristics such as distributions, column correlations, empty value frequency, and more.

A single MarkovML Dataset can be **segmented** or **unsegmented**. ML engineers frequently divide datasets into segments to train, test, and/or validate a model. MarkovML allows you to specify different dataset segments and provides insights into how your **train**, **test**, and **validate** segments compare.

<figure><img src="https://3247973094-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F2vGu2QFqzSu6XdcGUFmu%2Fuploads%2FqhiW7T3rhgRJS0UMXM9y%2FScreenshot%202023-05-01%20at%2010.45.52%20AM.png?alt=media&#x26;token=9c08e0f6-31b9-455b-b21f-31987ed56597" alt=""><figcaption><p>Data Family  is a collection of datasets</p></figcaption></figure>

#### Data Family

To help keep your datasets organized, each dataset you register with MarkovML is associated with a *data family*. A **data family** is a set of one or more Datasets that share a similar schema. Organizing your datasets into data families makes it much easier to locate a particular dataset when needed.
