MS Data Alliance

MSDA Core Dataset

One of the challenges in promoting the use of real-world data in MS, is the complexity of heterogeneity within and in between MS real-world data sources. Every data source is collecting their data within their data acquisition systems for the set of variables that they or the respective initiative have defined.

So, there is a complexity of heterogeneity within and in between MS real-world data sources when it comes to the content and especially the semantic and syntactic representation. This heterogeneity complicates the collaboration between data sources, especially on a large scale like global joint analyses. Therefore, there is a particular interest in and need for alignment in real-world data collection in MS.

Recently, the MSDA together with a dedicated task force agreed upon a globally used dataset recommendation in MS that covers the core data elements in MS: the MSDA Core Dataset (CDS). The MSDA CDS aims to be as essential as possible to represent the population and clinical care of people with MS best. The dataset has to cover the key data points for care and research while still maintaining the feasibility of data collection to achieve a high data coverage. It is not possible to serve every desired research question or purpose with the core dataset with a perfect, have-it-all core dataset for MS. The extent of such a dataset would grow infinitely and hence the decision was made to develop a core dataset in versions so that updates are possible and no ultimate, set-in-stone dataset was defined for the MSDA Core Dataset v2022.

The CDS represents the common denominator of existing initiatives as a list of recurring variables across these initiatives. This list is research question agnostic (= basic level of data collection: core dataset). Additional layers, e.g. for data that is needed for other research topics or regulators, can be built on top of that core dataset level.

The CDS also serves as a guideline for existing and emerging registries and real-world MS cohorts to reduce heterogeneity as early as possible in the data collection process. It is not a minimal dataset that claims the defined variables to be fully installed in the described manner to serve as the "bare minimum" of (good) data collection of real-world data in MS. It is rather a landscaping of existing datasets, experiences and guidelines, tied together with a concrete recommendation of the core content in MS care, structure and format in the form of a data dictionary.