Three Metric-Based Method for Data Compatibility Calculation

Daniel Vodňanský

Three Metric-Based Method for Data Compatibility Calculation

Číslo: 1/2021
Periodikum: Acta Informatica Pragensia
DOI: 10.18267/j.aip.145

Klíčová slova: Data metrics, Amount of information, Metadata, Relational database, XML, JSON, RDF, Ontology, Transformation, Structuredness, Hierarchicallity, Normalization, Visualization

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: This article analyzes ways of calculating characteristics of data and most common data structure types that allow comparison between them or on a time axis. To achieve this, it studies the key aspects of relational databases, XML, JSON and RDF structure types. These data structure types are compared to multiple isolated approaches to data quality and other data characteristics measurements. The goals of the article are the calculation method itself and a storage structure for calculated values. The article presents a method of characterization of data and data structure types based on the calculation of three metrics: the amount of structuredness, the amount of hierarchicallity and the amount of information. This triad of metrics allows comparison between various data sets (objects), for example evaluating the complexity of the transformation of data from one data object to another, as well as with data structure types (as mentioned above). Based on the vector of three metrics, the calculation method of the compatibility between data and data structure type is proposed. This method can help select the most compatible data format for existing data. The calculated values of metrics can also detect non-optimal storage design and classify data transformations. The method was evaluated on an example case study, which showed its usability on an example demonstration data set. It can be used in the process of data modelling to help select optimal data structure type, to design a data transformation process and to optimize existing data storages.