Which Data Format To Store Scientific Data Should I Use? A Performance Analysis

Daniel Gecášek, Michal Solanik, Ján Genči

Which Data Format To Store Scientific Data Should I Use? A Performance Analysis

Číslo: 3/2022
Periodikum: Acta Electrotechnica et Informatica
DOI: 10.2478/aei-2022-0015

Klíčová slova: scientific data; data storage; compression; performance

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: A lot of scientific work is dedicated to the analysis of data. Most of the analyzed data, like data from space missions, are structured.The choice of data format can affect various characteristics - read/write speed of standard files, read/write speed of small files andread/write speed of compressed data formats. In this paper, we analyze binary data formats, proposed types of the tests and testingmethods, and compare their performance with human-readable text format. We also discuss compressed and uncompressed modesavailable for data formats like HDF5 and netCDF. When disregarding precision, the best data format from the size perspective is lossyHDF5 without compression. Losless HDF5 without compression show the best speed performance. Lossy HDF5 without compressionis the best balance between size reduction and speed. However, for specific criteria and types of files, there might be better candidatesas detailed in the conclusion.