István Pölöskei

Spark-Based Digital Factory Design

Číslo: 2/2022
Periodikum: Acta Electrotechnica et Informatica
DOI: 10.2478/aei-2022-0008

Klíčová slova: Spark,big data,pipeline,cloud.ETL

Pro získání musíte mít účet v Citace PRO.

Anotace: Big data processing often uses the paradigm of parallelism by computing directly on top of the distributed data storage. The existing big data workflows unify the data processing practices to utilize the cloud's native computational potentials to offer advanced machine learning and BI capabilities. Spark is an open-source massively parallel in-memory data processing framework, the current state-of-the-art. The primary approach is to break down the job into granular-level executed tasks, enabling parallelization. In the discussed case study, through IoT – cloud solutions, the plant data can be converted into an analyzable form to let the farther machine learning modules produce added value. To maximize the efficiency of the processing and accumulation, cloud-based components are introduced. Based on the data insights, the appropriate operative actions can be taken. The cost and performance optimization methods were also discussed in the study. Through achieving higher degree of digitalization, the control over the production increased.