Designing a Data Warehouse for Collected Data About User Activity in Social Networks Using Elasticsearch

Iryna Mysiuk

Designing a Data Warehouse for Collected Data About User Activity in Social Networks Using Elasticsearch

Číslo: 7/2023
Periodikum: Path of Science
DOI: 10.22178/pos.94-13

Klíčová slova: social networks; data warehouse; data analytics; big data processing; system design

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: In this paper, a data storage data warehouse is designed to store collected data from social networks. Creating indexes with data and selecting a configuration with the appropriate number of shards and replicas is described – the primary states of the cluster and possibilities of its scaling. The features of working with the non-relational Elasticsearch database are described when working with data on user activity in social network posts. Among social networks, Facebook and Instagram were chosen for analysis. The paper describes the advantages and disadvantages of using such a data store compared to Apache Kafka.

Analysed existing data insertion Application Program Interfaces (APIs) and data visualisation tools integrated with Elasticsearch. The study describes the use of the Bulk API to insert many records at once into a database. The designed data warehouse uses Kibana, a data visualisation and analytics tool integrated with the selected database. Also, it is shown the ability to insert and view logs using Elasticsearch, Logstash, and Kibana (ELK stack). Tested data ingest by logging into the database using Beats. The obtained results can help implement a system for analysing user activities from social network data based on Elasticsearch as a central component.