Monitoring of apartment prices in the Czech Republic through parsing a web advertising server

Alena POZDÍLKOVÁ, Jaroslav MAREK, Marie NEDVĚDOVÁ

Monitoring of apartment prices in the Czech Republic through parsing a web advertising server

Číslo: 1/2020
Periodikum: Acta Electrotechnica et Informatica
DOI: 10.15546/aeei-2020-0002

Klíčová slova: web page parsing, real estate market, time series, apartment prices, floor area, purchased price, cluster analysis

Pro získání musíte mít účet v Citace PRO.

Přečíst po přihlášení

Anotace: Time series of apartment prices in the Czech Republic are available only in the partial statistics of the Statistical Office. Apartment prices are presented mainly in the articles and comments from the real estate agents. Data unavailability leads to a small number of statistically oriented publications on the real estate market. The main aim of our paper is thus to introduce a software solution for parsing real estate websites. Of course, we are only able to retrieve data on demanded prices from advertisements, actual prices are not achieved. By automatic polling, we are able to get data on the floor area of advertised apartments and the asked purchase price. A Python script was written to retrieve data from sreality.cz. The MongoDB database is used to store ads. New ads are saved directly to the database. Then, daily average apartment price of 1 square meter for each municipality are calculated. The filtered data can then be displayed or exported to a file via the web interface. In the statistical analyses, we present graphs showing the development of apartment prices and the number of advertisements in various municipalities of the Czech Republic in the period of 09/2018 – 12/2019. Next, we address the issue of clustering of municipalities with regard to the similarity of relative price changes.