Abstract (english) | In the area of the Gola field, the Upper Miocene deposits of the western part of the Drava depression represent a sequence of sediments consisting of sandstone, siltstone, marl and their transitional lithofacies. The sandstone deposits, i.e., the main reservoir rocks, mostly occur in lenticular forms (KRPAN et al., 2018), which makes it difficult to spatially correlate. These deposits were formed in a brackish lake environment in the zone of the lake littoral and part of the sublittoral and represent sediments of channel fills and underwater fans deposited from turbidites (TADEJ, 2011). In order to improve the interpretation of well logging data in the research area, different machine learning models were tested. In this study, regression analysis was used for acoustic log and porosity predictions. The regression belongs to supervised learning models, which means that the learning is based on already known values of the required variables. Algorithmic processing of a large amount of data is only possible if the matrix of input data is not singular, i.e. its determinant is not zero. The problem with data preparation is the fact that most logging data sets are in the form of a singular matrix, that is, most wells are missing a certain number of values at certain depth intervals (MCDONALD, 2021). For this reason, the missing values were added to existing well depth intervals after predictive modelling by machine learning. Various statistical methods used to analyse data, including correlation and regression, assume that the data have normal distribution. Most algorithms are also based on the assumption that the variables are around zero to one and that they are comparable to each other. For this reason, prior to testing the algorithms, the data were standardized, normalized, or transformed depending on the algorithms used (SEMENIKHIN & BELOZEROV, 2019). Based on the correlation matrix, 14 variables (logs) were determined for the calculation of the forecast model. A total of six regression models were used to predict the acoustic logs and porosity values. The base model was calculated using the Support Vector Regression (SVR) algorithm (Fig. 1), as the goal was to determine the boundaries of acceptable predictions. The measure of accuracy for regression models, amongst others, is the coefficient of determination R2, which indicates how well the predicted and measured values correlate. For the base model, at the shallower depths the predicted values follow the trend of the measured ones but at the greater depths the predictions fall out of the trend, which makes the results unreliable. The most accurate model was generated using the K-Nearest Neighbour (KNN) algorithm, where the coefficient of determination was over 0.9 (Fig. 1). The greatest advantages of predicting the values using trained models are the ability to fill in missing logging data and the short time required to get the initial image of the subsurface. Based on the predicted values, the further processing and interpretation methods are determined more quickly. |