ПРИМЕНЕНИЕ МЕТОДА K-СРЕДНИХ В ЗАДАЧЕ ОЦЕНКИ ХАРАКТЕРИСТИК ПРОЦЕССА ДЛЯ ВЕБ-ПРИЛОЖЕНИЙ

Standard

ПРИМЕНЕНИЕ МЕТОДА K-СРЕДНИХ В ЗАДАЧЕ ОЦЕНКИ ХАРАКТЕРИСТИК ПРОЦЕССА ДЛЯ ВЕБ-ПРИЛОЖЕНИЙ. / Евстратов, Виктор Владимирович ; Ананьевский, Михаил Сергеевич.

In: НАУЧНО-ТЕХНИЧЕСКИЙ ВЕСТНИК ИНФОРМАЦИОННЫХ ТЕХНОЛОГИЙ, МЕХАНИКИ И ОПТИКИ, Vol. 20, No. 5, 2020, p. 755-760.

Research output: Contribution to journal › Article › peer-review

BibTeX

@article{db99da3268a54cc5ad7b95f1f9c79a2d,

title = "ПРИМЕНЕНИЕ МЕТОДА K-СРЕДНИХ В ЗАДАЧЕ ОЦЕНКИ ХАРАКТЕРИСТИК ПРОЦЕССА ДЛЯ ВЕБ-ПРИЛОЖЕНИЙ",

abstract = "Subject of Research. The paper presents the study of estimation problem of process characteristics for the particular case of user{\textquoteright}s activity prediction in computer online games. Various machine learning methods are considered, and the advantages of clustering-based approaches are identified. The variety of metrics for the estimation of clustering quality is studied. Method. A clustering-based approach to estimation of process characteristics was developed on the base of a hypothesis proposed during the preliminary analysis of user{\textquoteright}s activity data. Data on activity of users with the known predicted values was collected. Each user was represented as a pair of vectors: the first vector corresponded to his first days of activity, and the second one corresponded to the days with predicted performance. The vectors representing user{\textquoteright}s activity in the first days were used as training data for the K-means algorithm. A developed entropy-like loss function was used to find a value of K suitable for the problem under consideration. The clusters were matched with vectors of predicted process characteristics averaged over all users in the cluster. These matches were used as the prediction of new users{\textquoteright} characteristics. Main Results. An approach to the determination of the suitable number of clusters is proposed, taking into account the specifics of the considered data. Numerical experiment is carried out, demonstrating the applicability of the developed method. Practical Relevance. The proposed approach application allows for the simultaneous prediction of multiple characteristics of online-game users, and, therefore, for solution of various planning and analytics problems during online-game development. For example, the method developed in the present work was used to analyze the development payback of new game elements, and to predict server load in order to increase available computational resources beforehand. The advantages of the developed method include no need for expert tagging of the training set and relatively low computational cost due to the low computational complexity of the proposed loss function used to estimate the hyperparameter K.",

keywords = "Algorithms, Clustering, Clustering quality assessment, Entropy, K-means, K-means algorithm, Machine learning, Web",

author = "Евстратов, {Виктор Владимирович} and Ананьевский, {Михаил Сергеевич}",

note = "Funding Information: Abstract Subject of Research. The paper presents the study of estimation problem of process characteristics for the particular case of user{\textquoteright}s activity prediction in computer online games. Various machine learning methods are considered, and the advantages of clustering-based approaches are identified. The variety of metrics for the estimation of clustering quality is studied. Method. A clustering-based approach to estimation of process characteristics was developed on the base of a hypothesis proposed during the preliminary analysis of user{\textquoteright}s activity data. Data on activity of users with the known predicted values was collected. Each user was represented as a pair of vectors: the first vector corresponded to his first days of activity, and the second one corresponded to the days with predicted performance. The vectors representing user{\textquoteright}s activity in the first days were used as training data for the K-means algorithm. A developed entropy-like loss function was used to find a value of K suitable for the problem under consideration. The clusters were matched with vectors of predicted process characteristics averaged over all users in the cluster. These matches were used as the prediction of new users{\textquoteright} characteristics. Main Results. An approach to the determination of the suitable number of clusters is proposed, taking into account the specifics of the considered data. Numerical experiment is carried out, demonstrating the applicability of the developed method. Practical Relevance. The proposed approach application allows for the simultaneous prediction of multiple characteristics of online-game users, and, therefore, for solution of various planning and analytics problems during online-game development. For example, the method developed in the present work was used to analyze the development payback of new game elements, and to predict server load in order to increase available computational resources beforehand. The advantages of the developed method include no need for expert tagging of the training set and relatively low computational cost due to the low computational complexity of the proposed loss function used to estimate the hyperparameter K. Keywords clustering, K-means, K-means algorithm, clustering quality assessment, entropy, machine learning, algorithms, web Acknowledgements This study has been supported by the Russian Foundation for Basic Research, grant no. 19-08-00865 А. Publisher Copyright: {\textcopyright} 2020, ITMO University. All rights reserved.",

year = "2020",

doi = "10.17586/2226-1494-2020-20-5-755-760",

language = "русский",

volume = "20",

pages = "755--760",

journal = "Scientific and Technical Journal of Information Technologies, Mechanics and Optics",

issn = "2226-1494",

publisher = "НИУ ИТМО",

number = "5",

}

RIS

TY - JOUR

T1 - ПРИМЕНЕНИЕ МЕТОДА K-СРЕДНИХ В ЗАДАЧЕ ОЦЕНКИ ХАРАКТЕРИСТИК ПРОЦЕССА ДЛЯ ВЕБ-ПРИЛОЖЕНИЙ

AU - Евстратов, Виктор Владимирович

AU - Ананьевский, Михаил Сергеевич

N1 - Funding Information: Abstract Subject of Research. The paper presents the study of estimation problem of process characteristics for the particular case of user’s activity prediction in computer online games. Various machine learning methods are considered, and the advantages of clustering-based approaches are identified. The variety of metrics for the estimation of clustering quality is studied. Method. A clustering-based approach to estimation of process characteristics was developed on the base of a hypothesis proposed during the preliminary analysis of user’s activity data. Data on activity of users with the known predicted values was collected. Each user was represented as a pair of vectors: the first vector corresponded to his first days of activity, and the second one corresponded to the days with predicted performance. The vectors representing user’s activity in the first days were used as training data for the K-means algorithm. A developed entropy-like loss function was used to find a value of K suitable for the problem under consideration. The clusters were matched with vectors of predicted process characteristics averaged over all users in the cluster. These matches were used as the prediction of new users’ characteristics. Main Results. An approach to the determination of the suitable number of clusters is proposed, taking into account the specifics of the considered data. Numerical experiment is carried out, demonstrating the applicability of the developed method. Practical Relevance. The proposed approach application allows for the simultaneous prediction of multiple characteristics of online-game users, and, therefore, for solution of various planning and analytics problems during online-game development. For example, the method developed in the present work was used to analyze the development payback of new game elements, and to predict server load in order to increase available computational resources beforehand. The advantages of the developed method include no need for expert tagging of the training set and relatively low computational cost due to the low computational complexity of the proposed loss function used to estimate the hyperparameter K. Keywords clustering, K-means, K-means algorithm, clustering quality assessment, entropy, machine learning, algorithms, web Acknowledgements This study has been supported by the Russian Foundation for Basic Research, grant no. 19-08-00865 А. Publisher Copyright: © 2020, ITMO University. All rights reserved.

PY - 2020

Y1 - 2020

N2 - Subject of Research. The paper presents the study of estimation problem of process characteristics for the particular case of user’s activity prediction in computer online games. Various machine learning methods are considered, and the advantages of clustering-based approaches are identified. The variety of metrics for the estimation of clustering quality is studied. Method. A clustering-based approach to estimation of process characteristics was developed on the base of a hypothesis proposed during the preliminary analysis of user’s activity data. Data on activity of users with the known predicted values was collected. Each user was represented as a pair of vectors: the first vector corresponded to his first days of activity, and the second one corresponded to the days with predicted performance. The vectors representing user’s activity in the first days were used as training data for the K-means algorithm. A developed entropy-like loss function was used to find a value of K suitable for the problem under consideration. The clusters were matched with vectors of predicted process characteristics averaged over all users in the cluster. These matches were used as the prediction of new users’ characteristics. Main Results. An approach to the determination of the suitable number of clusters is proposed, taking into account the specifics of the considered data. Numerical experiment is carried out, demonstrating the applicability of the developed method. Practical Relevance. The proposed approach application allows for the simultaneous prediction of multiple characteristics of online-game users, and, therefore, for solution of various planning and analytics problems during online-game development. For example, the method developed in the present work was used to analyze the development payback of new game elements, and to predict server load in order to increase available computational resources beforehand. The advantages of the developed method include no need for expert tagging of the training set and relatively low computational cost due to the low computational complexity of the proposed loss function used to estimate the hyperparameter K.

AB - Subject of Research. The paper presents the study of estimation problem of process characteristics for the particular case of user’s activity prediction in computer online games. Various machine learning methods are considered, and the advantages of clustering-based approaches are identified. The variety of metrics for the estimation of clustering quality is studied. Method. A clustering-based approach to estimation of process characteristics was developed on the base of a hypothesis proposed during the preliminary analysis of user’s activity data. Data on activity of users with the known predicted values was collected. Each user was represented as a pair of vectors: the first vector corresponded to his first days of activity, and the second one corresponded to the days with predicted performance. The vectors representing user’s activity in the first days were used as training data for the K-means algorithm. A developed entropy-like loss function was used to find a value of K suitable for the problem under consideration. The clusters were matched with vectors of predicted process characteristics averaged over all users in the cluster. These matches were used as the prediction of new users’ characteristics. Main Results. An approach to the determination of the suitable number of clusters is proposed, taking into account the specifics of the considered data. Numerical experiment is carried out, demonstrating the applicability of the developed method. Practical Relevance. The proposed approach application allows for the simultaneous prediction of multiple characteristics of online-game users, and, therefore, for solution of various planning and analytics problems during online-game development. For example, the method developed in the present work was used to analyze the development payback of new game elements, and to predict server load in order to increase available computational resources beforehand. The advantages of the developed method include no need for expert tagging of the training set and relatively low computational cost due to the low computational complexity of the proposed loss function used to estimate the hyperparameter K.

KW - Algorithms

KW - Clustering

KW - Clustering quality assessment

KW - Entropy

KW - K-means

KW - K-means algorithm

KW - Machine learning

KW - Web

UR - http://www.scopus.com/inward/record.url?scp=85097533237&partnerID=8YFLogxK

UR - https://www.mendeley.com/catalogue/57a5952d-8e29-3429-a832-c83b4ceb60a1/

U2 - 10.17586/2226-1494-2020-20-5-755-760

DO - 10.17586/2226-1494-2020-20-5-755-760

M3 - статья

VL - 20

SP - 755

EP - 760

JO - Scientific and Technical Journal of Information Technologies, Mechanics and Optics

JF - Scientific and Technical Journal of Information Technologies, Mechanics and Optics

SN - 2226-1494

IS - 5

ER -

ID: 71554629