Форма представления | Статьи в зарубежных журналах и сборниках |
Год публикации | 2023 |
Язык | английский |
|
Невзорова Ольга Авенировна, автор
|
|
Гизатуллин Булат Тимурович, автор
|
Библиографическое описание на языке оригинала |
O. A. Nevzorova, and B. T. Gizatullin Analysis of the cluster structure of collections of mathematical papers with different UDC codes // Lobachevskii Journal of Mathematics, 2022, Vol. 43, No. 12, pp. 3597–3604. |
Аннотация |
Lobachevskii Journal of Mathematics |
Ключевые слова |
clustering, universal decimal classification, UDC code, mathematical paper |
Название журнала |
Lobachevskii Journal of Mathematics
|
URL |
https://link.springer.com/article/10.1134/S1995080222150239#citeas |
Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на эту карточку |
https://repository.kpfu.ru/?p_id=282670 |
Полная запись метаданных |
Поле DC |
Значение |
Язык |
dc.contributor.author |
Невзорова Ольга Авенировна |
ru_RU |
dc.contributor.author |
Гизатуллин Булат Тимурович |
ru_RU |
dc.date.accessioned |
2023-01-01T00:00:00Z |
ru_RU |
dc.date.available |
2023-01-01T00:00:00Z |
ru_RU |
dc.date.issued |
2023 |
ru_RU |
dc.identifier.citation |
O. A. Nevzorova, and B. T. Gizatullin Analysis of the cluster structure of collections of mathematical papers with different UDC codes // Lobachevskii Journal of Mathematics, 2022, Vol. 43, No. 12, pp. 3597–3604. |
ru_RU |
dc.identifier.uri |
https://repository.kpfu.ru/?p_id=282670 |
ru_RU |
dc.description.abstract |
Lobachevskii Journal of Mathematics |
ru_RU |
dc.description.abstract |
Clustering is the task of dividing data objects into groups of similar objects. The influence of the specifics of the texts of scientific articles of one subject area for clustering problems has been little studied at present. This article is devoted to the problem of clustering collections of mathematical papers that have the different Universal Decimal Classification (UDC) codes. The study was carried out on the collection of mathematical papers published in the «Izvestiya VUZov. Matematika” journal for 10 years. The size of this collection is about 1000 original papers with different UDC codes.
The collection contains subcollections of papers that have the same UDC code. The objective of our research is to analyze the cluster structure of repre-sentative subcollections of papers with the same UDC code, which will allow us to evaluate various parameters of the constructed clusters in the future.
We have performed the standard pre-processing (tokenization, lemmatiza-tion) of Russian math texts. The following text vectorization methods were in-vestigated: tf-idf, doc2vec trained on the original data, and a pretrained word2vec model. All vectors were normalized using the L2 norm.
To identify optimal hyperparameters and check the quality of clustering, we have used internal efficiency measures such as Silhouette coefficient, Calinski-Harabaz index. We also used the elbow method for hyperparameters tuning.The following clustering algorithms have been investigated: k-means, agglomerative clustering, affinity propagation, DBSCAN, spectral clustering. The optimal hyperparameters were selected for each method, and then the results of clustering were compared. As a result we have selected the optimal methods of clustering math papers.
|
ru_RU |
dc.language.iso |
ru |
ru_RU |
dc.subject |
clustering |
ru_RU |
dc.subject |
universal decimal classification |
ru_RU |
dc.subject |
UDC code |
ru_RU |
dc.subject |
mathematical paper |
ru_RU |
dc.title |
Analysis of the cluster structure of collections of mathematical papers with different UDC codes |
ru_RU |
dc.type |
Статьи в зарубежных журналах и сборниках |
ru_RU |
|