Форма представленияСтатьи в зарубежных журналах и сборниках
Год публикации2020
  • Иванов Владимир Владимирович, автор
  • Солнышкина Марина Ивановна, автор
  • Соловьев Валерий Дмитриевич, автор
  • Библиографическое описание на языке оригинала Solovyev V., Ivanov V., Solnyshkina M. (2020) Thesaurus-Based Methods for Assessment of Text Complexity in Russian. In: Martínez-Villaseñor L., Herrera-Alcántara O., Ponce H., Castro-Espinoza F.A. (eds) Advances in Computational Intelligence. MICAI 2020. Lecture Notes in Computer Science, vol 12469. Springer, Cham. https://doi.org/10.1007/978-3-030-60887-3_14
    Аннотация The study explores the problem of assessing complexity of Russian educational texts. In this paper, we focus on measuring conceptual complexity which is rarely selected as a research question and propose to use a thesaurus (or a linguistic ontology) to this end. We also compiled an original corpus of school textbooks on Social Studies, History used in high school, and textbooks for elementary school specifically for this set of text complexity experiments. On the first stage of the research, RuThes-Lite thesaurus, a linguistic knowledge base with the total size of 100,000 concepts, was used to elicit concepts in the texts of schoolbooks and represent them as graphs. To the best of our knowledge, we a new method for text complexity assessment using RuThes-Lite graphs and identify graphs-based semantic characteristics of texts that impact complexity. The most significant findings of the research include identification of statistically significant correlations of the selected features, such as node degree, with complexity of educational texts.
    Ключевые слова Text complexity, Thesaurus, Russian language
    Название журнала Lecture Notes in Computer Science Proceedings, Springer. - LNCS 3777.
    URL https://link.springer.com/chapter/10.1007/978-3-030-60887-3_14
