Казанский (Приволжский) федеральный университет, КФУ
КАЗАНСКИЙ
ФЕДЕРАЛЬНЫЙ УНИВЕРСИТЕТ
 
PROBABILITY ANALYSIS OF THE VOCABULARY SIZE DYNAMICS USING GOOGLE BOOKS NGRAM CORPUS
Форма представленияСтатьи в зарубежных журналах и сборниках
Год публикации2018
Языканглийский
  • Бочкарев Владимир Владимирович, автор
  • Масленникова Юлия Сергеевна, автор
  • Библиографическое описание на языке оригинала Pekina A, Maslennikova Y, Bochkarev V., Probability analysis of the vocabulary size dynamics using google books ngram corpus//CEUR Workshop Proceedings. - 2018. - Vol.2268, Is.. - P.202-207.
    Аннотация The article introduces a method for determining a rate of appearance of new words in a language. The method is based on probabilistic estimates of the vocabulary size of a large text corpus. Backward predicted frequencies of rare words are estimated using linear models that are optimized by the maxi-mum likelihood criteria. This approach provides more accurate estimations of frequencies for the earlier periods; the lower the frequency of the word during the analyzed period, the higher the benefit. A posteriori estimates of the fre-quency probability of appearance of new words were used to clarify the vo-cabulary size for different years and rate of appearance of new words. Accord-ing to the proposed probabilistic model, it was shown that >30% of investigated English and Russian word were appeared in the language before the moment when they were identified in the Google Books Ngram Corpus.
    Ключевые слова Word usage frequencies, prediction, Google Books Ngram
    Название журнала CEUR Workshop Proceedings
    URL https://www.scopus.com/inward/record.uri?eid=2-s2.0-85058976417&partnerID=40&md5=9a11efbe0d7295409759459f5ff2e650
    Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на эту карточку https://repository.kpfu.ru/?p_id=194049

    Полная запись метаданных