Form of presentation | Articles in international journals and collections |
Year of publication | 2018 |
Язык | английский |
|
Bochkarev Vladimir Vladimirovich, author
Maslennikova Yuliya Sergeevna, author
|
Bibliographic description in the original language |
Pekina A, Maslennikova Y, Bochkarev V., Probability analysis of the vocabulary size dynamics using google books ngram corpus//CEUR Workshop Proceedings. - 2018. - Vol.2268, Is.. - P.202-207. |
Annotation |
The article introduces a method for determining a rate of appearance of new words in a language. The method is based on probabilistic estimates of the vocabulary size of a large text corpus. Backward predicted frequencies of rare words are estimated using linear models that are optimized by the maxi-mum likelihood criteria. This approach provides more accurate estimations of frequencies for the earlier periods; the lower the frequency of the word during the analyzed period, the higher the benefit. A posteriori estimates of the fre-quency probability of appearance of new words were used to clarify the vo-cabulary size for different years and rate of appearance of new words. Accord-ing to the proposed probabilistic model, it was shown that >30% of investigated English and Russian word were appeared in the language before the moment when they were identified in the Google Books Ngram Corpus. |
Keywords |
Word usage frequencies, prediction, Google Books Ngram |
The name of the journal |
CEUR Workshop Proceedings
|
URL |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85058976417&partnerID=40&md5=9a11efbe0d7295409759459f5ff2e650 |
Please use this ID to quote from or refer to the card |
https://repository.kpfu.ru/eng/?p_id=194049&p_lang=2 |
Full metadata record |
Field DC |
Value |
Language |
dc.contributor.author |
Bochkarev Vladimir Vladimirovich |
ru_RU |
dc.contributor.author |
Maslennikova Yuliya Sergeevna |
ru_RU |
dc.date.accessioned |
2018-01-01T00:00:00Z |
ru_RU |
dc.date.available |
2018-01-01T00:00:00Z |
ru_RU |
dc.date.issued |
2018 |
ru_RU |
dc.identifier.citation |
Pekina A, Maslennikova Y, Bochkarev V., Probability analysis of the vocabulary size dynamics using google books ngram corpus//CEUR Workshop Proceedings. - 2018. - Vol.2268, Is.. - P.202-207. |
ru_RU |
dc.identifier.uri |
https://repository.kpfu.ru/eng/?p_id=194049&p_lang=2 |
ru_RU |
dc.description.abstract |
CEUR Workshop Proceedings |
ru_RU |
dc.description.abstract |
The article introduces a method for determining a rate of appearance of new words in a language. The method is based on probabilistic estimates of the vocabulary size of a large text corpus. Backward predicted frequencies of rare words are estimated using linear models that are optimized by the maxi-mum likelihood criteria. This approach provides more accurate estimations of frequencies for the earlier periods; the lower the frequency of the word during the analyzed period, the higher the benefit. A posteriori estimates of the fre-quency probability of appearance of new words were used to clarify the vo-cabulary size for different years and rate of appearance of new words. Accord-ing to the proposed probabilistic model, it was shown that >30% of investigated English and Russian word were appeared in the language before the moment when they were identified in the Google Books Ngram Corpus. |
ru_RU |
dc.language.iso |
ru |
ru_RU |
dc.subject |
Word usage frequencies |
ru_RU |
dc.subject |
prediction |
ru_RU |
dc.subject |
Google Books Ngram |
ru_RU |
dc.title |
Probability analysis of the vocabulary size dynamics using google books ngram corpus |
ru_RU |
dc.type |
Articles in international journals and collections |
ru_RU |
|