Форма представления | Статьи в зарубежных журналах и сборниках |
Год публикации | 2018 |
Язык | английский |
|
Бочкарев Владимир Владимирович, автор
Масленникова Юлия Сергеевна, автор
|
Библиографическое описание на языке оригинала |
Pekina A, Maslennikova Y, Bochkarev V., Probability analysis of the vocabulary size dynamics using google books ngram corpus//CEUR Workshop Proceedings. - 2018. - Vol.2268, Is.. - P.202-207. |
Аннотация |
The article introduces a method for determining a rate of appearance of new words in a language. The method is based on probabilistic estimates of the vocabulary size of a large text corpus. Backward predicted frequencies of rare words are estimated using linear models that are optimized by the maxi-mum likelihood criteria. This approach provides more accurate estimations of frequencies for the earlier periods; the lower the frequency of the word during the analyzed period, the higher the benefit. A posteriori estimates of the fre-quency probability of appearance of new words were used to clarify the vo-cabulary size for different years and rate of appearance of new words. Accord-ing to the proposed probabilistic model, it was shown that >30% of investigated English and Russian word were appeared in the language before the moment when they were identified in the Google Books Ngram Corpus. |
Ключевые слова |
Word usage frequencies, prediction, Google Books Ngram |
Название журнала |
CEUR Workshop Proceedings
|
URL |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85058976417&partnerID=40&md5=9a11efbe0d7295409759459f5ff2e650 |
Пожалуйста, используйте этот идентификатор, чтобы цитировать или ссылаться на эту карточку |
https://repository.kpfu.ru/?p_id=194049 |
Полная запись метаданных |
Поле DC |
Значение |
Язык |
dc.contributor.author |
Бочкарев Владимир Владимирович |
ru_RU |
dc.contributor.author |
Масленникова Юлия Сергеевна |
ru_RU |
dc.date.accessioned |
2018-01-01T00:00:00Z |
ru_RU |
dc.date.available |
2018-01-01T00:00:00Z |
ru_RU |
dc.date.issued |
2018 |
ru_RU |
dc.identifier.citation |
Pekina A, Maslennikova Y, Bochkarev V., Probability analysis of the vocabulary size dynamics using google books ngram corpus//CEUR Workshop Proceedings. - 2018. - Vol.2268, Is.. - P.202-207. |
ru_RU |
dc.identifier.uri |
https://repository.kpfu.ru/?p_id=194049 |
ru_RU |
dc.description.abstract |
CEUR Workshop Proceedings |
ru_RU |
dc.description.abstract |
The article introduces a method for determining a rate of appearance of new words in a language. The method is based on probabilistic estimates of the vocabulary size of a large text corpus. Backward predicted frequencies of rare words are estimated using linear models that are optimized by the maxi-mum likelihood criteria. This approach provides more accurate estimations of frequencies for the earlier periods; the lower the frequency of the word during the analyzed period, the higher the benefit. A posteriori estimates of the fre-quency probability of appearance of new words were used to clarify the vo-cabulary size for different years and rate of appearance of new words. Accord-ing to the proposed probabilistic model, it was shown that >30% of investigated English and Russian word were appeared in the language before the moment when they were identified in the Google Books Ngram Corpus. |
ru_RU |
dc.language.iso |
ru |
ru_RU |
dc.subject |
Word usage frequencies |
ru_RU |
dc.subject |
prediction |
ru_RU |
dc.subject |
Google Books Ngram |
ru_RU |
dc.title |
Probability analysis of the vocabulary size dynamics using google books ngram corpus |
ru_RU |
dc.type |
Статьи в зарубежных журналах и сборниках |
ru_RU |
|