How do we know when, say, Early Modern period of a given language expires and Late Modern commences? Typically coarse-grained periodizations are based on changes of the grammatical system, whereas fine-grained ones take as an evidence some sociolinguistic or philological arguments. Instead we propose a corpus driven approach. Using text categorisation methods, in a stepwise fashion we divide a diachronic corpus into two, as different as possible, subcorpora (Eder & Górski 2016). This allows us for identification of quantitatively different stages in language development. The underlying assumption is that effective categorisation is possible only if two requirements are satisfied: there is a true difference (be it lexical or grammatical) between older and newer texts and the two subcorpora are homogeneous.
Event detail
- Event start
- 12. 10. 2016 17:30 - 18:30
- Venue
- Faculty of Arts, Jan Palach Square 2, Prague 1 (room 104)
- Website
- https://www.korpus.cz/files/Gorski-poster.pdf
- Organizing Institution
- Department of Czech National Corpus
- Event type
- Lecture