Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Recurso digital |
| Sprog: | |
| Udgivet: |
Zenodo
2018
|
| Fag: | |
| Online adgang: | https://doi.org/10.5281/zenodo.2605352 |
| Tags: |
Tilføj Tag
Ingen Tags, Vær først til at tagge denne postø!
|
Indholdsfortegnelse:
- <p>Models for diachronic lexical semantics used by the <a href="http://jeseme.org">Jena Semantic Explorer (JeSemE)</a> web site described in our <a href="http://aclweb.org/anthology/C18-2003">COLING 2018 paper "JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion"</a>.</p> <p>Also described and applied in Johannes Hellrich's Ph.D. thesis "Word Embeddings: Reliability & Semantic Change" who was funded by the Deutsche Forschungsgemeinschaft (DFG) within the graduate school "The Romantic Model" (GRK 2041/1).</p> <p>One ZIP file per corpus, each containing several CSV files:</p> <ul> <li>CHI.csv with χ<sup>2 </sup>word association values (structure: word-id, word-id, time, value)</li> <li>EMBEDDING.csv with SVD-PPMI word embeddings (aligned; structure: word-id, time, values)</li> <li>EMOTION.csv with VAD word emotion values (structure: word-id, time, values)</li> <li>FREQUENCY.csv with relative word frequency values (structure: word-id, time, value)</li> <li>PPMI.csv with PPMI<sup> </sup>word association values (structure: word-id, word-id, time, value)</li> <li>SIMILARITY.csv with word embedding derived word similarity values (structure: word-id, word-id, time, value)</li> <li>WORDIDS.csv mapping words to their corpus specific IDs</li> </ul> <p>Corpora are:</p> <ul> <li> <p>coha: Corpus of Historical American English</p> </li> <li> <p>dta: Deutsches Textarchiv 'German Text Archive'</p> </li> <li> <p>google_fiction: Google Books N-Gram corpus, English fiction subcorpus</p> </li> <li> <p>google_german: Google Books N-Gram corpus, German subcorpus</p> </li> <li> <p>rsc: Royal Society Corpus </p> </li> </ul>