Analysis ready data set of strategy documents made by municipalities. More info of format in https://universaldependencies.org/format.html
Format
A data frame with rows and columns:
- kunta
Municipality name
- sent
Sentence number per document/municipality
- ID
Word index, integer starting at 1 for each new sentence
- FORM
Word form or punctuation symbol
- LEMMA
Lemma or stem of word form
- UPOSTAG
Universal part-of-speech tag
- XPOSTAG
Language-specific part-of-speech
- FEATS
List of morphological features
- HEAD
Head of the current word, which is either a value of ID or zero (0)
- DEPREL
Universal dependency relation to the HEAD
- DEPS
Enhanced dependency graph in the form of a list of head-deprel pairs
- MISC
Any other annotation
- doc
Document name read from
Examples
strategia
#> # A tibble: 391,985 × 13
#> kunta sent ID FORM LEMMA UPOSTAG XPOSTAG FEATS HEAD DEPREL DEPS MISC
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Akaa 1 1 STRAT… stra… NOUN _ Case… 0 root _ "Spa…
#> 2 Akaa 2 1 Sujuv… suju… ADJ _ Case… 2 amod _ "_"
#> 3 Akaa 2 2 arjen arki NOUN _ Case… 3 nmod:… _ "_"
#> 4 Akaa 2 3 Akaa Akaa PROPN _ Case… 9 nsubj… _ "_"
#> 5 Akaa 2 4 2026 2026 NUM _ NumT… 3 nummod _ "Spa…
#> 6 Akaa 2 5 Sujuv… suju… ADJ _ Case… 6 amod _ "_"
#> 7 Akaa 2 6 arjen arki NOUN _ Case… 7 nmod:… _ "_"
#> 8 Akaa 2 7 Akaas… Akaa PROPN _ Case… 3 nmod _ "_"
#> 9 Akaa 2 8 on olla AUX _ Mood… 9 cop _ "_"
#> 10 Akaa 2 9 helpp… help… ADJ _ Case… 0 root _ "_"
#> # ℹ 391,975 more rows
#> # ℹ 1 more variable: doc <chr>