Skip to contents

Analysis ready data set of strategy documents made by municipalities. More info of format in https://universaldependencies.org/format.html

Usage

strategia

Format

A data frame with rows and columns:

kunta

Municipality name

sent

Sentence number per document/municipality

ID

Word index, integer starting at 1 for each new sentence

FORM

Word form or punctuation symbol

LEMMA

Lemma or stem of word form

UPOSTAG

Universal part-of-speech tag

XPOSTAG

Language-specific part-of-speech

FEATS

List of morphological features

HEAD

Head of the current word, which is either a value of ID or zero (0)

DEPREL

Universal dependency relation to the HEAD

DEPS

Enhanced dependency graph in the form of a list of head-deprel pairs

MISC

Any other annotation

doc

Document name read from

Source

Finnish municipalities

Examples

strategia
#> # A tibble: 391,985 × 13
#>    kunta  sent ID    FORM   LEMMA UPOSTAG XPOSTAG FEATS HEAD  DEPREL DEPS  MISC 
#>    <chr> <int> <chr> <chr>  <chr> <chr>   <chr>   <chr> <chr> <chr>  <chr> <chr>
#>  1 Akaa      1 1     STRAT… stra… NOUN    _       Case… 0     root   _     "Spa…
#>  2 Akaa      2 1     Sujuv… suju… ADJ     _       Case… 2     amod   _     "_"  
#>  3 Akaa      2 2     arjen  arki  NOUN    _       Case… 3     nmod:… _     "_"  
#>  4 Akaa      2 3     Akaa   Akaa  PROPN   _       Case… 9     nsubj… _     "_"  
#>  5 Akaa      2 4     2026   2026  NUM     _       NumT… 3     nummod _     "Spa…
#>  6 Akaa      2 5     Sujuv… suju… ADJ     _       Case… 6     amod   _     "_"  
#>  7 Akaa      2 6     arjen  arki  NOUN    _       Case… 7     nmod:… _     "_"  
#>  8 Akaa      2 7     Akaas… Akaa  PROPN   _       Case… 3     nmod   _     "_"  
#>  9 Akaa      2 8     on     olla  AUX     _       Mood… 9     cop    _     "_"  
#> 10 Akaa      2 9     helpp… help… ADJ     _       Case… 0     root   _     "_"  
#> # ℹ 391,975 more rows
#> # ℹ 1 more variable: doc <chr>