Document term table in tidy format of Finnish housing policy documents

aspol and aspol_filtered which is analysis ready data set of housing policy documents. More info of format in https://universaldependencies.org/format.html

Usage

aspol

Format

A data frame with rows and columns:

kunta: Municipality name
sent: Sentence number per document/municipality
ID: Word index, integer starting at 1 for each new sentence
FORM: Word form or punctuation symbol
LEMMA: Lemma or stem of word form
UPOSTAG: Universal part-of-speech tag
XPOSTAG: Language-specific part-of-speech
FEATS: List of morphological features
HEAD: Head of the current word, which is either a value of ID or zero (0)
DEPREL: Universal dependency relation to the HEAD
DEPS: Enhanced dependency graph in the form of a list of head-deprel pairs
MISC: Any other annotation
doc: Document name read from

Source

Finnish municipalities

Examples

aspol
#> # A tibble: 451,660 × 13
#>    kunta   sent ID    FORM  LEMMA UPOSTAG XPOSTAG FEATS HEAD  DEPREL DEPS  MISC 
#>    <chr>  <int> <chr> <chr> <chr> <chr>   <chr>   <chr> <chr> <chr>  <chr> <chr>
#>  1 Enont…     1 1     Khall Khall PROPN   _       Case… 0     root   _     "_"  
#>  2 Enont…     1 2     19.4… 19.4… NUM     _       _     1     nmod   _     "_"  
#>  3 Enont…     1 3     $     $     PUNCT   _       _     4     punct  _     "_"  
#>  4 Enont…     1 4     126   126   NUM     _       NumT… 1     nummod _     "Spa…
#>  5 Enont…     2 1     (     (     PUNCT   _       _     2     punct  _     "Spa…
#>  6 Enont…     2 2     N     N     NOUN    _       Abbr… 0     root   _     "Spa…
#>  7 Enont…     3 1     Enon… Enon… PROPN   _       Case… 0     root   _     "Spa…
#>  8 Enont…     4 1     KUNTA kunta NOUN    _       Case… 0     root   _     "Spa…
#>  9 Enont…     5 1     VUOK… vuok… NOUN    _       Case… 2     nmod:… _     "Spa…
#> 10 Enont…     5 2     KEHI… kehi… NOUN    _       Case… 0     root   _     "Spa…
#> # ℹ 451,650 more rows
#> # ℹ 1 more variable: doc <chr>