Skip to contents

Adds document frequencies to new column. Document frequency describes in how many documents term appears. Useful in finding very common terms appearing in almost all the documents and very rare terms appearing in only single or very few documents.

Usage

calculate_doc_freq(df, doc, term)

Arguments

df

tidy data frame with one term per row

doc

column with document id

term

column with terms

Value

tidy data frame with added column df giving document frequencies for terms and df_ratio giving relative document frequency