CoCA outputs schematic classes derived from documents' engagement with multiple bi-polar concepts (in a Likert-style fashion). The function requires a (1) DTM of a corpus which can be obtained using any popular text analysis package, or from the dtm_builder() function, and (2) semantic directions as output from the get_direction(). CMDist() works under the hood. Code modified from the corclass package.

CoCA(
  dtm,
  wv = NULL,
  directions = NULL,
  filter_sig = TRUE,
  filter_value = 0.05,
  zero_action = c("drop", "ownclass")
)

coca(
  dtm,
  wv = NULL,
  directions = NULL,
  filter_sig = TRUE,
  filter_value = 0.05,
  zero_action = c("drop", "ownclass")
)

Arguments

dtm

Document-term matrix with words as columns. Works with DTMs produced by any popular text analysis package, or you can use the dtm_builder() function.

wv

Matrix of word embedding vectors (a.k.a embedding model) with rows as words.

directions

direction vectors output from get_direction()

filter_sig

logical (default = TRUE), sets 'insignificant' ties to 0 to decrease noise and increase stability

filter_value

Minimum significance cutoff. Absolute row correlations below this value will be set to 0

zero_action

If 'drop', CCA drops rows with 0 variance from the analyses (default). If 'ownclass', the correlations between 0-variance rows and all other rows is set 0, and the correlations between all pairs of 0-var rows are set to 1

Value

Returns a named list object of class CoCA. List elements include:

  • membership: document memberships

  • modules: schematic classes

  • cormat: correlation matrix

References

Taylor, Marshall A., and Dustin S. Stoltz. (2020) 'Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts.' Sociological Science 7:544-569. doi:10.15195/v7.a23 .
Boutyline, Andrei. 'Improving the measurement of shared cultural schemas with correlational class analysis: Theory and method.' Sociological Science 4.15 (2017): 353-393. doi:10.15195/v4.a15

See also

Author

Dustin Stoltz and Marshall Taylor

Examples


#' # load example word embeddings
data(ft_wv_sample)

# load example text
data(jfk_speech)

# minimal preprocessing
jfk_speech$sentence <- tolower(jfk_speech$sentence)
jfk_speech$sentence <- gsub("[[:punct:]]+", " ", jfk_speech$sentence)

# create DTM
dtm <- dtm_builder(jfk_speech, sentence, sentence_id)

# create semantic directions
gen <- data.frame(
  add = c("woman"),
  subtract = c("man")
)

die <- data.frame(
  add = c("alive"),
  subtract = c("die")
)

gen_dir <- get_direction(anchors = gen, wv = ft_wv_sample)
die_dir <- get_direction(anchors = die, wv = ft_wv_sample)

sem_dirs <- rbind(gen_dir, die_dir)

classes <- CoCA(
  dtm = dtm,
  wv = ft_wv_sample,
  directions = sem_dirs,
  filter_sig = TRUE,
  filter_value = 0.05,
  zero_action = "drop"
)

print(classes)
#> CoCA found 2 schematic classes in the corpus. Sizes: 45 39