CoCA outputs schematic classes derived from documents' engagement
with multiple bi-polar concepts (in a Likert-style fashion).
The function requires a (1) DTM of a corpus which can be obtained using any
popular text analysis package, or from the dtm_builder()
function, and (2)
semantic directions as output from the get_direction()
.
CMDist()
works under the hood. Code modified from the corclass
package.
Document-term matrix with words as columns. Works with DTMs
produced by any popular text analysis package, or you can use the
dtm_builder()
function.
Matrix of word embedding vectors (a.k.a embedding model) with rows as words.
direction vectors output from get_direction()
logical (default = TRUE), sets 'insignificant' ties to 0 to decrease noise and increase stability
Minimum significance cutoff. Absolute row correlations below this value will be set to 0
If 'drop', CCA drops rows with 0 variance from the analyses (default). If 'ownclass', the correlations between 0-variance rows and all other rows is set 0, and the correlations between all pairs of 0-var rows are set to 1
Returns a named list object of class CoCA
. List elements include:
membership: document memberships
modules: schematic classes
cormat: correlation matrix
Taylor, Marshall A., and Dustin S. Stoltz.
(2020) 'Concept Class Analysis: A Method for Identifying Cultural
Schemas in Texts.' Sociological Science 7:544-569.
doi:10.15195/v7.a23
.
Boutyline, Andrei. 'Improving the measurement of shared cultural
schemas with correlational class analysis: Theory and method.'
Sociological Science 4.15 (2017): 353-393.
doi:10.15195/v4.a15
#' # load example word embeddings
data(ft_wv_sample)
# load example text
data(jfk_speech)
# minimal preprocessing
jfk_speech$sentence <- tolower(jfk_speech$sentence)
jfk_speech$sentence <- gsub("[[:punct:]]+", " ", jfk_speech$sentence)
# create DTM
dtm <- dtm_builder(jfk_speech, sentence, sentence_id)
# create semantic directions
gen <- data.frame(
add = c("woman"),
subtract = c("man")
)
die <- data.frame(
add = c("alive"),
subtract = c("die")
)
gen_dir <- get_direction(anchors = gen, wv = ft_wv_sample)
die_dir <- get_direction(anchors = die, wv = ft_wv_sample)
sem_dirs <- rbind(gen_dir, die_dir)
classes <- CoCA(
dtm = dtm,
wv = ft_wv_sample,
directions = sem_dirs,
filter_sig = TRUE,
filter_value = 0.05,
zero_action = "drop"
)
print(classes)
#> CoCA found 2 schematic classes in the corpus. Sizes: 45 39