Open
Description
I was looking to use trigrams because there are significant three-word phrases in my corpus (e.g. "economies in transition" to refer to developing countries). I used the following code in R.
statements <- prep_word2vec(basePath,
"docs.txt",
lowercase=T, bundle_ngrams = 3, threshold = 50)
w2v <- train_word2vec("docs.txt",
output="./stat_vecs.bin",
threads=detectCores(),
vectors=100,
window=7,
force=TRUE)
It worked as expected with the exception that I got some four word phrases (e.g. "so_that_they_can"). I'm curious why this is happening. Thanks!
Metadata
Assignees
Labels
No labels
Activity