An Introduction to Latent Semantic Analysis
Thomas K. Landauer, Peter W. Foltz, and Darrell Laham
Abstract
Latent Semantic Analysis (LSA) is a theory and method for extracting
and representing the
contextual-usage meaning of words by statistical computations applied
to a large corpus of
text (Landauer and Dumais, 1997). The underlying idea is that the
aggregate of all the word
contexts in which a given word does and does not appear provides a set
of mutual constraints that largely determines the similarity of meaning
of words and sets of words to
each other. The adequacy of LSA's reflection of human knowledge has been
established in a variety of ways. For example, its scores overlap those
of humans on standard vocabulary
and subject matter tests; it mimics human word sorting and category
judgments; it simulates
word-word and passage-word lexical priming data; and, as reported in 3
following articles
in this issue, it accurately estimates passage coherence, learnability
of passages by
individual students, and the quality and quantity of knowledge contained
in an essay.
Full Paper (PDF)
|