![different dictionaries different dictionaries](https://www.raptisrarebooks.com/images/31637/a-dictionary-of-the-english-language-in-which-samuel-johnson-first-edition-1755.jpg)
By making use of distributional semantic similarity, researchers can focus on concept validity rather than dealing with linguistic issues. This makes it easier for researchers to generate new dictionaries and apply them to explore theoretical concepts where the resources may not have been previously available for large-scale text analysis. Since the purpose of the dictionary is now to identify the core of a concept rather than identifying every possible word which might be associated with that concept, it is possible to produce a dictionary with a small list of the most salient words. The disadvantage is that we lose post-level granularity and the ability to track changes over time, critical in a number of areas such as clinical psychology.ĭDR also has a number of benefits for dictionary development. Prior social media research has noted precisely this difficulty (Gunn and Lester 2015) with the common solution being to aggregate multiple short posts into larger documents (Tumasjan et al. At that length, it is unlikely that any words from an open-class dictionary will be present to be counted. 2015 Dehghani et al., 2016) which are often no more than a few words long. This is critical as more and more social scientific text analysis makes use of social media posts (Mitchell et al., 2013 Kern et al., 2014 Eichstaedt et al. One advantage of this method is to improve the ability to apply dictionaries to small pieces of text (down to individual words). We can use this representation to provide a continuous measure for how similar other words are to a given concept. Our method, which we term Distributed Dictionary Representation (DDR), averages the representations of the words in a dictionary and uses that average to represent a given concept as a point in the semantic space. We demonstrate a novel method of combining psychological dictionary methods and distributed representations which indicates that these two methods are not only compatible, but that combining the two adds to the flexibility of both and opens new avenues for exploration. However, psychological applications of dictionaries and word counts showed these to be essential to understanding a range of phenomena including emotional state (Pennebaker 1997), authorship identification (Boyd and Pennebaker 2015), and social hierarchies (Kacewicz et al. Preferring to focus on content words, many computational approaches dismissed these as “stopwords” (Wilbur and Sirotkin 1992) which could be safely ignored. Given the Zipfian distribution of language (Powers 1998), these small sets of common words compose around 60 % of many English texts. A number of word classes such as determiners, pronouns, and conjunctions and sub-classes such as modal verbs are considered to be closed since they are relatively fixed with words rarely added or removed. One notable discovery has been the importance of closed class terms to understanding psychological properties from language (Pennebaker 2011). This work has also led to insights which have fed back into both linguistics and computer science. Finally, we provide references to tools and resources to make this method both available and accessible to a broad psychological audience. These studies allow us to examine how DDR and word count methods complement one another as tools for applying concept dictionaries and where each is best applied. We further demonstrate the benefits of DDR on two real-world tasks and finally conduct an extensive study of the interaction between dictionary size and task performance. We show how DDR enables dictionary authors to place greater emphasis on construct validity without sacrificing linguistic coverage. This allows for the measurement of the similarity between dictionaries and spans of text ranging from complete documents to individual words. In this paper, we introduce Distributed Dictionary Representations (DDR), a method that applies psychological dictionaries using semantic similarity rather than word counts. These dictionaries have generally been applied through word count methods which have proven to be both simple and effective. Theory-driven text analysis has made extensive use of psychological concept dictionaries, leading to a wide range of important results.