It’s what most people say they want. So how do we know how happy people are? You can’t improve or understand what you can’t measure. In a blow to happiness, we’re very good at measuring economic indices and this means we tend to focus on them. With our flagship instrument at hedonometer.org we’ve created an instrument that measures the happiness of large populations in near real time. We also link to applications of the instrument to literature, movies, and news.
Our Hedonometer is based on people’s online expressions, capitalizing on data-rich social media, and we’re measuring how people present themselves to the outside world. For our first version of hedonometer.org, we’re using Twitter as a source but in principle we can expand to any data source in any language (more below). We’ll also be adding an API soon.
So this is just a start — we invite you to explore the Twitter time series and let us know what you think.
Hedonometer.org is based on the research of Peter Dodds and Chris Danforth and their team in the Computational Story Lab, including visualization by Andy Reagan, at the University of Vermont Complex Systems Center, and the technology of Brian Tivnan, Matt McMahon and their team from The MITRE Corporation. Many others have contributed to the research we've conducted using the instrument, their names appear near the bottom of this page.
The economist Francis Edgeworth coined the term in the late 1800’s to describe "an ideally perfect instrument, a psychophysical machine, continually registering the height of pleasure experienced by an individual." [wikipedia]
To quantify the happiness of the atoms of language, we merged the 5,000 most frequent words from a collection of four corpora: Google Books, New York Times articles, Music Lyrics, and Twitter messages, resulting in a composite set of roughly 10,000 unique words. Using Amazon’s Mechanical Turk service, we had each of these words scored on a nine point scale of happiness: (1) sad to (9) happy. You can explore the average scores of each word on our words page, or download the entire list from the publication supplement here. On a few occasions, we've updated the word list to include new terms that were uncommon when the original survey was conducted.
hedonometer.org currently measures Twitter’s Decahose API feed (formerly Gardenhose). The stream reflects a 10% random sampling of the roughly 500 million messages posted to the service daily, comprising roughly 100GB of raw JSON each day. Words in messages we determine to be written in English are thrown into a large bag containing roughly 200 million words per day. This bag is then assigned a happiness score based on the average happiness score of the words contained within. While "bag-of-words" approaches can be problematic for small collections of text, we have found the methodology to work well at the large scale.
Is that even a question? Well, we do have a knob. It allows us to tune the relative importance of the most emotionally charged words by removing neutral words from consideration when determining the happiness of a given day. It also allows us to remove words that receive widely varying scores when rated on Mechanical Turk. Many profanities received average ratings between 4 and 6 due to the bimodal nature of their word score distribution. Details on the choice of Δ havg = 1 can be found in figure 2 of the foundational publication for Hedonometer. We also mask a small set of words with average sentiment outside of the neutral range. These are words whose scores we determined to be inappopriate for the task, typically because they are highly context dependent.
Tweets represent a non-uniform subsampling of all utterances made by a non-representative subpopulation of all people. However, there are hundreds of millions of people presently using the website to express their activities and interests, and as such it is an important social signal. According to Pew, 1 in 5 adult Americans use Twitter, including our current President as you may have heard.
Yes! And Twitter’s demographics have also changed over time. Nevertheless, we’re using Twitter as our initial data source for a few reasons:
Many people presume this day will be one of clear positivity. While we do see positive words such as “celebration” appearing, the overall language of the day on Twitter reflected that a very negatively viewed character met a very negative end. It was a day of complex emotion which is best explored in the word shift for the day, rather than the single number of its average happiness.
In our Computational Story Lab blog we describe research projects in which we use our hedonometer to characterize happiness variations with respect to geography, network topology, demographics, and socio-economic data. For example, here’s a map of the US with cities colored by happiness:
For the full story of our hedonometer algorithm, please read our foundational paper describing its construction:
We have scored 10 languages, revealing a universial positivity bias in human language. For more information, see our paper in PNAS. We have analyzed other languages as well, and are working to get sentiment timeseries online.
We are currently developing a principled method to identify relevant phrases, for example to deal with the multitude of both positive and negative uses of profanity. We expect to be scoring phrases instead of words, where appropriate, in the near future.
We are currently building a large-scale database of word-based measures for emotions other than happiness and sadness such as fear, anger, and surprise. We intend to incorporate these emotions into future versions of the hedonometer.
In addition to the raw data shown on this site, we are working to provide detailed analysis around brands, financial products, and US politics at Quokka Labs. More information is available on the website: quokkalabs.io.
Peter Dodds, Chris Danforth, Jane Adams, Sharon Alajajian, Nicholas Allgaier, Thayer Alshaabi, Michael Arnold, Catherine Bliss, Eric Clark, Emily Cody, Ethan Davis, Todd DeLuca, Suma Desu, David Dewhurst, Danne Elbers, Kameron Decker Harris, Fletcher Hazlehurst, Sophie Hodson, Kayla Horak, Ben Emery, Mike Foley, Morgan Frank, Ryan Gallagher, Darcy Glenn, Sandhya Gopchandani, Kelly Gothard, Tyler Gray, Max Green, Laura Jennings, Dilan Kiley, Isabel Kloumann, Ben Kotzen, Paul Lessard, Ross Lieb-Lappen, Kelsey Linnell, Ashley McKhann, Andy Metcalf, Tom McAndrew, Sven McCall, Josh Minot, Henry Mitchell, Lewis Mitchell, Kate Morrow, Eitan Pechenick, Michael Pellon, Aaron Powers, Andy Reagan, John Ring IV, Abby Ross, Lindsay Ross, Aaron Schwartz, Anne-Marie Stupinski, Matt Tretin, Lindsay Van Leir, Colin Van Oort, Brendan Whitney, and Jake Williams.
Brian Tivnan, Matt McMahon, Ivan Ramiscal, Mike Shadid, Pete Carrigan, Zach Furness, Zoe Henscheid, Garry Jacyna, Matt Koehler, and Karine Megerdoomian.
Mike Austin, Jim Bagrow, Josh Bongard, Josh Brown, Jim Burgmeier, Melody Burkins, Kate Danforth, Andrea Elledge, Maggie Eppstein, Bill Gottesman, Laurent Hebert-Dufresne, John Kaehny, Jim Lawson, Juniper Lovato, Aimee Picchi, Andrew Reece, Tony Richardson, Taylor Ricketts, Melissa Rubinchuk, John Tucker and Toph Tucker.
Thanks to Thiago Lins and David Peterman for helping identify events to annotate.
And special thanks go to Jonathan Harris and Sep Kamvar for their initial inspiration.