Chapter 11: Big Language
Advances in natural language processing (NLP) and Big Data techniques have allowed us to learn about the human mind through one of its richest outputs – language. In this chapter, we introduce the field of computational linguistics and go through examples of how to find natural language and how to interpret the complexities that are present within it. The chapter discusses the major state-of-the-art methods being applied in NLP and how they can be applied to psychological questions, including statistical learning, n-gram models, word embedding models, large-language models, topic modeling, and sentiment analysis. The chapter concludes with ethical questions on the proliferation of chat "bots" that pervade our social networks and the importance of balanced training sets for NLP models.
- Learn how large language models work
- Try out examples of LLMs and natural language processing
- ChatGPT - one of the most well-known LLMs made by OpenAI.
- Claude - developed by Anthropic
- Gemini - developed by Google
- Mistral
- Claude - developed by Anthropic
- AI Dungeon - have an AI-generated choose-your-path adventure
- Dishgen - use AI to do meal-planning
- Visualize N-grams, embeddings, and analogies
- Is the text you're reading AI generated?
- How can biased training sets impact NLP tools?
Check out this blog post by Microsoft data scientist Andreas Stoffelbauer of how LLMs work.
Check out this blog by Jay Alammar as well, which also has a short YouTube tutorial series and a printed book on LLMs.
Here is a list of current LLMs and LLM tools to try. Because LLMs are exploding, these systems are constantly changing (and the space is getting a little crowded!):
Visualize trends in N-gram usage in text across time here with the Google Books Ngram Viewer
Visualize word emmbeddings and analogies in a geometric space with this word embedding demo by Prof. Dave Touretzky.
Interestingly, since writing about the topic of bots-fighting-bots for the book, the two main services for using AI to spot fake AI-generated reviews have been shut down (Review Meta and FakeSpot). It will be interesting to see if a new service fills this void. It does seem like one option is Fake Find-- an AI-powered service for spotting fake (AI-generated) reviews. I have not tested its quality, though.
There are also several services aimed at detecting AI generated text more broadly. The only one I have used is GPTZero but it seems the space is getting crowded! (For the time being I will not list others here because I haven't tested the quality of the services.)
Tatman, 2017 reports a bias in AI-based automatic captioning systems based on different dialects. She tested captioning on videos of the "accent tag challenge" and found automatic captioning was worst for Scottish dialects and for women's voices, even though those are equally valid examples of English speech. See an example here: