Is Siri human? — A primer on natural language processing.
Historians define several points in time as influential moments in our evolution. For example, the Agricultural Revolution and Scientific Revolution marked points where our species developed exponentially. However, the first of these significant moments — the one which evolved our species, Homo sapiens, from other primates is referred to as the Cognitive Revlolution. The Cognitive Revolution marked the beginning of when our ancestors started defining certain cultural norms that differentiate us as a species. According to researchers, the most important thing that made this possible was language.
If the ability to communicate through speech is one of the things that define us as humans — what does it mean now that machines can communicate with us (and one another)? The purpose of this blog post is to explore the branch of Artificial Intelligence that deals with verbal and written communication — called Natural Language Processing — to understand how this will continue to impact our society as we further evolve along with technology. It also aims to answer the question — if language is the defining trait that made Homo sapiens, then is Siri human?
Artificial Intelligence (AI) is the demonstration of important characteristics typically associated with human intelligence (such as logic, reasoning, and problem solving) by machines. AI has become more and more commonplace in recent years and some examples of applications include self driving vehicles, recommendations for who to follow on social media or what to watch while streaming (along with many others).
Some of the sub-branches of natural language processing (NLP) are natural language generation, natural language understanding, and speech recognition. The rest of this post will define each and explain how they make up the basic logic that define virtual assistants such as Siri.
Within natural language generation, there can be a range of complexity. At its simplest form, we see chatbots as a common tool used by companies to automate online support. Often, these have predefined responses to common questions and will end the conversation with something like “Was this a helpful response?” Your answer to this question helps train the algorithm to better answer future questions for others.
On the other end of the spectrum is the type of text that can be produced by models like Generative Pre-Trained Transformer 3 (GPT-3). GPT-3 made headlines this past year when it was released by the company OpenAI as a tool for generating text that is indistinguishable from that of a human. GPT-3 uses a deep learning model trained on a large corpus of human text (deep learning is a subset of machine learning that works on very large complex datasets). Here is an example of what models like GPT-3 can accomplish:
Natural language understanding is the area of NLP that deals with taking a piece of text and processing it in a way for a machine to understand it. Two major examples of this area of NLP are sentiment analysis and topic detection. Sentiment analysis is used to comprehend a piece of text and determine whether or not it represents a positive or negative reflection of its subject. A search of “sentiment analysis of tweets” will provide plenty of examples on how to do this on Twitter data in Python. Topic detection, on the other hand, will group bodies of text based on their keywords. This is important when it comes to taking a large corpus of information and getting an overview about their subjects.
Finally, speech recognition is something that anyone with a smart phone is familiar with. It is the technology upon which Siri is based on. In simplest terms, speech recognition turns audio waves into language that the algorithm can comprehend (natural language understanding) and then back to words that the user can understand (natural language generation). Finally, it converts that text back into audio (if Siri is providing you with a verbal response).
I wanted to write this blog post because I am very interested in natural language processing. I think one of the things that first attracted me to the topic was when I heard of the Chinese room thought experiment.
The idea is that there is a man in a room and he is following an algorithm that given some Chinese characters, he can generate a perfect response to a Chinese speaker on the other end of the room (even though he himself does not know any Chinese). Because he is looking up every response, one can conclude that he does not actually know the answers to these questions. Therefore, he is not truly answering these questions based on his own knowledge.
In my opinion, the man in the room helps answers the question — “How smart is Siri, really?” I just asked her what the elevation of Mt. Everest was and she gave me a perfect answer. As I mentioned at the beginning of my post, language was one of the defining characteristics of what separates us from our earlier primate relatives — it’s what makes us human. However, Siri is just like the man in the Chinese room experiment. She might be able to give us perfect answers, but she doesn’t quite possess the knowledge herself. Not yet at least.