The AARON Test
I wrote this essay for the course of Philosophical issues of Computer Science. Maybe someone on the internet may appreciate it...
Converted from LaTeX using latex2html.
AbstractIn 1968 Harold Cohen, a British artist, started to work on a computer program which could create artistic images, called AARON. This paper wants to propose a variation of the storical Turing Test introduced by Alan Turing in 1950, in which AI-generated art pieces will be compared to human-created ones. The implications of a scenario in which AI could generate art indistinguishable from human's one will be analyzed.
First of all, in Section II I will illustrate the main two concepts necessary to understand what we are talking about: the Turing Test and AARON. In Section III I will illustrate the AARON Test, with a practical example and the related considerations. While other examples of artistic imitation games are analyzed in Section IV, conclusions are finally drawn in Section V.Turing, 1950] is a test proposed by Alan Turing, an English mathematician widely considered the father of computer science and artificial intelligence. While it is often indicated as a test to evaluate if a machine can think, that is not the case. A better definition is that the Turing Test evaluates if a machine is able to reproduce a performance comparable to the one of a human, in a specific task [Harnad, 2006]. In other words, if the machine is able to pass the test, it means that it owns some capabilities that we usually consider exclusive to humans. The Turing Test is in the form of a game, with three players: a man (A), a woman (B), and an interrogator (C). C stays in a room apart from A and B that C knows only as X and Y. C has to determine which of the other two is the man and which is the woman just by means of questions. The questions and their answers will be transmitted in a typewritten form, through two terminals, in order to not let the voice or the handwriting of A and B help C in identifying them. A’s goal is to try and cause C to make the wrong identification, while B’s goal is to help the interrogator. The question proposed by Turing is: “What will happen when a machine takes the part of A in this game? Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?” [Turing, 1950, p. 433-434]. Discussing Turing's work is outside the scope of this paper; a possible observation, which will be helpful for our goal, is that this test evaluates, in particular, the ability of the machine to impersonate a human and self-evaluate its own answers. While these are characteristics that an intelligent machine is likely to have, they can not be sufficient to define the machine intelligent [Copeland, 2000]. Intelligence, in particular human intelligence, is composed of many different parts. The work by Turing is nowadays considered limited and not adequate for the complexity of the question “can machines think?”; however, it remains an extraordinary insight, and, probably, the starting point of the AI field. From now on, we will indicate with (a) the modality in which the role of A is assigned to a man, and with (b) the modality in which the role of A is assigned to a machine.
First, let me point out the main difference with respect to the Turing Test. In the Turing Test the game is played normally between three humans, with no indication that one of them could be a machine. Only in a second moment a machine is introduced in the game, in the role of A. The machine, exactly like the man, will have to convince C that he/she is talking to a woman.
Instead, in the AARON Test I am supposing beforehand that C is aware that one between A and B is a machine. In the first case (a) this is actually false, since both A and B are humans. In the second case (b) this is true. The reason for this is that C has to have a well-defined goal. In the Turing Test the goal is to distinguish between the woman and the man: it is a tangible goal, with precise conditions to decide if it is reached or not. However, in the AARON Test it is way more difficult to have an equally effective goal: on some extent art is subjective, and the risk is that questions like: “which artwork is more realistic?” or “which artwork do you like the most?” may lead to not well-defined goals. On the contrary, telling C that one between A and B is a machine forces her or him to take a stand, also from a psychological point of view. For example, Hong and Curran [Hong and Curran, 2019] found that AI-created artworks are generally considered less valuable from an artistic point of view with respect to human-created artworks; C could be motivated in choosing carefully if such a prejudice is present.
It is clear that in case (a) a correct answer exists for the Turing Test, while it doesn't for the AARON Test because both A and B are human. However, this does not represent a problem: the case (a) is useful to establish a control group or, in simple words, to count how many times B is indicated as the machine by C. I will explain the usefulness of this data after having pointed out many possible misunderstandings and gray zones. The rest of this Section will be devoted to strictly restrict the area of action of this test: only this way we will be able to extract meaningful considerations from it.
I made up these numbers, but I hope that they make sense. Indeed, it is probable (at least, conceivable) that a similar challenge would result in a close tie. In the hypothesised scenario, Menet was able to be identified as the real artist 55 times, convincing Bill that he is the real artist. Picarro was classified as the machine: his paintings could not convince Bill enough.
Consider now the modality (b). We would ask Bill to play again. However, this time, the role of A would be taken by AARON, our machine. We would activate it and wait for it to generate 100 paintings without particular constraints. Then, we would ask to Menet to produce 100 other paintings, similarly to what he has done the previous time. We would then couple the pictures of these paintings in the same way we did before, to send them to Bill. Again, after some time, we would collect Bill's answers:
Consider the case in which we obtain values similar to these; again, I have invented them, but they are absolutely possible. AARON, the machine, would outperform Menet. In other words, this would mean that AARON could fool C, convincing her or him that a computer was the actual human artist.
What would this entail for us, as humans? First of all, I want to highlight that I wanted C to answer in the most spontaneous way: pushing her or him to take a position on who was the artist and who was the computer is the way I achieved that. Now, apparently, we would like to say that AARON produced art as a real artist would do, and that is why it won the competition. I will now propose a counterargument to this statement, and this counterargument represents the main message I want to propose in this paper. To do this, I have to introduce two concepts: the concept of art and the Chinese room argument by Searle. Obviously, I can not give a definitive and exhaustive definition of art here - also because such a definition probably does not exist yet. However, there is one definition which I find shoehorn for this context. In his work “Languages of art”, Goodman [Goodman, 1976] proposed a new view on the concept of art. According to him, art can be considered as a symbolic activity, similar to the one of natural language. As we use English or Italian to communicate, also paintings, sculptures and other artworks are composed of symbols. As such, artworks require interpretation, similar to a phrase written in English or Italian. Despite the great complexity of Goodman's work, this minimum definition is enough for the theory we are interested in.
The Chinese room argument was proposed by the philosopher John Searle in 1980 [Searle, 1980]. In this argument Searle imagines a computer able to behave as if it understands Chinese. In particular, the computer is able to receive questions in Chinese, start an algorithm which produces a reasonable answer to every possible question in Chinese, and return the answer back. Searle notices that a human could do the same work of the computer. It is necessary a book, written in English, which suggests to the man which are the best Chinese symbols to use as answer, given a question in Chinese. The man could blindly look for the Chinese symbols written in the question on the book. Then, according to it, he could produce an answer in correct Chinese. Nonetheless, if the man does not speak Chinese, we can not say that he understood the answers he provided, nor the questions. He just blindly followed a set of rules and, according to them, manipulated some symbols. The same exact thing could be said for the computer.
I hope that the connection I am making is becoming clear. AARON passed the AARON Test in the same way a computer program can pass the Turing Test: by manipulating symbols, according to specific rules; in particular, according to an algorithm. Even if AARON passes the AARON Test it does not imply that the computer can think; neither that it really understood the value of what it proposed. It can be proven with the Chinese room argument: the difference being that, instead of the Chinese language, AARON manipulated symbols belonging to the language of art.
While this consequence may look unimportant, I invite the reader to think, for a moment, if there exists a computer program capable of winning the AARON Test against any human. To be clearer, a computer program which, independently from who is in the role of the real artist and in the role of the interrogator, can win, fooling the interrogator. The issue becomes more serious: at that point would we still think that machines can not be intelligent? My answer does not change: simply passing the AARON Test implies little on the intelligence of the machine, even in such an extreme situation.
It may look like I am lessening my own work. Instead, I am framing the AARON Test in a more precise context. The value of this work lies here: if someone wants to create a test in order to evaluate the ability of the computer to think, using art, they should consider that something similar to the AARON Test would be subject to the aforementioned limitations. However, if the AARON Test is used with other goals (for example investigating our perception of art), then it can be valuable.Bishop and Boden, 2010]. According to them, to pass the test, the machine should be able to produce art which:
- is indistinguishable from one produced by a human being; and/or
- was seen as having as much aesthetic value as one produced by a human being.
However, there are some objections to the use of tests similar to the Turing Test in this area. In particular [Pease and Colton, 2011] summarizes the majority of these counterarguments. The stronger one is that this kind of tests have an important difference with respect to the original Turing Test: they lack of interaction between the interrogator (C) and the two other players (A and B). In fact, in the Turing Test, C may ask questions in response to previous answers, while A and B can answer trying to be coherent with what previously said. This interaction is not present in the AARON Test and in other similar works. I acknowledge this difference, but I don't think it lowers the value of the test. Here, the goal is to have a specific and well-defined way of forcing a human to express, through their action, their inner idea of what is art and what is not, even if they could not express it through the natural language. In the Turing Test the interaction was necessary because, in order to demonstrate to be able to impersonate, it's fundamental to be able to evaluate one's own answers or to maintain semantic coherence between answers, or to recognize pitfalls. In the AARON Test the computer should demonstrate the ability to create visual art; often, this kind of art does not involve interaction with the author. Another consideration made by Pease and Colton is that this kind of tests can penalize some forms of creativity. In other words, they say that a computer may be creative in a way unknown or inconceivable for humans; in that case, even a really creative computer program may not pass the AARON Test. I accept this critique and this is the reason for which the AARON Test should be not considered as an all-in-one creativity-test for computers. More complex types of creativity may require other tests, the AARON Test not being one of them.Hertzmann, 2020]. A deep insight on the particular relationship between Harold Cohen and AARON can be found in [Garcia, 2016].
- Bishop and Boden, 2010
- Bishop, M. and Boden, M. A. (2010). The Turing test and artistic creativity. Kybernetes.
- Cohen, 1995
- Cohen, H. (1995). The further exploits of AARON, painter. Stanford Humanities Review, 4(2):141–158.
- Cohen, 2014
- Cohen, H. (2014). ACM SIGGRAPH Awards - Harold Cohen, Distinguished Artist Award for Lifetime Achievement. Accessed May 18, 2020. https://www.youtube.com/watch?v=_Xbt8lzWxIQ.
- Copeland, 2000
- Copeland, B. J. (2000). The Turing Test. Minds and Machines, 10(4):519–539.
- Garcia, 2016
- Garcia, C. (2016). Harold Cohen and AARON—A 40-Year Collaboration. Accessed May 18, 2020. https://computerhistory.org/blog/harold-cohen-and-aaron-a-40-year-collaboration/.
- Goodman, 1976
- Goodman, N. (1976). Languages of art: An approach to a theory of symbols. Hackett publishing.
- Harnad, 2006
- Harnad, S. (2006). The annotation game: On Turing (1950) on computing, machinery, and intelligence. In The Turing test sourcebook: philosophical and methodological issues in the quest for the thinking computer. Kluwer.
- Hertzmann, 2020
- Hertzmann, A. (2020). Computers do not make art, people do. Communications of the ACM, 63(5):45–48.
- Hong and Curran, 2019
- Hong, J.-W. and Curran, N. M. (2019). Artificial Intelligence, Artists, and Art: Attitudes Toward Artwork Produced by Humans vs. Artificial Intelligence. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 15(2s):1–16.
- Pease and Colton, 2011
- Pease, A. and Colton, S. (2011). On impact and evaluation in computational creativity: A discussion of the Turing test and an alternative proposal. In Proceedings of the AISB symposium on AI and Philosophy, volume 39.
- Searle, 1980
- Searle, J. R. (1980). Minds, brains, and programs. Behavioral and brain sciences, 3(3):417–424.
- Turing, 1950
- Turing, A. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.