The AARON Test

Date: 2020-09-14

I wrote this essay for the course of Philosophical issues of Computer Science. Maybe someone on the internet may appreciate it…

Converted from LaTeX using latex2html.


In 1968 Harold Cohen, a British artist, started to work on a computer program which could create artistic images, called AARON. This paper wants to propose a variation of the storical Turing Test introduced by Alan Turing in 1950, in which AI-generated art pieces will be compared to human-created ones. The implications of a scenario in which AI could generate art indistinguishable from human's one will be analyzed.


In this paper I will try to re-interpret a classic concept in the field of Artificial Intelligence (AI) which is the Turing Test and propose a new test, which involves AI and art. More specifically, if Turing was wondering what conclusions we can draw if a machine is able to impersonate a human being and fool another one, I ask: what conclusions can we draw if AI can generate art indistinguishable from the one generated by humans? To illustrate my reasoning, I will take as reference one of the first computer programs able to generate original artistic images: AARON. I will put AARON in competition with an imaginary human artist in the AARON Test, a variation of the Turing Test. Then, I will analyze the possible outcomes of the test, along with some considerations.

First of all, in Section II I will illustrate the main two concepts necessary to understand what we are talking about: the Turing Test and AARON. In Section III I will illustrate the AARON Test, with a practical example and the related considerations. While other examples of artistic imitation games are analyzed in Section IV, conclusions are finally drawn in Section V.


Turing Test

The Turing Test [Turing, 1950] is a test proposed by Alan Turing, an English mathematician widely considered the father of computer science and artificial intelligence. While it is often indicated as a test to evaluate if a machine can think, that is not the case. A better definition is that the Turing Test evaluates if a machine is able to reproduce a performance comparable to the one of a human, in a specific task [Harnad, 2006]. In other words, if the machine is able to pass the test, it means that it owns some capabilities that we usually consider exclusive to humans. The Turing Test is in the form of a game, with three players: a man (A), a woman (B), and an interrogator (C). C stays in a room apart from A and B that C knows only as X and Y. C has to determine which of the other two is the man and which is the woman just by means of questions. The questions and their answers will be transmitted in a typewritten form, through two terminals, in order to not let the voice or the handwriting of A and B help C in identifying them. A’s goal is to try and cause C to make the wrong identification, while B’s goal is to help the interrogator. The question proposed by Turing is: “What will happen when a machine takes the part of A in this game? Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?” [Turing, 1950, p. 433-434]. Discussing Turing's work is outside the scope of this paper; a possible observation, which will be helpful for our goal, is that this test evaluates, in particular, the ability of the machine to impersonate a human and self-evaluate its own answers. While these are characteristics that an intelligent machine is likely to have, they can not be sufficient to define the machine intelligent [Copeland, 2000]. Intelligence, in particular human intelligence, is composed of many different parts. The work by Turing is nowadays considered limited and not adequate for the complexity of the question “can machines think?”; however, it remains an extraordinary insight, and, probably, the starting point of the AI field. From now on, we will indicate with (a) the modality in which the role of A is assigned to a man, and with (b) the modality in which the role of A is assigned to a machine.

Figure 1: Simple diagram representing the Turing Test. C's goal is to identify who is the man and who is the woman; B's goal is to help C, playing as herself. A's goal is to fool C: to do so, he or it has to impersonate a woman.


AARON is a computer program able to create original artistic images [Cohen, 1995]. It has been written by Harold Cohen, a British painter, during his permanence at University of California, San Diego. Cohen started from a question: “What are the minimum conditions under which a set of marks functions as an image?”. Initially AARON was able only to create abstract drawings, but its creator implemented more possibilities, like the ability to draw concrete objects and humans [Cohen, 2014]. A program like AARON can raise many philosophical issues, both in the field of computational creativity and arts itself, for example: can a computer program be really creative? However, I will not consider them: for what concerns this work, AARON is a computer program which is able to appear creative: it can output works (in particular, paintings) that are different from every other piece of art existent on Earth. The possibility that AARON is indeed creative is not relevant here; the technology behind the program is even less important for us.

Figure 2: Nameless painting generated by AARON, Computer Museum, Boston, MA, 1995.
Image aaron1

The AARON Test

Consider now a game, with three players: two human artists (A and B), and an interrogator (C). C stays in a room apart from A and B that C knows only as X and Y. C is informed that one between A and B is a machine (even if it is not true). C has to determine which of them is a human artist and which is not, just by seeing pictures of their paintings. The goal of both A and B is to convince C that they are the real artist. The question I propose is: “What will happen when a machine, like AARON, takes the part of A in this game? Will the interrogator indicate B as the computer as often as he does when the game is played between two real artists?”. From now on, we will refer to the modality in which both A and B are real artist with (a), and the modality in which A is a machine and B a real artist with (b), in the same way we did for the Turing Test.

First, let me point out the main difference with respect to the Turing Test. In the Turing Test the game is played normally between three humans, with no indication that one of them could be a machine. Only in a second moment a machine is introduced in the game, in the role of A. The machine, exactly like the man, will have to convince C that he/she is talking to a woman.

Instead, in the AARON Test I am supposing beforehand that C is aware that one between A and B is a machine. In the first case (a) this is actually false, since both A and B are humans. In the second case (b) this is true. The reason for this is that C has to have a well-defined goal. In the Turing Test the goal is to distinguish between the woman and the man: it is a tangible goal, with precise conditions to decide if it is reached or not. However, in the AARON Test it is way more difficult to have an equally effective goal: on some extent art is subjective, and the risk is that questions like: “which artwork is more realistic?” or “which artwork do you like the most?” may lead to not well-defined goals. On the contrary, telling C that one between A and B is a machine forces her or him to take a stand, also from a psychological point of view. For example, Hong and Curran [Hong and Curran, 2019] found that AI-created artworks are generally considered less valuable from an artistic point of view with respect to human-created artworks; C could be motivated in choosing carefully if such a prejudice is present.

It is clear that in case (a) a correct answer exists for the Turing Test, while it doesn't for the AARON Test because both A and B are human. However, this does not represent a problem: the case (a) is useful to establish a control group or, in simple words, to count how many times B is indicated as the machine by C. I will explain the usefulness of this data after having pointed out many possible misunderstandings and gray zones. The rest of this Section will be devoted to strictly restrict the area of action of this test: only this way we will be able to extract meaningful considerations from it.

Figure 3: Simple diagram representing the AARON Test. C's goal is to identify who is the artist and who is the machine; the goal both for A and B is to convince C that they are the real artist.


As said, A and B are required to be real artists, at least in modality (a). In modality (b) only B is required to be so. However, giving a precise definition of artist may be a problem. It may be argued that, in a sense, everybody could be an artist. Let me say that, for this test, an artist is a person considered so by some critics. It is not necessary that the artist is world-wide known. On the other side, C may be a common person, with a common knowledge about art; the only important requirement is that C does not know A and B.


C, in order to establish who between A and B is the real artist, will only have access to a series of pictures of paintings produced by A and B. This is to exclude the possibility that C can analyze the technique of the strokes on the canvas. The game between A and B should be on a more semantic plane, instead of a mechanical one. The produced paintings have to be original in the sense that they cannot be considered a plagiarism of another artwork existent and done by a human, but we are not investigating the algorithm used do produce it. A neural network trained on a set of paintings is okay, but also other routes are viable. Moreover, a few words should be spent on the style of paintings. Dealing with Modern Art or very particular categories of paintings can be difficult. Therefore, for the scope of this paper, the artworks should be paintings able to arouse the common sense of beauty and harmony which is typical of visual arts. The two paintings used as an example in Figure 3 (The Starry Night by Vincent van Gogh and Impression, Sunrise by Claude Monet) are considered fine for the game, as a reference. Lastly, it's important that C has never seen the paintings before, or she/he would immediately understand who is the human and who is not.

A practical example

Suppose that we want to play this game; we can imagine how it would be. For example, consider the modality (a). We would ask for the help of two real artists, let's call them Picarro and Menet. Then, we would ask to another person, let's call him Bill, to play the part of C in our game. We would commission to Picarro and Menet 100 original paintings, each, without particular indications: they just have to be good looking artworks, even common, in a sense. After having waited for them to finish the work, we could start the game. We would send to Bill 100 couples of pictures of paintings. Every couple would be composed by a picture of a painting by Picarro (labeled as A) and one of a painting by Menet (labeled as B). Bill would have to decide, for each couple, which painting has been done by the real artist, while the other will be the one done by the machine. At the end, every painting produced by them will have been used (so, the same painting will not be used more than one time). After some time, we would be able to collect Bill's answers:

Table 1: Possible result of the AARON Test in modality (a): both A and B are real artists.
& A (Picarro) & B (Mene...
Which is the\\ machine? & 55 & 45 \\

I made up these numbers, but I hope that they make sense. Indeed, it is probable (at least, conceivable) that a similar challenge would result in a close tie. In the hypothesised scenario, Menet was able to be identified as the real artist 55 times, convincing Bill that he is the real artist. Picarro was classified as the machine: his paintings could not convince Bill enough.

Consider now the modality (b). We would ask Bill to play again. However, this time, the role of A would be taken by AARON, our machine. We would activate it and wait for it to generate 100 paintings without particular constraints. Then, we would ask to Menet to produce 100 other paintings, similarly to what he has done the previous time. We would then couple the pictures of these paintings in the same way we did before, to send them to Bill. Again, after some time, we would collect Bill's answers:

Table 2: Possible result of the AARON Test in modality (b): A is a machine, while B is a real artist.
& A (AARON) & B (Menet)...
Which is the\\ machine? & 40 & 60 \\

Consider the case in which we obtain values similar to these; again, I have invented them, but they are absolutely possible. AARON, the machine, would outperform Menet. In other words, this would mean that AARON could fool C, convincing her or him that a computer was the actual human artist.

What would this entail for us, as humans? First of all, I want to highlight that I wanted C to answer in the most spontaneous way: pushing her or him to take a position on who was the artist and who was the computer is the way I achieved that. Now, apparently, we would like to say that AARON produced art as a real artist would do, and that is why it won the competition. I will now propose a counterargument to this statement, and this counterargument represents the main message I want to propose in this paper. To do this, I have to introduce two concepts: the concept of art and the Chinese room argument by Searle. Obviously, I can not give a definitive and exhaustive definition of art here - also because such a definition probably does not exist yet. However, there is one definition which I find shoehorn for this context. In his work “Languages of art”, Goodman [Goodman, 1976] proposed a new view on the concept of art. According to him, art can be considered as a symbolic activity, similar to the one of natural language. As we use English or Italian to communicate, also paintings, sculptures and other artworks are composed of symbols. As such, artworks require interpretation, similar to a phrase written in English or Italian. Despite the great complexity of Goodman's work, this minimum definition is enough for the theory we are interested in.

The Chinese room argument was proposed by the philosopher John Searle in 1980 [Searle, 1980]. In this argument Searle imagines a computer able to behave as if it understands Chinese. In particular, the computer is able to receive questions in Chinese, start an algorithm which produces a reasonable answer to every possible question in Chinese, and return the answer back. Searle notices that a human could do the same work of the computer. It is necessary a book, written in English, which suggests to the man which are the best Chinese symbols to use as answer, given a question in Chinese. The man could blindly look for the Chinese symbols written in the question on the book. Then, according to it, he could produce an answer in correct Chinese. Nonetheless, if the man does not speak Chinese, we can not say that he understood the answers he provided, nor the questions. He just blindly followed a set of rules and, according to them, manipulated some symbols. The same exact thing could be said for the computer.

I hope that the connection I am making is becoming clear. AARON passed the AARON Test in the same way a computer program can pass the Turing Test: by manipulating symbols, according to specific rules; in particular, according to an algorithm. Even if AARON passes the AARON Test it does not imply that the computer can think; neither that it really understood the value of what it proposed. It can be proven with the Chinese room argument: the difference being that, instead of the Chinese language, AARON manipulated symbols belonging to the language of art.

While this consequence may look unimportant, I invite the reader to think, for a moment, if there exists a computer program capable of winning the AARON Test against any human. To be clearer, a computer program which, independently from who is in the role of the real artist and in the role of the interrogator, can win, fooling the interrogator. The issue becomes more serious: at that point would we still think that machines can not be intelligent? My answer does not change: simply passing the AARON Test implies little on the intelligence of the machine, even in such an extreme situation.

It may look like I am lessening my own work. Instead, I am framing the AARON Test in a more precise context. The value of this work lies here: if someone wants to create a test in order to evaluate the ability of the computer to think, using art, they should consider that something similar to the AARON Test would be subject to the aforementioned limitations. However, if the AARON Test is used with other goals (for example investigating our perception of art), then it can be valuable.

Other examples of artistic imitation games

This work is not the first to propose an imitation game related to the artistic field. In particular the most important is probably the one by Bishop and Boden, which proposed a variation of the Turing Test [Bishop and Boden, 2010]. According to them, to pass the test, the machine should be able to produce art which:
  1. is indistinguishable from one produced by a human being; and/or
  2. was seen as having as much aesthetic value as one produced by a human being.
There are at least two differences with this work. The first is that the AARON Test is not concerned with the aesthetic value of the artwork. This is mainly because I find this data particularly hard to use, due to the artistic issues arising from that. The second difference is in the formalization: I designed this test in order to be quantifiable and with a very controlled environment. The fact that the interrogator is forced to choose among the two artworks, the precise number of pictures examined and the other characteristics make the AARON Test more precise and well-defined. To the best of my knowledge, no similar works have been presented.

However, there are some objections to the use of tests similar to the Turing Test in this area. In particular [Pease and Colton, 2011] summarizes the majority of these counterarguments. The stronger one is that this kind of tests have an important difference with respect to the original Turing Test: they lack of interaction between the interrogator (C) and the two other players (A and B). In fact, in the Turing Test, C may ask questions in response to previous answers, while A and B can answer trying to be coherent with what previously said. This interaction is not present in the AARON Test and in other similar works. I acknowledge this difference, but I don't think it lowers the value of the test. Here, the goal is to have a specific and well-defined way of forcing a human to express, through their action, their inner idea of what is art and what is not, even if they could not express it through the natural language. In the Turing Test the interaction was necessary because, in order to demonstrate to be able to impersonate, it's fundamental to be able to evaluate one's own answers or to maintain semantic coherence between answers, or to recognize pitfalls. In the AARON Test the computer should demonstrate the ability to create visual art; often, this kind of art does not involve interaction with the author. Another consideration made by Pease and Colton is that this kind of tests can penalize some forms of creativity. In other words, they say that a computer may be creative in a way unknown or inconceivable for humans; in that case, even a really creative computer program may not pass the AARON Test. I accept this critique and this is the reason for which the AARON Test should be not considered as an all-in-one creativity-test for computers. More complex types of creativity may require other tests, the AARON Test not being one of them.


In this paper a game similar to the Turing Test, called AARON Test has been proposed. The test evaluates the ability of a computer program to produce paintings that a human could assume to be produced by a human. After a practical example, I stated that passing the AARON Test is not a meaningful demonstration of intelligence from the machine, but, like the Turing Test, a demonstration of a particular and reproducible ability. This ability is the one which consists in the generation of forms and images able to arouse an artistic reaction in a human being. Indeed, the Chinese Room argument, valid for the Turing Test, is relevant also for the AARON Test. Even if this is not the main concern of this paper, I feel right to express a thought on the artistic value of AI-generated artworks. AARON's artworks should be conceived together with the work of Harold Cohen; they are the result of the hard effort of the artist, who used his programming ability to develop the algorithm. Today, AI in the artistic field is considered as a tool, adoperated from the artist which is becoming more and more a data scientist or a programmer [Hertzmann, 2020]. A deep insight on the particular relationship between Harold Cohen and AARON can be found in [Garcia, 2016].


Bishop and Boden, 2010
Bishop, M. and Boden, M. A. (2010). The Turing test and artistic creativity. Kybernetes.
Cohen, 1995
Cohen, H. (1995). The further exploits of AARON, painter. Stanford Humanities Review, 4(2):141–158.
Cohen, 2014
Cohen, H. (2014). ACM SIGGRAPH Awards - Harold Cohen, Distinguished Artist Award for Lifetime Achievement. Accessed May 18, 2020.
Copeland, 2000
Copeland, B. J. (2000). The Turing Test. Minds and Machines, 10(4):519–539.
Garcia, 2016
Garcia, C. (2016). Harold Cohen and AARON—A 40-Year Collaboration. Accessed May 18, 2020.
Goodman, 1976
Goodman, N. (1976). Languages of art: An approach to a theory of symbols. Hackett publishing.
Harnad, 2006
Harnad, S. (2006). The annotation game: On Turing (1950) on computing, machinery, and intelligence. In The Turing test sourcebook: philosophical and methodological issues in the quest for the thinking computer. Kluwer.
Hertzmann, 2020
Hertzmann, A. (2020). Computers do not make art, people do. Communications of the ACM, 63(5):45–48.
Hong and Curran, 2019
Hong, J.-W. and Curran, N. M. (2019). Artificial Intelligence, Artists, and Art: Attitudes Toward Artwork Produced by Humans vs. Artificial Intelligence. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 15(2s):1–16.
Pease and Colton, 2011
Pease, A. and Colton, S. (2011). On impact and evaluation in computational creativity: A discussion of the Turing test and an alternative proposal. In Proceedings of the AISB symposium on AI and Philosophy, volume 39.
Searle, 1980
Searle, J. R. (1980). Minds, brains, and programs. Behavioral and brain sciences, 3(3):417–424.
Turing, 1950
Turing, A. (1950). Computing machinery and intelligence. Mind, 59(236):433–460.