The Turing Test : Interpretations and Applications
The Turing test is a much discussed and often misconstrued topic in the philosophy of mind. In particular, there is disagreement over how the progression of the test to include machines should be understood. The differing interpretations are known as the traditional and the literal translation, and the dividing factor is that of the role of the interrogator – it is believed that it is unclear in Turing’s paper whether the interrogator, in the machine version of the test, must identify which of two players is a machine, or whether the interrogator is to continue his original investigation to identify which of two players are a man and a woman.
I will show that such distinctions are in fact irrelevant for Turing, although in an indirect fashion, and furthermore that such arguments overlook the more practical benefits of the test. I will consider the implications of this in comparison to standard uses of the test, then provide a slightly different method for applying it, and outline some of the benefits of such an application.
Turing was led by his work on mathematics and computing machinery to consider whether machines could think. However, this question is not well defined, given that both the terms “machine” and “think” – along with its’ associated concepts of cognition, mind, consciousness and soul – are poorly specified in language. Instead, it is claimed that Turing offered a description of a test that could be used to denote machines that pass it as thinking. The basis of the test is the Imitation Game, and it was described by Turing in “Computing Machinery and Intelligence” (Turing 1950). Although he wrote others, it is this paper that I refer to as “Turing’s paper”.
THE IMITATION GAME
The initial version of the Imitation Game is played between three people – man A, woman B and judge C. It is the job of C to correctly identify the man and the woman. A is to make things difficult for the judge, for example by lying. The job of B is to help C, in any way possible. The players and the judge are each in separate rooms and cannot communicate by any means other than teletype machines. Further detail is given in Turing’s paper about what sort of questions can be asked, and the range of topics that could be discussed.
The game is then adapted by replacing A with a computing machine; the Imitation game, as played by a computing machine A, a woman B, and a judge C, is commonly known as the Turing test. However Turing did not clearly specify any further changes to the game; particularly, he did not mention any change in the responsibilities of the judge, which has resulted in controversy leading to opposing interpretations.
INTERPRETATIONS OF THE TEST
In what is known as the standard or traditional reading, it is assumed that the role of the judge is to identify the woman in the first version, and so in the second version the judge is to identify the computer. Thus, the Turing test is stipulated as being a test of whether or not the computing machine succeeds in obfuscating the fact that it is a machine. If the judge cannot identify the machine from the other player, the machine passes the test. Proponents of this understanding would claim therefore that a machine that passes the Turing test is intelligent, by virtue of having shown ability similar to a human in use of language and inference. This is how the Turing test is said to suitably replace the question of whether or not machines can think.
As there is no clear specification of a change to the role of the judge in the Turing test, there are those that claim the role of the judge should therefore remain the same as in version 1 – that, although the computing machine takes the place of the man, there is no reason given in the text to assume that the judge need know this. Therefore, the judge should continue to try and identify between a man and a woman, regardless of the fact that there is no man present. This version could help to eliminate any bias on the part of the judge against machine intelligence.
PICCININI AND THE STANDARD READING
Piccinini, for one, is in the standard camp, and provides various arguments to defend the standard reading against the literal (Piccinini 2000). He argues that the literal reading is not, in fact, literal; it does not specify whether the judge is to know that the machine has joined the game or not. Also, he believes that the test as understood in the literal sense does not provide an obvious reason for accepting it as a substitute for the original question as, for example, it is not clear why the ability to imitate a man imitating a woman is suitable justification for intelligence. Piccinini claims that Turing, not being a sloppy thinker or writer, would have specified this need to conceal the truth of the matter from the judge.
The best support for the standard reading can be found in section 5 of Turing’s paper where, as Piccinini points out, reconciling Turing’s comments with the literal reading would require that a computing machine and a male playing the test must both imitate a woman. This shows the literal reading to depart even further from what would seem to be a sensible understanding of Turing’s words.
The points that Piccinini raises are reasonable; they do highlight ways in which the literal reading seems incongruous with Turing’s writings. However Piccinini and indeed other advocates for the standard reading do not actually give any definitive proof. Clearly, despite the claim that Turing was not a sloppy writer, and the arguments given against the literal reading, there are those that find the standard reading suspect enough to warrant considering the literal reading. There must therefore be at least some reason to lend credence to another interpretation, or there would simply not be any debate on the matter.
LEARNING FROM THE LITERAL READING
The shift from the question of whether machines can think, to the specification of a game, leading on to a definition for a test, is a convoluted process during which Turing attempts to avoid certain ambiguities in the concepts both of machines and of thinking. Understanding of that process is not easy. Perhaps the standard reading of the test is the reading Turing intended, but yet it is still poorly applied. It could then be that the suspicion some have for the standard reading, along with the complex format of Turing’s paper, gives rise to the search for new ways to understand and apply the test.
As Turing predicted, the concept of thinking machines is more widely acceptable these days than it was in the 1950’s; many people talk of their computers as succeeding or failing in tasks, and often take occurrences in the modern world to be results of actions taken by some sort of real life person. For example, when receiving spam email some believe it has in fact been sent to them by someone or, upon receiving an unwanted telemarketing call, may demand to know how their number was discovered (despite the fact that automated calling systems can simply output any sequence of tones). But despite these idiosyncrasies, there is still a possibility for bias in the application of the Turing test, as a judge may be unwilling to accept that a machine can think, or may believe that even a thinking machine cannot trick him.
The ability of the test to actually prove intelligence is always in question. There are various arguments regarding the necessity or sufficiency of the test, and undermining examples such as that of perfectly intelligent beings that simply refuse to take part in it. Further claims are made about the capability of such a device that passes the test to actually be thinking, perhaps by declaring brute-force methods to be unacceptable. Block (Block 1981) and others would claim for example that there are limits to what a computing machine can achieve, and that those limits preclude thinking.
The initial Imitation game clearly states that the judge is to identify which is the woman (and, as a consequence, which is the man) between players A and B. Under the standard reading the second version requires that the judge tries to identify which player of two is the machine. However, it does seem as if something of the essence of the game is lost here, in that the standard reading does not strongly enforce a requirement that either participant attempt to imitate the other in order to deceive the judge. This is quite a large change to a game called “The Imitation Game” and, I think, is the root source of alternative readings.
APPLYING THE TURING TEST
The initial purpose of the Turing test was to replace the unclear question of whether machines can think. Given the points raised from considering the differing readings of the test, it is therefore necessary to :
-
discern what role gender identification plays in the game
-
analyse the importance of imitation in achieving the aim of replacing the question with the test
-
discover if that importance is overlooked in the standard reading
Executing these requirements and considering the findings in light of the claims for and ability of the Turing test to prove intelligence will demonstrate the benefits the Turing test actually provides over the question of whether machines can think, and of exactly how those benefits can be realised without suspicion of inadequate interpretation.
There are already arguments against the Turing test as a necessary or sufficient condition for intelligence. But bearing in mind the Turing test is based on the Imitation Game, and the Imitation Game does require the judge to identify between the male and female players, any pretension of the test as being necessary or sufficient is easily dismissed – because there is no point at which the judge is required to make any claim whatsoever about intelligence; he is simply identifying between a man and a woman and I presume that these are both things that are accepted as intelligent, both when Turing wrote the paper and nowadays. When the game is changed to include a computing machine, the requirements made of the judge may change, but not to the effect that it becomes the responsibility of the judge to be the arbiter of all things intelligent.
If it were necessary for the judge to claim which player is intelligent, he would require criteria for doing so, which would leave him in the same difficult situation he would be in if asked to show how machines can think – or, indeed, if instead of playing version 1 of the game, he were asked to show how women or men think. In the second version of the game, the judge must identify between a person and a computing machine (on the standard reading), but he must not be biased and presume to show that one is therefore not intelligent. All he can show is whether or not he can identify between the two.
Is it true that by modifying … [a] … computer to have an adequate storage, suitably increasing its speed of action, and providing it with an appropriate programme, … [a computing machine] … can be made to play satisfactorily the part of A in the imitation game, the part of B being taken by a man? (Turing 1950)
The Turing test is not a test at which one passes or fails. If it were, the first version of the test would imply that a woman failing to identify herself as such to the judge would not be a woman. As womanhood is exemplified by certain physical rather than mental characteristics, it would certainly be an odd test for womanhood. The purpose of the test should therefore be construed only as being to discern whether or not one can take part in it. On this basis, there is no concern over those that choose not to take part (unless choice is an inhibitor of capability).
Furthermore, there is no concern as to whether the second version participants should pretend to be women or not. In fact, even if it were not a requirement of the Turing test, as the standard reading claims, a judge can specify that he wants the participants to partake for a while in a game where one tries to prove they are a woman and the other attempts to hinder their progress. This may not be of benefit to the judge, but it is still possible.
The role of gender identification in version 1 gives the judge a responsibility that is not contingent upon intelligence – as gender plays no role in intelligence. The imitation requirement highlights the necessity not to rely on preconceptions of thinking things when specifying the role of the judge. Thus, the standard reading of the test is acceptable, but it must be made clear in applying it that the role of the judge is not to claim that one of the participants is not a thinking thing.
This may seem, to those that have argued over whether or not the test is a proof for intelligence, to render it worthless, or circular. But this is a backward way of perceiving the situation. It is because of this lack of differentiation that the Turing test bestows the benefit of avoiding the requirement to define “thinking”. It is because the judge must be free from making such decisions, and because he should be part of a society that can accept computing machines as thinking things, that the test can be applied (where those that claim the literal reading removes bias, it actually does not: but it does imply that the judge has no inherent bias).
The most important and frequently reiterated rules for the test are those governing the available methods of communication. They define the parameters within which the game can take place. For example in the final section of his paper, during which he discussed extra sensory perception and mind-reading (a passage that readers generally consider very odd), Turing specified that should such things be possible they must be guarded against in any implementation of the test. Presumably, this is because such abilities would enable the judge to know which player was which (assuming that only persons would have ESP – if not, the job of the judge may still be easier in that he may be able to tell if a player was lying). The problem with this is not that the game would be too simple, but that it would negate the game altogether. For example in version 1, to be able to see that one player is female does not just make the game easy, but makes it pointless. Thus the rules of communication must always be set in such a fashion as to ensure that the role of the judge (noting that his role is not that of identifying intelligence) is not rendered trivial.
Turing did not claim that the test should be applied to each and every machine, but that it should be seen as a suitable example if a machine that could pass it can be reasonably expected to exist. He did mention that machines could be given a rating based on their performance in the test, though such a rating system highlights the fact that the test does not actually prove a machine to be intelligent: the purpose of the test is to show that some machine with sufficient memory and complexity could pass the test, not that every machine must pass it – it cannot be a requirement of the test for the things taking part to have to prove their intelligence – as this is just the thing that cannot yet (perhaps ever) be proven. Thus it is the purpose of the test to specify parameters of communication under which intelligent things can interact, for a given purpose; in the case of the Turing test, that of imitating a human person communicating via teletype.
As Turing committed suicide only a few years after writing the paper in question, and as all records of his discussions of the test prior to his death do not seem to provide sufficient detail to appease opposing readers, I can offer no better proof for my claims. Instead, I will give examples of ways in which the test has been presented in various popular philosophical discussions, and offer suggestions for altering the application of the test. In so doing, it may be possible to overcome some of the apprehensions that a reader of the test may have.
As a final note on the applications of the Turing test as I have described it, I would claim that it is possible to fit Turing’s descriptions of the test with how I have presented it here. Perhaps Andrew Hodges (Hodges 1997), as something of a Turing biographer, would disagree – but I do not feel that my reading requires a misrepresentation of Turing’s writings, as Hodges has claimed of some of the alternative theories offered. I will return to the topic of making use of my application of the test at a later stage.
THE TURING TEST IN RELATION TO THE MIND / BODY PROBLEM
Since the widespread dissemination of the Turing test in the 1950s, it has become an icon in the realm of philosophy of mind. In particular some proponents of Strong AI would have the Turing test as a basis for defining success in the field of Artificial Intelligence in creating that for which the field is named. Discussion of such beliefs have been the catalyst for yet more of the most famous artefacts of modern philosophy, such as Searle’s Chinese Room argument (Searle 1980). The aim of such examples is usually to attempt to show that some thing that is intuitively not intelligent, e.g. Searle’s man in a room deciphering Chinese symbols or Ned Block’s “Blockhead” (Block 1981) referencing large datasets, could be claimed to be intelligent by virtue of meeting the requirements of (at least Strong, if not Weak) functionalist AI, by passing the standard version of the Turing test.
Considering Block’s paper as an example, he gives a typical account of the Turing test :
…. popular way of construing Turing’s proposal is as a version of operationalism. “Being intelligent” is defined as passing the Turing Test, if it is administered (or alternatively, a la Carnap: if a system is given the Turing Test, then it is intelligent if and only if it passes). Construed operationally, the Turing Test conception of intelligence shares with other forms of operationalism the flaw of stipulating that a certain measuring instrument (the Turing Test) is infallible. According to the operationalist interpretation of the Turing Test as a definition of intelligence, it is absurd to ask of a device that passes the Turing Test whether it is really intelligent, and it is equally absurd to ask of a device that fails it whether it failed for some extraneous reason, but is nonetheless intelligent. (Block 1981)
I have already detailed why the Turing test should not be regarded as a definition for intelligence, so I will not reiterate that here. However, Block goes on to specify other ways in which the test could be interpreted :
This difficulty can be avoided by going from the crude operationalist formulation to a familiar behavioral disposition formulation. On such a formulation, intelligence is identified not with the property of passing the test (if it is given), but rather with a behavioral disposition to pass the test (if it is given). On this behaviorist formulation, failing the Turing Test is not taken so seriously, since we can ask of a system that fails the test whether the failure really does indicate that the system lacks the disposition to pass the test. Further, passing the test is not conclusive evidence of a disposition to pass it, since, for example, the pass may have been accidental. (Block 1981)
And furthermore, to detail some of the problems that he perceives with such interpretations :
But the new formulation is nonetheless subject to deep difficulties. One obvious difficulty is its reliance on the discriminations of a human judge. Human judges may be able to discriminate too well–that is, they may be able to discriminate some genuinely intelligent machines from humans. Perhaps the responses of some intelligent machines will have a machinish style that a good human judge will be able to detect.
This problem could be avoided by altering the Turing Test so that the judge is not asked to say which is the machine, but rather is asked to say whether one or both of the respondents are, say, as intelligent as the average human. However, this modification introduces circularity, since “intelligence” is defined in terms of the judge’s judgments of intelligence. Further, even ignoring the circularity problem, the modification is futile, since the difficulty just crops up in a different form: perhaps human judges will tend chauvinistically to regard some genuinely intelligent machines as unintelligent because of their machinish style of thought. (Block 1981)
It is views like this that I find are somewhat backwards approaches. The last paragraph above mentions the introduction of circularity to the test – but such circularity is only introduced if one first believed that the test is an attempt to prove intelligence, or that the judge must in some way define the intelligence of the players, whereas neither is the case. Before discussing why, I will consider the behavioural disposition formulation of the test mentioned in the quote as it does show similarity with points I have made previously, in that it relaxes the requirement to actually pass the test. The behaviourist approach to these matters is an interesting one, in which the focus of the mind / body problem shifts from one of defining an object to be tested to defining the inputs and outputs required for such a test.
the ‘behaviorism’ of the [Turing Test] is put forward as the only empirical basis at our disposal, and those who would reject it in the case of the test are presumed guilty of employing a double standard with respect to human beings and computational artifacts. (Schweizer 1998)
… [the Turing test] .. places too great an emphasis on the anonymity of the subject … the point then becomes to fool someone into thinking that the machine is human. … The most natural way to overcome this limitation is to expand the repertoire of relevant behavior to include the full range of intelligent human activities. This will require that the artifact controls not simply a teletype system but rather a well crafted artificial body. (Schweizer 1998)
These quotes from Schweizer’s paper “The Truly Total Turing Test” describe the Total Turing Test (Harnad 1991), a behaviouristic extension of the test to include all the facets of human endeavour. Schweizer proposes going a further step in requiring the full history of the development of human intellect to be taken into consideration :
Even though we do not have a general theory of intelligence, we do have this extensive historical record of its exercise and expression in the case of human beings. If it cannot be demonstrated that the robot’s intellectual type is capable of these (or comparable) achievements, then its cognitive architecture simply does not pass the same behavioral test that ours has. I dub this long-term evolutionary criterion the ‘Truly Total Turing Test’ (TTTT), and take the test to incorporate the full range of behavioral data that must be satisfied, if we are to use the same empirical standards on another type of system that we implicitly utilize in the human case. (Schweizer 1998)
Whilst I would approve of such conceptual applications of the Turing test, there is still a sense in which such approaches consider intelligence as a property of some entity. Although it has become popular in many modern theories of various academic disciplines to favour networks over individual agents, for example interactionism or actor-network theory, such theories still attribute intelligence to some thing or group of things.
I do not intend here to argue against Block and his supporters or opponents, but to use the given example of such work, and the further example of behaviouristic approaches, to further highlight my claim for the Turing test: that it is not a test for the intelligence of a particular entity, but a method for defining interactions between things that could be considered intelligent.
Block highlights the enduring belief that the test is a test for intelligence or, if not that, then a test where the judge has some sort of responsibility or capability to declare a participant as intelligent. But this is not the case. And the example from Schweizer shows that even though the methodology of the test could be expanded to a far greater size, there is still an implied claim that a given entity could in some way be shown to be intelligent.
If the standard Turing test is considered useful for discussion of the mind / body problem, whether taken as proof for intelligent machines or even other things, or used as an example of a bad way to define intelligence, then there is a problem for the application of the Turing test as I have described it. My reading of the Turing test would have it that there is no claim made by the judge, or by the result of the test, regarding the intellect of the participants. Whilst this avoids claims of circularity as expressed by Block (because my judge requires no definition of intelligence to fulfil his duties), opponents to my reading may still claim that I have rendered the test pointless. Hence I should give some sort of outline for how it could be of any use, or there would be little justification for my reasoning.
USING THE TURING TEST
In summary :
-
The question of whether a machine can think requires a definition and a proof for intelligence.
-
The Turing test cannot be such a proof.
-
Imitation is important in the test, because it gives the judge a responsibility other than that of identifying or defining intelligent things.
-
The standard reading often overlooks this, and implies that the judge is in some way the arbiter of intelligence.
-
The participants in the test are intelligent by virtue of their ability to take part in the test.
-
Participants may be able to take part to a greater or lesser extent.
-
The range of the test can be altered.
-
The defining factor is the restriction on methods of communication.
I do not wish to give the impression that I think there is no mystery about consciousness. There is, for instance, something of a paradox connected with any attempt to localise it. But I do not think these mysteries necessarily need to be solved before we can answer the question with which we are concerned in this paper. (Turing 1950)
As Turing specified, the problem of consciousness (in terms of intelligence, for now) is irrelevant to the task at hand because it is not the aim to question the intelligence of the players – although the test could theoretically be used to grade intelligence, such a grade requires intelligence in the first place. Instead the test gives a particular method that can be used, in this case, as a definition for a machine meeting the intelligence of humans. But the scope can be varied for different purposes.
I believe that the Turing test demonstrates the solution to the paradox he described, in that whilst terms such as intelligence are often attributed to players (which are entities or things), this is merely a turn of phrase; it is the methods of communication involved that are well defined. Hence, there is no reason to believe that any thing is intelligent. I would argue that interaction is the intellect, and that facets of intelligence are just facets of interaction; that no thing can be shown to be intelligent, because it is impossible to define the location of an intelligent thing whilst verifying its’ intelligence.
Such a view may be extreme, and require many sacrifices. I will not argue the point at this juncture, but would state that I hold no allegiance to standards such as intentionality, causality and belief.
CONCLUSION
The debate over how to read Turing’s paper has been going on since his death. As no definitive answer has yet to be uncovered, and although evidence for the standard reading is very strong, it is now very unlikely that any unequivocal proof will be forthcoming. Despite this, the Turing test and the concepts embedded within it still provide a driving force for research into the mind and Artificial Intelligence, and whilst Turing’s predictions for thinking machines may not yet have come true, perhaps the cause for delay in such a discovery is less to do with technological development than it is to do with the need for a paradigmatic shift in the conceptualisation of intelligence, mind, cognition and similar terms.
Rather than arguing over the changeable records and interpretations of an immutable history, and of what dead people may have meant, we should abandon the relic of distinction between mind and body and the partisan beliefs it has generated.
REFERENCES
-
Block, N. (1981), ‘Psychologism and Behaviourism’, Philosophical Review 90 pp 5 – 43.
-
Copeland, B. J. (2000), ‘The Turing Test’, Minds and Machines 10, pp 519 – 39.
-
French, R. (2000), ‘The Turing Test: the First 50 Years’, Trends in Cognitive Sciences 4, pp 115 – 122.
-
Harnad, S. (1991), ‘Other Bodies, Other Minds: A Mahcine Incarnation of an Old Philosophical Problem’, Minds and Machines 1, pp 43 – 54.
-
Hodges, A. (1997), ‘Turing; a Natural Philosopher’, London: Phoenix.
-
Piccinini, G. (2000), ‘Turing’s Rules for the Imitation Game’, Minds and Machines vol. 10 issue 4, pp 573 – 582.
-
Piccinini, G. (2003), ‘Alan Turing and the Mathematical Objection’, Minds and Machines vol. 13 issue 1, pp 23 – 48.
-
Schweizer, P. (1998), ‘The Truly Total Turing Test’, Minds and Machines 8, pp 263 – 272.
-
Searle, J. R. (1980), ‘Minds, brains, and programs’, Behavioral and Brain Sciences 3 (3), pp 417 – 457.
-
Turing, A. M. (1950), ‘Computing machinery and intelligence’, Mind 59, pp 433 – 460.