Second Part
0:07:09 I did not have a scholarship so I had to pay my own way; I worked as a crayfisherman, a weighbridge officer, an assistant electrician, in an auction room, as a quantity surveyor, so had a string of experiences in different types of jobs; by this means I paid most of my way through university; at the same time my relationship with my wife went from strength to strength and we are still here; after graduating we decided that we wanted to leave Western Australia and go back to Europe; before doing so, with a degree in mathematics, felt I needed a meal ticket; I therefore enrolled for a year in a diploma course on numerical analysis and automatic computing; thus I retrained as a computing person; surprisingly I discovered that I liked it; my wife, also with a degree in mathematics, took a diploma of education; we got on a boat with a number of other graduates and came to England; before setting off I wrote to a number of places in England with computing centres asking for a job; I did not come here with the intention to do a PhD; a letter was sent to the Computing Service in Cambridge; Eric Mutch was the head of the service but he had no job to offer but he contacted Nick Jardine; they thought they might be able to offer me a job as a programmer; I arrived in 1969 and was interviewed by Nick Jardine in King's; he said he could give me part-time employment but that the Computer Lab could also offer some work; Nick had a project with Robin Sibson, who was also in King's at that time; they had just finished their PhDs on automatic classification in the biological sciences; Eric Mutch died soon after I arrived, but I continued in the lab working on a sub-routine library in numerical analysis which ran for many years on TITAN; after working for about nine months I told Nick that I was getting very bored; he suggested I do a PhD; the idea I had for it grew out of the work I was doing for Nick and Robin in the King's College Research Centre, which was to try and use automatic classification techniques in information retrieval; I had an interest in information retrieval even before I left Australia so had already read about it when I arrived in Cambridge; I put these things together and this was how I got to know Karen Spärck Jones; King's College was going to have me as a research student and subsidise my fees; I remember assuming that I could do a PhD in information retrieval providing I was not supervised by Karen Spärck Jones; Ken Moody who at that stage had an interest in databases took me on; I think that was a stroke of genius because I collaborated with Karen all along but it was not a student-supervisor arrangement; she was generous in giving me access to the test data she had but was not my supervisor; I also got a lot of help from Roger Needham
9:06:08 The history of information retrieval in Cambridge is quite interesting; Roger Needham, a famous computer scientist in a totally different area, did his PhD in information retrieval; I suspect that it was the first PhD ever done on that subject; he was supervised by David Wheeler who is one of the pioneers, and worked with Maurice Wilkes; there was a progression David to Roger who, to a certain extent, helped me; Nick and Robin had conversations with Roger with their work in automatic classification because he and Karen worked together in the Cambridge Language Research Unit on the theory of clumps; I think that Karen got her initial ideas about information retrieval probably from Roger; Karen came in as a linguist where she had done her PhD work, which apparently, even now, is still read and considered to be very good; it was on synonyms; her thesis was republished recently under pressure from Yorick Wilks; he was also a member of the Cambridge Language Research Unit, which also had Fred Parker-Rhodes, who may also have done some work on information retrieval, and also Margaret Masterman; the group that I knew worked on language and linguistics and shaded into stuff on information retrieval; when I was starting to work for Nick and Robin there was this huge intellectual disagreement between them and the Needhams because the clumping process was considered to be very unprincipled, whereas the approach adopted by Jardine and Sibson, later Jardine & Sibson 'Mathematical Taxonomy', was considered very principled and mathematically well defined; as it turned out the algorithms or methods of automatic classification that came out of their work was relatively efficient, whereas clumping was horribly inefficient; what happened in the end was that the Jardine-Sibson work that turned into my work too, survived, whereas the clumping didn't; I was caught between these two groups but I had my office in the Research Centre where at the same time Denis Mollison, a probabilist, was also working; his application was pandemics so had mathematical models for the current flu pandemic; he became Professor of Statistics at Heriot-Watt and was on the fringes of our project; the other person who was quite involved with the project was Ken Moody; I don't know whether he was officially written into the project proposal but he was certainly acting as a consultant; he designed the original algorithm for a sequence of cluster methods called BK, the core of which was based on his work; as a supervisor he took an interest in what I was doing when it really mattered; I was an appalling writer to begin with, it was so condensed that if I was to write the same stuff now, one paragraph would take several pages; Ken got me to think about how to write; I went on to do my PhD thesis on automatic classification techniques; I tested them to show that if used in the way that I did, at least on the data that I had, it showed that you could get major performance increase; on the way I also invented an evaluation measure which is used to this day, in fact is used very widely in speech recognition, so was adopted by another field as a measure of retrieval performance; it was some of the theoretical work that I really enjoyed doing; I went back to some theory called the theory of measurement that suggests that if you want to measure something where the objects are in a qualitative relationship, what you have to do is to define a mapping of these to a numerical representation where properties of the qualitative structure are preserved; I took that approach to measuring performance in information retrieval; I wrote down the intuitive conditions or relationships that were important in IR and the ones that you wanted to measure, and then defined this mapping; I came up with this new way of measuring things and it has persisted; it is still used in information retrieval and also in speech processing; so automatic classification, the method and the algorithms for IR and this evaluation technique were really the guts of my thesis; Karen, who really was not mathematical on the whole, helped me with the linguistic stuff; she was the one who explained to me about stemming and stop words, and the way it was driven by some background in linguistics; she had also built up some test collections and she allowed me to use them; our interactions were mostly around the experimental side of information retrieval
19:02:06 While I was doing my PhD, my wife taught at a number of schools, especially Impington Village College, and after a difficult first six months in Cambridge we ended up being happy here; Maurice Wilkes only entered my life towards the end of my PhD; he really didn't see the point of information retrieval; I thought that information retrieval should be a subject in the computer science curriculum and he never really allowed that; I think Karen half agreed but was ambiguous about whether she wanted it as part of a computer science degree; she had quite an interesting attitude to IR; she saw it very much as a post graduate activity but then she took very few PhD students; she had one, David Jackson, who preceded me, and after that there was a huge gap; what she did do was to supervise students in natural language processing; she did supervise Martin Porter but in macro processors; there was a language called Snowbol, and he designed and invented a comparable language; he was a superb software engineer; Martin came into our lives when I came back again from Australia; after I took my PhD I was head-hunted by the Professor of Computer Science at Monash University in Melbourne; his name was Chris Wallace and he worked on automatic classification; he was a good, able, academic, but it was different from the automatic classification work that people did here; we went out to Monash and there our daughter, Nicola, was born five months after we arrived; however, as a lecturer in computer science I got very bored; I had nobody to talk to really about information retrieval; I had the idea of talking to myself and wrote a book about it; the book did well and is still used to this day; half-way through my contractual period I returned to Cambridge to try and figure out if there was a way that I could come back; Karen suggested I apply for a Royal Society Research Fellowship which I did; I got it, so after two and a half years in Monash, I came back to Cambridge and bought a house
24:56:16 Academically there was a development going on in information retrieval which turned out to be extremely significant, in fact theoretically probably one of the most important in the subject; there are various models for information retrieval; the standard one, probably the oldest, what we call a vector space model, researched by Gerard Salton at Cornell; the sense was that there was enough uncertainty in information retrieval processes that we had to use probabilistic approaches; the first major development in that area was a thesis written by an academic, Bill Miller, at Newcastle University; he invented a new model - probabilistic retrieval as it is now called; the trouble was that it was only half a model; Karen working with Stephen Robertson but he was in danger of not having a job; we worked out a way of employing him as a research assistant; on that project we also employed Martin Porter as another research assistant; he was working on probabilistic models for our project so the suffix stripping algorithm, called the Porter Algorithm, was invented on the project; I gave him the task of building one and he did a fantastic job; he went and read all the literature on such algorithms on the linguistic side and computing side, then he put it all together and produced this algorithm; it is still used; any information retrieval experiment that is done, even with commercial systems, they tend to use the Porter Algorithm; Martin was not really a researcher, more of a developer, writing the software and building a system; he and Karen did not get on so one of the problems we had on the project was that every so often we would need something from Karen and then she would give him a hard time; he was not good at coping with that; Karen was very forthright and Martin didn't like to be hassled, so got very upset; Stephen, who was going to be on the project, at the last moment got a job so he became the co-investigator with me, so we employed Martin; I think he is a brilliant software engineer, one of the best I know in terms of getting things done; I was on the fringes of Muscat; he took an earlier system that he had written for the Museum Documentation people and extended it, as I saw it, with some information retrieval functionality; some of that he probably got from working with myself earlier on
31:24:10 My relationship with Maurice Wilkes got better when I became interested in Alan Turing; Maurice is very aware of his position in the history of computer science, and rightly so; he is one of the founders and pioneers, and certainly you can make the argument that, with David Wheeler etc., he built the first real computer, the EDSAC; Alan Turing, especially at the moment, but certainly towards the end of the 1970s, became more well known for his work; at that time the Scientific Archives in Oxford rang King's and said they had papers by a man called Turing who seemed to be originally from King's; they asked if King's could send someone to say whether they were worth preserving; the Librarian, Peter Crofts, asked me to go and look at them; like everybody else I had heard of a Turing Machine but I knew little more than that about him; I saw the papers and suggested they were brought to King's; progressively King's got more and more serious about them and I later got some money to help with them; at that time most people would not have heard of him but I talked to Maurice Wilkes at some time and he clearly felt that Turing was being given too much prominence in the history of things; you can see that from his point of view, Alan Turing was a theoretician and wrote this incredibly smart paper in the 1930s on the Entscheidungsproblem; he spent the War at Bletchley Park involved in the design of computers - (not the Colossus, that had nothing to do with him) ; after the war, when working at the NPL, he wrote the ACE Report, a blueprint for designing a computer which was in contrast to a similar report written by John von Neumann which was the EDVAC Report; the EDVAC was the first complete design for a computer and Turing's came a little later; Wilkes, on the other hand, started to think about building EDSAC and got his information from the same sources as von Neumann and Turing; Eckert and Mauchly who gave courses in the Moore School of Electrical Engineering, University of Pennsylvania, were the source of new ideas about a modern computer; Wilkes's attitude was that he was just going to build it, and as an engineer that is what he did; he tried to collaborate with Turing but they didn't like each other; the EDSAC was built and was a success; the ACE machine was only built very much later in the 1950s, so its impact on the design of the next generation of machines was not very significant; however, I think that Maurice sees Turing as unjustifiably getting more support than he ought to because his work was theoretical and as far as building the computer, his engineering skills were not great; nowadays everybody has heard of Turing because of the books and films which Wilkes would feel overstates what he actually accomplished; I am a great admirer of Turing but more for his theoretical work; I think his thoughts on the Turing Machine were an incredible breakthrough but his contribution to the actual building of computers is nowhere near as significant as Maurice's are; Winston Churchill said that we were saved from a German invasion by him but in a letter to him, I think Churchill said that it had shortened the War by about a year; Turing started work on encrypting using his theoretical knowledge, so the bombe, the mechanical devise to do the decoding was based on the theoretical ideas Turing had at Bletchley Park; after that early period he was not working on that any more; he got moved onto speech encryption; the work that led to the Colossus was independent, and Turing was only on the edge of that; his story was very sad; he moved to Manchester and worked in the computer laboratory there; he was forced to undergo drug treatment because of his homosexuality; the book his mother wrote about him is a delight to read; I have interviewed Maurice about Turing as I wanted to know whether he had read the ACE Report about the design of a modern computer; Maurice claims he did not read it and worked independently of Turing
42:45:17 I was at Monash at the time that Stephen and Karen were working on probability; I also made the shift in my mind and had a student build a probabilistic search engine as a student project; my approach to it was slightly different; I took a decision theoretic approach so developed some sort of decision theory, so when Karen and Stephen produced their draft paper and sent it to me, I reworked it in decision theoretic terms which is I still think a good way to do it; there was a kind of debate going on between the three of us about how to formulate it; some people would say that the probabilistic model (which I wrote about in the second edition of my book) was actually developed by the three of us; however, if I am honest I certainly think that I probably had the same ideas to the same extent; the paper that was published by Karen and Stephen is the first publication of that work although it has a subset in it which is on the decision theoretic approach; in the other way that I was in the thick of it was that I started building, together with Martin, implementations of it; for example, the relevance feedback comes out of that model; I was much more interested to get it working at that stage than to define the theory; like a lot of these things, it is very difficult to pinpoint where the breakthroughs were made; there was this earlier thesis by the Bill Miller from Newcastle, who actually ended up in Glasgow University, who had a pretty good formal development of it, but he didn't seriously take into account the non-occurrence of things, thus it was half a model; it was completed in the paper by Karen and Stephen; I should have had my name on that, but I didn't
47:21:15 I don't have great thoughts sitting at a computer terminal; to this day I still hand-write important stuff first and not at home; I am one of these people who actually works in cafes; most of my papers were drafted by hand in a notebook in a cafe somewhere; I worked for a long time in the old 'Copper Kettle', and I wasn't the only person that did; Green, who contributed much to string theory, was working in another corner, though at that time I had no idea who he was; I used to get ideas in the bath and when cycling; it would be unusual for me to sort something out sitting at my desk staring at a blank piece of paper or a screen; I can't sit still for very long either; if I work for an hour and a half I then have to go for a walk anyway, I am a restless researcher
50:35:03 I enjoy music, particularly singing and jazz; my background meant that none of my siblings or parents played any musical instrument; until the age of sixteen there was not a record player in the house; my mother and father were pretty much a-musical; my brother liked to play the mouth organ but was exposed in the same way to not having any music in the house; I only discovered music gradually, I absolutely adore opera and am a great fan of jazz, especially blues; I use music as a way of relaxing; I used to take a bath to do that; now if I come home shattered, I put on a piece of music - recently Handel's 'Julius Caesar’ - and I just listen to the whole opera; my wife and I share the same taste in music so wherever we go we try to go to performances as I also enjoy the staging; I also read a huge amount of fiction both in English and in Dutch; again it is a way of escaping from the technical stuff that I do
55:09:17 I have supervised about thirty PhD students and about 60% of them are now full professors around the world; in my supervisions what I try to do is to engineer one thing; I want them to have the central idea themselves, they should come up with it if the possibly can, although I might stimulate it, but then I want them to end up owning it; then when I start to argue with them they show clear signs that it is their idea; once they have reached that point then they are well on the way; how do you do that? I say that they should follow their noses, and choose to do something they are interested in and want to do, not something that they were just told to do; it seems to work although I have had students where it has not, and generally they don't do as well in terms of a career afterwards; I believe in the academic way of life and have enjoyed it immensely; the thing that I have resented in the last ten to fifteen years is the extent to which the bureaucracy in universities have started to drive things; they create circumstances which academics have to respond to which are basically just stopping you from doing the intellectual work; it is not that I feel that academics should just do intellectual work, but the burden of doing the non-intellectual work, dealing with either national or local bureaucracy, has become rather over-excessive and I think very sad; my daughter is an academic, a post doc in neuroscience, and I can see that the pressure is there already on her to cope with bureaucratic excesses