Chapter 6

VIRTUES AND DEFECTS OF THE RECORDS AND THE METHOD

6:1 The records.

A good deal has been said in this book in praise of the historical data for English communities in the past. Although it would be difficult to prove, the experience of one of the authors as a social anthropologist suggests that the amount we can learn about people such as Henry Abbott, a seventeenth-century villager, compares favourably with what a social anthropologist can gather about members of a community in which he resides. Naturally, the data is different in many ways, but in sheer quality and quantity it would be difficult to show that, except with particular informants, the anthropologist can learn more. Furthermore, it would seem likely that we can learn more about individuals living some three hundred years ago than we can find out from written records concerning contemporary inhabitants of modern England. If we take into account the difficulties caused by rules of secrecy and the high geographical mobility of modern populations, it is difficult to see how we could learn as much about people still living as about those dead for several centuries, at least in the absence of full-scale anthropological investigation. The other major difference is the sheer size of the sample. It is seldom possible for one social investigator to learn about more than a few dozen, or perhaps, at the outside, a few hundred, human individuals, in any real detail. Even these individuals are only observed for one or two years of their lives. The historical material enables us to trace thousands of individuals, rather than tens, and to follow them through their whole lives in a number of cases. Both in terms of quality and quantity there is much for a modern sociologist or social anthropologist to envy, though a historian will also envy the sociologist's ability to ask questions, to create his own data.

Yet with all its virtues, the particular material we have been considering has a number of serious defects and it would be unrealistic not to consider them. They can be listed briefly, though each is a serious hindrance. one problem is the archival and technical one of record loss. Even the best documented parish will have large gaps in most sets of records. Though one of the advantages of multi-source work is that at least it makes it possible to gain some idea of the dimensions of the loss, and to cast some thin bridges across the chasm, the holes remain. For example, for Earls Colne, the loss of the burial register for the years 1590-1610 and the loss of the original court rolls for some of the fifteenth and sixteenth centuries are each a cruel blow. At the least, the social anthropologist carries his most important data in his head, but the fragility of the past is constantly made obvious by the disappearance of documents we know once existed.

Another problem is the various ambiguities in the records themselves. Often the ambiguity lies in the eye of the beholder; as frequently stated, we do not as yet know what many of the documents really mean and hence cannot use them with confidence. This can often be resolved, we may not know for a while whether appearance in a Hearth Tax means that a person is living in a place or merely owns a house, but there are various tests which make it possible to find an answer. Much more difficult is the problem of the extent to which documents mean what they say. A notorious example is the whole area of 'legal fictions', whereby a completely fictitious account of an event that did not occur is devised in order to get round a legal difficulty, as in the case of common recoveries or riot. In fact, all legal records pose enormous problems. Since so much is at stake in many court proceedings, it is often almost impossible to be sure to what extent what is described as happening reflects any kind of objective reality. To use such records for statistical or other purposes is, therefore, extremely difficult. At least a social anthropologist has some faith in the evidence of his own eyes, but his caution with informants' statements needs to be shared, if not doubled, by the historian. These warnings concerning the untrustworthiness of documents are, of course, frequently made.

A separate problem is that some of the more complex documents, for example wills or manorial transfers, are themselves ambiguous. The English language, like any other, has a considerable capacity for ambiguity and it is often quite impossible to decide what a sentence means. For example, if punctuation is not used properly, it may be impossible to decide whether, in the statement 'John son of John the blacksmith', the occupation refers to the father or son. This problem of ambiguity, as well as the problem of meaning, is often partly masked from the investigator until he tries to break down documents in order to feed them into an indexing system, either manual or computerized.

Another defect in the data is that it comes in a set of discrete records and before it can be used for many purposes these separate documents need to be linked or matched. An investigator studying A contemporary community will probably have little difficulty in deciding whether two pieces of information relate to the same or different individuals, but it is often much more difficult to do so when there are thousands of small references to people in the past. Names of one individual are often spelt in different ways; there are frequently two or more people of the same name living in a community; the information is sometimes vague; the description of lands and houses often omits names altogether. Considerable thought therefore has to be given to the problems concerning record linkage. Even with much care, it is not possible to identify unambiguously all the individuals or other items mentioned in records, and hence it is impossible to link them all together. Consequently, there is always likely to be ambiguity in the final indexes.

The question of how to link records has received considerable attention and there are now surveys of the methods and literature (Wrigley 1973). In relation to the methods described in this book all that can really confidently be said at the moment is that record linkage by hand does seem to be possible, as the various experiments described in chapters 4 and 5 have shown. Whether it will be possible to achieve as good results using a computer must await a later discussion.

A further defect in the data is that it-is almost all at the level of behaviour, describing events in the past, rather than at the normative or cognitive level. We have a very large amount of information about how people behaved and interacted, but know far too little about what they thought or said they were doing. This means that we can generate very large amounts of statistical information, but the reasons why people behaved in certain patterned ways can only be guessed at. This is a curious reversal of the position of contemporary investigators, who often have a plethora of data at the normative level - people's comments on how one ought to behave, how people are thought to behave, the reasons why people are thought to behave in these ways - but rather little information about how they actually do behave. Thus investigators are forced to infer the statistical level from the normative data, whereas with material of the kind we have been describing we have to deduce the patterns of motivation from the patterns of action. Both types of inference need to be made explicit, for they contain many concealed biases. Theories as to why various patterns and rates occur in our sample population will have to be imported from outside the data base.

It will be obvious that the material we have been discussing represents only a tiny fraction of the past. There are huge areas which are of interest to us and were of importance to those who lived in previous centuries, that are completely omitted in the records. Until we step back from a community study for a moment, we may forget that civil wars, scientific revolutions, the collapse of the established Church, and even such locally important phenomena as the weather or localized disease may leave no obvious and direct trace in the records we have been considering. The topics which never occur in such records are far more numerous than those which do, and encompass most of what is important to us and to our ancestors. Using such records one gains only a very partial picture of some very delimited areas of the past. This may be vividly illustrated, for example, if we compare the account of village life we would obtain from village records with the account which, by chance, we have for seventeenth-century Earls Colne in the shape of the diary kept by the resident clergyman. This diary provides a picture of a world of religious turmoils, political activity, daily disease and illness, which is almost totally absent in the conventional records (Macfarlane 1970b, 1976a).

Another weakness of the data arises from the fact that, in reality, the geographical demarcation of a community is artificial. On the one hand, we are aware that people were highly mobile and consequently we often obtain only a partial description of any single life cycle. People move past our bathescope window and then disappear into the gloom. A second feature of this boundedness, connected to high geographical movement; in English communities as far back as records exist, is the fact that economically, socially, intellectually, and in every other way these parishes were not isolated. Ideas, food, government, kinship sentiments, all overflowed the parish boundaries. Although we may make efforts to follow some of these chains outside the delimited area, we are bound to oversimplify and impoverish the past the moment we Adopt the 'community study' approach. The defects have been elaborated at some length in the introduction, where the reasons why both the nature of the data and the need for a finite amount of information forces one into adopting such an approach. There is therefore no need to go over the ground again except to say that the intensive analysis of the historical material, using the methods sketched in above, does tend to give a spurious sense of 1community' boundaries which has to be consciously guarded against.

A final bias which may be noted in the data is the discrimination against certain groups in the population. Particular categories, either because of their age, sex, occupation, wealth, or mobility, tend to be less well documented. The most conspicuous examples are women, and servants; but children, the poor and others are also less well recorded. Anthropologists often find that certain sections of the-population with whom they live force themselves on their attention; they are easier to approach and easier to study. The same is true with the historical data. Although it is hoped that we will no longer suffer under the illusion that large sectors of the population in the past will always remain totally invisible, it is true that in relation to English historical communities, at least, even in the best recorded periods, it is the wealthy and males who crowd onto the stage.

All these defects in the data do not, we believe, invalidate the general approach. They do suggest that the community study approach described here is severely limited. It is one tool, among others, and not an end in itself. At present it offers hope of probing into areas which previous generations have thought were closed for ever. It provides large quantities of data of an unrivalled kind for literate societies stretching over long periods of time. With this material, sociologists, demographers, economists, biologists and others may test out some of their hypotheses. Yet, in the end, it will not, by itself, solve any problems except in combination with the numerous other sets of data and disciplines which are relevant to the study of man.

6:2 General comments on manual analysis.

It would seem that many interesting areas for investigation are closed to us if we do not unde.rtak6 an extensive restructuring of the raw historical data. To this extent, the method outlined above can be seen as an advance on current techniques of community analysis. But there are constraints of both a practical and theoretical kind in such a hand analysis. These suggest that a complementary analysis using computers would be worthwhile.

The first constraint is the practical one of the time it takes to construct such a hand-indexing system. Earlier we quoted an estimate that for a parish of 1,000 persons over a period of 300 years it would take approximately 1,500 man-hours-to undertake a full 'family reconstitution', based on parish registers alone. The description of the nature and quality of the data available has suggested that in terms of names, perhaps one-tenth of those found in all the local records come from parish registers. Furthermore, the entries are simple and are easily transcribed and indexed. Taking everything into account we might guess that the information they include constitutes less than a twentieth of all that contained. If this is correct, it would seem likely that in order to bring a 'total reconstitution' up to the same level as that desirable for a family reconstitution, in other words to have all possible individuals, pieces of property, families and other features linked together and ready for subsequent analysis, would take anything up to twenty-times as long. If-a single researcher worked for 30 hours a week, 50 weeks a year, it would still take 20 years to carry out such a task, even for a moderately small parish. Even a team of 5 persons would have to work for 4 years. Such lengthy research and analysis, which has to be done before the real historical work of substantive investigation can begin, is clearly out of the question in terms of the way research in social history is currently organized. It would also make it quite impossible to undertake research on a region rather than a parish, though the region is often a better unit for many social and economic questions.

In general, it would seem that the creation of the original indexes of unlinked persons and pieces of land would take about half as long by machine as it would by hand. If we make a rough estimate that it would take about 6 man-years to create the indexes for a parish of 1,000 persons for a period of 400 years, then it would take perhaps 3 years to process by computer. As yet, there would be no great saving. In terms of sheer processing of the data, rather than searching, the saving would come when further indexes were needed. For example, if an index by occupation were required, there would be practically no extra time needed if the data were in the machine already, whereas it might take months to create such an index by hand. A number of such more refined indexes will be needed in order to make the material really accessible even by hand. One of these indexes, which is crucial for many types of work, is an index by individuals, rather than by sets of persons who have the same forename and surname. This can only be produced on the basis of record linkage or individual reconstitution. It is this stage of deciding whether two pieces of information refer to the same or different persons which takes at least one-third of the total time in the preparation of data. To do this by hand for a file the size of the one which we have been discussing would probably add another 3 or 4 years onto the task. If it were possible to use the computer to link the records, even if it could only do this for 80% or 90% of the easier cases, would thus save perhaps 3 years work per community study. Taking all this into account, we might guess that such a project would require between 10 and 20 man-years from start to finish by hand, or about 3 to 5 years (assuming that the computer program were available) by machine. All these stages, of course, are only preparatory. They only order the material so that interesting questions can be asked. Searching the data for answers to specific questions is the second area where we can estimate the limitations of a manual-indexing system.

The creation of hand indexes along the lines suggested above makes the searching of data in order to answer questions much faster. It is possible to follow logical chains fairly swiftly. Yet in terms of modern methods of data retrieval, searching is still painfully slow. The historian searches his files in two ways, sequentially and randomly. In the former case, for example, he may want to search through a parish register in order to lift out bastard births, or through a set of inventories in order to see if there is any correlation between the occupation of an inventory-maker and the presence of certain implements in the inventory. Using the files we have created, it would take approximately four hours to search sequentially through the typed parish register for Earls Colne from 1558-1837 in order to look for registered bastards or other information, even if one were reading very rapidly. The IBM 370/165 computer at Cambridge would take approximately six seconds to search the approximately half million characters (letters or numbers or spaces) involved. To search through and copy out the appearances of a certain tool in the roughly 2,000 inventories for the parish of Kirkby Lonsdale would take about twelve hours, assuming that the tool was mentioned on average once in every 10 inventories and that the inventories had already been typed to make them more legible. To search the same file and copy out the same entries would take the computer about thirty seconds in 'real time'. The-central processing time is, of course, only a tiny fraction of the total time the job would take on the machine. Realistic estimates would have to include the time taken to write the program to do the search, fetch the results from the lineprinter and so on. Such considerations mean that it is not usually worth undertaking a sequential search with the computer if it would take under an hour to do by hand. The economies of scale, however, are enormous. For example, to search through the whole name index for some missing cards by hand might take about thirty hours or more. To write the program and locate such cards on the machine, (though they would not, of course, have been lost in the first place), would take well under an hour. The actual time taken by the computer to search the whole file of over 100,000 entries would be under twelve seconds.

One variant of this sequential search frequently employed by historians is the case where two or three variables need to be plotted over time in order to see if there is any correlation between them, perhaps in the form of a moving graph. For example, one might want to test the hypothesis that birth and death and marriage rates are somehow interconnected over long stretches of time. The best way to test this might be to produce a graph based on. aggregated figures from the parish register. To produce such a graph, lifting out each fact one by one by hand, then adding the numbers into totals, then drawing a graph, might take one or two weeks to do for a parish of 1,000 for a period of.300 years. To write the program and execute it, once the data was in the machine, would not take more than a morning using a computer. If one had added up the figures by months in the original hand calculation and it was then discovered that the figures were needed week by week, it is quite likely that the whole process would need to be repeated. Another week or two would be needed and the effect on the researcher's enthusiasm would be considerable. If the search had been done by the computer, a very slight modification of the original program, perhaps taking an hour, would make it possible to run the program again to obtain the new results. This discussion of sequential searching merely illustrates the very obvious fact that in terms of sheer speed, a fast reader can read about 2,000 characters a minute. In the same period of time, a large computer can read 5,000,000 characters. In other words, it is over a thousand times faster than the human. Since the historian is likely to want to extract considerable amounts of information from various records in order to ponder over them, the fact that the computer can type at over 1,000 (long) lines per minute, or about two hundred times faster than a very good human typist, is also of relevance.

The most interesting searches through files are not sequential but 'random'. In such cases, the investigator is following a logical chain which cuts across the way in which even the best of indexes is organized. For example, if one is interested in following kinship links, or in discovering who lived in' a certain street at a certain time, or how old various people were who happened to die in a particular epidemic, it is necessary to jump around between different kinds of files. Even to answer fairly simple questions, it is often necessary to leap into various different sets of files in order to find the relevant 'fact'. Some examples of how long it takes to perform such random searching by hand may be given. To build up the family tree and the pen portraits of the Abbott family used in chapter 4 took about 50 man-hours of work, including many random searches. It is unlikely that the computer could ever do anything approaching the complexity of this task from beginning to end. Yet it could probably have printed out all the records in which the Abbotts are mentioned, and many background features of the family, in a few seconds. The difference is again a reflection of the relative speed of manual and computer searching. To find something by hand in even the best organized of filing systems takes a surprisingly long time. As an experiment, some 60 'random' searches were made by hand, looking for various different kinds of information The average time per search was about sixty seconds. The average time per random search using the indexed sequential system developed for our project at Cambridge is .2 of a second. This means that, in practice, a researcher has to think very hard indeed before he decides to undertake an analysis of any problem which will involve more than about 1,000 random searches. With a search time of one minute per fact, it will take a couple of days merely to find the data, let alone copy it out or think about it, when over 1,000 random facts are needed. For example, to conduct a search which involved correlating dates of death with acquisition of landholding by sons, if there were 1,000 cases and each involved 10 random searches, would take 166 man-hours by hand, even if we allowed only one minute to both find and copy each piece of information. To write the program, run it and obtain the results, should not take more than a day using the computer.

Since there are almost always hitches in running a program and there is the time spent in writing it, it is clear that it is often best to undertake small pilot studies by hand. For this reason alone it is absolutely essential to have a complementary manual system, even if this is partly or wholly produced by the computer. It is necessary because many searches are too complex for anything but the human brain to perform, with its enormous wealth of knowledge about the way other humans behave, the historical context of the documents, and many other sets of implicit information. Yet it would also seem sensible to be able to use both the sorting and searching power of the computer as a tool in historical analysis. Certain tasks are best done by hand, others are quite impracticable without a computer. Yet it is one thing to dream about what a computer might do. It is quite another to be able to convert the records we have illustrated and analysed in this book into a form where they can be sorted and searched by the machine, without losing their complexity. A method of doing this will be described in a sequel to this work.