Introduction

Ancient writings on papyrus are invaluable for our knowledge of history. The writings may be literary texts or official records with various dates and names, sometimes family records or letters, or lists of ownerships. But whatever they contain, they always offer us first-hand knowledge of the ancient times. Because of the unique and fragile nature of the writings, historians, papyrologists and archaeologists prefer to avoid unnecessary physical handling. So far, the primary method of recording has been photography. The current technology provides many new means to the specialists; there are new ways to obtain the images, to distribute them, and to process them. The new technologies are especially needed in cases like carbonized papyri, which is the topic of this report.

This work has been set forth to support the deciphering of recently found Petra scrolls. The Petra scrolls, found in December 1993 in Jordan, are typical examples of carbonized papyri, where the lampblack text is almost indistinguishable from the carbon black background. The conservation work was led by professor Jaakko Frösén, and took several months, being a very tedius job. It was finished by spring 1995. Professor Frösén has kindly provided us samples of similarly carbonized papyri fragments from another finding.

Our primary object in this project has been to find the best possible way of producing digitized images from carbonized papyrus fragments for further image processing, research and archiving The secondary object has been to optimize the photographic methods, which will probably continue to stay as one of the most important ways to record carbonized papyri.

Papyrus

Papyrus, manufactured from the papyrus plant (Cyperus papyrus), has been widely used in Egypt, Greece, Middle East countries and Roman Empire; the usage dates from ca. 3000 B.C. to the beginning of European Middle Ages. The material is quite durable, but, being organic, it degrades gradually unless kept in dark and micro-organism-free conditions, and at suitable humidity.

The ink used for writing was a composition of water and plant fluids with lampblack as pigment. In many findings the papyrus scrolls have been carbonized, whereupon the organic compounds have been charred out and only the most chemically stable material has been left. The process of carbonization is not as rapid and mechanically wearing as open combustion, and so the scrolls have had their chances to survive. As a result, carbonized papyri have preserved even better than undamaged!

Figure [1] shows the sample plate we used in our tests. The carbonized fragments of a papyrus scroll from Bubastos, Egypt, were kindly provided to us by professor J.Frösén from University of Helsinki. Most of the numerous tests were concentrated on sample no. 12 [2], in particular to the first three characters of one line of it [3]. The first figure has been scanned from a photograph. The figures 2-4 are directly scanned.

Figure 1. The conserved papyrus fragments on a 40x25cm plate (640x383).
Top row: Fragments 1,2,3,4,5,6,10.
Lower row: Fragments 7,9,11,12,13.

Figure 2: Fragment 12, 6.5x93 cm

Figure 3: The test image from fragment 12, 2.8x0.9 cm (640x208).

Figure 4: A detail from fragment 12, 0.9x0.9 cm (447x447).

Digitizing Images

There are several ways to digitize images. The main issues with digital images and image processing are spatial resolution and grayscale depth. Good spatial resolution is needed for statistical methods to be efficient, and sheer enlargening has proven to be very helpful to the human eye. This is because the human eye can not distinguish small gray scale differences of small objects. The smaller the grey scale difference the larger areas are needed. Good amount of different grayscales again makes it possible to analyze the dynamical properties - for example, noise and edges. Good dynamics is essential in finding the best contrast area from a picture. Without good enough contrast we simply don't have readable text.

The following sources for digital images have been tested more or less thoroughly:

Photograps; negatives or paper copies
Digital cameras
CCD cameras
Scanners
X-ray photographs

Photographs can be digitized with a variety equipment:

Slide scanners
Drum scanners
PhotoCD scanners
Flat scanners
Hand scanners
Video grabbers
Digital cameras

Flat scanners, hand scanners, video grabbers or digital cameras can be used to direct recording of sample plates.

The methods themselves are not directly comparable. They all have their own benefits and weaknesses, which we have tried to find out in our tests. Before judging any method one must be familiar enough with it. For example, for taking best possible photographs one must spend lots of time in experimenting, as the amount of variables to control is quite large. The object in these tests has been one plate of carbonized papyrus fragments shown in figure [1]. The plates themselves set certain limits; e.g. one cannot put a plate into a drum scanner, the glass plate effectively filters out long wavelength IR-radiation, and U.S.letter sized flat scanners cannot be easily used, because the plates are typically larger.

Image Processing

The black writing on a black, rough background of the carbonized papyri is a challenging image processing problem. There are several aspects that make it very difficult to extract the characters from background: minimal contrast between writing and background, messy background texture partly visible through the text, unclear character edges, noise, cracks in the material, etc. There is no simple feature that could be used as the perfect classifier. In principle, the utopistic result of processing should be a binary (black and white only) image showing only the characters, leaving the background out, and doing that with absolute certainty. In practice, that is hardly possible, but one can always enhance the images to a more readable form, and search for any features that could be extracted with image processing methods.

Image processing takes easily vast amounts of processor time even on the fastest computers. Our small test image consists of 131.120 pixels. A simple algorithm may require, say, a 5x5 matrix to be applied to every pixel. With maximum resolution, the size of the test image is almost 1.000.000 pixels. Complex algorithms may need larger matrices, derivatives, variances, sorting, heuristic methods and dozens of iterations to be applied to a single pixel; and they must be exhaustively tested to find a suitable set of boundaries and values for their variables. So, one should keep in mind the limits when searching for a good algorithm.

It should also be remembered that the image should not be modified too much and that by manipulating the images one can easily create artifacts that show up as parts of characters. It is also very easy to loose essential information. For best results one should combine human expertise and intelligence to powerful computational methods.

There are some basic methods such as histogram equalization which might be classified as 'non-altering' or 'non-manipulative' methods when not used to the extreme. Manipulated images should always be presented with the originals or with 'non-manipulated' images to maintain the credibility. At best, manipulations help to find new ways to see the images while the character recognition suits best for the human specialists. It is recommended to use two or more differently processed images of difficult-to-read objects.

Hypermedia

The papyrus scrolls and the context of the find itself contain many types of information. The context contains everything about how the find was related to its surroundings: exact location, environment, surrounding buildings, structures and other objects, the depth of the find, possibly what layers was above and below it. The context provides the base for deducing miscellaneous information about the find, one of the most important being the dating. The scrolls also contain miscellaneous information. They may contain dates, names, locations, lists, religious texts, descriptions of local life and many things about the culture that all can be related to the previously known or suggested understanding of the ancient world.

The publications of these scrolls [viite] discuss the writing and spelling itself, contain interpretations, notes and related information. Typically only parts of the whole scrolls can be saved, and one must take great care to keep the fragments in correct context; their relation to each other must be preserved.

The essence of hypermedia is the ability to naturally link different kinds of information together. The multi-layered information structure of the papyri writings and the whole context is inherently difficult to be contained in an ordinary document. A book can contain lots of links and lists to related information, typically to previuos documents and or other material not included in the current document. The physical format of a book is also set, and to follow links to other pages and back is somewhat restricting to the train of thought. Hypermedia offers at least a partial solution to free the format of a document. All pictures, descriptions and interpretations of current work can be linked freely to previous knowledge and background material. A good example of the possibilities of hypermedia concerning papyri is the Duke University Papyrus Archive (http://odyssey.lib.duke.edu/papyrus).

To benefit from the possibilities that hypermedia offers one should be able to form a link directly to the information needed. This can be done in an information network such as Internet where a unified standard of hypermedia language (html) is set, but it will be efficient only after very large amounts of data is made available by transferring it to databases and archives. It should now be set as a standard to provide all documents through Internet.

Back to Abstract

Back to Contents

Next: Previous Work

Antti Nurminen, 34044T, andy@cs.hut.fi