It also includes an explanation of the images, definitions, French - English - Arabic , a glossary of terms of quality Arabic - French , symbols and abbreviations. Convert currency. Add to Basket.
Soft cover. Condition: New. Seller Inventory afSRjkbgjhg. More information about this seller Contact this seller. Never used!. Seller Inventory P Francois Deroche.
This specific ISBN edition is currently not available. View all copies of this ISBN edition:. The age of a historical manuscript can be an invaluable source of information for paleographers and historians. The process of automatic manuscript age detection has inherent complexities, which are compounded by the lack of suitable datasets for algorithm testing. This paper presents a dataset of historical handwritten Arabic manuscripts designed specifically to test state-of-the-art authorship and age detection algorithms.
Qatar National Library has been the main source of manuscripts for this dataset while the remaining manuscripts are open source. The dataset consists of over images taken from various handwritten Arabic manuscripts spanning fourteen centuries. In addition, a sparse representation-based approach for dating historical Arabic manuscript is also proposed. There is lack of existing datasets that provide reliable writing date and author identity as metadata. KERTAS is a new dataset of historical documents that can help researchers, historians and paleographers to automatically date Arabic manuscripts more accurately and efficiently.
Islamic civilization contributed significantly to modern civilization; the period from the 8th to 14th century is known as the Islamic golden age of knowledge. This period marked an era in history when culture and knowledge thrived in the Middle East, Africa, Asia and parts of Europe. Arabic was the language of science and the Arab world was the center of knowledge [ 1 ]. Millions of Arabic manuscripts from that era on a wide variety of topics are scattered in different collections across the world. Many efforts have been made by numerous contributors to preserve this valuable heritage. Unfortunately, due to physical degradation of the paper and the ink, processing and studying these documents has proven to be a challenging process.
Consequently, these documents are actively being digitized to preserve them. Historians and paleographers are encouraged to work with these digitized versions of the manuscripts. These digital copies are very attractive to researchers because they allow quick and easy access to these historical manuscripts, which in turn provides a way to evaluate, analyze and research these documents without physically handling the delicate and precious works.
The publication or writing date of a historical manuscript has always been important for historians. It can help them understand the sub-textual context of the document and also help in understanding the cultural and historical references that are presented in the text. Knowing when the manuscript was written can also help researchers catalogue and categorize historical documents more accurately and efficiently.
Islamic Codicology at Cambridge University Library
Traditionally, historians and paleographers have used invasive methods such as identifying the texture and composition of the paper or components used to make the ink to estimate the age of the document [ 2 ]. Some even try to find clues such as dates of historical events within the written content as well as the handwriting and punctuation in order to find the age of the document [ 3 ]. A few researchers have also studied ornamentation and watermarks in the documents in order to determine the age of these manuscripts [ 4 ]. As mentioned earlier, a large number of ancient manuscripts have been scanned and digitized by libraries and museums.
These scanned images have enticed the pattern recognition community as a whole and image processing researchers in particular to try and solve the problem of document age detection using noninvasive techniques [ 5 ]. Classifying ancient documents based on writing styles is one of the techniques used to date these documents. System for paleographic Inspection SPI [ 6 ] is one of the earliest researches that employs writing style-based techniques for ancient documents dating. SPI uses tangent distance and statistical based algorithms to build models of all characters. Afterward, SPI uses the models to measure similarity of the letters in their dataset with the letters of the tested document.
Moreover, He et al. Alternative research on dating ancient manuscript [ 8 ], suggests using histogram of orientation of strokes as a feature descriptor to represent the image documents. The descriptor is later sent to self-organizing map clustering system to match the image with a date label. Similarly, Wahlberg et al. Whereas Howe et al. While there are quite a few online libraries with datasets in various languages that possess thousands of manuscripts. However, most researchers had to develop their own datasets and find the authorship and age information for verification before they could test and verify their algorithms.
A brief review on some existing online dataset is studied in Sect. The next section provides a brief history of Arabic handwriting over the centuries and its distinguishing characteristics in each period of Islamic history. Results and discussion is elaborated in Sect. Then, conclusions are presented in Sect. It is important to note that there are quite a few manuscript datasets available for algorithm testing and training but they all have their own limitations and advantages.
The main issue with these datasets is that with the exception of the MPS dataset [ 7 ], other datasets do not provide proper dating information for their manuscripts. In addition, the Syriac [ 10 ] and IBN SINA [ 19 ] datasets focus on characters and are more suited for word detection, word segmentation and word annotation. The aim was to provide a dataset similar to MPS dataset for Arabic language that is able to support design, development, training and testing of automatic manuscript age detection algorithms for Arabic historical documents.
Deciding on the most suitable feature extraction method is perhaps the most crucial step to achieve a high recognition rate. Age detection of historical manuscripts is a very challenging problem. We have to contend with the complexities inherent in working with noisy images, and there is an additional challenge we have to tackle. The class boundaries between the documents written in two adjacent centuries are highly nonconvex and nonlinear. This means that for documents written in two centuries and for documents written by two different authors, a very high interclass similarity will exist.
In addition, handwritten documents by different authors are present in a single century, thus providing very high interclass variability. It is interesting to note that a similar kind of interclass similarity and intraclass variability is present in the domain of facial recognition. Wright et al. In this paper, we are following the same method with a small adjustment to consider all possible similarity of the test image in a single class century and across all classes while selecting the minimum number of training images required to represent each test sample adaptively.
This sparse representation provides new insight into the role of feature extraction and occlusion. This theory of compressed sensing a technique of finding a sparse solution to an underdetermined linear system suggests that the correct choice of feature space is no longer critical, however, giving a chance to a random feature to suitably represents a test image.
The sparse representation-based classification algorithm is in essence a nearest subspace selection algorithm. In simplest terms, the algorithm works as follows. The similarity of handwriting style between manuscripts from the same period suggests considering classifying these document as a writing style-based classification problem. To evaluate sparse representation-based approach, we compared it with some of the state-of-the-art writing style-based features that have been used in multiple researches [ 22 , 23 , 24 , 25 ].
These features are run-length feature as it was examined in [ 22 ], edge hinge and edge direction distribution as they were studied by Bulacu and Schomake in [ 23 ]. Run length is a multi-scale run feature that is obtained from the probability distribution of black and white pixels of a binary image [ 22 ]. Run-length feature is calculated after scanning the image into four directions: horizontal, vertical, left-diagonal and right-diagonal.
Subsequently, probability distribution is estimated from normalized histogram of the scanned values. The approach is thoroughly explained in [ 22 ].
Persian manuscripts - The British Library
Edge hinge is obtained by calculating normalized histogram of curvature edge of the text. While edge direction is calculated from normalized histogram of text direction [ 23 ]. Both Edge hinge and Edge direction have been used in writer style identification written in different languages such as [ 22 , 24 , 25 ]. Every attempt has been made to keep this selection process random with only check been to make sure that all the classes are properly represented.
The remaining images are kept as part the evaluation dataset. Different image sizes are created by scaling the image to smaller sizes and no cropping takes place. Evaluation results for the sparse representation-based manuscript age detection algorithm. The results show that as we increase the image size, initially the accuracy tends to improve and this is because more discriminative features are made available with the increase in size. However, if we continue to increase the image size the accuracy tends to drop.
In this paper, we presented a dataset KERTAS Dataset accumulated specifically to assist researchers working on designing solutions and algorithms for digital paleography. The dataset consists of over high-quality, high-resolution digital images acquired from multiple historical handwritten Arabic manuscripts from multiple sources. Detailed metadata are provided for each image to assist in testing and verification of manuscript author detection and manuscript age detection algorithms.
In addition, we presented a sparse representation-based approach to detect the age of manuscripts in order to highlight the suitability of the dataset. The algorithm also provides a baseline accuracy measure that can be compared with other algorithms developed in the future by using KERTAS dataset. Furthermore, we employed some writing style-based features to compare with the proposed approach and to study the consistency of writing style in each century in KETRAS dataset.
The authors gratefully acknowledge use of the services and facilities of the Qatar National Library. The statements made herein are solely the responsibility of the authors.
- Islamic Codicology by François Déroche.
- [Magazine] Scientific American. Vol. 291. No 2.
- Radioecological Concentration Processes. Proceedings of an International Symposium Held in Stockholm, 25–29 April, 1966.
Skip to main content Skip to sections. Advertisement Hide. Download PDF. Open Access. First Online: 08 September While Arabic scripts existed before Islam, Arab was an oral society in that period. Only a few inscriptions were found that go back to that time. The inscription was written with al Jazm, one of the earliest known styles of the modern Arabic scripts [ 11 ].
Open image in new window.
Islamic calendar Hijri started at C. Dates in Islamic calendar are denoted A. Latin: Anno Hegirae.
After the Islamic world expanded, many non-Arab Muslims found it difficult to read the Quran and distinguish between Arabic letters. Arab grammarians adjusted the text of the Quran to avoid distortion; therefore, dots were introduced. The second half of the second A. The arrival of paper in the region contributed to the increase in material documented during that time.