Un-paralleling the Parallel: A Contrastive Stylometric Analysis of H. G. Wells’ The War of the Worlds Parallel Corpus

Document Type : Original Article

Authors

1 Faculty of Languages, October University for Modern Sciences and Arts (MSA), Giza, Egypt

2 Faculty of Languages, October University for Modern Sciences and Arts

Abstract

Contrastive linguistic studies compare and contrast how texts are formed and interpreted in different languages and cultures. Recently, computational tools have been utilized to empirically conduct linguistic analysis. Stylometry is the quantitative study of literary style through computational text analysis. This study attempts a parallel-corpus contrastive stylometric analysis of H.G. Wells’ The War of the Worlds (1898) and its Arabic translation (2012). The paper aims to demonstrate the various challenges of English/ Arabic parallel corpus alignment and to explore the effect of the intricate nature of the Arabic language on natural language processing (NLP) attempts by examining English adverbs and automatically recognized named entities of locations, people, and organizations in comparison to their Arabic renditions. For alignment, the heuristic-based NLTK sentence segmenter successfully produces valid alignments though some discrepancy occurs. The part-of-speech (POS) tagger is more trained on English texts. Most English tokens are accurately tagged; however, the tagger underperforms with Arabic tokens, either misidentifying parts of speech or by labelling them X, standing for unidentified. It is evident that Arabic renditions of adverbs fail to parallel those employed in the English source text featuring a variety of morpho-syntactic alternatives. NER tags manifest better results in both texts with the translator’s tendency to transliterate named entities. The study concludes by shedding light on some of the factors that might have led to inaccurate alignment and annotation. The study also reflects on the translator’s inconsistent choices in translating adverbs and entities of locations and organizations.

Keywords