Institut für Dokumentologie und Editorik

Genre Analysis and Corpus Design: Nineteenth-Century Spanish-American Novels (1830–1910)

 

1 Introduction

1If people are asked about the objects, beings, or events around them, they will most probably name the categories that the things belong to, for example, a book on a shelf, a bird in a tree, a dancer on stage, or a thunderstorm in the sky. Research in cognitive development has shown that even small babies begin to recognize what is around them in terms of categories when they are about a year old (Gopnik, Meltzoff, and Kuhl 2009, 79–83Gopnik, Alison, Andrew Meltzoff, and Patricia Kuhl. 2009. The Scientist in the Crib. What Early Learning Tells Us About the Mind. New York: HarperCollins. First published 1999.). Paradoxically, however, all objects and beings are unique: “All you ever see are individual objects: this particular sweet pea, this individual dollar bill. There is no ‘sweet-peaness’ or ‘dollarhood’ in the world. So how could it ever be informative to say that this individual thing belongs to this nonexistent, mythical category, when the individual thing itself is all we ever actually experience?” (Gopnik, Meltzoff, and Kuhl 2009, 79Gopnik, Alison, Andrew Meltzoff, and Patricia Kuhl. 2009. The Scientist in the Crib. What Early Learning Tells Us About the Mind. New York: HarperCollins. First published 1999.). Categorizing serves a basic need of humans to confer meaning to what they perceive and to leave aside the individuality of things. It helps them to grasp the world around them. However, the perception of the individual is also dependent on the understanding of the general, so that what is special about something emerges from the background of the familiar.

2Literary texts are no exception. One type of category that they are commonly associated with is genre. A poem is something different than a drama, and a science fiction novel is not to be confused with a sentimental one. One comes across literary genres in everyday life, for example, as an organizing principle in a bookstore, in a library, or on the covers of the books themselves. Experiences in daily life usually suggest that the assignment of individual texts to genres does not cause particular problems, only that one might be disappointed, surprised, or impressed when the book bought is different than expected by its genre label. In literary theory and its antecedents, however, the “genre problem” has been discussed intensely for thousands of years, starting with the attempt of Aristotle and Plato to formulate a theory of poetry (Zymner 2003, 10Zymner, Rüdiger. 2003. Gattungstheorie. Probleme und Positionen der Literaturwissenschaft. Paderborn: mentis.). Some of the main questions in the debates about genre are as what type of category they can be conceived: as logical classes with clear boundaries into which all the literary works can neatly be sorted? As prototypical categories with exemplary masterpieces at the center and mediocre imitations at the edge? As networks of related texts that form generic families? Or as some other kind of category that can be described as a combination of necessary and optional features?1 Moreover, it has been debated if genres can be assumed to exist at all beyond pure naming conventions, given that the literary works associated with them can be so different. This is connected to the problem of genre change and also dependence on the cultural context because literary historians must deal with the phenomenon that the same names of genres are applied to phenomena with quite distinct textual characteristics across time and place. At times it has been tried to avoid the challenges that genres as categories of literary texts pose by denying their relevance altogether (for example, by Croce 1905Croce, Benedetto. 1905. Aesthetik als Wissenschaft des Ausdrucks und allgemeine Linguistik. Theorie und Geschichte. Leipzig: E.A. Seemann.). However, the practical relevance that genres have not only in daily life but also for students of literature and literary scholars cannot be denied. Topics of courses and exams are often defined in terms of genres, for example, a seminar on the Spanish picaresque novel or classic drama. Literary histories are also often structured in terms of genres that were important for certain periods. Finally, the interpretation of individual literary works does not happen in a vacuum. In order to assess the value of single texts, they are often examined with regard to a specific literary tradition or genre (Keckeis and Michler 2020, 7–8Keckeis, Paul, and Werner Michler. 2020. “Einleitung: Gattungen und Gattungstheorie.” In Gattungstheorie, edited by Paul Keckeis and Werner Michler, 7–48. Berlin: Suhrkamp.). As a way to approach the genre problem theoretically, there is a tendency in recent literary genre theory to see the phenomenon as one that can be described in different dimensions that are linked to each other in cognitive, communicative, social, and textual dimensions (Gymnich and Neumann 2007Gymnich, Marion, and Birgit Neumann. 2007. “Vorschläge für eine Relationierung verschiedener Aspekte und Dimensionen des Gattungskonzepts: Der Kompaktbegriff Gattung.” In Gattungstheorie und Gattungsgeschichte, edited by Marion Gymnich, Birgit Neumann, and Ansgar Nünning, 31–52. Trier: WVT.).

3This dissertation aims to enter the theoretical discussion about genres from an interdisciplinary perspective. It is located in the field of digital literary stylistics, which is part of the wider discipline of digital humanities, in which humanities research is combined with methods from information science and computer science, and which includes interdisciplinary disciplines such as computational linguistics, computational philology, or computational literary studies. The subfield of digital stylistics is concerned with the analysis of linguistic and literary style with computational methods. An important subject is the investigation of the style of individual authors, but genre style has also been the focus of digital stylists.2 To examine genre on the level of style means that the approach is primarily text-centered, and it also entails empirical work.

4Digital literary stylistics is not exclusively but predominantly applied research. The basis for it are digital corpora of literary texts, which are designed for a specific language, period, set of authors, or genre, or combinations of several ones of them if the aim is a contrastive analysis. The topic of this dissertation is, therefore, genre analysis and corpus design, both as a theoretical discussion of genre as a concept in literary theory and digital stylistics and as an empirical corpus study. A specific corpus was built for this purpose as a basis for an analysis of metadata and texts in terms of genre. The genres that the empirical part of this study is concerned with are the novel and its subgenres in the context of nineteenth-century Spanish-American literature, more precisely Argentine, Cuban, and Mexican novels that were published between 1830 and 1910. There is a growing number of digital stylistic studies concerned with texts in Romance languages, as the contributions to the conference “Digital Stylistics in Romance Studies and Beyond”, which took place at the University of Würzburg in 2019, show.3 Nevertheless, most of the digital stylistic studies on literary texts are still based on corpora of texts in English.4 Comprehensive central repositories of digital literary texts, which are curated following scholarly standards, such as the Digital Library in the TextGrid Repository (TextGrid n.d.TextGrid, ed. n.d. “The Digital Library in the TextGrid Repository.” TextGrid. Virtuelle Forschungsumgebung für die Geisteswissenschaften. https://web.archive.org/web/20221106162919/https://textgrid.de/en/digitale-bibliothek.) or the German Text Archive (Berlin-Brandenburgische Akademie der Wissenschaften 2022Berlin-Brandenburgische Akademie der Wissenschaften, ed. 2022. “Deutsches Textarchiv. Grundlage für ein Referenzkorpus der neuhochdeutschen Sprache.” DTA. Accessed November 6, 2022. https://web.archive.org/web/20221106163539/https://www.deutschestextarchiv.de/.) for German texts, are not yet available for Spanish literary works. There are, however, initiatives that also promote the building of corpora of literary texts in Spanish, many of which go back to individual work, community initiatives, or research projects, for example, the “Corpus of Spanish Golden-Age Sonnets” (Navarro-Colorado, Ribes Lafoz, and Sánchez 2016Navarro-Colorado, Borja, María Ribes Lafoz, and Noelia Sánchez. 2016. “Metrical annotation of a large corpus of Spanish sonnets: representation, scansion and evaluation.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 4360–4364. Portorož, Slovenia: European Language Resources Association (ELRA). http://web.archive.org/web/20220315081224/https://aclanthology.org/L16-1691.pdf.; Navarro-Colorado 2020Navarro-Colorado, Borja, ed. 2020. “Corpus of Spanish Golden-Age Sonnets.” Version 1.0.0. GitHub.com. Accessed December 6, 2022. https://github.com/bncolorado/CorpusSonetosSigloDeOro.) or the multi-language corpora DraCor (Fischer et al. 2019Fischer, Frank, Ingo Börner, Mathias Göbel, Angelika Hechtl, Christopher Kittel, Carsten Milling, and Peer Trilcke. 2019. “Programmable Corpora: Introducing DraCor, an Infrastructure for the Research on European Drama.” In Proceedings of DH2019: ‘Complexities’. Utrecht: Utrecht University. http://web.archive.org/web/20220303001044/https://dev.clariah.nl/files/dh2019/boa/0268.html., n.d.Fischer, Frank, Peer Trilcke, Julia Jennifer Beine, and Boris Orekhov, eds. n.d. “Drama Corpora Project.” Accessed December 6, 2022. https://dracor.org/.) and ELTEC (Odebrecht, Burnard, and Schöch 2021Odebrecht, Carolin, Lou Burnard, and Christof Schöch, eds. 2021. “European Literary Text Collection (ELTeC).” Version 1.1.0. COST Action Distant Reading for European Literary History (CA16204). https://doi.org/10.5281/zenodo.4662444.), which include Spanish drama and novels, respectively. In addition, the project “Computational Literary Genre Stylistics” (CLiGS), the context in which this dissertation was written, was concerned with building and analyzing digital corpora of literary texts in French, Spanish, Italian, and also Portuguese.5 Building such corpora is important for several reasons: it strengthens digital quantitative research in the respective disciplines and language areas, and it helps to make the empirical results of digital stylistics in general more reliable if they are based on findings derived from a broad range of different corpora and if they are not specific for certain languages or genres.

5The Spanish-American nineteenth-century novel is well studied, and knowledge about it is consolidated in literary histories and monographs.6 Various subgenres of this novel have also been analyzed in depth by literary scholars, for instance, the historical novel, the anti-slavery novel, or novels of the romantic, naturalistic, and modernist currents (Löfquist 1995Löfquist, Eva. 1995. La novela histórica chilena dentro del marco de la novelística chilena. 1843–1879. Göteborg: Acta Universitatis Gothoburgensis.; Peñaranda Medina 1994Peñaranda Medina, Rosario. 1994. La novela modernista hispanoamericana. Valencia: Universitat de Valencia.; Read 1939Read, John Lloyd. 1939. The Mexican Historical Novel. 1826–1910. New York: Instituto de las Españas en los Estados Unidos.; Rivas 1990Rivas, Mercedes. 1990. Literatura y esclavitud en la novela cubana del siglo XIX. Sevilla: Escuela de Estudios Hispano-Americanos.; Schlickers 2003Schlickers, Sabine. 2003. El lado oscuro de la modernización: estudios sobre la novela naturalista hispanoamericana. Madrid, Frankfurt: Iberoamericana/Vervuert.; Suárez-Murias 1963Suárez-Murias, Marguerite C. 1963. La novela romántica en Hispanoamérica. New York: Hispanic Institute in the United States.). However, nineteenth-century Spanish-American novels and their subgenres have not yet been analyzed on the basis of a comprehensive digital text corpus and by means of stylistic computational methods. There are several reasons why a quantitative digital analysis of the subgenres of that novel is of interest. First, many studies on nineteenth-century Spanish-American novels focus on selected works that have a canonical status. The effect is that only a specific section of the whole literary production of the time forms the basis of the literary-historical knowledge about the novel and its subgenres in that period.7 There is also qualitative research based on larger corpora, but in these cases, scholars mostly concentrate either on the novel as a whole or on one specific subgenre.8

6A digital approach in which several subgenres of the novel are contrasted can contribute new insights into the characteristics of the texts that are distinctive for the different subgenres. Moreover, more novels can be taken into account if a comparatively large corpus is used – not only the well-known novels but also works that have thus far not received much critical attention. This can shed new light on the concepts of the subgenres, on the one side because the quantitative relevance of the subgenres becomes clearer, and on the other side because lesser-known works possibly represent the subgenres that they are associated with in a different way, by use of other textual traits and stylistic means. Third, a quantitative digital study is different from a qualitative approach even if the same number of novels and subgenres would be analyzed because the way knowledge about the texts is extracted and summarized is not directly dependent on the human reader but results from a mechanical treatment of the texts and computational processing. This can produce new findings about the subgenres that remain unrecognized by close reading methods. Even if the nineteenth century is past, the literature of that time is still of importance because that century marked the rise of the novel as a genre and the beginning of the national literatures of the different Spanish-American countries. Many of the subgenres that were practiced or emerged in the nineteenth century are still relevant in twenty-first-century literature, such as historical or crime novels. For digital genre stylistics in general, the subgenres of the Spanish-American novels are an interesting empirical case because they combined generic concepts of a European origin with specific local inventions. Especially regarding literary currents, a neat chronological succession is not given so that several currents were en vogue at once (on these aspects, see, for instance, Varela Jácome [1982] 2000Varela Jácome, Benito. (1982) 2000. Evolución de la novela hispanoamericana en el siglo XIX (en formato HTML). Alicante: Biblioteca Virtual Miguel de Cervantes. https://www.cervantesvirtual.com/nd/ark:/59851/bmct14z8.). It is an interesting question to what extent different theoretical concepts of genre categories are suitable to capture the various nineteenth-century Spanish-American subgenres.

7Although this dissertation is concerned with texts in Spanish, it is written in English because the field of digital genre stylistics and digital humanities in general is highly interdisciplinary. The aim is to provide results that can be appreciated by scholars of Spanish-American literature but also by digital humanists from around the world. A second linguistic and cultural background of this thesis is German, and much research literature from German-speaking countries has been taken into account, especially literature on genre theory but also digital stylistics papers and research on the Spanish-American novel. In general, quotes are not translated, assuming that the context provides enough information to grasp their meaning.

8Before the specific goals and questions of this thesis and its structure are outlined, it must be clarified what is not covered here. The period that is covered is 1830 to 1910, the whole long nineteenth century, but the corpus of novels that is analyzed is treated as a synchronic one. In the discussion of results, the publication date of the novels has been taken into account to see if that had an influence on the results, but no inherently diachronic analysis of the subgenres is pursued here. A second aspect that is not addressed fully at the moment of the publication of this thesis is the one of sustainable research data management in connection with the publication of the corpus. Its basic publication strategy is presented, and it is published in Open Access and in standard formats in a public code repository on GitHub and Zenodo, including versioning, but the publication method is not discussed explicitly in relationship to the FAIR principles (Wilkinson et al. 2016Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3. https://doi.org/10.1038/sdata.2016.18.) or other best practices for research data publication. In the longer term, it is planned to prepare the corpus for long-term preservation and accessibility in suitable institutional or subject-specific repositories, but in the context of this dissertation, the initial focus was on its creation, analysis, and basic availability for transparency and re-use.

9As mentioned above, the main goals of this dissertation are firstly in the area of genre theory, secondly in the construction of a digital corpus of novels, and thirdly in its computer-assisted analysis. The theoretical foundations of the thesis are clarified in chapter 2, “Concepts”. On the level of genre theory (chapter 2.1, “Literary Genres”), the aim is to work out which of the existing concepts of the ontological status of genres (chapter 2.1.2, “Ontological Status and Relevance of Genres”) and their historical and theoretical nature (chapter 2.1.3, “System and History”) are relevant and applicable and how these concepts need to be adapted for digital genre stylistics. In this context, three aspects are specifically addressed. The first aspect is the question of how generic terms can be modeled and defined. This is an important issue because genre labels are the main feature through which genre conventions enter a digital stylistic text analysis. This aspect is deepened in chapters 2.1.2.1 (“Semiotic Models of Genres”) and 2.1.2.2 (“Genres and Digital Genre Stylistics: The Roles of Corpora, Genre Labels, Features, and Text Style”). Second, definitions of genre and text types stemming from literary theory and linguistics are compared to see to what degree they are suitable for digital stylistics. In particular, the question of how and whether a conventional or historical level of genre and a textual one should be separated is discussed. An own proposal is made for conceptual differentiation in chapter 2.1.3.1 (“A Conceptual Proposal for Digital Genre Stylistics: Literary Text Types, Conventional Literary Genres, and Textual Literary Genres”), building on existing approaches. Third, three main concepts that have been proposed to conceptualize genres as categories, namely logical classes, prototypical structures, and family resemblance networks, are related to the distinction between conventional and textual levels of genre (chapter 2.1.4, “Categorization”). It is then outlined how these three concepts can be implemented in text-based digital genre analyses by referring to computational methods of text classification, clustering, and network analysis. The theoretical part also explains the concept of literary style (chapter 2.2, “Style”) that underlies the analyses in the empirical part. Furthermore, the part on concepts is closed by a presentation of three major thematic subgenres and three literary currents chosen for text analysis (chapter 2.3, “Subgenres of the Nineteenth-Century Spanish-American Novel”). These are the historical novel, the sentimental novel, and the novel of customs as thematic subgenres, and the romantic novel, the realist novel, and the naturalistic novel as literary currents. Several hypotheses are formulated regarding the textual and stylistic characteristics and coherence expected for these subgenres and currents.

10The empirical part of the work has two main parts: chapter 3, “Corpus”, and chapter 4, “Analysis”. The first main goal of this part has been to build up a comprehensive digital bibliography of Argentine, Mexican, and Cuban nineteenth-century novels and a corresponding digital corpus of 256 texts. Both have been elaborated as a prerequisite and basis for the text analysis of subgenres. The selection of novels from the three countries is motivated, and the diachronic limits of the bibliography and the corpus are clarified. General defining characteristics of the novel are discussed as a basis for the selection of works for both digital resources (chapter 3.1, “Selection Criteria”). A special focus is on how the subgenre labels were collected, modeled, and encoded (chapters 3.2.3 and 3.3.4, “Assignment of Subgenre Labels”, for the bibliography and the corpus, respectively). An empirically based adaptation of the semiotic models of Raible (1980Raible, Wolfgang. 1980. “Was sind Gattungen? Eine Antwort aus semiotischer und textlinguistischer Sicht.” Poetica 12: 320–349.) and Schaeffer (1983Schaeffer, Jean-Marie. 1983. Qu’est-ce qu’un genre littéraire? Paris: Seuil.), which are also presented in the first chapter on genre theory, provides the theoretical foundation for the organization of the subgenre labels in the bibliography and the corpus. The preparation of the bibliography and corpus is explained in detail, including the availability and usage of bibliographical and full-text sources, the treatment of the extracted full-texts, the collection of metadata and text encoding, and the chosen publication strategy. Both resources are published on the web and offered to other scholars for reuse (Henny-Krahmer 2017–2021Henny-Krahmer, Ulrike, ed. 2017–2021. “Bib-ACMé. Bibliografía digital de novelas argentinas, cubanas y mexicanas (1830–1910).” Version 1.2. Zenodo. https://doi.org/10.5281/zenodo.4453491., 2021aHenny-Krahmer, Ulrike, ed. 2021a. “Corpus de novelas hispanoamericanas del siglo XIX (conha19).” Version 1.0.1. Zenodo. https://doi.org/10.5281/zenodo.4766987.).

11The creation of the two collections of data and texts was primarily motivated by the goal of analyzing subgenres of Spanish-American nineteenth-century novels with quantitative methods. Therefore the selection of the materials was guided by the question about the specific subgenres that are the focus of interest here. However, the bibliography and the corpus also aim to provide a foundation for future analysis in other contexts. There are aspects of the corpus that are not employed in the analyses in this dissertation but are nonetheless presented as relevant for the design of corpora for digital literary genre studies. Examples are chapter structures and paragraphs that were encoded in the corpus but not considered in the analysis. Another example is the separation of direct speech and narrated text, which was realized only for a part of the corpus and was analyzed only on a test basis. Such additional encoding prepares for future analyses beyond the scope of this dissertation. In addition, some structural units of the corpus have already been used for analyses in the CLiGS project, although they are not the focus of the work here.9 The digital text corpus created here thus claims to go beyond limited, project-specific use. It aims to be a community data collection that can be used by different representatives of a research community, is suitable for addressing different questions from a specific research field, is comprehensive, follows discipline-specific standards, and is designed to be archived and reusable in the medium term (Schöch 2017a, 224Schöch, Christof. 2017a. “Aufbau von Datensammlungen.” In Digital Humanities. Eine Einführung, edited by Fotis Jannidis, Hubertus Kohle, and Malte Rehbein, 223–233. Stuttgart: J.B. Metzler.; National Science Board 2005, 20–21National Science Board, ed. 2005. “Long-Lived Digital Data Collections Enabling Research and Education in the 21st Century.” National Science Foundation. https://web.archive.org/web/20230207100814/https://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf. ).

12Of the two resources, the bibliography constitutes the sampling frame for the novels in the corpus, which means that it represents the larger population of all the novels that were published between 1830 and 1910 in Argentina, Cuba, and Mexico. Of course, the bibliography does not contain information about all these works, as it cannot be known with certainty how many and which novels were published in that time, but it aspires to approximate that amount of novels. It is then possible to compare the novels contained in the bibliography to the ones in the corpus to see how representative the latter is for the novel and its subgenres of the chosen years and countries. This is done in the first part of the analysis chapter, in chapter 4.1, “Metadata Analysis”. Not only the question of representativeness is tackled in that chapter, but also which subgenres on which discursive levels were quantitatively relevant. In addition, it is analyzed how the novels can be characterized by other parameters that have a possible impact on the analysis of genre style, for instance, the narrative perspective of the novels or the decades that they were published in. The metadata analysis chapter also provides a general overview of which authors and works are included in the bibliography and corpus and to which subgenres the works are assigned. This informs potential subsequent users of the resources in detail about their content and the distribution of the content in quantitative terms.

13The second part of the analysis chapter, chapter 4.2, “Text Analysis”, is concerned with the text analysis of the corpus of 256 novels. Two main types of stylistic features are employed in the analysis: most frequent words (MFW) and topics. In the first part of the text analysis chapter (4.2.1, “Features”), both types of features are presented and it is discussed how they relate to literary concepts of style and theme. In the second part of the analysis chapter (4.2.2, “Categorization”), the texts are categorized, first by statistical classification and then with a family resemblance network analysis as an alternative categorization approach. The novels are analyzed on two discursive levels of genre: thematic subgenres and literary currents. Only the subgenres and currents that are most relevant in quantitative terms are analyzed in this part. One goal of the text analysis is to show in empirical experiments how statistical classification and network analysis can be employed to analyze genres on the textual level in terms of different categorical concepts. Another goal is to find out if the conventionally, historically, and theoretically defined thematic subgenres and literary currents can be captured at all on the stylistic level of a group of texts, and if yes, how textually coherent the groups of novels associated with these subgenres are. In the classification setting (chapter 4.2.2.1, “Classification”), textual coherence means the degree to which the communicatively established subgenre classifications of the novels can be captured accurately in terms of textually defined classes, and it is measured in terms of classification accuracy. A further question is what can be learned about the subgenres and the individual texts from the errors that the classifier makes.

14Besides the statistical classification approach, a family resemblance analysis (chapter 4.2.2.2, “Family Resemblance: Network Analysis”) is pursued. While a classificatory approach assumes strict boundaries between the various groups of texts, in a network structure, the focus is on direct and indirect relationships between groups of novels, and the results are more open. In this context, the question of textual coherence refers to the extent to which textually based groups of novels in the network are also related to the same genre or subgenre of novels from a communicative perspective. In this case, coherence cannot simply be measured with an accuracy value but must be assessed by evaluating and interpreting the clusters found in the network. That way, the family resemblance network analysis can also answer questions about the internal structure of subgenres, and it takes into account factors other than the genre that may influence the groupings of texts found in the network.

15Just as for the digital bibliography and text corpus, all Python and XSLT scripts used to perform the analyses and all associated data are published on GitHub and Zenodo in script and data repositories (Henny-Krahmer 2021aHenny-Krahmer, Ulrike, ed. 2021a. “Corpus de novelas hispanoamericanas del siglo XIX (conha19).” Version 1.0.1. Zenodo. https://doi.org/10.5281/zenodo.4766987., 2021bHenny-Krahmer, Ulrike. 2021b. “Data accompanying the dissertation ʻGenre analysis and corpus design: 19th century Spanish American novels (1830–1910)ʼ.” Version 1.0.0. Zenodo. http://doi.org/10.5281/zenodo.4451928., 2021cHenny-Krahmer, Ulrike. 2021c. “Features for the classification of Spanish American 19th century novels by subgenre.” Version 1.0.0. Zenodo. http://doi.org/10.5281/zenodo.4449494., 2021dHenny-Krahmer, Ulrike. 2021d. “Scripts accompanying the dissertation ʻGenre analysis and corpus design: 19th century Spanish American novels (1830–1910)ʼ.” Version 1.0.0. Zenodo. https://doi.org/10.5281/zenodo.4445877.). From the text of this dissertation, links are always provided to the relevant individual scripts and data in these repositories. Selected result data are also included directly in the text in the form of XML examples, tables, and figures. This book is, therefore, to be understood as an enhanced monograph: the text is a chain of argumentation and a narrative that leads through the data and scripts and becomes complete only with them. In addition, the text of this dissertation itself has been encoded in TEI and is available in a web-based HTML format and as a PDF.10 Finally, it must be said that this work was submitted as a dissertation in early 2021. Updates could be made only partly, so in essence, the contents reflect the state of research at the time of submission.11

xNote
For an overview of the categorization aspect of genres, see Zymner (2003, 99–104Zymner, Rüdiger. 2003. Gattungstheorie. Probleme und Positionen der Literaturwissenschaft. Paderborn: mentis.).
xNote
For an introduction to the background and goals of digital literary stylistics, see the website (SIG-DLS n.d.SIG-DLS. n.d. “Goals.” Digital Literary Stylistics (SIG-DLS). http://web.archive.org/web/20221023111813/https://dls.hypotheses.org/activities/about/about.) of the corresponding special interest group of the Alliance of Digital Humanities Organizations (ADHO).
xNote
See the call for papers (CLiGS n.d.CLiGS. n.d. “Call for Papers: Digital Stylistics in Romance Studies and beyond.” CLiGS – Computergestützte literarische Gattungsstilistik. Accessed October 23, 2022. http://web.archive.org/web/20221023113851/https://cligs.hypotheses.org/digital-stylistics-in-romance-studies-and-beyond/call-for-papers.) and the conference proceedings to be published in 2023 (Hesselbach et al., forthcomingHesselbach, Robert, José Calvo Tello, Ulrike Henny-Krahmer, Christof Schöch, and Daniel Schlör, eds. Forthcoming. Digital Stylistics in Romance Studies and Beyond. Heidelberg: Heidelberg University Publishing.).
xNote
See, for instance, the influential studies of Jockers (2013Jockers, Matthew L. 2013. Macroanalysis. Digital Methods & Literary History. Topics in the Digital Humanities. Urbana, Chicago, and Springfield: University of Illinois Press.) and Underwood (2019Underwood, Ted. 2019. Distant Horizons: Digital Evidence and Literary Change. Chicago: The University of Chicago Press.).
xNote
One outcome of the project is the Textbox, a collection of small to medium-sized corpora of literary texts in Romance languages of different genres, which are published on GitHub and free to reuse (Schöch, Calvo Tello et al. 2018Schöch, Christof, José Calvo Tello, Ulrike Henny-Krahmer, and Stefanie Popp, eds. 2018. “The CLiGS textbox.” Version 4.0.0. Zenodo. https://doi.org/10.5281/zenodo.597430., 2019Schöch, Christof, José Calvo Tello, Ulrike Henny-Krahmer, and Stefanie Popp. 2019. “The CLiGS Textbox: Building and Using Collections of Literary Texts in Romance Languages Encoded in TEI XML.” Journal of the Text Encoding Initiative. Rolling Issue. https://doi.org/10.4000/jtei.2085.). Beyond the Textbox, the following more extensive individual corpora resulting from the CLiGS project are worth mentioning: the “Corpus of Novels of the Spanish Silver Age” (CoNSSA, Calvo Tello 2021aCalvo Tello, José, ed. 2021a. “Corpus of Novels of the Spanish Silver Age (CoNSSA).” Version 1.0.0. GitHub.com. Accessed December 9, 2022. https://github.com/cligs/conssa.) and a text collection of over 800 French dramatic texts (Schöch 2017bSchöch, Christof, ed. 2017b. “theatreclassique.” Accessed December 9, 2022. https://github.com/cligs/theatreclassique.) derived from the corpus Théâtre Classique (Fièvre 2007–2022Fièvre, Paul, ed. 2007–2022. “Théâtre Classique.” Accessed December 10, 2022. https://www.theatre-classique.fr.). The latter is also available as part of the multilingual DraCor corpus, where it is called FreDraCor (Milling, Fischer, and Göbel 2021Milling, Carsten, Frank Fischer, and Mathias Göbel, eds. 2021. “French Drama Corpus (FreDraCor): A TEI P5 Version of Paul Fièvre's ʻThéâtre Classiqueʼ Corpus.” GitHub.com. Accessed December 9, 2022. https://github.com/dracor-org/fredracor.).
xNote
For general literary histories on Spanish-American literature that also cover the nineteenth-century novel and for specialized monographs, see, among others, Alegría (1959Alegría, Fernando. 1959. Breve historia de la novela hispanoamericana. México: Ed. de Andrea.), Anderson Imbert (1954Anderson Imbert, Enrique. 1954. Historia de la literatura hispanoamericana. México: Fondo de Cultura Económica.), Dill (1999Dill, Hans-Otto. 1999. Geschichte der lateinamerikanischen Literatur im Überblick. Stuttgart: Reclam.), Gálvez (1990Gálvez, Marina. 1990. La novela hispanoamericana (hasta 1940). Madrid: Taurus.), Goić (2009Goić, Cedomil. 2009. Brevísima relación de la historia de la novela hispanoamericana. Madrid: Biblioteca Nueva.), Íñigo Madrigal, Alvar, and Aínsa (1982Íñigo Madrigal, Luis, Manuel Alvar, and Fernando Aínsa, eds. 1982. Historia de la literatura hispanoamericana. 3 vols. Madrid: Cátedra.), Lindstrom (2004Lindstrom, Naomi. 2004. Early Spanish American Narrative. Austin: University of Texas Press.), Rössner (2007Rössner, Michael. 2007. Lateinamerikanische Literaturgeschichte. 3rd ed. Stuttgart, Weimar: J.B. Metzler.), and Sánchez (1953Sánchez, Luis Alberto. 1953. Proceso y contenido de la novela hispano-americana. Madrid: Editorial Gredos.).
xNote
Rivas (1990Rivas, Mercedes. 1990. Literatura y esclavitud en la novela cubana del siglo XIX. Sevilla: Escuela de Estudios Hispano-Americanos.), for instance, establishes the concept of the anti-slavery novel based on seven different novels. Gnutzmann (1998Gnutzmann, Rita. 1998. La novela naturalista en Argentina (1880–1900). Amsterdam, Atlanta: Rodopi.) as well studies the Argentine naturalistic novel with a corpus of seven texts.
xNote
For example, Löfquist (1995Löfquist, Eva. 1995. La novela histórica chilena dentro del marco de la novelística chilena. 1843–1879. Göteborg: Acta Universitatis Gothoburgensis.) on the Chilean historical novel, Read (1939Read, John Lloyd. 1939. The Mexican Historical Novel. 1826–1910. New York: Instituto de las Españas en los Estados Unidos.) on the Mexican historical novel, or Schlickers (Schlickers 2003Schlickers, Sabine. 2003. El lado oscuro de la modernización: estudios sobre la novela naturalista hispanoamericana. Madrid, Frankfurt: Iberoamericana/Vervuert.) on the Spanish-American naturalistic novel. Another approach is to consider the novel as a whole for an individual country and for a certain period. Lichtblau (1959Lichtblau, Myron I. 1959. The Argentine Novel in the Nineteenth Century. New York: Hispanic Institute in the United States.), for example, studies the nineteenth-century novel in Argentina, and Molina (2011Molina, Hebe Beatriz. 2011. Como crecen los hongos. La novela argentina entre 1838 y 1872. Buenos Aires: Teseo.) the Argentine novel between 1838 and 1872.
xNote
There are two studies based on subparts of the corpus in which the internal structure of the texts was exploited: Schöch, Henny et al. (2016Schöch, Christof, Ulrike Henny, José Calvo Tello, Daniel Schlör, and Stefanie Popp. 2016. “Topic, Genre, Text. Topics im Textverlauf von Untergattungen des spanischen und hispanoamerikanischen Romans (1880–1930).” In DHd 2016. Modellierung, Vernetzung, Visualisierung. Die Digital Humanities als fächerübergreifendes Forschungsparadigma. Konferenzabstracts, 235–239. Leipzig: Universität Leipzig. https://doi.org/10.5281/zenodo.4645380.) on the development of topics in different parts of the novels, depending on the subgenres, and Henny-Krahmer (2018Henny-Krahmer, Ulrike. 2018. “Exploration of Sentiments and Genre in Spanish American Novels.” In Digital Humanities 2018. Puentes–Bridges. Book of Abstracts. Mexico City, 26–29 June 2018, 399–403. Mexico City: Red de Humanidades Digitales. https://web.archive.org/web/20200702225303/https://dh2018.adho.org/exploration-of-sentiments-and-genre-in-spanish-american-novels/.) on the connection of sentiments and direct speech versus narrated text in different subgenres.
xNote
The web-based edition of this dissertation can be accessed at https://side17.i-d-e.de/.
xNote
In the meantime, for example, the dissertation of my co-doctoral student José Calvo Tello from the CLiGS project has been published (Calvo Tello 2021bCalvo Tello, José. 2021b. The Novel in the Spanish Silver Age. A Digital Analysis of Genre Using Machine Learning. Digital Humanities Research, vol. 4. Bielefeld: Bielefeld University Press. https://doi.org/10.14361/9783839459256.), the content of which could not be considered here because the dissertations were prepared at the same time. Due to the joint research project in which the two theses were written, there are, of course, common foundations and references between them.