• Architecture and Design
  • Asian and Pacific Studies
  • Business and Economics
  • Classical and Ancient Near Eastern Studies
  • Computer Sciences
  • Cultural Studies
  • Engineering
  • General Interest
  • Geosciences
  • Industrial Chemistry
  • Islamic and Middle Eastern Studies
  • Jewish Studies
  • Library and Information Science, Book Studies
  • Life Sciences
  • Linguistics and Semiotics
  • Literary Studies
  • Materials Sciences
  • Mathematics
  • Social Sciences
  • Sports and Recreation
  • Theology and Religion
  • Publish your article
  • The role of authors
  • Promoting your article
  • Abstracting & indexing
  • Publishing Ethics
  • Why publish with De Gruyter
  • How to publish with De Gruyter
  • Our book series
  • Our subject areas
  • Your digital product at De Gruyter
  • Contribute to our reference works
  • Product information
  • Tools & resources
  • Product Information
  • Promotional Materials
  • Orders and Inquiries
  • FAQ for Library Suppliers and Book Sellers
  • Repository Policy
  • Free access policy
  • Open Access agreements
  • Database portals
  • For Authors
  • Customer service
  • People + Culture
  • Journal Management
  • How to join us
  • Working at De Gruyter
  • Mission & Vision
  • De Gruyter Foundation
  • De Gruyter Ebound
  • Our Responsibility
  • Partner publishers

corpus analysis in research articles

Your purchase has been completed. Your documents are now available to view.

Corpus-based discourse analysis: from meta-reflection to accountability

Recent years have seen an increase in data and method reflection in corpus-based discourse analysis. In this article, we first take stock of some of the issues arising from such reflection (covering concepts such as triangulation, objectivity/subjectivity, replication, transparency, reflexivity, consistency). We then introduce a new ‘accountability’ framework for use in corpus-based discourse analysis (and perhaps beyond). We conceptualise such accountability as a multi-faceted phenomenon, covering various aspects of the research process. In the second part of this article, we then link this framework to a new cross-institutional initiative – the Australian Text Analytics Platform (ATAP) – which aims to address a small part of the framework, namely the transparency of analyses through Jupyter notebooks. We introduce the Quotation Tool as an example ATAP notebook of particular relevance to corpus-based discourse analysis. We reflect on how this notebook fosters accountability in relation to transparency of analysis and illustrate key applications using a set of different corpora.

1 Introduction

When we were asked by the editors of this special issue to reflect on trends and new directions in the area where corpus linguistics meets discourse studies, one particular aspect came to mind: the issue of data and method reflection or what we might call “meta-reflection”, i.e. reflection about corpus-based discourse analysis (DA). In this respect, Marchi and Taylor (2018 : 8) speak of a “drive to increase reflexivity across the field”. From the beginning, studies bringing together corpus linguistics with discourse analysis have arguably had a general interest in this area, for example with respect to triangulation. Thus, Hardt-Mautner (1995)  – perhaps the earliest example of corpus-based Critical Discourse Analysis – notes “[e]ven when the computer has entered the fray, triangulation remains a valuable methodological principle” (24). This openness of corpus-based discourse analysis to method reflection is possibly based on “the distinct combination of qualitative and quantitative techniques that has made us more apt to think about method per se” ( Baker 2018 : 291). A recent volume ( Taylor and Marchi 2018 ) is explicitly dedicated to “dusty corners” (neglected aspects), “blind spots” (under-analysed or undetected aspects) and “pitfalls” (in research design choices), reflecting on issues such as partiality, incompleteness, and bias ( Marchi and Taylor 2018 : 9–12).

Such meta-reflection includes reflection about using corpus linguistics to enable corpus-based discourse analysts to achieve the same results with the same or different corpora. This has been discussed with respect to reproducibility or replicability : [1] Marchi and Taylor suggest that “[t]he quantitative techniques of CL give greater generalizability, allow for replicability and strengthen the overall reliability and validity of the research” ( Marchi and Taylor 2009 : 4). Baker states that “replicability is important – if the same results are achieved with the same or different data sets, then we have better evidence for ‘total’ accountability” ( Baker 2018 : 282). Marchi and Taylor (2018 : 7–8) point out the importance of creating a “reproducible” process that is systematically accounted for, so that others can check results and conclusions.

This interest within corpus-based discourse analysis dovetails with recent general interest in linguistics (e.g. Bochynska et al. 2023 ; Sönning and Werner 2021 ) and corpus linguistics (e.g. McEnery and Brezina 2022 ) in reproducibility, repeatability, or replicability. Generally, proposals or critiques concern the availability of data/software and the transparency/clarity of reporting of methods and results (e.g. Doyle 2005 : 2–4; Egbert 2023 ; Hober et al. 2023 ; McEnery and Brezina 2022 : 68, 185; McEnery and Hardie 2012 : 2; Paquot and Callies 2020 : 121; Sönning and Werner 2021 : 1182). Further discussion of replication/reproducibility is provided in Schweinberger and Haugh (in press) . In this article, we will limit our discussion of meta-reflection and related issues such as triangulation and replication specifically to corpus-based discourse analysis . This term refers here to the combination of corpus linguistics and discourse analysis, including but not limited to corpus-assisted discourse studies/CADS (see e.g. Partington 2008 ), and to the combination of corpus linguistics and Critical Discourse Analysis (see e.g. Nartey and Mwinlaaru 2019 ). [2] In Section 2 , we describe some of the relevant method reflection in this field and make our own suggestion for greater accountability, before introducing Jupyter notebooks as part of a new initiative ( Section 3 ) and discussing one relevant notebook ( Section 4 ). Throughout, our focus remains on corpus-based discourse analysis; however, given the above-mentioned general interest in related issues beyond this field, the article is likely to also have relevance for other linguists.

2 Triangulation, transparency, and accountability in corpus-based discourse analysis

As indicated above, method reflection is important for corpus-based DA and one area that has seen particular attention is that of triangulation. In corpus-based DA, triangulation has been recognized as concerning methodologies, data sources, researchers, theories ( Marchi and Taylor 2009 , drawing on Denzin 1970 ) as well as disciplines ( Ancarno 2018 ). For scope, we will not consider the latter here (combining disciplines). Note also that the publications we reference below are illustrative rather than exhaustive.

In general, triangulation is a validation technique which is sometimes favoured as a substitute for replicability ( Bloor 1997 : 38). Within corpus-based DA, the dominant view of triangulation is that this field of research inherently involves triangulation – namely, the triangulation of (sets of) methodologies used in corpus linguistics with (sets of) methodologies used in discourse analysis. [3] This is also how the first textbook on Using Corpora in Discourse Analysis discusses triangulation ( Baker 2006 : 15–17) and this is arguably the underlying assumption in Baker et al.’s (2008) “synergy” of corpus-based Critical Discourse Analysis, with a nine-stage model proposed for moving between quantitative and qualitive analysis. In a different vein, Bednarek (2009) suggests a “three-pronged approach” that brings together macro-, meso- and micro-levels of discourse analysis by combining manual analysis of individual texts with semi-automated analysis of small corpora, and with large-scale corpus analyses. This is explicitly discussed in terms of triangulation. There is of course wider interest in how to combine analyses of corpora with analysis of individual texts or other types of closer qualitative analysis (e.g. Baker 2009 : 89, 2020 ; Baker and Levon 2015 : 223). Researchers sometimes select (“downsample”) corpus texts for qualitative analysis based on different approaches (see Baker 2020 : 88–89), including through the tool ProtAnt, which offers a principled means of downsampling texts, reducing researcher bias, identifying outlier texts, and enabling replication ( Anthony and Baker 2015 ; Anthony et al. 2023 ). A framework for how to use data triangulation in corpus-based DA has also been proposed ( Jaworska and Kinloch 2018 ).

This interest in triangulation and proposed models/frameworks for such an undertaking has been extended to explicit experimentation. The first such study was undertaken by Marchi and Taylor (2009) , described by the authors as a “quasi-experiment into triangulation” ( Marchi and Taylor 2009 : 1), with a focus on researcher triangulation. The question underpinning their experiment was: “Would two researchers starting with the same corpus and research question and (broadly) theoretical/methodological framework come to the same/similar conclusions?” ( Marchi and Taylor 2009 : 1). Findings are differentiated into convergence (results confirming each other), dissonance (results are incompatible) and complementarity (results illuminate different aspects). This early work inspired similar studies in recent years ( Baker 2015 ; Baker and Egbert 2016 ; Baker and Levon 2015 ). [4] In each study, the same research question is tackled by different scholars although the specifics of the set-up vary, e.g. whether the same data, corpus tool, or techniques are used. On the whole, these studies uncover a large amount of individual variation rather than a standard analysis, and even where results are similar, they may have been uncovered using different techniques. Analyses are shown to result in some converging (shared) as well as a larger number of unique (but mainly complementary) findings.

An important part of this meta-reflection and experimentation in corpus-based discourse analysis has been explicit reflection on subjectivity and objectivity, such as the assumption that corpus linguistics provides greater objectivity or reduces bias (e.g. Baker 2012 ; Baker and McEnery 2015 ; Marchi and Taylor 2009 , 2018 ). In general, the various studies demonstrate that corpus and discourse analyses vary depending on multiple choices that can be made during the research process. This includes, inter alia , datasets; corpus-based versus corpus-driven directions of research; particular techniques (e.g. keywords vs collocation); specific thresholds/cutoffs and other methodological parameters; extent and type of qualitative analysis; researcher interests; how corpus findings are interpreted, etc. Thus, there appears to be general agreement that corpus-based discourse analysis “cannot in and of itself bring greater objectivity” ( Marchi and Taylor 2018 : 8). Baker (2012 : 255) therefore asks “for an increased commitment to researcher reflexivity” (see also Marchi and Taylor 2018 ). In addition, Baker and McEnery (2015 : 9) suggest to “aim for wider transparency about methodological decisions and a more nuanced set of stated claims about the benefits of using computational methods”. McEnery (2016 : 31) recommends accurate reporting to enable replication and critical exploration (see also Egbert and Baker 2016 ). Marchi and Taylor (2018 : 12) call for “a more general set of principles which can guide our work”, to monitor research choices and their impacts and present them in ways that facilitate replication.

To manage subjectivity in corpus-based discourse analysis, Baker (2009) proposes that analysts adopt transparency and consistency . For Baker (2009 : 83–84), transparency refers to being clear about methodological choices as well as making corpora and corpus data (frequency lists, concordances) available. Transparency also covers researchers stating or making explicit their position regarding the object of research, such as their goals, standpoints, and other relevant influences ( Baker 2009 : 87). For him, this also includes reflexivity with respect to how one’s starting position may have changed in the course of the research. Consistency in turn refers to maintaining the same methodological choices regarding corpus techniques throughout the research project. Bednarek and Caple (2017 : 23) use consistency in qualitative discourse analysis of corpus texts – in their case, the coding of news values. [5]

Corpus-based discourse researchers have attempted to increase both transparency and consistency in various ways. Some studies include a step where two or more researchers undertake the same analysis and then measure inter-rater agreement (e.g. Bednarek et al. 2022 ; Jaworska and Kinloch 2018 : 116) or where the same researcher undertakes the same analysis more than once and measures intra-rater reliability (e.g. Gray 2016 : 37). Others create a coding manual (to improve consistency) and, to increase transparency, make this openly available on a website (e.g. Bednarek and Caple 2017 ) or via the Open Science Framework/OSF (e.g. Bednarek et al. 2022 ). Where corpus-based discourse analysts have used programming rather than off-the-shelf software, some have made the code available that underpins the analysis (e.g. McGlashan 2021 ).

In sum, many important concepts are involved in meta-reflection in corpus-based discourse analysis: triangulation, objectivity/subjectivity, replication, transparency, reflexivity, consistency . However, one concept that is so far missing here is that of accountability . In this context, Marchi and Taylor (2018 : 12) argue for replacing the aim to achieve greater objectivity with the aim of increasing accountability (including consistency, transparency) and self-reflexivity.

Building on the discussion above, Table 1 therefore proposes what we call an “accountability framework”. Accountability is not limited here to Leech’s (1992 : 112) principle of total accountability which refers to using all relevant evidence from the corpus/data ( McEnery and Brezina 2022 ; McEnery and Hardie 2012 : 14–16). Neither is it limited to consistency and transparency. Rather, we use accountability as a broad cover term which includes being accountable for all relevant aspects of corpus-based DA, deliberately drawing on the everyday meaning of accountability . Being accountable means being transparent about research aspects, but also being able to justify them, being responsible for the decisions made or positions taken, and critically reflecting on them. As evident from Table 1 , we conceptualise this as a multi-faceted phenomenon, covering various aspects of the research process. The framework is a first step and likely incomplete; the dot points in the table signal that we hope others will extend this framework – including but not limited to making links with open science initiatives and principles.

Accountability framework.

Corpus
Methods, techniques, analyses
Theories
Researcher(s)
Triangulation
Interested/affected parties

Given the scope of this article, we cannot elaborate on all points, but – given the discussion above – most should be relatively clear. [6] A few comments must suffice: When we talk about specifying the language of the corpus and its national context in the “corpus” row, we want to highlight the need to not assume any language or national context as unmarked or as having universal relevance. Corpus-based discourse analysis is arguably dominated by research on British English data, and it is imperative not to treat such data as the unmarked case. Similarly, when we talk about the underpinnings, applicability, and limitations of theories in the “theories” row, we want to encourage reflections regarding the origins (e.g. “Western”) and data (e.g. language variety, text types) from which those theories, concepts, or frameworks derive. This could also mean unpacking the assumptions and presuppositions that are built into such theories. Further, when we talk about supporting analysts in the “researcher(s)” row, one thing to think about is how to support those who analyse challenging/confronting data (e.g. sexual abuse, racism, misogyny, trans- or homophobia, health contexts, etc.). Finally, this row also interacts with the last row (“interested/affected parties”), as the researcher themselves might be an interested/affected party or an “out-group” researcher analysing the representation of social groups to which they do not belong (see Bray 2023 ).

The questions that we ask in Table 1 are meant to act either as a set of guiding principles or as a reflection tool and we acknowledge that not every project will be able to implement all relevant aspects given the constraints and pressures under which researchers operate and the different levels of support and capacity available. We also do not mean to imply that we have always considered all points in Table 1 or that we will always be able to do this in the future! Implementation would require considerable support from different levels. For example, triangulation alone faces significant challenges such as time, expertise and the space to write about different approaches ( Egbert and Baker 2016 ). Some opportunities are already available, for example repositories for depositing supplementary materials (e.g. OSF, Github – see Schweinberger and Haugh in press ), corpus linguistics journals that encourage methodological commentary (see Egbert and Baker 2016 : 206), journals that allow for publication of corpus or data descriptions, and standardised notation systems (e.g. collocation parameters notation; see Brezina 2018 : 275). Others still need to be implemented, such as customary procedures for reciprocal double coding ( Brezina 2018 : 270) or the appropriate institutional recognition of both corpora and corpus manuals as valuable research outputs. In sum, the burden of implementation should not fall on individual researchers; rather, supporting structures and cultural change are needed. In the remainder of this article, we introduce a cross-institutional initiative – the Australian Text Analytics Platform (ATAP) – which aims to address a small part of the framework in Table 1 , namely the transparency of analyses through making code available via Jupyter notebooks (explained in 3.1 ).

3 The Australian Text Analytics Platform

3.1 the initiative.

Making code available arguably constitutes a relatively recent trend in corpus-based DA and is hampered by a computational skills shortage. Even where the code is made available, the question is how many corpus-based discourse analysts have the skill to read, evaluate, and/or implement this code? To address this skills shortage and to increase transparency, researchers in Australia are collaborating on the Australian Text Analytics Platform (ATAP). While ATAP does not exclusively focus on corpus-based discourse analysis, we want to take this opportunity to introduce this initiative as potentially helpful to this field.

In brief, ATAP is an initiative which fosters cooperation among data and text analytics users and providers, and supports researchers from diverse academic backgrounds, including beginners, in adopting user-friendly code-based text analysis. It involves collaboration among the Universities of Queensland (including the Language Technology and Data Analytics Laboratory/LADAL), Sydney (Sydney Corpus Lab, Sydney Informatics Hub), and Australia’s Academic and Research Network/AARNet. It is also connected to the Language Data Commons of Australia HASS RDC through which it collaborates with additional institutions/organizations and supports language-related technologies, data collection infrastructure, and Indigenous capability programs. ATAP’s primary goal is to enhance research flexibility, offer valuable upskilling resources, and improve research workflow transparency and reproducibility. It thus offers a range of resources for text data collection, processing, visualization, and analysis (e.g. training, office hours, collaborations). Code-based resources include the development of user-friendly, shareable, transparent, and interactive Jupyter notebooks. Before we introduce an ATAP notebook that may be of particular interest to corpus-based discourse analysts, we provide an explanation of such notebooks given that they may be unfamiliar to many researchers in this field.

3.2 Jupyter notebooks

Jupyter notebooks are free open-sourced online web applications that are designed to create and share documents which combine text with executable code and the output of the code. These notebooks support a variety of programming languages including Python, Julia, and R. They are often stored on GitHub and other online repositories. In September 2018, there were more than 2.5 million public Jupyter notebooks hosted on GitHub alone ( Perkel 2018 : 145). Jupyter notebooks can be uploaded to online platforms which allow to run the code in notebooks independent of personal computers or they can be run locally (i.e. without connecting to the Internet), although extra setup is required for this. It is also possible to convert Jupyter notebooks to other file formats (e.g. html, LaTeX, and pdf), which makes them shareable outside of wherever they are stored ( Beg et al. 2021 : 38).

In general, a notebook is composed of cells (i.e. areas of the notebook), of which there are three types: code cells, text cells, and raw cells. Code cells contain code which can be executed to run analyses, output results from analysis, produce visualisations, etc. Text cells make use of markdown, a method for formatting text. Raw cells are mainly used for configuration, e.g. for specifying if a notebook should be converted into a pdf or html document ( Pimentel et al. 2021 : 4). Thus, Jupyter notebooks typically contain both computer code (in the relevant programming language) and text elements (paragraphs of text, links, equations etc.). Code cells are (usually) preceded by explanatory text (e.g. description of the code). What the code produces (the output) is presented just below the code cells. This means that notebooks are simultaneously human-readable and executable documents. Thus, a notebook can be summarised as “an interactive literate programming document and an application that executes the document” ( Pimentel et al. 2021 : 3).

Jupyter notebooks are used in ATAP because they have several advantages: they make it easy to document, share, and reproduce codes used for analyses ( Pimentel et al. 2021 ; Shen 2014 ). With notebooks, it is possible to carry out an entire study within a single document while maintaining a complete and executable record of the processes involved ( Beg et al. 2021 : 38, 40). Due to the ability to contain text and tables, notebooks can also provide information about annotation schemas, provide or discuss examples as well as automatically store and document technical information ( Beg et al. 2021 : 40).

The combination of explanation and code allows analysts to describe the intended process and outcomes of the code which, in turn, allows others to better understand how it works. This is helpful for troubleshooting if any errors are encountered when the code is adapted. Finally, it is possible to modify and then immediately execute/run the code. This means that it is possible to see the results of the modifications which lead to a better understanding of the code. Because of these features, Schweinberger and Haugh (in press) argue that the use of notebooks in linguistics can assist in making corpus analyses and the workflows they are based on more transparent, reproducible, and efficient.

Document Similarity Tool – enables the identification of duplicated content among the texts in a corpus and permits the exclusion of duplicates after manual review.

Semantic Tagger – can be used to add semantic tags to the texts in a corpus, undertake analysis/visualisation and export tagged texts.

Keywords Analysis – keywords analysis on two or multiple corpora, as well as statistical tests to investigate word use across corpora.

Quotation Tool – identifies who is quoted as well as quoted content in newspaper corpora, integrating named entity recognition.

While most of the notebooks are relevant to different text types, the Quotation Tool ( Jufri and Sun 2022 ) targets newspaper texts – a very common data source in corpus-based DA. Given that quotation extraction is a type of analysis not normally enabled in “classic” off-the-shelf corpus software but is of clear relevance to corpus-based DA, we will introduce this tool in more detail below.

4 The Quotation Tool

4.1 introducing the quotation tool.

We have chosen the Quotation Tool as a sample notebook because of the relevance of quoted speech to corpus-based DA. In Baker et al.’s (2008 : 282) list of strategies of representation (adapted from Wodak 2001 ), quotation falls under the strategies identified as “Perspectivation, framing or discourse representation”. As they note, “Quotation patterns […] play an important role in implementing particular perspectives, and hence, ideologies” ( Baker et al. 2008 : 295). Therefore, it is important to analyse “who is written about, how much space they are given and whether they are directly or indirectly quoted” ( Baker et al. 2008 : 294), and whether sources are represented negatively, for instance as “inarticulate, extremist, illogical, aggressive or threatening” ( Baker et al. 2008 : 295). In other approaches to discourse analysis, reported speech and thought is also regarded as important, in particular in newspaper discourse (see Bednarek 2016 : 31–32; White 2012 ). Fairclough (1988) is a classic CDA paper that introduces some of the relevant issues. Bednarek (2016) suggests that there is general agreement that the following questions can be differentiated in relation to the integration of voices (sources) in news discourse:

Whose voice is it (who is the source)? How are they identified? Where are they located and how do we access them? How are these voices integrated? What reporting expression, if any, is used to introduce the content?

The Quotation Tool is of clear relevance to these questions, because it allows users to upload a news corpus and to automatically identify who is cited (the source), to classify this source (as entity type), to identify which reporting expression is used to cite them (e.g. according to , said, claimed ), the type of quotation used to integrate the reported content (e.g. direct, indirect), and to extract the quoted content itself (for additional analyses, e.g. manually or as corpus in another tool). It then becomes possible for discourse analysts who examine larger datasets (which would not be amenable to manual analysis) to identify what sources are cited, how these sources are identified, whether readers hear/experience what sources said or a paraphrase by the journalist, and which sources and which content is associated with neutral versus other reporting expressions (e.g. those expressing attitude like admit , warn or those expressing potential doubt like claim ). This enables analysis of whose perspective is included and whether there is any bias in any quotation patterns. For discursive news values analysis ( Bednarek and Caple 2014 , 2017 ), the Quotation Tool can inform analysts of whether the cited sources are “ordinary people” (news value of personalisation) or “elites” (news value of eliteness).

Before we illustrate some of the results from using the Quotation Tool, we first briefly explain its basic features, using a small training dataset/corpus comprising 100 articles (utf-8 encoded text files) from the University of Sydney’s student newspaper Honi Soit . We collected one article per week from the newspaper’s online News section ( http://honisoit.com/category/news/ ) over a period of 2 years (2021–2022), as explained in Lee (2024) . The Honi Soit editors gave us permission to use this corpus for training purposes.

Step 1 – Setup : imports and initiates the Quotation Tool and the necessary libraries;

Step 2 – Load the data : uploads the corpus file, including previewing its contents (cf.  Figure 1 );

Step 3 – Extract the quotes : extracts the quotes and applies named entity recognition to speakers and quoted content (but not other elements); includes preview of the extracted data (not shown here);

Step 4 – Display the quotes : shows the results of the analysis in one text ( Figure 4 ); allows analysis of whole corpus for the most frequent (top) entity names (e.g. John Doe ) and entity types (e.g. PERSON) in speakers and quoted content; visualisations of the latter can be exported ( Figures 2 and 3 );

Step 5 – Save the quotes : saves the results into a spreadsheet for downloading to users’ computers and additional analysis outside the Quotation Tool notebook environment.

Figure 1: 
Previewing corpus contents in the notebook.

Previewing corpus contents in the notebook.

Figure 2: 
The five most frequent speaker entity names.

The five most frequent speaker entity names.

Figure 3: 
The five most frequent speaker entity types.

The five most frequent speaker entity types.

The quotation extraction has been adapted (with permission) from code developed by Maite Taboada’s Gender Gap Tracker team (e.g. Asr et al. 2021 ), while other tools used in the notebook include spaCy ( https://spacy.io/ ), Natural Language Toolkit ( https://www.nltk.org/index.html ), pandas ( https://pandas.pydata.org/ ) and Jupyter Widgets ( https://ipywidgets.readthedocs.io/en/stable/ ). Before we demonstrate uses of the Quotation Tool further (in Section 4.3 ), we provide a brief discussion of some of the notebook features with respect to transparency of analysis (falling within the second row “Methods, techniques, analysis” in Table 1 ).

4.2 Reflection on accountability and transparency

How does this notebook foster accountability in relation to transparency of analysis? On a general level, such analytical transparency is facilitated by the fact that this notebook contains both explanations and code. This allows users to understand what action the relevant code performs and would be particularly important for users who do not read code. The explanations also include information on how the code can be changed by users, for example changing the number of rows that are previewed by changing the “ n ” variable. This makes the code more transparent, as the “ n ” variable is both explained and made adaptable. However, it must be acknowledged that some of the code will still be non-transparent to those who cannot read it. They can understand what the code achieves, but cannot evaluate the code itself for how it achieves the outcome. A final general feature is the use of previews, which allow users to see the extracted data and identify any potential issues with it. Such previews can also make the code more transparent, as results are displayed and can be inspected (cf. Figure 4 ).

Figure 4: 
Extract from text preview showing results of quote extraction and named entity recognition.

Extract from text preview showing results of quote extraction and named entity recognition.

In addition to these general features, this notebook includes several links to resources where users can read about (aspects of) the underlying code used to extract the quotes. Figure 5 shows the link to the relevant team’s GitHub page as well as to an article that evaluates the Quotation Tool’s accuracy. Elsewhere in the notebook, users are able to directly access an appendix (from the Gender Gap team), which contains technical information about the quotation extraction pipeline.

Figure 5: 
Introduction to the Quotation Tool notebook.

Introduction to the Quotation Tool notebook.

The notebook also includes various coloured boxes which provide information about the libraries that are installed ( Figure 6 ) and the tools that are used ( Figure 7 ). In some cases, additional information about the relevant tool is also provided, for instance with respect to how texts are split into tokens ( Figure 7 ) or with respect to capitalization ( Figure 8 ). This is important so that users can understand the choices that are integrated within the tools, to be able to interpret the output, and to identify potential limitations.

Figure 6: 
Information about installed libraries.

Information about installed libraries.

Figure 7: 
Information about tools and tokenization.

Information about tools and tokenization.

Figure 8: 
Information about capitalization.

Information about capitalization.

Other coloured boxes contain important information for users, such as providing a definition of each column header in a table or information about how to deal with particular issues from a technological perspective (e.g. large file upload).

In sum, transparency is addressed through several different measures, either directly within the notebook or by providing users with links to relevant documentation outside the notebook. When users do not change the code but use it with all default values, this analysis should be reproducible (with the same dataset) and replicable (with a different dataset). Where users change the code – as they are able to – they would need to take responsibility for documenting the changes that have been made in a transparent way to ensure reproducibility and replicability. One remaining disadvantage is that some notebook users will not be able to read and evaluate the code itself, but will have to rely on the accompanying material and its accessibility to them. In addition, there is a trade-off between including too much and too little information in the notebook itself. Too much explicatory text within the notebook may overwhelm the notebook user, while links to documents outside the notebook require additional effort by the user and may deter them from engaging with the relevant information. In general, we have opted for a mix of both.

4.3 Potential uses in corpus-based DA

Having discussed its general features, this section now illustrates some potential uses of the Quotation Tool in corpus-based DA. One of the obvious uses of the notebook is to analyse a specific corpus for its quotation patterns and to interpret these patterns. This seems a relatively straightforward use of the tool, which we only briefly discuss here. To give some examples, the downloaded spreadsheet (produced in step 5) could be filtered by reporting expression and all sources and propositions could be retrieved that are cited using negative attitudinal reporting verbs such as WARN or ADMIT. These could be compared to those cited with neutral reporting expressions. Potential classification systems that can be used as starting point for identifying reporting expressions can be found in Caldas-Coulthard (1994) or Bednarek (2006 : 57–58). Similarly, one could compare sources cited directly with those cited indirectly, or one could identify and classify all the sources and types of entities that are cited. Thus, relevant quotation patterns could be retrieved in a corpus of news about a particular topic and analysed with respect to bias and ideology.

The Honi Soit training dataset introduced above (100 articles from an Australian student newspaper);

A subset of the AusBrown corpus ( Collins and Yao 2019 ), namely 169 articles from the Press section (139 articles from 1990; 30 articles from 2006);

The Cycling corpus described in Bednarek and Caple (2017 : 138–143) containing 1,687 news items about cycling from 12 Australian, US, and UK newspapers in the years 2004, 2005, 2009, 2013, 2014;

The Diabetes News Corpus described in Bednarek and Carr (2019) , consisting of 694 newspaper articles about diabetes (577 news items, 117 other items) from 12 Australian newspapers (2013–2017);

The Australian Obesity Corpus, including approximately 26,000 articles from 12 Australian newspapers that mention the words obese or obesity (both news and other items; see Vanichkina and Bednarek 2022 ).

These datasets all contain Australian newspaper items, although some include non-news genres (e.g. opinion) and some also include UK and US items. We used the Quotation Tool to compare a range of aspects across these corpora, with the aim of identifying the frequency of entities (as sources and in quoted content), of quotation types, and of reporting expressions. All results in Tables 2 – 5 were retrieved by working with the downloaded spreadsheet, but can now be easily retrieved in summative form (see Australian Text Analytics Platform 2024 ).

Speaker entities.

AusBrown subset Cycling Diabetes Honi Soit Obesity
PERSON 1,273 (#1) 3,175 (#1) 1,707 (#1) 290 (#1) 61,481 (#1)
ORG 511 (#2) 1,384 (#2) 557 (#2) 236 (#2) 30,551 (#2)
GPE 181 (#3) 639 (#3) 253 (#3) 33 (#3) 10,158 (#3)
NORP 81 (#4) 70 (#5) 55 (#4) 10 (#4) 3,302 (#4)
LOC 22 (#5) 70 (#5) 8 (#5) 2 (#5) 375 (#5)
FAC 19 (#6) 102 (#4) 4 (#6) 0 (#6) 294 (#6)

Quote entities.

AusBrown subset Cycling Diabetes Honi Soit Obesity
ORG 700 (#1) 999 (#2) 343 (#2) 236 (#1) 17,539 (#1)
GPE 521 (#2) 1,040 (#1) 350 (#1) 84 (#2) 15,983 (#2)
PERSON 492 (#3) 780 (#3) 329 (#3) 70 (#3) 12,378 (#3)
NORP 202 (#4) 147 (#6) 186 (#4) 53 (#4) 7,850 (#4)
LOC 97 (#5) 228 (#5) 28 (#5) 7 (#5) 1,540 (#5)
FAC 49 (#6) 353 (#4) 6 (#6) 5 (#6) 657 (#6)

Quotation types.

AusBrown subset Cycling Diabetes Honi Soit Obesity
SVC 1,591 (#1) 3,241 (#1) 1,428 (#1) 314 (#1) 63,974 (#1)
CSV 439 (#2) 742 (#4) 308 (#5) 66 (#4) 15,856 (#5)
QCQSV 407 (#3) 998 (#3) 912 (#2) 81 (#3) 29,449 (#3)
Heuristic 370 (#4) 1,038 (#2) 523 (#3) 269 (#2) 34,331 (#2)
QCQ 174 (#5) 474 (#6) 490 (#4) 46 (#5) 17,192 (#4)
Floating quote 174 (#5) 474 (#6) 490 (#4) 46 (#5) 17,192 (#4)
SVQCQ 52 (#6) 664 (#5) 33 (#8) 16 (#9) 2,539 (#9)
AccordingTo 50 (#7) 149 (#9) 107 (#6) 30 (#6) 5,636 (#6)
QCQVS 38 (#8) 357 (#7) 72 (#7) 29 (#7) 4,278 (#7)
CVS 30 (#9) 171 (#8) 44 (#9) 23 (#8) 3,780 (#8)
VCS 1 (#10) 5 (#11) 1 (#11) 2 (#10) 230 (#10)
VSC 1 (#10) 6 (#10) 177 (#11)
QSCQV 2 (#16)
SCV 1 (#13) 43 (#12)
SQCQV 4 (#10) 19 (#14)
VQCQS 3 (#15)
VQSCQ 2 (#16)
VSQCQ 2 (#12) 37 (#13)

Ten most frequent reporting expressions.

Rank AusBrown Cycling Diabetes Honi Soit Obesity
1 said (1,943) said (4,691) said (1,924) said (233) said (54,136)
2 told (123) says (365) says (384) told (50) says (31,668)
3 says (87) told (237) according [to] (108) says (37) according [to] (5,675)
4 say (72) say (164) say (71) according [to] (31) say (4,967)
5 according [to] (50) added (160) told (35) stated (16) told (4,618)
6 saying (28) according [to] (149) think (33) noted (15) think (2,243)
7 warned (24) saying (60) suggests (24) argued (14) suggests (1,588)
8 thought (18) wrote (49) warn (18) explained (14) saying (1,425)
9 confirmed (17) adding (33) saying (17) added (12) writes (1,399)
thought (17) say (12)
saying (12)
10 announced (15) claimed (30) warned (16) announced (11) wrote (825)
claimed (15)

We start by examining trends in entities (raw frequencies and ranks): Tables 2 and 3 show the results for speaker entities (sources) and quote entities (entities mentioned within quoted content) in the five corpora (excluding blank cells). These results are based on the six spaCy entity types included by default in the notebook (PERSON = people; ORG = companies, agencies, institutions, etc.; GPE = countries, cities, states; NORP = nationalities, religious or political groups; FAC = buildings, airports, highways, etc.; LOC = non-GPE locations, mountain ranges, bodies of water). Numbers derive from the named entity recognition, where the same source is sometimes classified as more than one entity, for instance in the case of role labels (e.g. NSW Police Minister David Elliott : David Elliott  = PERSON; Police  = ORG; NSW  = GPE) and were retrieved using a text filter. It is evident that the three most highly-ranked speaker entities are the same across all corpora, with only small differences among the remaining entities. No doubt there are errors in the entity recognition; however, the ranking conforms to expectations, given that we would expect people to be sources more often than other entity types and given that ORG and GPE/NORP are likely associated with the construction of the news values of eliteness and/or proximity ( Bednarek and Caple 2017 : 82–83; 91–93). For quote entities, there is similar overlap in the rankings, with three newspaper corpora having identical rankings for all entity types and two corpora (Cycling; Diabetes) having slight variations. Thus, there are journalistic norms concerning the kinds of entities that are quoted as sources as well as those that are mentioned in quoted content.

We now consider the quotation structure, with Table 4 listing all quotation types and their (raw) frequency and rank in each corpus. The categories come from the quotation extraction developed by the Gender Gap team and are explained in Asr et al. (2021) ’s supporting information. Briefly, AccordingTo = use of according to , S = speaker, V = verb, C = content of quote, Q = quotation mark; floating quote = quotation by same speaker without new reporting expression. As can be seen, there are also definite norms at play considering quotation types: SVC is ranked first across all corpora, while most other quotation types also show rankings that are either the same or similar (within one to two ranks) across corpora. Only CSV (ranks 2, 4, 5) and SVQCQ (ranks 5, 6, 8, 9) show slightly more variation, some of which may derive from the diversity of news genres/topics in each corpus.

It is also interesting to identify frequency patterns in the reporting expressions that are used, because such expressions can be tied to evaluation/attitude (e.g. Bednarek 2006 ; White 2012 ). Again, there is much overlap concerning the ranking of the ten most frequent reporting expressions in each corpus (cf. Table 5 , excluding blank cells). For instance, said is ranked first in all corpora and is thus the most commonly used reporting expression, while the present tense form says is ranked second or third across all. Told (ranks 2, 3, 5) and according [to] (ranks 3–6) are also important in all datasets, while say has the same rank (#4) in all but the student newspaper ( Honi Soit ), which may not consistently follow established journalistic norms. Other differences may arise from genre or topic variation. In any case, there are clear usage norms that can be identified with the Quotation Tool, allowing us to identify common and uncommon quotation patterns in news against which a particular news corpus could then be compared.

In addition, we can use the notebook for the purpose of triangulation (of methods). We will illustrate this briefly with the Diabetes News Corpus. In a previous study ( Bednarek and Carr 2021 ), we used WordSmith ( Scott 2020 ) to identify the four most frequent reporting expressions in this corpus ( said , says , according to and say ; cf. also Table 5 ), and to classify each source (excluding pronouns) that occurs with these four expressions using a coding manual ( https://osf.io/jrhx2/ ). The results of this original classification are shown in Table A.1 in the appendix. We can now use the output from the Quotation Tool to triangulate this analysis. To do so, one of the authors (Kelvin Lee) used the same coding manual to classify all speakers occurring in the “speaker” column in the downloaded spreadsheet. We filtered out pronouns ( he, I, it, she, some, someone, that, them, they, this, we, which, who, you ), blank cells, and instances associated with said , says , according to and say  – leaving us with 658 instances to analyse (resulting in 659 codings, with one instance double-coded). The purpose of the triangulation was to see if the patterns would be similar with respect to additional reporting expressions; hence the exclusion of the expressions that were previously analysed.

Since we are reporting these results here purely for the purposes of illustration, inter- or intra-rater reliability was not measured. In addition to the categories from the coding manual we added two additional categories: “error” and “unclear”. The “error” category refers to cases where speakers were incorrectly identified by the Quotation Tool (e.g. mice ), while the “unclear” category refers to cases where the spreadsheet output did not suffice to enable speaker classification. In such cases one would have to use an additional tool for analysis (e.g. a concordancer) or check the relevant text in the corpus. The results of this qualitative analysis are presented in Table A.2 in the appendix and show that frequency patterns of source categories differ somewhat, indicating that particular reporting expressions may be associated with particular source categories. When the results of both codings are combined ( Table 6 ), however, the ranking of the source categories is virtually identical to that based on the most frequent expressions, with only health advocacy groups and lay people changing places (but very close to each other). Triangulation with the Quotation Tool thus gives us additional confidence in the original findings. All three tables are included in the appendix for further consultation.

Frequency and percentage for each source category (combined).

Source category Total frequency of codings
Research findings and announcements 597 (23.26 %)
Medical and health experts 475 (18.50 %)
Health advocacy groups 350 (13.63 %)
Lay people 398 (15.50 %)
Politicians, government officials and government initiatives 205 (7.99 %)
Businesses/companies 97 (3.78 %)
Research organisations 79 (3.08 %)
Professional experts 157 (6.12 %)
Celebrities 19 (0.74 %)
Media outlet or story 14 (0.55 %)
Guidelines and information sheets 8 (0.31 %)
Unclear 107 (4.17 %)
Other 25 (0.97 %)
Error 36 (1.40 %)

A final use of the notebook for corpus-based DA is that we can use the quoted content itself as corpus to identify linguistic characteristics of such content. To do so, we take the column that contains the quoted content, extract this content, and turn it into a corpus. We can then analyse the five quoted content corpora for frequent features, for example part-of-speech tags, semantic tags, grammatical words, etc. (cf.  Tables A.4–A.6 in the appendix). There is no room here to discuss these results adequately and further analyses would be necessary, but these tables indicate some similar trends regardless of the topic of the corpus. For instance, common POS tags ( Table A.4 ) across all or most corpora include singular/mass nouns, prepositions/subordinating conjunctions, determiners, adjectives, personal pronouns, singular proper nouns, plural nouns, adverbs, verbs (in base form; past participle; singular present tense; past tense; gerund/present participle), and coordinating conjunctions. Excluding the grammatical bin and unmatched cases, the top 15 semantic tag lists ( Table A.5 ) show a preference for Pronouns, Existing, Time: Period, Getting and possession, Personal names, Evaluation: Authentic (all corpora) as well as People, Likely, and Numbers (four corpora). The overlapping grammatical words ( Table A.6 ) likely reflect general frequency trends in English, but do inform us about the importance of first person plural pronouns ( we ) in quoted content, as they occur among the top 15 most frequent grammatical words in all but one corpora. That these tendencies occur regardless of the topic informs us about general language trends in newspaper quotes on the whole.

In sum, the Quotation Tool has a range of potential uses for corpus-based discourse analysis of English-language newspaper texts, and we hope that this notebook – alongside other ATAP notebooks – will prove helpful to other researchers. Of course, manual analysis will be more accurate and is preferable for small text collections. In addition, the tool does work with medium-to large datasets (e.g. 26,000 articles) but may be slow or require several attempts, and it currently does not work with “big data”. Thus, the tool may be most suitable for medium-to large corpora that are not amenable to manual coding.

5 Conclusions

The present paper showcased the use of Jupyter notebooks for improving transparency in corpus-based DA, as part of a novel accountability framework we introduced here for the first time. We do want to briefly acknowledge the limitations of such notebooks here.

Although supporting the reproducibility/repeatability of analyses, this aspect can be difficult to attain when making use of Jupyter notebooks if code cells are executed out of the intended top-to-bottom order or because dependencies change over time (see Beg et al. 2021 : 41). This means that the results of functions change as functions in the packages or libraries that the notebook uses are edited or changed. However, tracking changes ( Vastola 2023 ) and collaborating on Jupyter notebooks with traditional version control systems like Git can be challenging due to the notebook’s JSON-based format. In addition, many analyses rely on random sampling which will render analyses non-reproducible if it is not assured that the same random samples are drawn when re-running the code ( Wang et al. 2020 : 289).

Another suite of limitations relates to the fact that notebooks may represent an unfamiliar, “strange” tool ( Beg et al. 2021 : 41–42) for those with little or no computational experience, and can have a steep learning curve. This issue ties in with another potentially problematic aspect: the quality and tidiness of the code. Jupyter notebooks can become cluttered with outputs, comments, and unorganized code, making them less readable and maintainable. Further, issues emerging from using large datasets and computationally intensive methods remain serious drawbacks, with Jupyter notebooks being unable to handle large datasets or “timing-out” if computational processing of data takes longer than a pre-specified time limit (see Vastola 2023 ). Another problem is the rapidly changing computational ecosystem, since best practices and environments for running and sharing Jupyter notebooks are evolving and adapting at a remarkable speed. A general recommendation is to make them findable, accessible, interoperable and reusable (FAIR), with practical recommendations on how to achieve this available from the Australian Research Data Commons (2023) .

Finally, like many other computational tools, the NLP methods used in Jupyter notebooks are based and tested on large datasets representing major, mostly western-European, languages and alphabets, which can lead to issues when trying to apply them beyond such contexts. In sum, these various limitations of Jupyter notebook need to continuously be weighed against their advantages (see Section 3.2 ), and we do not want to rule out that future ATAP tools will be using different applications as new developments occur.

In addition to critical reflection on Jupyter notebooks, there is a need for a more general discussion about the integration of computational methods in corpus-based DA. The limited but increasing reflective work on combining corpus linguistics and discourse analysis shows that the field is now robust enough to handle such ( Marchi and Taylor 2018 ). Baker (2018 : 281) confirms that such a reflection “suggests a maturing of the field of corpora and discourse studies”. This should also include critical reflection on other issues, such as those included in the accountability framework we introduced above. Our discussion here was limited in scope and did not encompass all facets of the framework. Rather, we focused on a specific area and introduced a cross-institutional initiative: the Australian Text Analytics Platform, which currently employs Jupyter notebooks. Our emphasis has been on notebooks relevant to corpus-based DA, with the Quotation Tool notebook serving as an example of its link to accountability and potential applications. Of course, Jupyter notebooks do not address all aspects of the accountability framework. Researchers should also consider how they might deal with other elements in corpus-based discourse analysis, including the decisions made when building corpora, annotating data, defining variables or selecting features of interest, generating hypotheses, identifying discourses, and so on (see Table 1 ). Different options may need to be investigated for bringing together and making accountable different steps that are brought together in a single research project. For instance, Schweinberger and Haugh (in press) propose a GitHub repository which is comprised of documentation, code, and an interactive notebook, but other workflow representations are clearly possible and deserve further exploration.

Could the aim of striving for accountability be considered naïve? We aimed to encourage researchers to at least think about these aspects and see which they can address and how, given the constraints among which they are operating in a particular research project. We anticipate that future research and critical scrutiny can expand upon this article, delving into other aspects of the accountability framework in greater depth and refining it further. Just to give one example, we have not touched upon issues around open access dissemination of project findings. Clearly, there are a number of relevant open scholarship projects and initiatives in both applied linguistics and linguistics that can be considered in this respect. [7] In conclusion, we must acknowledge the pressing concerns surrounding additional developments in corpus-based DA. This includes the imperative to centre marginalized voices (e.g. Nartey 2022 , 2023 ) and promote greater diversity among both practitioners and the data they analyse.

Funding source: Australian Research Data Commons (Platforms Program)

Funding source: Australian Research Data Commons (HASS and Indigenous Research Data Commons)

Acknowledgements

We are very grateful to Maite Taboada for permitting us to adapt the code for quotation extraction developed by the Gender Gap Tracker team at Simon Fraser University for use in the ATAP Quotation Tool. We acknowledge the technical assistance provided by the Sydney Informatics Hub, a Core Research Facility of the University of Sydney. The Jupyter notebook development was rendered possible through the Australian Text Analytics Platform ( https://doi.org/10.47486/PL074 ) and the Language Data Commons of Australia via the HASS Research Data Commons and Indigenous Research Capability Program ( https://doi.org/10.47486/HIR001 ). These projects received investment from the Australian Research Data Commons (ARDC), which is funded by the National Collaborative Research Infrastructure Strategy (NCRIS).

Ancarno, Clyde. 2018. Interdisciplinary approaches in corpus linguistics and CADs. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review , 130–156. London & New York: Routledge. Search in Google Scholar

Anthony, Laurence & Paul Baker. 2015. ProtAnt: A tool for analysing the prototypicality of texts. International Journal of Corpus Linguistics 20(3). 273–292. https://doi.org/10.1075/ijcl.20.3.01ant . Search in Google Scholar

Anthony, Laurence, Nicholas Smith, Sebastian Hoffmann & Paul Rayson. 2023. Understanding corpus text prototypicality: A multifaceted problem. Paper presented at ICAME 44 conference . North-West University. 17–21 May. Search in Google Scholar

Applied Linguistics, Press. 2024. Applied Linguistics Press [list of open scholarship resources]. https://www.appliedlinguisticspress.org/home/os-resources (accessed 18 March 2024). Search in Google Scholar

Asr, Fatemeh Torabi, Mohammad Mazraeh, Alexandre Lopes, Vasundhara Gautam, Junette Gonzales, Prashanth Rao & Maite Taboada. 2021. The gender gap tracker: Using natural language processing to measure gender bias in media. PLoS One 16(1). e0245533. https://doi.org/10.1371/journal.pone.0245533 . Search in Google Scholar

Australian Research Data Commons . 2023. FAIR for Jupyter notebooks: A practical guide . https://ardc.edu.au/resource/fair-for-jupyter-notebooks-a-practical-guide/ (accessed 18 March 2024). Search in Google Scholar

Australian Text Analytics Platform . 2024. Quotation Tool notebook: Help pages . https://github.com/Australian-Text-Analytics-Platform/quotation-tool/blob/main/documents/quotation_help_pages.pdf (accessed 18 March 2024). Search in Google Scholar

Baker, Paul. 2006. Using corpora in discourse analysis . London: Continuum. Search in Google Scholar

Baker, Paul. 2009. Issues in teaching corpus-based discourse analysis. In Linda Lombardo (ed.), Using corpora to learn about language and discourse , 73–79. Bern & New York: Peter Lang. Search in Google Scholar

Baker, Paul. 2012. Acceptable bias? Using corpus linguistics methods with critical discourse analysis. Critical Discourse Studies 9(3). 247–256. https://doi.org/10.1080/17405904.2012.688297 . Search in Google Scholar

Baker, Paul. 2015. Does Britain need any more foreign doctors? Inter-analyst consistency and corpus-assisted (critical) discourse analysis. In Maggie Charles, Nicholas Groom & Suganthi John (eds.), Corpora, grammar, text and discourse: In honour of Susan Hunston , 283–300. Amsterdam & Philadelphia: John Benjamins. Search in Google Scholar

Baker, Paul. 2018. Conclusion: Reflecting on reflective research. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review , 281–292. London & New York: Routledge. Search in Google Scholar

Baker, Paul. 2020. Analysing representations of obesity in the Daily Mail via corpus and down-sampling methods. In Jesse Egbert & Paul Baker (eds.), Using corpus methods to triangulate linguistic analysis , 85–108. London & New York: Routledge. Search in Google Scholar

Baker, Paul & Jesse Egbert (eds.). 2016. Triangulating methodological approaches in corpus linguistic research . London & New York: Routledge. Search in Google Scholar

Baker, Paul & Erez Levon. 2015. Picking the right cherries? A comparison of corpus-based and qualitative analyses of news articles about masculinity. Discourse & Communication 9(2). 221–336. https://doi.org/10.1177/1750481314568542 . Search in Google Scholar

Baker, Paul & Tony McEnery. 2015. Introduction. In Paul Baker & Tony McEnery (eds.), Corpora and discourse studies: Integrating discourse and corpora , 1–19. Basingstoke & New York: Palgrave Macmillan. Search in Google Scholar

Baker, Paul, Costas Gabrielatos, Majid Khosravinik, Michał Krzyzanowski, Tony McEnery & Ruth Wodak. 2008. A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society 19(3). 273–306. https://doi.org/10.1177/0957926508088962 . Search in Google Scholar

Bednarek, Monika. 2006. Evaluation in media discourse: Analysis of a newspaper corpus . London & New York: Continuum. Search in Google Scholar

Bednarek, Monika. 2009. Corpora and discourse: A three-pronged approach to analyzing linguistic data. In Michael Haugh, Kate Burridge, Jean Mulder & Pam Peters (eds.), Selected proceedings of the 2008 HCSNet workshop on designing the Australian national corpus: Mustering languages . Sommerville: Cascadilla Proceedings Project. Search in Google Scholar

Bednarek, Monika. 2016. Voices and values in the news: News media talk, news values and attribution. Discourse, Context & Media 11. 27–37. https://doi.org/10.1016/j.dcm.2015.11.004 . Search in Google Scholar

Bednarek, Monika & Helen Caple. 2014. Why do news values matter? Towards a new methodological framework for analyzing news discourse in critical discourse analysis and beyond. Discourse & Society 25(2). 135–158. https://doi.org/10.1177/0957926513516041 . Search in Google Scholar

Bednarek, Monika & Helen Caple. 2017. The discourse of news values: How news organisations create newsworthiness . Oxford & New York: Oxford University Press. Search in Google Scholar

Bednarek, Monika & Georgia Carr. 2019. Guide to the diabetes news corpus (DNC). https://osf.io/jrhx2/ (accessed 4 July 2023). Search in Google Scholar

Bednarek, Monika & Georgia Carr. 2021. Australian diabetes news media coverage. Australian Diabetes Educator 23(4). https://ade.adea.com.au/australian-diabetes-news-media-coverage/ (accessed 4 July 2023). Search in Google Scholar

Bednarek, Monika, Andrew S. Ross, Olga Boichak, Yaegan J. Doran, Georgia Carr, Eduardo G. Altmann & Tristram J. Alexander. 2022. Winning the discursive struggle? The impact of a significant environmental crisis event on dominant climate discourses on Twitter. Discourse, Context & Media 45(100564). 1–13. https://doi.org/10.1016/j.dcm.2021.100564 . Search in Google Scholar

Beg, Marijan, Juliette Taka, Thomas Kluyver, Alexander Konovalov, Min Ragan-Kelley, Nicolas M. Thiéry & Hans Fangohr. 2021. Using Jupyter for reproducible scientific workflows. Computing in Science & Engineering 23(2). 36–46. https://doi.org/10.1109/mcse.2021.3052101 . Search in Google Scholar

Bender, Emily M. 2019. The #BenderRule: On naming the languages we study and why it matters. The Gradient . https://thegradient.pub/the-benderrule-on-naming-the-languages-we-study-and-why-it-matters/ (accessed 2 June 2023). Search in Google Scholar

Bloor, Michael. 1997. Techniques of validation in qualitative research: A critical commentary. In Gale Miller & Robert Dingwall (eds.), Context and method in qualitative research , 37–50. London: Sage. Search in Google Scholar

Bochynska, Agata, Liam Keeble, Caitlin Halfacre, Joseph V. Casillas, Irys-Amélie Champagne, Kaidi Chen, Melanie Röthlisberger, Erin M. Buchanan & Timo B. Roettger. 2023. Reproducible research practices and transparency across linguistics. Glossa Psycholinguistics 2(1). 1–36. https://doi.org/10.5070/G6011239 . Search in Google Scholar

Bray, Carly. 2023. Applying decolonial research principles in corpus-based critical discourse analysis of Aboriginal and Torres Strait Islander peoples and issues. Paper presented at the 7th meeting of the International Society for the linguistics of English (ISLE 7) . Australia: University of Queensland 19–22 June 2023. Search in Google Scholar

Brezina, Vaclav. 2018. Statistical choices in corpus-based discourse analysis. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review , 259–280. London & New York: Routledge. Search in Google Scholar

Caldas-Coulthard, Carmen Rosa. 1994. On reporting reporting: The representation of speech in factual and factional narratives. In Malcolm Coulthard (ed.), Advances in written text analysis , 295–308. London: Routledge. Search in Google Scholar

Caple, Helen, Changpeng Huan & Monika Bednarek. 2020. Multimodal news analysis across cultures . Cambridge: Cambridge University Press. Search in Google Scholar

Collins, Peter & Xinyue Yao. 2019. AusBrown: A new diachronic corpus of Australian English. ICAME Journal 43(1). 5–21. https://doi.org/10.2478/icame-2019-0001 . Search in Google Scholar

Denzin, Norman K. 1970. The research act in sociology: A theoretical introduction to sociological methods . London & Chicago: Butterworths. Search in Google Scholar

Doyle, Paul. 2005. Replicating corpus-based linguistics: Investigating lexical networks in text. In Proceedings from corpus linguistics 2005 . Birmingham: University of Birmingham. https://www.birmingham.ac.uk/documents/college-artslaw/corpus/conference-archives/2005-journal/lexiconodf/coling2005paper.pdf (accessed 16 June 2023). Search in Google Scholar

Egbert, Jesse. 2023. “I tried”: Transparency in reporting methods. Linguistics with a Corpus . https://linguisticswithacorpus.wordpress.com/2023/10/31/i-tried-transparency-in-reporting-methods/ (accessed 18 March 2024). Search in Google Scholar

Egbert, Jesse & Paul Baker. 2016. Research synthesis. In Paul Baker & Jesse Egbert (eds.), Triangulating methodological approaches in corpus-linguistic research , 183–208. London & New York: Routledge. Search in Google Scholar

Egbert, Jesse & Paul Baker (eds.). 2020. Using corpus methods to triangulate linguistic analysis . London & New York: Routledge. Search in Google Scholar

Fairclough, Norman. 1988. Discourse representation in media discourse. SocioLinguistics 17. 125–139. Search in Google Scholar

Gray, Bethany. 2016. Lexical bundles. In Paul Baker & Jesse Egbert (eds.), Triangulating methodological approaches in corpus-linguistic research , 33–56. London & New York: Routledge. Search in Google Scholar

Hardt-Mautner, Gerlinde. 1995. ‘Only connect’: Critical discourse analysis and corpus linguistics. UCREL Technical Paper 6. Lancaster: University of Lancaster. http://ucrel.lancs.ac.uk/papers/techpaper/vol6.pdf (accessed 4 July 2023). Search in Google Scholar

Hober, Nicole, Tülay Dixon & Tove Larsson. 2023. Towards increased reliability and transparency in projects with manual linguistic coding. Corpora 18(2). 245–258. https://doi.org/10.3366/cor.2023.0284 . Search in Google Scholar

Jaworska, Sylvia & Karen Kinloch. 2018. Using multiple data sets. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review , 110–129. London & New York: Routledge. Search in Google Scholar

Jufri, Sony & Chao Sun. 2022. Quotation tool. v1.0.0 Australian text analytics platform. Software. Available at: https://github.com/Australian-Text-Analytics-Platform/quotation-tool . Search in Google Scholar

Lee, Kelvin K. H. 2024. Using constructed week sampling to compile a newspaper corpus. Sydney Corpus Lab. https://sydneycorpuslab.com/using-constructed-week-sampling-to-compile-a-newspaper-corpus/ (accessed 18 March 2024). Search in Google Scholar

Leech, Geoffrey. 1992. Corpora and theories of linguistic performance. In Jan Svartvik (ed.), Directions in corpus linguistics , 105–122. Berlin: De Gruyter Mouton. Search in Google Scholar

Lorenzo-Dus, Nuria. 2023. Digital grooming. Discourses of manipulation and cyber-crime . New York: Oxford University Press. Search in Google Scholar

Marchi, Anna & Charlotte Taylor. 2009. If on a winter’s night two researchers…: A challenge to assumptions of soundness of interpretation. Critical Approaches to Discourse Analysis across Disciplines 3(1). 1–20. Search in Google Scholar

Marchi, Anna & Charlotte Taylor. 2018. Introduction: Partiality and reflexivity. In Charlotte Taylor & Anna Marchi (eds.), Corpus approaches to discourse: A critical review , 1–15. London & New York: Routledge. Search in Google Scholar

McEnery, Tony. 2016. Keywords. In Paul Baker & Jesse Egbert (eds.), Triangulating methodological approaches in corpus-linguistic research , 20–32. London & New York: Routledge. Search in Google Scholar

McEnery, Tony & Vaclav Brezina. 2022. Fundamental principles of corpus linguistics . Cambridge: Cambridge University Press. Search in Google Scholar

McEnery, Tony & Andrew Hardie. 2012. Corpus linguistics: Method, theory and practice . Cambridge: Cambridge University Press. Search in Google Scholar

McGlashan, Mark. 2021. Networked discourses of bereavement in online COVID-19 memorials. International Journal of Corpus Linguistics 26(4). 557–582. https://doi.org/10.1075/ijcl.21135.mcg . Search in Google Scholar

Musgrave, Simon. 2021. What are the FAIR and CARE principles and why should corpus linguists know about them? Sydney Corpus Lab . https://sydneycorpuslab.com/what-are-the-fair-and-care-principles-and-why-should-corpus-linguists-know-about-them/ (accessed 4 July 2023). Search in Google Scholar

Nartey, Mark. 2022. Centering marginalized voices: A discourse analytic study of the Black lives matter movement on Twitter. Critical Discourse Studies 19(5). 523–538. https://doi.org/10.1080/17405904.2021.1999284 . Search in Google Scholar

Nartey, Mark (ed.). 2023. Voice, agency and resistance: Emancipatory discourses in action, [Special issue]. Critical Discourse Studies 19(5). Search in Google Scholar

Nartey, Mark & Isaac N. Mwinlaaru. 2019. Towards a decade of synergizing corpus linguistics and critical discourse analysis: A meta-analysis. Corpora 14(2). 203–235. https://doi.org/10.3366/cor.2019.0169 . Search in Google Scholar

Paquot, Magali & Marcus Callies. 2020. Promoting methodological expertise, transparency, replication, and cumulative learning: Introducing new manuscript types in the International Journal of Learner Corpus Research. International Journal of Learner Corpus Research 6(2). 121–124. https://doi.org/10.1075/ijlcr.00014.edi . Search in Google Scholar

Partington, Alan. 2008. The armchair and the machine: Corpus-assisted discourse studies. In Carol Taylor Torsello, Katherine Ackerley & Erik Castello (eds.), Corpora for university language teachers , 95–118. Bern: Peter Lang. Search in Google Scholar

Perkel, Jeffrey M. 2018. Why Jupyter is data scientists’ computational notebook of choice. Nature 563(7732). 145–146. https://doi.org/10.1038/d41586-018-07196-1 . Search in Google Scholar

Pimentel, João Felipe, Leonardo Murta, Vanessa Braganholo & Juliana Freire. 2021. Understanding and improving the quality and reproducibility of Jupyter notebooks. Empirical Software Engineering 26. 65. https://doi.org/10.1007/s10664-021-09961-9 . Search in Google Scholar

Scott, Mike. 2020. WordSmith Tools (version 8) . Stroud: lexical Analysis Software Ltd. Software. Available at: https://lexically.net/wordsmith/ . Search in Google Scholar

Schweinberger, Martin & Michael Haugh. in press. Reproducibility and transparency in interpretive corpus pragmatics. International Journal of Corpus Linguistics . Search in Google Scholar

Shen, Helen. 2014. Interactive notebooks: Sharing the code. Nature 515. 151–152. https://doi.org/10.1038/515151ax . Search in Google Scholar

Sönning, Lukas & Valentin Werner. 2021. The replication crisis, scientific revolutions, and linguistics. Linguistics 59(5). 1179–1206. https://doi.org/10.1515/ling-2019-0045 . Search in Google Scholar

Stubbs, Michael. 1996. Text and corpus analysis: Computer-assisted studies of language and culture . Oxford: Blackwell. Search in Google Scholar

Taylor, Charlotte & Anna Marchi (eds.). 2018. Corpus approaches to discourse: A critical review . London & New York: Routledge. Search in Google Scholar

Vanichkina, Darya & Monika Bednarek. 2022. Australian obesity corpus manual . https://osf.io/h6n82 (accessed 3 March 2022). Search in Google Scholar

Vastola, John. 2023. Why I stopped using Jupyter notebooks and why you should too. Medium . https://levelup.gitconnected.com/why-i-stopped-using-jupyter-notebook-and-why-you-should-too-b1e564d49ea1 (accessed 12 May 2023). Search in Google Scholar

Wang, Jiawei, Tzu-yang Kuo, Li Li & Andreas Zeller. 2020. Restoring reproducibility of Jupyter notebooks. In ICSE ’20: Proceedings of the ACM/IEEE 42nd international Conference on software engineering: Companion proceedings , 288–289. Search in Google Scholar

White, Peter R. R. 2012. Exploring the axiological workings of ‘reporter voice’ news stories—attribution and attitudinal positioning. Discourse, Context & Media 1(2–3). 57–67. https://doi.org/10.1016/j.dcm.2012.10.004 . Search in Google Scholar

Wodak, Ruth. 2001. The discourse historical approach. In Ruth Wodak & Michael Meyer (eds.), Methods of critical discourse analysis , 63–94. London: SAGE. Search in Google Scholar

  • Supplementary Material

This article contains supplementary material ( https://doi.org/10.1515/cllt-2023-0104 ).

© 2024 the author(s), published by De Gruyter, Berlin/Boston

This work is licensed under the Creative Commons Attribution 4.0 International License.

  • X / Twitter

Supplementary Materials

Please login or register with De Gruyter to order this product.

Corpus Linguistics and Linguistic Theory

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

  • > Journals
  • > Language Teaching
  • > Volume 56 Issue 1
  • > Replicating corpus-based research in English for academic...

corpus analysis in research articles

Article contents

Replicating corpus-based research in english for academic purposes: proposed replication of cortes (2013) and biber and gray (2010).

Published online by Cambridge University Press:  07 October 2021

Accurate description of language use is central to English for academic purposes (EAP) practice. Thanks to the development of corpus tools, it has been possible to undertake systematic studies of language in academic contexts. This line of research aims to provide detailed and accurate characterization of academic communication and to ultimately inform EAP practice. Very few studies, however, have attempted to ascertain whether, and to what the extent, corpus-based findings have achieved such goals. The diverse nature of EAP, and the unique methodological challenges involved in compiling and using corpora, provide sufficient incentive for replication research in this area. The present article makes a case for replication of corpus-based studies in the field of EAP. It argues that replication research not only enhances the credibility of corpus linguistics for EAP pedagogy and research but also provides practical advice for EAP teachers and materials designers. It then looks at how two key corpus-based studies on the topic, Cortes ( 2013 ) and Biber and Gray ( 2010 ), can be replicated with respect to replication approaches described in Porte ( 2012 ).

Access options

Crossref logo

This article has been cited by the following publications. This list is generated based on data provided by Crossref .

  • Google Scholar

View all Google Scholar citations for this article.

Save article to Kindle

To save this article to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Volume 56, Issue 1
  • Taha Omidian (a1) , Oliver James Ballance (a2) and Anna Siyanova-Chanturia (a1) (a3)
  • DOI: https://doi.org/10.1017/S0261444821000367

Save article to Dropbox

To save this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Dropbox account. Find out more about saving content to Dropbox .

Save article to Google Drive

To save this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you used this feature, you will be asked to authorise Cambridge Core to connect with your Google Drive account. Find out more about saving content to Google Drive .

Reply to: Submit a response

- No HTML tags allowed - Web page URLs will display as text only - Lines and paragraphs break automatically - Attachments, images or tables are not permitted

Your details

Your email address will be used in order to notify you when your comment has been reviewed by the moderator and in case the author(s) of the article or the moderator need to contact you directly.

You have entered the maximum number of contributors

Conflicting interests.

Please list any fees and grants from, employment by, consultancy for, shared ownership in or any close relationship with, at any time over the preceding 36 months, any organisation whose interests may be affected by the publication of the response. Please also list any non-financial associations or interests (personal, professional, political, institutional, religious or other) that a reasonable reader would want to know about in relation to the submitted work. This pertains to all the authors of the piece, their spouses or partners.

USING CORPORA TO AID QUALITATIVE TEXT ANALYSIS

An interdisciplinary approach.

  • Jędrzej Olejniczak University of Wrocław https://orcid.org/0000-0003-4977-5459

Aim. The aim of this paper is to present and exemplify a number of basic uses of corpus-based text analysis tools that can supplement and provide additional insight for an otherwise qualitative analysis of a text. I attempt to show that nowadays certain corpus tools are easily accessible to any researcher and can be used to enrich the results of studies concerned with texts. 

Methods. This paper comprises the basics of corpus building, the main types of data that can be drawn from a simple corpus and a detailed description of four methods that can aid text analysis: wordlists, concordances, dispersion plots and keywords. Each of those four methods is thoroughly described, including a number of examples of its applications and indicates its possible limitations.

Results. The examples provided suggest that even performing a very simple corpus analysis of a text might unveil certain trends and phenomena not noticeable through the classic qualitative text analysis methods ( e.g. close reading). The paper argues that corpus research can hence work as an extension of a quantitative analysis (or be its starting point) by examining themes and keywords present in a given text and enrich the results of a qualitative study with a fresh perspective. Finally, the paper claims that basic corpus analysis can, in fact, be successfully employed by researchers who do not have any prior experience with statistics or corpora.

Author Biography

1. Doctoral Student (University of Wrocław) 2. Corpus linguistics, digital humanities, translation studies, literary and academic translation, translation teaching methodology 3. Language editor: Miscellanea Posttotalitariana Wratislaviensia, Between

Al-Mosaiwi & M., Johnstone, T. (2018). In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation. Clinical Psychological Science, 1-14. https://doi.org/10.1177/2167702617747074

Anthony, L. (2018). AntConc (Version 3.5.7) [Computer Software]. Tokyo: Waseda University. Available from http://www.laurenceanthony.net/software

Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. In Baker, M., Francis, G. and Tognini-Bonelli, E. (Eds.), Text and Technology: In Honour of John Sinclair (233-250). Amsterdam: John Benjamins Publishing Company.

Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing 8(4), 243-257. https://doi.org/10.1093/llc/8.4.243

Davies, M. (2004-). BYU-BNC. (Based on the British National Corpus from Oxford University Press). Available online at https://corpus.byu.edu/bnc/

Fischer-Starcke, B. (2010). Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries. London: Continuum Publishing.

Jeans, C. (2016) Vegetarian Tigers of Paradise. Aberystwyth: Honno Welsh Women's Press.

Milizia, D. (2010). Keywords and Phrases in Politcial Speeches. In Bondi, M., Scott, M. (Eds.), Keyness in Texts (127-145). Amsterdam/Philadelphia: John Benjamins Publishing Company.

O’Sullivan, J., Bazarnik, K., Eder, M. & Rybicki, J. (2018). Measuring Joycean Influences on Flann O’Brien. Digital Studies, 8(1), 1–25. https://doi.org/10.16995/dscn.288

Project Gutenberg. (n.d.). Retrieved on 24 April 2018 from www.gutenberg.org.

Quirk, R. & Greenbaum, S. (1973). A University Grammar of English. London: Longman.

Rauscher, J., Swiezinski, L, Riedl, M. & Biemann, C. (2013). Exploring Cities in Crime: Significant Concordance and Co-occurrence in Quantitative Literary Analysis. In Kazantseva, A. & Szpakowicz, S. (Eds.), Proceedings of the Computational Linguistics for Literature Workshop at NAACL-HLT 2013 (61-71). Atlanta, GA, USA: Association for Computational Linguistics.

Rybicki, J. (2012). The great mystery of the (almost) invisible translator: stylometry in translation. In Oakley, M. and Ji, M. (Eds.), Quantitative Methods in Corpus-Based Translation Studies (231-248). Amsterdam: John Benjamins.

Scott, M. (2010a). WordSmith Tools Manual. Retrieved on 16 April 2018 http://www.lexically.net/downloads/version5/HTML/index.html?wordlist_overview.htm

Scott, M. (2010b). WordSmith Tools Manual. Retrieved on 12 April 2018 http://www.lexically.net/downloads/version5/HTML/?dispersion_basics.htm

Scott, M. (2010c). WordSmith Tools Manual. Retrieved on 24 April 2018 from http://www.lexically.net/downloads/version5/HTML/index.html?keyness_definition.htm

Scott, M. (2016). WordSmith Tools Version 7, Stroud: Lexical Analysis Software. Available from http://lexically.net/wordsmith/

Sinclair, J. (2005). Corpus and text — basic principles. In Wynne, M. (Ed.) Developing linguistic corpora: A guide to good practice (1–16). Oxford: Oxbow Books.

Stubbs, M. (2010). Three Concepts of Keywords. In Scott, M. and Bondi, M. (Eds.), Keyness in Texts (21-42). Amsterdam: John Benjamins Publishing.

Vonnegut, K. (1969). Slaughterhouse-Five or the Children's Crusade. New York: Bantam Doubleday Dell Publishing Group.

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal. All authors agree for publishing their email adresses, affiliations and short bio statements with their articles during the submission process.

How to Cite

  • Endnote/Zotero/Mendeley (RIS)

Most read articles by the same author(s)

  • Patrycja Karpińska, Jędrzej Olejniczak, Fifty Shades of Translation: On the Patterns in the English to Polish Translation of Erotica , Journal of Education Culture and Society: Vol. 10 No. 2 (2019): Journal of Education Culture and Society

Developed By

Information.

  • For Readers
  • For Authors
  • For Librarians

corpus analysis in research articles

Corpus analysis

Related articles.

Roughly, the data available for linguistic research stem from either of two sources: intuitions about language or observations of linguistic events. Collections of data of the latter kind are called corpora. Although corpus data have been used throughout the history of linguistic research, a real breakthrough in their use came in the course of the 20th century when it became possible to store and search large quantities of text electronically. In the second half of the previous century the use of corpus data in their new form was stimulated by the dissatisfaction felt by some with the preference of the linguistic mainstream for intuitive data. Positions taken with respect to the appropriateness for linguistic research of either corpus data or intuitive data have occasionally been quite extreme, but the best policy for any linguist is probably to regard the two as being complementary, rather than in opposition to each other. However, it must be borne in mind that corpus data reflect what people actually say and write, and as such provide the most appropriate data for linguists who want to investigate the use of language rather than linguistic competence or linguistic universals. And since the study of language use is not only concerned with the description of what people actually say and write, but also with the question why in a given verbal or situational context they use one linguistic construct rather than another, it follows that for a collection of linguistic events to be a corpus, it has to meet minimally two conditions. The first is that it should present a faithful record of the utterances contained in running texts (rather than, say, a collection of examples of a particular linguistic phenomenon), the second is that it should give information about the questions by whom, where, when and why the texts were produced. In other words, apart from a record of utterances, a corpus should contain the fullest possible information about the verbal and situational contexts in which the utterances were produced. The fact that corpora are repositories of language use entails that corpus-based studies are naturally biased towards the study of specific languages, genres and language varieties.

  • Collocation and colligation
  • Corpus pragmatics
  • Language acquisition
  • Geoffrey Leech, 1936-2014 – The pragmatics legacy
  • Postcolonial pragmatics
  • Pragmatic markers
  • Psycholinguistics
  • Structuralism
  • Text and discourse linguistics
  • Translation studies
  • Variational pragmatics

           

Encyclopedia for Writers

Writing with ai, corpus linguistic analysis – a bird’s eye view of writing.

  • CC BY-NC-ND 4.0 by Laura Aull - University of Michigan

Corpus linguistics fuels AI innovation: Teams of computational linguists, including those at OpenAI, delve into the vast expanse of the internet, amassing an extensive corpus to predict textual patterns. Yet, when classic lines, like T.S. Eliot's 'I measure my life in coffee spoons' from 'The Love Song of J. Alfred Prufrock,' are absorbed without proper acknowledgment, pressing ethical questions emerge. This illustration captures that very sentiment, as Eliot's iconic line spirals into the corpus vortex.

Related Concepts: Research Methodology ; Rhetorical Analysis ; Textual Research Methods

Table of Contents

What is Corpus Linguistics Analysis?

How we (usually) read and write.

If you are like most people in the United States, you read and write one phrase, sentence, and paragraph at a time. Then, you consider all the words, sentences, and paragraphs of a full individual text, and that tells you what that text is about.

For example, when you read the news, you probably read or skim each news article or post from the beginning onward, and then you think about what each one is about.  For a class or your own purposes, you might also consider the audience of a particular article, such as whether it is international or domestic, or left-leaning or right-leaning. This kind of attention to the rhetoric and rhetorical situation of individual texts is something you have probably practiced a good deal.

Reading one sentence and text at a time is what your teachers tend to do when they read papers, too: they read your paper from start to finish, and then they read your classmate’s paper, and so on.

You and your instructors may also think about some aspects of writing across individual texts, such as genre or purpose. Your teachers might look across a stack of papers, for instance, and consider how well a class of students has used primary evidence in a research paper. In another example, you might look over a Twitter feed to see how often people retweet posts in a particular thread. In such instances, you and your teachers are paying attention to aspects of the rhetorical situation across multiple texts.

By contrast, you probably spend little time thinking about how language —in words, phrases, and sentences—is used across the texts you read and write. That kind of focus, on language across texts, is common in linguistic approaches to writing, which are more popular outside of the U.S. than inside the U.S. Accordingly, if your writing teachers have been trained in U.S. rhetoric and composition rather than linguistics, they know a lot about students’ writing generally but may not know a lot about the specific language that students use across their papers and across courses.

What does all this mean? Most U.S. readers and writers, and most U.S. student writing research, tends to discuss written texts one text at a time. Understanding across texts tends to focus on contextual patterns, such as audience or genre. Most U.S. readers and writers know less about textual patterns, or patterns of language across texts and contexts.

Of course, on some level, you do think about language patterns, maybe without even realizing it. It’s part of why you can recognize a newspaper article and why you know how to write a text message: you have paid attention to how people use language in patterned ways. But this kind of knowledge—the kind we pick up through casual observation—is often subconscious and is rarely systematic. For example, you can probably write a text message that is appropriate for a given rhetorical situation without thinking much about it, because you have picked up on what kind of language is appropriate for the genre (text message) and audience (your recipient, such as a family member or friend). But what do you do when you need to write something unfamiliar to you? If you are writing your first college composition essay, or your first psychology case study, how do you know what language patterns are preferred?

Corpus Linguistic Analysis

This brings us to analysis that uses computer-aided tools to offer us a view of language patterns across texts—a bird’s eye view of written language patterns. This kind of analysis is called corpus linguistic analysis: the term corpus refers to a body of texts, and linguistic analysis , as you saw before, refers to the examination of patterns of language use. As a complement to understanding one text at a time, corpus linguistic analysis can help us systematically analyze and understand written language in terms of patterns across many texts and across time.

Reading so far, you may already be picking up on three premises, or assumptions, related to corpus linguistics:

  • Texts make meaning in patterned ways across texts and contexts.
  • It can be hard to comprehend language patterns if we are trained to read and analyze only one text at a time.
  • Attention to language across texts and contexts can teach us additional information about what is expected in particular rhetorical situations.

You are probably already picking up on a detailed definition of corpus linguistic analysis, too.   Corpus linguistic analysis refers to the examination of textual patterns in a selected body of naturally produced texts, usually via computer-aided tools that facilitate searching, sorting, and calculating large-scale textual patterns.

Notice two key terms inside this definition:

  • Textual patterns : lexical or grammatical patterns that persist across texts in a corpus, in contrast to more varied choices or to patterns in other corpora
  • Naturally produced texts : a given corpus consists only of language produced for authentic, real- world purposes

In sum, corpus linguistic analysis is about identifying choices people make (and don’t make) across texts, and we can use the results of such analysis to enhance our understanding of how language and texts work. Corpus linguistic analysis has been used a lot since the mid- to late-20 th century, especially outside of the U.S., in places like England, Asia, and Australia, to help teachers and students learn about expert and student writing choices that come up again and again.

The Bird’s-Eye View of Language: Why Corpus Linguistic Analysis?

You may not be convinced yet. If we are most used to reading and writing one text at a time, why introduce something different? Why get a bird’s eye view of language patterns across texts?

Some good reasons include that we get to see different details when we look across texts—details we can miss or misperceive when we read one text at a time. Here are two key reasons why corpus linguistic analysis can be useful, followed by examples from corpus linguistic analysis of academic writing.

  • Our perceptions of language use are often misleading .

It’s easy to come to inaccurate conclusions about language, because some things catch our attention more than others. For instance, people tend to think that language is changing rapidly when they read slang words on the Internet. But actually, there are many more words on the Internet that have been around a long time than there are new words. Corpus linguistic analysis has shown that only around 3% of online language use includes internet-specific slang such as abbreviations. It’s just that the newer words grab our attention more than the old ones. In this example, corpus linguistic analysis helps us quantify what percentage of words on the internet are actually new words, and what percentage are words we have been using for a while. Let’s consider one more example, this one from research on academic writing .

Have you ever found it difficult to read college textbooks? Doug Biber and his research team used corpus linguistic analysis to analyze different kinds of language use on college campuses, including research articles, textbooks, and office hours. One thing they wanted to investigate was how textbooks compared to these other kinds of language use, because instructors often think that textbooks provide easy-to-read narrative descriptions for students.

Based on corpus linguistic analysis of all of these kinds of language, Biber et al. found that textbooks are not characterized by narrative, accessible language like spoken conversation. Instead, they tend to include dense, present-tense discussions of implications, making textbooks challenging to read for students. In some ways, textbooks are just as difficult to parse as research articles.

  • Much of our knowledge about written language is tacit, or unconscious (Odell et al.).

Once we have learned to write in a particular way, it is easy to forget the conscious steps we had to learn to do it in the first place. That is why it can be hard for your teachers to realize what might be challenging about an academic writing task they assign, and why it might be hard for you to explain to a grandparent how to write a tweet or how to use hashtags. Let’s again turn to a more specific example from research on academic writing.

Have you ever felt like you didn’t know what a teacher wanted in your writing? What teachers want can be subtle, or even unstated. Brown and Aull did a corpus analysis of advanced placement English essays that showed two distinct patterns in successful and unsuccessful essays. The successful student writing included specific, detailed phrases, while unsuccessful student writing included generic, emphatic phrases. This means, for instance, that a successful student essay might include the following sentence:

A twentieth-century understanding of grief suggests that it takes time .

In this sentence, a detailed phrase about an understanding of grief (underlined in the example) is the subject of the sentence.

By contrast, an unsuccessful student essay might instead say:

Grief obviously takes time .

This sentence includes a simple subject ( grief ) as well as an emphatic word obviously .To academic readers, the second sentence can seem too general and too strong.

The bottom line is that our perceptions of language use can miss important patterns, because we tend to read one word, sentence, and text at a time. Getting a bird’s-eye view allows us to understand more about the kinds of choices people tend to make with language, including successful and unsuccessful choices in academic writing. As we learn about such patterns and practice looking for them, we can become more adept at recognizing what characterizes different kinds of written texts.

Example exercise: Words that hang out with one another

Let’s get some practice thinking about language patterns. We’ll do this by considering collocations , or the words that most often hang out with other words. (The technical, fancy-sounding definition of collocations is “the habitual juxtaposition of a particular word with another word or words with a frequency greater than chance.”)

First, try to guess: What words collocate, or hang out, most often with the word idea in U.S. English?

Specifically, what words do you think come just before idea , in all sorts of U.S. English (spoken, fiction, academic, news, and magazine)? List your top 5 guesses.

________________ idea

To test your guesses, we can turn to corpus linguistic analysis, using the Corpus of Contemporary American English (COCA). COCA is an online database where you can search all kinds of patterns in American English, across spoken conversation, fiction, academic writing, news, and magazines. You’ll see COCA listed in the resources below with a URL so that you can check it out yourself.

For this search, we’ll look for all words immediately to the left of idea. These are called 1L collocates, because they appear 1 space to the left .

Use of the word IDEA in COCA (all registers)

How many of your guesses were right? Did you guess that not only are good idea and bad idea popular, but so too are the expressions (the) very idea, basic idea, and general idea?

Let’s think about these patterns. Several collocations show evaluation of an idea ( good idea, bad idea, great idea ), including some comparison ( better idea, new idea ). Others show emphasis on an idea ( (the) very idea ). Finally, others convey a summary or gist of an idea ( whole idea, basic idea, general idea ). ( Clear idea is used both in evaluation and in summary statements.)

Many people guess that people describe ideas as good and bad , but they don’t realize how often speakers and writers use idea to let their audience know that they are summarizing something. As you read before, this is the kind of thing that corpus linguistic analysis can uncover: common patterns of language use that we don’t necessarily pay attention to but that can tell us what matters to people in a given type of writing. Picking up on these collocates might, for instance, help students begin to notice how often people summarize, and when they tend to do so.

If we use the above examples, for instance, you could consider the following as you begin to read and write in a new course: How do writers describe ideas? Do they evaluate them (e.g., as good, bad, or correct )? Do they describe them (e.g., as theoretical, abstract, or practical )? Do they summarize them (e.g., general, overall )?

Let’s explore one more example, this one concerning something many students wonder about: the first person in academic writing.

Here’s our question for this one: How do writers draw attention to themselves as writers by using the first person I or we ?

Let’s first make a guess about expert academic writing. In academic writing published in the U.S., what words do you think collocate, or hang out, with I ? Specifically, what words do you think most often appear right after I, or immediately to the right of the word I , in academic writing? Again, note your top 5 guesses.

I ________________

We can again use corpus linguistic analysis to find out how accurate your guesses are. Specifically, we can use the Corpus of Contemporary American English academic subcorpus (COCAA) and search for words  1 space to the right, or 1R, of I.

Use of the word I in COCA, Academic writing

First of all, using COCAA, we can see that even though lots of students have heard that they shouldn’t use I in academic writing, corpus linguistic analysis shows us that many published academic writers use I, or we .

How do they use it? In these collocates, we can see a clear and consistent pattern: academic writers use I as the subject of verbs, and these verbs tend to help writers describe their processes; consider, for instance, examples like I have observed, I was able to, I had collected ). Academic writers also use I to describe their thinking ( I think that , I would suggest ). They also, though less often, use I to describe beliefs: I believe is the final of the last of the top ten.

How did your guesses hold up? A lot of people guess argue, thinking that academics write I argue a lot, but it is not in the top ten. Conversely, few people guess I have or I had. In addition, many students are surprised to see that academic writers are often tentative rather than explicit about their arguments: as you can see, academic writers use I would, I think, and I could far more often than I argue.

As you can see, sometimes corpus linguistic analysis can surprise us. It shows us that textbooks can be hard to read, that student grades are based in part on the subjects of their sentences, and that academic writers use I to describe steps in their thinking and processes. With more analysis, we learn more.

Try out the resources below, and see what patterns you find with a bird’s eye view across many texts. 

More examples of corpus linguistics research

Written versus spoken English:

  • Very formal, academic writing tends to contain lots of nouns and prepositions, while more informal language, including spoken conversation, tends to contain more pronouns and verbs (Biber; Biber and Gray).

Student writing:

  • Successful writing by late-undergraduate and early-graduate writers show clear differences depending on the discipline. For example, writing in Philosophy and Education is more narrative and interpersonal than writing in Biology or Physics. Writing in Political Science and Linguistics falls in between (Hardy and Römer).
  • First-Year college writers tend to boost, or intensify their ideas with words such as really, truly, or clearly, more than they hedge or qualify their ideas, with words such as perhaps, might, or possibly . This can make first-year writing seem overstated to many academic readers, who tend to appreciate some space for doubt and exception (Aull First-Year; Aull et al.; Aull and Lancaster; Hyland “Undergraduate Understandings”).

Published academic writing across disciplines:

  • Writers in the social and natural sciences tend to use more first person pronouns ( I, we ) to describe experimental processes, while writers in the humanities tend to use first person pronouns to showcase interpretive reasoning (Hyland “Stance”).
  • Academic writers across all disciplines still tend to hedge, or qualify, more than they boost, or intensify (Hyland Disciplinary Discourses ).

Corpus Resources

Corpus of Contemporary American English (COCA): https://www.english-corpora.org/coca/

Details about COCA: Davies, M. (2011). Word frequency data from the Corpus of Contemporary American English (COCA).

Michigan Corpus of Upper-Level Student Papers (MICUSP) :

Details about MICSUP:Römer, Ute and O’Donnell, Matthew. From student hard drive to web corpus (part 1): the design, compilation and genre classification of the Michigan Corpus of Upper-level Student Papers (MICUSP). Corpora , vol. 6, no. 2, 2011: 159-177.

Collocation games , see e.g., Wu, Franken, and Witten. Collocation games from a language corpus. In Digital Games in Language Learning and Teaching . Palgrave Macmillan, London, 2012: 209-229.

The Grammar Lab : David West Brown’s www.thegrammarlab.com/   

Further reading

Corpus linguistic analysis can be particularly valuable for identifying student-specific discourse (Römer and Wulff)

Textual patterns with attention to discipline/ genre/ assignment/ level/ course

  • Multiple features combine to create coherent styles, such as more a persuasive or more formal style, that are equally successful even for the same task (Crossley et al.)
  • Some academic genres and fields (e.g., argumentative essays; humanities) tend to include more features characteristic of informational writing (e.g., nouns and prepositions)
  • Others (e.g., reports and natural sciences) include features more characteristic of interpersonal writing (e.g., adverbs and pronouns)
  • (Aull “Argumentative Versus Explanatory Discourse”; Hardy and Römer; Nesi and Gardner)

Textual patterns with attention to genre/ assignment/ level/ course

  • Students may develop vis-à-vis how they cite, engage with, and project others’ views (Ädel and Garretson; Coffin; Coffin and Hewings)
  • As undergraduate students develop, they hedge more and boost less, and they begin to use certain cohesive strategies more (Aull and Lancaster)
  • Successful advanced student writing includes nouns that are metadiscoursal and methodology-related (Hardy and Römer), versus more generic nouns, such as people or society , that are key in first- year writing (Aull et al.)

Brevity – Say More with Less

Brevity – Say More with Less

Clarity (in Speech and Writing)

Clarity (in Speech and Writing)

Coherence – How to Achieve Coherence in Writing

Coherence – How to Achieve Coherence in Writing

Diction

Flow – How to Create Flow in Writing

Inclusivity – Inclusive Language

Inclusivity – Inclusive Language

Simplicity

The Elements of Style – The DNA of Powerful Writing

Unity

Suggested Edits

  • Please select the purpose of your message. * - Corrections, Typos, or Edits Technical Support/Problems using the site Advertising with Writing Commons Copyright Issues I am contacting you about something else
  • Your full name
  • Your email address *
  • Page URL needing edits *
  • Comments This field is for validation purposes and should be left unchanged.

Featured Articles

Student engrossed in reading on her laptop, surrounded by a stack of books

Academic Writing – How to Write for the Academic Community

corpus analysis in research articles

Professional Writing – How to Write for the Professional World

an illustration of a scale. "Opinion" is being weighed on the left side of the scale. "Facts & Research" are being depicted on the right side. It's clear from the illustration that "facts & research" weigh more than "opinion."

Authority & Credibility – How to Be Credible & Authoritative in Research, Speech & Writing

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Journal Proposal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

sustainability-logo

Article Menu

corpus analysis in research articles

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Research advancement in forest property rights: a thematic review over half a decade using natural language processing.

corpus analysis in research articles

1. Introduction

1.1. forests through human societies: from ancient stewardship to global sustainability, 1.2. from current forest property rights changes to the statement of purpose, 2. state of knowledge and legal frameworks, 2.1. understanding property rights, 2.2. forests property rights: legal settings for forest management and policy-making, 3. methodology, 3.1. proposed method, 3.1.1. co-word analysis based on a hierarchical clustering algorithm, 3.1.2. assessing the strategic importance of clusters, 3.2. data handling, 3.2.1. data collection and justification of the period covered, 3.2.2. data pre-processing.

  • Word lemmatization: Reduces words to their basic form;
  • Converting keywords to lowercase: Standardizes keywords to avoid duplication due to case differences;
  • Generating N-grams: Creates combinations of consecutive words in the keywords to capture key concepts;
  • Selecting a list of the most frequent 50 N-grams: Enables us to focus on the most frequently discussed concepts. This list identifies the main research topics, with a total of 151 articles, 98 of which cover the 50 most frequently used concepts between 2019 and 2023.

3.2.3. Data Vectorization

4. findings.

Cluster TermsStudiesThematicKey TrendsImportance of Cluster
forest policy, private property right, public trust resource, forestry, forest regulation, forest resource, forest management[ , , , , , , , , , , , , , , , ]Forest policy and resource management.This cluster of studies examines forest policy and forest resource governance in different national contexts. It examines forest management policies, regulations and practices, and their impact on private landowners, local communities and the environment. Studies also explore property rights, management decentralization, forest sustainability and implementation challenges. Countries covered include the USA, Bhutan, China, India, Slovenia, the Federation of Bosnia-Herzegovina, Sweden, Turkey, Peru and Zimbabwe.
This cluster is relatively well connected and centralized, indicating good integration within the network.
conservation, indigenous community, land tenure, conservation policy, redd+[ , , , , , , , , , , , ]Conservation and the rights of indigenous communities.This cluster of articles explores challenges and solutions related to forest governance and sustainable management. Studies highlight the impacts of programs such as Payment for Ecosystem Services (PES), land rights reform, and local community participation. Contexts range from Argentina and Brazil to Peru, Cameroon and Sweden, highlighting common challenges around land rights, stakeholder conflicts and alternative governance models.
This cluster shows good integration and centrality in the network, slightly better than Cluster 1.
institutional change, forest ecosystem service, protected area, forest governance, forest conservation[ , , , , , , , , ]Institutional change and forest conservation.This cluster examines the impact of property rights reforms, such as the law on the recognition of the rights of traditional forest dwellers, and the implications of these changes for forest conservation. Studies also include the evaluation of payments for ecosystem services, the analysis of stakeholders’ property rights in natural parks, and the effect of land tenure reforms on forest carbon sequestration. Countries covered include India, China, Croatia, the Philippines, Bolivia, Madagascar, Kenya, Peru and various European countries.
This cluster is somewhat connected and centralized, but less so than Clusters 1 and 2.
policy, livelihood, agroforestry[ , , , , ]Policies, livelihoods and agroforestry.This cluster of studies explores the impact of land rights on forest management across a variety of contexts. Research focuses on the effects of agriculture-related deforestation in the tropical Congo Basin, and the influence of forest conservation policies in China, Finland, Indonesia, Romania, Peru, Ghana and India. The studies highlight the importance of land tenure rights in guiding agricultural investment and forest land management, emphasizing their role in promoting sustainable agricultural practices and reducing deforestation.
This cluster is extremely well connected and centralized, occupying a major strategic position of control and strong cohesion within the network.
governance, restoration, land right, decentralization, ecosystem service, tenure, collective forest tenure reform, common-pool resource, conflict resolution, community forest management, forest, resource, customary right, private forest, property right, guatemala, tenure security[ , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ]Forest governance, community management and land conflict resolution.This cluster of articles collectively addresses themes related to forest management, tenure and socio-economic factors influencing forest practices. The focus is on how land rights, political recognition and climate-smart agricultural practices affect forest communities, particularly those dependent on traditional systems. Particular attention is paid to the balance between economic viability and sustainable land use, as shown by studies of state forest systems and the implementation of conservation agriculture. In addition, these studies shed light on the challenges and opportunities of protecting customary forest tenure from external pressures, particularly in areas where community forest management is under threat. Geographically, the studies cover a wide range of countries, including Indonesia, the Mekong region (which includes Cambodia, Laos, Myanmar, Thailand and Vietnam), Cameroon, Hungary and India.
This cluster is the least connected and the least centralized, suggesting a marginal position in the network.
institution, sustainability, non-timber forest product[ , , , , , , , , , , , ]Institutions, sustainability and non-timber forest products.This cluster of studies explores the impact of land rights on forest management in different countries and contexts, including China, Finland, Indonesia, Romania, Peru and Ghana. The articles examine the challenges of forest governance and highlight the need to recognize community and individual rights. They highlight the latent opportunities offered by non-timber forest products (NTFPs) to diversify the forest economy and enhance the sustainability of forest ecosystems.
Similar to Cluster 4, this cluster is fully connected and centralized, indicating a position of control and strong cohesion.
amazon, deforestation, indigenous land, devolution, community forestry, brazil, china, cadastre[ , , , , , , , , , , , , , , , , , , ]Deforestation, indigenous rights and community management.This thematic group of articles examines the impact of land tenure rights on forest management and conservation in various international contexts, including China, Poland, Brazil, Indonesia, Bolivia, Uganda, and Honduras. The studies analyze how the devolution of forest tenure rights to households or communities affects forest investment, the modernization of land registries and their impact on taxes, and the effectiveness of indigenous territories in promoting secondary forest growth and reducing deforestation.
This cluster has moderate connectivity and centrality, but lower than Clusters 1, 2 and 3, indicating a relative but apparently growing presence.

Click here to enlarge figure

5. Discussion

5.1. about the central and influential thematic, 5.1.1. policies, livelihoods and agroforestry (cluster 4), 5.1.2. institutions, sustainability and non-timber forest products (cluster 6), 5.2. about the developed thematic, 5.2.1. forest policy and resource management (cluster 1), 5.2.2. conservation and the rights of indigenous communities (cluster 2), 5.3. about the emerging thematic cluster, 5.3.1. institutional change and forest conservation (cluster 3), 5.3.2. deforestation, indigenous rights and community management (cluster 7), 5.4. about the peripheral thematic cluster, 6. conclusions, author contributions, data availability statement, conflicts of interest.

  • Perkumienė, D.; Doftartė, A.; Škėma, M.; Aleinikovas, M.; Elvan, O.D. The Need to Establish a Social and Economic Database of Private Forest Owners: The Case of Lithuania. Forests 2023 , 14 , 476. [ Google Scholar ] [ CrossRef ]
  • Nanda, S.; Warms, R.L. Cultural Anthropology ; Sage Publications: Thousand Oaks, CA, USA, 2019; ISBN 1-5443-3392-7. [ Google Scholar ]
  • Cámara-Leret, R.; Dennehy, Z. Indigenous Knowledge of New Guinea’s Useful Plants: A Review. Econ. Bot. 2019 , 73 , 405–415. [ Google Scholar ] [ CrossRef ]
  • Roberts, P. Tropical Forests in Prehistory, History, and Modernity ; Oxford University Press: Oxford, UK, 2019; ISBN 0-19-255055-1. [ Google Scholar ]
  • FAO; UNEP. The State of the World’s Forests. Forests, Biodiversity and People ; FAO: Rome, Italy, 2020. [ Google Scholar ]
  • Wily, L.A. Can We Own the Forest? Looking at the Changing Tenure Environment for Community Forestry in Africa. For. Trees Livelihoods 2004 , 14 , 217–228. [ Google Scholar ] [ CrossRef ]
  • Mazur, R.E.; Stakhanov, O.V. Prospects for Enhancing Livelihoods, Communities, and Biodiversity in Africa through Community-Based Forest Management: A Critical Analysis. Local Environ. 2008 , 13 , 405–421. [ Google Scholar ] [ CrossRef ]
  • Duguma, L.A.; Atela, J.; Ayana, A.N.; Alemagi, D.; Mpanda, M.; Nyago, M.; Minang, P.A.; Nzyoka, J.M.; Foundjem-Tita, D.; Ntamag-Ndjebet, C.N. Community Forestry Frameworks in Sub-Saharan Africa and the Impact on Sustainable Development. Ecol. Soc. 2018 , 23 , 21. [ Google Scholar ] [ CrossRef ]
  • Kottler, D.; Watkins, C.; Lavers, C. The Transformation of Sherwood Forest in the Twentieth Century: The Role of Private Estate Forestry. Rural Hist. 2005 , 16 , 95–110. [ Google Scholar ] [ CrossRef ]
  • Broadberry, S.; Campbell, B.M.; Klein, A.; Overton, M.; Van Leeuwen, B. British Economic Growth, 1270–1870 ; Cambridge University Press: Cambridge, UK, 2015; ISBN 1-107-07078-3. [ Google Scholar ]
  • Iriarte-Goñi, I.; Ayuda, M.-I. Not Only Subterranean Forests: Wood Consumption and Economic Development in Britain (1850–1938). Ecol. Econ. 2012 , 77 , 176–184. [ Google Scholar ] [ CrossRef ]
  • Voigtländer, N.; Voth, H.-J. Why England? Demographic Factors, Structural Change and Physical Capital Accumulation during the Industrial Revolution. J. Econ. Growth 2006 , 11 , 319–361. [ Google Scholar ] [ CrossRef ]
  • Farrell, E.P.; Führer, E.; Ryan, D.; Andersson, F.; Hüttl, R.; Piussi, P. European Forest Ecosystems: Building the Future on the Legacy of the Past. For. Ecol. Manag. 2000 , 132 , 5–20. [ Google Scholar ] [ CrossRef ]
  • Paillet, Y.; Pernot, C.; Boulanger, V.; Debaive, N.; Fuhr, M.; Gilg, O.; Gosselin, F. Quantifying the Recovery of Old-Growth Attributes in Forest Reserves: A First Reference for France. For. Ecol. Manag. 2015 , 346 , 51–64. [ Google Scholar ] [ CrossRef ]
  • Verkerk, P.J.; Zanchi, G.; Lindner, M. Trade-Offs between Forest Protection and Wood Supply in Europe. Environ. Manag. 2014 , 53 , 1085–1094. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bebi, P.; Seidl, R.; Motta, R.; Fuhr, M.; Firm, D.; Krumm, F.; Conedera, M.; Ginzler, C.; Wohlgemuth, T.; Kulakowski, D. Changes of Forest Cover and Disturbance Regimes in the Mountain Forests of the Alps. For. Ecol. Manag. 2017 , 388 , 43–56. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Morgenstern, E. The Origin and Early Application of the Principle of Sustainable Forest Management. For. Chron. 2007 , 83 , 485–489. [ Google Scholar ] [ CrossRef ]
  • Balogh, B. Scientific Forestry and the Roots of the Modern American State: Gifford Pinchot’s Path to Progressive Reform. Environ. Hist. 2002 , 7 , 198–225. [ Google Scholar ] [ CrossRef ]
  • Willcock, H. Traditional Learning, Western Thought, and the Sapporo Agricultural College: A Case Study of Acculturation in Early Meiji Japan. Mod. Asian Stud. 2000 , 34 , 977–1017. [ Google Scholar ] [ CrossRef ]
  • Lindahl, K.B.; Sténs, A.; Sandström, C.; Johansson, J.; Lidskog, R.; Ranius, T.; Roberge, J.-M. The Swedish Forestry Model: More of Everything? For. Policy Econ. 2017 , 77 , 44–55. [ Google Scholar ] [ CrossRef ]
  • Udell, R.W. Evolution of Adaptive Forest Management in a Historic Canadian Forest ; 2003; pp. 21–28. Available online: https://friresearch.ca/data/null/AFM_2003_06_Rpt_EvolutionofAdaptiveForestMgmtinaHistoricCanadianForestWorldForestryCongress.pdf (accessed on 2 June 2024).
  • Chen, S. Heading toward Sustainable Development from Adaptation to Climate Change. Paddy Water Environ. 2008 , 6 , 167–170. [ Google Scholar ] [ CrossRef ]
  • Ayari, I. Commentary: The Dynamics between Indigenous Rights and Environmental Governance: A Preliminary Analysis and Focus on the Impact of Climate Change Governance through the Reducing Emissions from Deforestation and Forest Degradation (REDD) Programme. AlterNative Int. J. Indig. Peoples 2014 , 10 , 81–86. [ Google Scholar ] [ CrossRef ]
  • Savedoff, W.D. Competing or Complementary Strategies? Protecting Indigenous Rights and Paying to Conserve Forests ; Center for Global Development Working Paper; 2018. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3310462 (accessed on 2 June 2024).
  • Perkumienė, D.; Atalay, A.; Labanauskas, G. Tackling Carbon Footprints: Sustainability Challenges of Hosting the Final Four in Kaunas, Lithuania. Urban Sci. 2024 , 8 , 55. [ Google Scholar ] [ CrossRef ]
  • Atalay, A.; Perkumiene, D.; Aleinikovas, M.; Škėma, M. Clean and Sustainable Environment Problems in Forested Areas Related to Recreational Activities: Case of Lithuania and Turkey. Front. Sports Act. Living 2024 , 6 , 1224932. [ Google Scholar ] [ CrossRef ]
  • FAO. The State of Food Insecurity in the World. Addressing Food Insecurity in Protracted Crises ; The State of Food Security and Nutrition in the World (SOFI): Rome, Italy, 2010; p. 62. [ Google Scholar ]
  • Porter-Bolland, L.; Ellis, E.A.; Guariguata, M.R.; Ruiz-Mallén, I.; Negrete-Yankelevich, S.; Reyes-García, V. Community Managed Forests and Forest Protected Areas: An Assessment of Their Conservation Effectiveness across the Tropics. For. Ecol. Manag. 2012 , 268 , 6–17. [ Google Scholar ] [ CrossRef ]
  • Demsetz, H. Toward a Theory of Property Rights. In Modern Understandings of Liberty and Property ; Routledge: Abingdon, UK, 2013; pp. 125–137. [ Google Scholar ]
  • Aronson, J.; Alexander, S. Ecosystem Restoration Is Now a Global Priority: Time to Roll up Our Sleeves. Restor. Ecol. 2013 , 21 , 293–296. [ Google Scholar ] [ CrossRef ]
  • Lambini, C.K.; Nguyen, T.T. A Comparative Analysis of the Effects of Institutional Property Rights on Forest Livelihoods and Forest Conditions: Evidence from Ghana and Vietnam. For. Policy Econ. 2014 , 38 , 178–190. [ Google Scholar ] [ CrossRef ]
  • FAO. State of the World’s Forests. Enhancing the Socioeconomic Benefits from Forests ; FAO: Rome, Italy, 2014. [ Google Scholar ]
  • Chazdon, R.L.; Brancalion, P.H.; Laestadius, L.; Bennett-Curry, A.; Buckingham, K.; Kumar, C.; Moll-Rocek, J.; Vieira, I.C.G.; Wilson, S.J. When Is a Forest a Forest? Forest Concepts and Definitions in the Era of Forest and Landscape Restoration. Ambio 2016 , 45 , 538–550. [ Google Scholar ] [ CrossRef ]
  • IUCN. International Union for Conservation of Nature Annual Report ; IUCN: Gland, Switzerland, 2019. [ Google Scholar ]
  • Kazungu, M.; Zhunusova, E.; Yang, A.L.; Kabwe, G.; Gumbo, D.J.; Günter, S. Forest Use Strategies and Their Determinants among Rural Households in the Miombo Woodlands of the Copperbelt Province, Zambia. For. Policy Econ. 2020 , 111 , 102078. [ Google Scholar ] [ CrossRef ]
  • Appiah, J.O.; Agyemang-Duah, W.; Sobeng, A.K.; Kpienbaareh, D. Analysing Patterns of Forest Cover Change and Related Land Uses in the Tano-Offin Forest Reserve in Ghana: Implications for Forest Policy and Land Management. Trees For. People 2021 , 5 , 100105. [ Google Scholar ] [ CrossRef ]
  • FAO. The State of the World’s Forests. Forest Pathways for Green Recovery and Building Inclusive, Resilient and Sustainable Economies ; FAO: Rome, Italy, 2022; ISBN 978-92-5-136364-5. [ Google Scholar ]
  • Lahteenmaki-Uutela, A.; Rantala, S.; Swallow, B.; Lehtiniemi, H.J.; Pohjola, T.; Paloniemi, R. Increasing Access to Forest Data for Enhancing Forest Benefits to All. Silva Fenn. 2023 , 11 , 231–461. [ Google Scholar ] [ CrossRef ]
  • Huber, P.; Kurttila, M.; Hujala, T.; Wolfslehner, B.; Sanchez-Gonzalez, M.; Pasalodos-Tato, M.; de-Miguel, S.; Bonet, J.A.; Marques, M.; Borges, J.G. Expert-Based Assessment of the Potential of Non-Wood Forest Products to Diversify Forest Bioeconomy in Six European Regions. Forests 2023 , 14 , 420. [ Google Scholar ] [ CrossRef ]
  • Gmür, D. Not Affected the Same Way: Gendered Outcomes for Commons and Resilience Grabbing by Large-Scale Forest Investors in Tanzania. Land 2020 , 9 , 122. [ Google Scholar ] [ CrossRef ]
  • Birben, Ü. State Ownership of Forests from Different Angles: Policy, Economics, and Law. Environ. Monit. Assess. 2019 , 191 , 502. [ Google Scholar ] [ CrossRef ]
  • Rochmayanto, Y.; Nurrochmat, D.R.; Locke, C.; Casse, T.; Nugroho, B.; Darusman, D. Evaluating the “Village Forests” in Indonesia: Property Rights and Sustainability Perspectives. Small-Scale For. 2022 , 21 , 461–481. [ Google Scholar ] [ CrossRef ]
  • Romulo, C.L.; Kennedy, C.J.; Gilmore, M.P.; Endress, B.A. Sustainable Harvest Training in a Common Pool Resource Setting in the Peruvian Amazon: Limitations and Opportunities. Trees For. People 2022 , 7 , 100185. [ Google Scholar ] [ CrossRef ]
  • Goldstein, B.A.; Kelly, E.C.; Crandall, M.S. By the Book: Examining California’s Private Forest Regulations from the Perspectives of Family Forest Landowners. Soc. Nat. Resour. 2023 , 37 , 328–346. [ Google Scholar ] [ CrossRef ]
  • Gurung, R.; Harada, K.; Dahal, N.K.; Adhikari, S.; Katel, O. The Transition of Sokshing (Leaf Litter Forest) Property Rights and Management: A Case Study of Punakha and Wangdue District, Bhutan. Environ. Chall. 2023 , 13 , 100767. [ Google Scholar ] [ CrossRef ]
  • Goldstein, B.; Crandall, M.S.; Kelly, E.C. “The Cost of Doing Business”: Private Rights, Public Resources, and the Resulting Diversity of State-Level Forestry Policies in the U.S. Land Use Policy 2023 , 132 , 106792. [ Google Scholar ] [ CrossRef ]
  • Kelly, E.C.; Crandall, M.S. State-Level Forestry Policies across the US: Discourses Reflecting the Tension between Private Property Rights and Public Trust Resources. For. Policy Econ. 2022 , 141 , 102757. [ Google Scholar ] [ CrossRef ]
  • Rana, P.; Miller, D.C. Predicting the Long-Term Social and Ecological Impacts of Tree-Planting Programs: Evidence from Northern India. World Dev. 2021 , 140 , 105367. [ Google Scholar ] [ CrossRef ]
  • Sears, R.R.; Cronkleton, P.; Miranda Ruiz, M.; Pérez-Ojeda del Arco, M. Hiding in Plain Sight: How a Fallow Forestry Supply Chain Remains Illegitimate in the Eyes of the State. Front. For. Glob. Chang. 2021 , 4 , 681611. [ Google Scholar ] [ CrossRef ]
  • Trejos, B.; Flores, J.C. Influence of Property Rights on Performance of Community-Based Forest Devolution Policies in Honduras. For. Policy Econ. 2021 , 124 , 102397. [ Google Scholar ] [ CrossRef ]
  • Ungvári, G. Combining Flood Risk Mitigation and Carbon Sequestration to Optimize Sustainable Land Management Schemes: Experiences from the Middle-Section of Hungary’s Tisza River. Land 2022 , 11 , 985. [ Google Scholar ] [ CrossRef ]
  • Yang, Y.; Li, H.; Cheng, L.; Ning, Y. Effect of Land Property Rights on Forest Resources in Southern China. Land 2021 , 10 , 392. [ Google Scholar ] [ CrossRef ]
  • Malovrh, Š.P.; Avdibegović, M. Comparative Analysis of Regulatory Framework Related to Private Forest Management in Slovenia and Federation of Bosnia and Herzegovina. Cent. Eur. For. J. 2021 , 67 , 197–211. [ Google Scholar ] [ CrossRef ]
  • Ece, M. Creating Property out of Insecurity: Territorialization and Legitimation of REDD+ in Lindi, Tanzania. J. Leg. Plur. Unoff. Law 2021 , 53 , 78–102. [ Google Scholar ] [ CrossRef ]
  • Inguaggiato, C.; Ceddia, M.G.; Tschopp, M.; Christopoulos, D. Codifying and Commodifying Nature: Narratives on Forest Property Rights and the Implementation of Tenure Regularization Policies in Northwestern Argentina. Land 2021 , 10 , 1005. [ Google Scholar ] [ CrossRef ]
  • Lorenzini, S. Rethinking Forests Governance as Global Commons: Devolution of Quasi-Property Rights to Indigenous Communities. Bandung 2022 , 9 , 357–382. [ Google Scholar ] [ CrossRef ]
  • Nuñez Godoy, C.C.; Pienaar, E.F. Motivations for, and Barriers to, Landowner Participation in Argentina’s Payments for Ecosystem Services Program. Conserv. Sci. Pract. 2023 , 5 , e12991. [ Google Scholar ] [ CrossRef ]
  • Sauini, T.; Santos, P.H.G.; Albuquerque, U.P.; Yazbek, P.; da Cruz, C.; Barretto, E.H.P.; Dos Santos, M.A.; Gomes, M.A.S.; Dos Santos, G.; Braga, S. Participatory Ethnobotany: Comparison between Two Quilombos in the Atlantic Forest, Ubatuba, São Paulo, Brazil. PeerJ 2023 , 11 , e16231. [ Google Scholar ] [ CrossRef ]
  • Tegegne, Y.T.; Ramcilovic-Suominen, S.; Kotilainen, J.; Winkel, G.; Haywood, A.; Almaw, A. What Drives Forest Rule Compliance Behaviour in the Congo Basin? A Study of Local Communities in Cameroon. Land Use Policy 2022 , 115 , 106012. [ Google Scholar ] [ CrossRef ]
  • Wang, Y.; Sarkar, A.; Li, M.; Chen, Z.; Hasan, A.K.; Meng, Q.; Hossain, M.S.; Rahman, M.A. Evaluating the Impact of Forest Tenure Reform on Farmers’ Investment in Public Welfare Forest Areas: A Case Study of Gansu Province, China. Land 2022 , 11 , 708. [ Google Scholar ] [ CrossRef ]
  • Dung, N.V.; Thang, N.N. Forestland Rights Institutions and Forest Management of Vietnamese Households. Post-Communist Econ. 2017 , 29 , 90–105. [ Google Scholar ] [ CrossRef ]
  • Mishra, B. Institutional Inefficiencies and Forest Degradation in Protected Areas of Odisha: Evidence from Lakhari Valley Wildlife Sanctuary. Int. J. Rural Manag. 2018 , 14 , 182–206. [ Google Scholar ] [ CrossRef ]
  • Krul, K.; Ho, P.; Yang, X. Incentivizing Household Forest Management in China’s Forest Reform: Limitations to Rights-Based Approaches in Southwest China. For. Policy Econ. 2020 , 111 , 102075. [ Google Scholar ] [ CrossRef ]
  • Samndong, R.A.; Vatn, A. Competing Tenures: Implications for REDD+ in the Democratic Republic of Congo. Forests 2018 , 9 , 662. [ Google Scholar ] [ CrossRef ]
  • Aggarwal, S.; Elbow, K. The Role of Property Rights in Natural Resource Management, Good Governance, and Empowerment of the Rural Poor ; United States Agency for International Development: Burlington, NJ, USA, 2006; Volume 3.
  • Holden, S.; Otsuka, K.; Deininger, K. Land Tenure Reform in Asia and Africa: Assessing Impacts on Poverty and Natural Resource Management ; Springer: Berlin/Heidelberg, Germany, 2013; ISBN 1-137-34381-8. [ Google Scholar ]
  • Vatn, A. Environmental Resources, Property Regimes, and Efficiency. Environ. Plan. C Gov. Policy 2001 , 19 , 665–680. [ Google Scholar ] [ CrossRef ]
  • Honoré, A.; Guest, A. Ownership. In Oxford Essays on Jurisprudence ; Oxford University Press: Oxford, UK, 1961; pp. 106–147. [ Google Scholar ]
  • Irimie, D.L.; Essmann, H.F. Forest Property Rights in the Frame of Public Policies and Societal Change. For. Policy Econ. 2009 , 11 , 95–101. [ Google Scholar ] [ CrossRef ]
  • Ostrom, E.; Hess, C. Private and Common Property Rights. In Encyclopedia of Law and Economics ; Edward Elgar Publishing Limited: Cheltenham, UK, 2011; ISBN 1-78254-745-2. [ Google Scholar ]
  • Kissling-Näf, I.; Bisang, K. Rethinking Recent Changes of Forest Regimes in Europe through Property-Rights Theory and Policy Analysis. For. Policy Econ. 2001 , 3 , 99–111. [ Google Scholar ] [ CrossRef ]
  • Vatn, A. Rationality, Institutions and Environmental Policy. Ecol. Econ. 2005 , 55 , 203–217. [ Google Scholar ] [ CrossRef ]
  • Ostrom, E. Governing the Commons: The Evolution of Institutions for Collective Action ; Cambridge University Press: Cambridge, UK, 1990; ISBN 0-521-40599-8. [ Google Scholar ]
  • Jagger, P. Confusion vs. Clarity: Property Rights and Forest Use in Uganda. For. Policy Econ. 2014 , 45 , 32–41. [ Google Scholar ] [ CrossRef ]
  • Ostrom, E. Understanding Institutional Diversity ; Princeton University Press: Princeton, NJ, USA, 2009; ISBN 1-4008-3173-3. [ Google Scholar ]
  • North, D.C. Institutions. J. Econ. Perspect. 1991 , 5 , 97–112. [ Google Scholar ] [ CrossRef ]
  • Pritchard, R.C. Woodland Transitions and Rural Livelihoods: An Interdisciplinary Case Study of Wedza Mountain, Zimbabwe; 2018. Available online: https://era.ed.ac.uk/handle/1842/31427 (accessed on 2 June 2024).
  • Harada, K.; Habib, M.; Sakata, Y.; Maryudi, A. The Role of NGOs in Recognition and Sustainable Maintenance of Customary Forests within Indigenous Communities: The Case of Kerinci, Indonesia. Land Use Policy 2022 , 113 , 105865. [ Google Scholar ] [ CrossRef ]
  • Mudombi-Rusinamhodzi, G.; Thiel, A. Property Rights and the Conservation of Forests in Communal Areas in Zimbabwe. For. Policy Econ. 2020 , 121 , 102315. [ Google Scholar ] [ CrossRef ]
  • Kusters, K.; de Graaf, M.; Ascarrunz, N.; Benneker, C.; Boot, R.; van Kanten, R.; Livingstone, J.; Maindo, A.; Mendoza, H.; Purwanto, E.; et al. Formalizing Community Forest Tenure Rights: A Theory of Change and Conditions for Success. For. Policy Econ. 2022 , 141 , 102766. [ Google Scholar ] [ CrossRef ]
  • Bromley, D.W. The Commons, Common Property, and Environmental Policy. Environ. Resour. Econ. 1992 , 2 , 1–17. [ Google Scholar ] [ CrossRef ]
  • Schlager, E.; Ostrom, E. Property-Rights Regimes and Natural Resources: A Conceptual Analysis. Land Econ. 1992 , 68 , 249–262. [ Google Scholar ] [ CrossRef ]
  • Bamwesigye, D.; Chipfakacha, R.; Yeboah, E. Forest and Land Rights at a Time of Deforestation and Climate Change: Land and Resource Use Crisis in Uganda. Land 2022 , 11 , 2092. [ Google Scholar ] [ CrossRef ]
  • FAO. Land Tenure and Rural Development ; FAO Land Tenure Studies; FAO: Rome, Italy, 2002; p. 56. [ Google Scholar ]
  • Katila, P.; McDermott, C.; Larson, A.; Aggarwal, S.; Giessen, L. Forest Tenure and the Sustainable Development Goals—A Critical View. For. Policy Econ. 2020 , 120 , 102294. [ Google Scholar ] [ CrossRef ]
  • Miller, D.C.; Rana, P.; Nakamura, K.; Irwin, S.; Cheng, S.H.; Ahlroth, S.; Perge, E. A Global Review of the Impact of Forest Property Rights Interventions on Poverty. Glob. Environ. Chang. 2021 , 66 , 102218. [ Google Scholar ] [ CrossRef ]
  • Libecap, G.D. Contracting for Property Rights ; Cambridge University Press: Cambridge, UK, 1993; ISBN 0-521-44904-9. [ Google Scholar ]
  • Furubotn, E.G.; Pejovich, S. Property Rights and Economic Theory: A Survey of Recent Literature. J. Econ. Lit. 1972 , 10 , 1137–1162. [ Google Scholar ]
  • He, J.; Sikor, T. Looking beyond Tenure in China’s Collective Forest Tenure Reform: Insights from Yunnan Province, Southwest China. Int. For. Rev. 2017 , 19 , 29–41. [ Google Scholar ] [ CrossRef ]
  • Besley, T. Property Rights and Investment Incentives: Theory and Evidence from Ghana. J. Political Econ. 1995 , 103 , 903–937. [ Google Scholar ] [ CrossRef ]
  • Deininger, K.; Feder, G. Land Registration, Governance, and Development: Evidence and Implications for Policy. World Bank Res. Obs. 2009 , 24 , 233–266. [ Google Scholar ] [ CrossRef ]
  • World Bank. Land Tenure Policy: Securing Rights to Reduce Poverty and Promote Rural Growth ; The World Bank: Washington, DC, USA, 2011. [ Google Scholar ]
  • United Nations. Sustainable Development Goals ; United Nations: New York, NY, USA, 2015. [ Google Scholar ]
  • Perkumienė, D.; Atalay, A.; Safaa, L.; Grigienė, J. Sustainable Waste Management for Clean and Safe Environments in the Recreation and Tourism Sector: A Case Study of Lithuania, Turkey and Morocco. Recycling 2023 , 8 , 56. [ Google Scholar ] [ CrossRef ]
  • Emich, K.J.; Kumar, S.; Lu, L.; Norder, K.; Pandey, N. Mapping 50 Years of Small Group Research through Small Group Research. Small Group Res. 2020 , 51 , 659–699. [ Google Scholar ] [ CrossRef ]
  • Lamhour, O.; Safaa, L.; Perkumienė, D. What Does the Concept of Resilience in Tourism Mean in the Time of COVID-19? Results of a Bibliometric Analysis. Sustainability 2023 , 15 , 9797. [ Google Scholar ] [ CrossRef ]
  • Safaa, L.; Khazi, A.; Perkumienė, D.; Labanauskas, V. Arts-Based Management between Actions and Conjunctions: Lessons from a Systematic Bibliometric Analysis. Adm. Sci. 2023 , 13 , 200. [ Google Scholar ] [ CrossRef ]
  • Sampieri, S.; Saoualih, A.; Safaa, L.; de Carnero Calzada, F.M.; Ramazzotti, M.; Martínez-Peláez, A. Tourism Development through the Sense of UNESCO World Heritage: The Case of Hegra, Saudi Arabia. Heritage 2024 , 7 , 2195–2216. [ Google Scholar ] [ CrossRef ]
  • Saoualih, A.; Safaa, L.; Bouhatous, A.; Bidan, M.; Perkumienė, D.; Aleinikovas, M.; Šilinskas, B.; Perkumas, A. Exploring the Tourist Experience of the Majorelle Garden Using VADER-Based Sentiment Analysis and the Latent Dirichlet Allocation Algorithm: The Case of TripAdvisor Reviews. Sustainability 2024 , 16 , 6378. [ Google Scholar ] [ CrossRef ]
  • Blondel, V.D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E. Fast Unfolding of Communities in Large Networks. J. Stat. Mech. Theory Exp. 2008 , 2008 , P10008. [ Google Scholar ] [ CrossRef ]
  • Tang, L.; Wang, X.; Liu, H. Community Detection via Heterogeneous Interaction Analysis. Data Min. Knowl. Discov. 2012 , 25 , 1–33. [ Google Scholar ] [ CrossRef ]
  • Girvan, M.; Newman, M.E. Community Structure in Social and Biological Networks. Proc. Natl. Acad. Sci. USA 2002 , 99 , 7821–7826. [ Google Scholar ] [ CrossRef ]
  • Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing Well-Connected Communities. Sci. Rep. 2019 , 9 , 5233. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Newman, M.E. Finding Community Structure in Networks Using the Eigenvectors of Matrices. Phys. Rev. E—Stat. Nonlinear Soft Matter Phys. 2006 , 74 , 036104. [ Google Scholar ] [ CrossRef ]
  • Zhu, J.; Liu, W. A Tale of Two Databases: The Use of Web of Science and Scopus in Academic Papers. Scientometrics 2020 , 123 , 321–335. [ Google Scholar ] [ CrossRef ]
  • Ballew, B.S. Elsevier’s Scopus® Database. J. Electron. Resour. Med. Libr. 2009 , 6 , 245–252. [ Google Scholar ] [ CrossRef ]
  • Pranckutė, R. Web of Science (WoS) and Scopus: The Titans of Bibliographic Information in Today’s Academic World. Publications 2021 , 9 , 12. [ Google Scholar ] [ CrossRef ]
  • Benhaida, S.; Saddou, H.; Safaa, L.; Perkumiene, D.; Labanauskas, V. Acquirements of Three Decades of Literature on Cultural Tourism. J. Infrastruct. Policy Dev. 2024 , 8 , 3817. [ Google Scholar ] [ CrossRef ]
  • Sténs, A.; Mårald, E. “Forest Property Rights under Attack”: Actors, Networks and Claims about Forest Ownership in the Swedish Press 2014–2017. For. Policy Econ. 2020 , 111 , 102038. [ Google Scholar ] [ CrossRef ]
  • Cai, M.; Murtazashvili, I.; Murtazashvili, J.; Salahodjaev, R. Individualism and Governance of the Commons. Public Choice 2020 , 184 , 175–195. [ Google Scholar ] [ CrossRef ]
  • Nelson, H.; Nikolakis, W.; Martin-Chan, K. The Effect of Institutional Arrangements on Economic Performance among First Nations: Evidence from Forestry in BC. For. Policy Econ. 2019 , 107 , 101922. [ Google Scholar ] [ CrossRef ]
  • Aggarwal, A. Revisiting the Land Use Assumptions in Forest Carbon Projects through a Case from India. J. Environ. Manag. 2020 , 267 , 110673. [ Google Scholar ] [ CrossRef ]
  • Correa, J.; Cisneros, E.; Börner, J.; Pfaff, A.; Costa, M.; Rajão, R. Evaluating REDD+ at Subnational Level: Amazon Fund Impacts in Alta Floresta, Brazil. For. Policy Econ. 2020 , 116 , 102178. [ Google Scholar ] [ CrossRef ]
  • Dupuits, E.; Cronkleton, P. Indigenous Tenure Security and Local Participation in Climate Mitigation Programs: Exploring the Institutional Gaps of REDD+ Implementation in the Peruvian Amazon. Environ. Policy Gov. 2020 , 30 , 209–220. [ Google Scholar ] [ CrossRef ]
  • Hein, J.; Del Cairo, C.; Gallego, D.O.; Gutiérrez, T.V.; Velez, J.S.; de Francisco, J.C.R. A Political Ecology of Green Territorialization: Frontier Expansion and Conservation in the Colombian Amazon. DIE ERDE—J. Geogr. Soc. Berl. 2020 , 151 , 37–57. [ Google Scholar ]
  • Boillat, S.; Ceddia, M.G.; Bottazzi, P. The Role of Protected Areas and Land Tenure Regimes on Forest Loss in Bolivia: Accounting for Spatial Spillovers. Glob. Environ. Chang. 2022 , 76 , 102571. [ Google Scholar ] [ CrossRef ]
  • Bruzzese, S.; Tolić Mandić, I.; Tišma, S.; Blanc, S.; Brun, F.; Vuletić, D. A Framework Proposal for the Ex Post Evaluation of a Solution-Driven PES Scheme: The Case of Medvednica Nature Park. Sustainability 2023 , 15 , 8101. [ Google Scholar ] [ CrossRef ]
  • Chand, S.; Behera, B. Does Assignment of Individual Property Rights Improve Forest Conservation Outcomes?: Empirical Evidence from West Bengal, India. Ecol. Econ. Soc. INSEE J. 2023 , 6 , 7–31. [ Google Scholar ] [ CrossRef ]
  • Hu, C.; Zhang, H. The Impact of Collective Forest Tenure Reform on Forest Carbon Sequestration Capacity—An Analysis Based on the Social–Ecological System Framework. Land 2023 , 12 , 1649. [ Google Scholar ] [ CrossRef ]
  • Nichiforel, L.; Deuffic, P.; Thorsen, B.J.; Weiss, G.; Hujala, T.; Keary, K.; Lawrence, A.; Avdibegović, M.; Dobšinská, Z.; Feliciano, D. Two Decades of Forest-Related Legislation Changes in European Countries Analysed from a Property Rights Perspective. For. Policy Econ. 2020 , 115 , 102146. [ Google Scholar ] [ CrossRef ]
  • Paletto, A.; Laktić, T.; Posavec, S.; Dobšinská, Z.; Marić, B.; Đordjević, I.; Trajkov, P.; Kitchoukov, E.; Pezdevšek Malovrh, Š. Nature Conservation versus Forestry Activities in Protected Areas-the Stakeholders’ Point of View. Šumarski List 2019 , 143 , 307–317. [ Google Scholar ] [ CrossRef ]
  • Pulhin, J.M.; Fajardo, A.R.; Predo, C.D.; Sajise, A.J.; De Luna, C.C.; Diona, D.L.Z. Unbundling Property Rights among Stakeholders of Bataan Natural Park: Implications to Protected Area Governance in the Philippines. J. Sustain. For. 2022 , 41 , 347–369. [ Google Scholar ] [ CrossRef ]
  • Rakotonarivo, O.S.; Bell, A.; Dillon, B.; Duthie, A.B.; Kipchumba, A.; Rasolofoson, R.A.; Razafimanahaka, J.; Bunnefeld, N. Experimental Evidence on the Impact of Payments and Property Rights on Forest User Decisions. Front. Conserv. Sci. 2021 , 2 , 661987. [ Google Scholar ] [ CrossRef ]
  • Sears, R.R.; Guariguata, M.R.; Cronkleton, P.; Miranda Beas, C. Strengthening Local Governance of Secondary Forest in Peru. Land 2021 , 10 , 1286. [ Google Scholar ] [ CrossRef ]
  • Gebru, B.; Elofsson, K. The Role of Forest Status in Households’ Fuel Choice in Uganda. Energy Policy 2023 , 173 , 113390. [ Google Scholar ] [ CrossRef ]
  • Kabra, A.; Das, B. Aye for the Tiger: Hegemony, Authority, and Volition in India’s Regime of Dispossession for Conservation. Oxf. Dev. Stud. 2022 , 50 , 44–61. [ Google Scholar ] [ CrossRef ]
  • Leakey, R.R.; Tientcheu Avana, M.-L.; Awazi, N.P.; Assogbadjo, A.E.; Mabhaudhi, T.; Hendre, P.S.; Degrande, A.; Hlahla, S.; Manda, L. The Future of Food: Domestication and Commercialization of Indigenous Food Crops in Africa over the Third Decade (2012–2021). Sustainability 2022 , 14 , 2355. [ Google Scholar ] [ CrossRef ]
  • Molua, E.L.; Sonwa, D.; Bele, Y.; Foahom, B.; Mate Mweru, J.P.; Wa Bassa, S.M.; Gapia, M.; Ngana, F.; Joe, A.E.; Masumbuko, E.M. Climate-Smart Conservation Agriculture, Farm Values and Tenure Security: Implications for Climate Change Adaptation and Mitigation in the Congo Basin. Trop. Conserv. Sci. 2023 , 16 , 19400829231169980. [ Google Scholar ] [ CrossRef ]
  • Rochmayanto, Y.; Nurrochmat, D.R.; Nugroho, B.; Darusman, D.; Satria, A.; Casse, T.; Erbaugh, J.T.; Wicaksono, D. Devolution of Forest Management to Local Communities and Its Impacts on Livelihoods and Deforestation in Berau, Indonesia. Heliyon 2023 , 9 , e16115. [ Google Scholar ] [ CrossRef ]
  • Adulcharoen, W.; Suntornvongsakul, K.; Lee, Y.-S. Assessment of Sustainable Utilization of Ecosystem Services in Different Stages of Mangrove Forest Restoration at Klong Khone Sub-District, Samut Songkhram Province, Thailand. Appl. Environ. Res. 2020 , 42 , 43–57. [ Google Scholar ] [ CrossRef ]
  • Astuti, R. Fixing Flammable Forest: The Scalar Politics of Peatland Governance and Restoration in Indonesia. Asia Pac. Viewp. 2020 , 61 , 283–300. [ Google Scholar ] [ CrossRef ]
  • Atangana Ondoa, H.; Nyebe Andela, B. Are Natural Resources a Blessing or a Curse for Scientific and Technical Research in Africa? Resour. Policy 2023 , 85 , 103759. [ Google Scholar ] [ CrossRef ]
  • Baragwanath, K.; Bayi, E. Collective Property Rights Reduce Deforestation in the Brazilian Amazon. Proc. Natl. Acad. Sci. USA 2020 , 117 , 20495–20502. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Biedenweg, K.; Trimbach, D.; Delie, J.; Schwarz, B. Using Cognitive Mapping to Understand Conservation Planning. Conserv. Biol. 2020 , 34 , 1364–1372. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Chankrajang, T. State-Community Property-Rights Sharing in Forests and Its Contributions to Environmental Outcomes: Evidence from Thailand’s Community Forestry. J. Dev. Econ. 2019 , 138 , 261–273. [ Google Scholar ] [ CrossRef ]
  • Cummins, A.; Yamaji, E. To See Invisible Rights: Quantifying Araman Informal Tenure and Its Immediate Relationship with Social Forestry in Central Java, Indonesia. For. Soc. 2019 , 3 , 193–201. [ Google Scholar ] [ CrossRef ]
  • Fernández Luiña, E.; Fernández Ordóñez, S.; Wang, W.H. The Community Commitment to Sustainability: Forest Protection in Guatemala. Sustainability 2022 , 14 , 6953. [ Google Scholar ] [ CrossRef ]
  • Harbi, J.; Cao, Y.; Milantara, N.; Gamin; Mustafa, A.B.; Roberts, N.J. Understanding People−Forest Relationships: A Key Requirement for Appropriate Forest Governance in South Sumatra, Indonesia. Sustainability 2021 , 13 , 7029. [ Google Scholar ] [ CrossRef ]
  • He, J.; Kebede, B.; Martin, A.; Gross-Camp, N. Privatization or Communalization: A Multi-Level Analysis of Changes in Forest Property Regimes in China. Ecol. Econ. 2020 , 174 , 106629. [ Google Scholar ] [ CrossRef ]
  • He, J.; Wang, J. Certificated Exclusion: Forest Carbon Sequestration Project in Southwest China. J. Peasant Stud. 2023 , 50 , 2165–2186. [ Google Scholar ] [ CrossRef ]
  • Hovis, M.; Frey, G.; McGinley, K.; Cubbage, F.; Han, X.; Lupek, M. Ownership, Governance, Uses, and Ecosystem Services of Community Forests in the Eastern United States. Forests 2022 , 13 , 1577. [ Google Scholar ] [ CrossRef ]
  • Kašubová, M.; Lichý, J.; Šulek, R. Comparison of Legal Aspects of Public Access to Forests in the Slovak Republic and the Czech Republic. Zprávy Lesn. Výzkumu 2021 , 66 , 11–18. [ Google Scholar ]
  • Kaur, K.P.; Chang, K.; Andersson, K.P. Collective Forest Land Rights Facilitate Cooperative Behavior. Conserv. Lett. 2023 , 16 , e12950. [ Google Scholar ] [ CrossRef ]
  • Kottek, P.; Király, É.; Mertl, T.; Borovics, A. Trends of Forest Harvesting Ages by Ownership and Function and the Effects of the Recent Changes of the Forest Law in Hungary. Forests 2023 , 14 , 679. [ Google Scholar ] [ CrossRef ]
  • Lambert, J.; Epstein, G.; Joel, J.; Baggio, J. Identifying Topics and Trends in the Study of Common-Pool Resources Using Natural Language Processing. Int. J. Commons 2021 , 15 , 206–217. [ Google Scholar ] [ CrossRef ]
  • Lawrence, A.; Gatto, P.; Bogataj, N.; Lidestav, G. Forests in Common: Learning from Diversity of Community Forest Arrangements in Europe. Ambio 2021 , 50 , 448–464. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lewis, S.R.; Faure, N.; Leonard, S.; Pen, R.; Ba Ngai, N.; Phengsopha, K. Safeguarding Customary Forest Tenure in the Mekong Region: A Legal Analysis. J. Land Use Sci. 2023 , 18 , 84–108. [ Google Scholar ] [ CrossRef ]
  • Li, M.; Sarkar, A.; Wang, Y.; Khairul Hasan, A.; Meng, Q. Evaluating the Impact of Ecological Property Rights to Trigger Farmers’ Investment Behavior—An Example of Confluence Area of Heihe Reservoir, Shaanxi, China. Land 2022 , 11 , 320. [ Google Scholar ] [ CrossRef ]
  • Liu, J.; Dong, J.; Long, H.; Xu, T.; Putzel, L. Private vs. Community Management Responses to De-Collectivization: Illustrative Cases from China. Int. J. Commons 2020 , 14 , 445–464. [ Google Scholar ] [ CrossRef ]
  • Liu, P.; Ravenscroft, N. Collective Forests and the Community at the Legal Frontier of Property Rights Reforms in China. J. Leg. Plur. Unoff. Law 2021 , 53 , 42–59. [ Google Scholar ] [ CrossRef ]
  • Mansourian, S.; Walters, G.; Gonzales, E. Identifying Governance Problems and Solutions for Forest Landscape Restoration in Protected Area Landscapes. Parks 2019 , 25 , 83–96. [ Google Scholar ] [ CrossRef ]
  • Miller, S. Causal Forest Estimation of Heterogeneous and Time-Varying Environmental Policy Effects. J. Environ. Econ. Manag. 2020 , 103 , 102337. [ Google Scholar ] [ CrossRef ]
  • Mizuno, K.; Hasibuan, H.S.; Okamoto, M.; Asrofani, F.W. Creation of the State Forest System and Its Hostility to Local People in Colonial Java, Indonesia. Southeast Asian Stud. 2023 , 12 , 47–87. [ Google Scholar ]
  • Nandwani, B. Land Rights Recognition and Political Participation: Evidence from India. J. Dev. Stud. 2023 , 59 , 1741–1759. [ Google Scholar ] [ CrossRef ]
  • Nilsson, J.; Helgesson, M.; Rommel, J.; Svensson, E. Forest-Owner Support for Their Cooperative’s Provision of Public Goods. For. Policy Econ. 2020 , 115 , 102156. [ Google Scholar ] [ CrossRef ]
  • Ribe, R.G.; Nielsen-Pincus, M.; Johnson, B.R.; Enright, C.; Hulse, D. The Consequential Role of Aesthetics in Forest Fuels Reduction Propensities: Diverse Landowners’ Attitudes and Responses to Project Types, Risks, Costs, and Habitat Benefits. Land 2022 , 11 , 2151. [ Google Scholar ] [ CrossRef ]
  • Robinson, E.J.; Somerville, S.; Albers, H.J. The Economics of REDD through an Incidence of Burdens and Benefits Lens. Int. Rev. Environ. Resour. Econ. 2019 , 13 , 165–202. [ Google Scholar ] [ CrossRef ]
  • Shumi, G.; Dorresteijn, I.; Schultner, J.; Hylander, K.; Senbeta, F.; Hanspach, J.; Ango, T.G.; Fischer, J. Woody Plant Use and Management in Relation to Property Rights: A Social-Ecological Case Study from Southwestern Ethiopia. Ecosyst. People 2019 , 15 , 303–316. [ Google Scholar ] [ CrossRef ]
  • Soliev, I.; Theesfeld, I.; Abert, E.; Schramm, W. Benefit Sharing and Conflict Transformation: Insights for and from REDD+ Forest Governance in Sub-Saharan Africa. For. Policy Econ. 2021 , 133 , 102623. [ Google Scholar ] [ CrossRef ]
  • Sommer, J.M.; Burroway, R.; Shandra, J.M. Defend Women’s Rights and Save the Trees: A Cross-National Analysis of Women’s Immovable Property Rights and Forest Loss. Popul. Environ. 2022 , 44 , 168–192. [ Google Scholar ] [ CrossRef ]
  • Tellman, B.; McSweeney, K.; Manak, L.; Devine, J.A.; Sesnie, S.; Nielsen, E.; Dávila, A. Narcotrafficking and Land Control in Guatemala and Honduras; 2021. Available online: https://repository.arizona.edu/handle/10150/665135 (accessed on 2 June 2024).
  • Tramel, S.F. The Tenure Guidelines in Policy and Practice: Democratizing Land Control in Guatemala. Land 2019 , 8 , 168. [ Google Scholar ] [ CrossRef ]
  • van der Zon, M.; de Jong, W.; Arts, B. Community Enforcement and Tenure Security: A Fuzzy-Set Qualitative Comparative Analysis of Twelve Community Forest Management Initiatives in the Peruvian Amazon. World Dev. 2023 , 161 , 106071. [ Google Scholar ] [ CrossRef ]
  • Wainaina, P.; Minang, P.A.; Nzyoka, J.; Duguma, L.; Temu, E.; Manda, L. Incentives for Landscape Restoration: Lessons from Shinyanga, Tanzania. J. Environ. Manag. 2021 , 280 , 111831. [ Google Scholar ] [ CrossRef ]
  • Yang, L.; Ren, Y. Has China’s New Round of Collective Forestland Tenure Reform Caused an Increase in Rural Labor Transfer? Land 2020 , 9 , 284. [ Google Scholar ] [ CrossRef ]
  • Yang, L.; Ren, Y. Property Rights, Village Democracy, and Household Forestry Income: Evidence from China’s Collective Forest Tenure Reform. J. For. Res. 2021 , 26 , 7–16. [ Google Scholar ] [ CrossRef ]
  • Yiwen, Z.; Kant, S.; Dong, J.; Liu, J. How Communities Restructured Forest Tenure throughout the Top-down Devolution Reform: Using the Case of Fujian, China. For. Policy Econ. 2020 , 119 , 102272. [ Google Scholar ] [ CrossRef ]
  • Yiwen, Z.; Kant, S. Secure Tenure or Equal Access? Farmers’ Preferences for Reallocating the Property Rights of Collective Farmland and Forestland in Southeast China. Land Use Policy 2022 , 112 , 105814. [ Google Scholar ] [ CrossRef ]
  • Zhou, Y.; Shi, X.; Ji, D.; Ma, X.; Chand, S. Property Rights Integrity, Tenure Security and Forestland Rental Market Participation: Evidence from Jiangxi Province, China ; Wiley Online Library: Hoboken, NJ, USA, 2019; Volume 43, pp. 95–110. [ Google Scholar ]
  • Brobbey, L.K.; Hansen, C.P.; Kyereh, B. The Dynamics of Property and Other Mechanisms of Access: The Case of Charcoal Production and Trade in Ghana. Land Use Policy 2021 , 101 , 105152. [ Google Scholar ] [ CrossRef ]
  • Brown, M. Local Governance, Ecological Knowledge, and Spatial Models: Assessing Resource Access in a Forest Commons. Hum. Ecol. 2022 , 50 , 997–1006. [ Google Scholar ] [ CrossRef ]
  • Soekmadi, R.; Hikmat, A.; Kusmana, C. Crafting Local Institution Using Social-Ecological System Framework for Sustainable Rattan Governance in Lore Lindu National Park. J. Manaj. Hutan Trop. 2019 , 25 , 135. [ Google Scholar ]
  • Sorea, D.; Roșculeț, G.; Rățulea, G.G. The Compossessorates in the Olt Land (Romania) as Sustainable Commons. Land 2022 , 11 , 292. [ Google Scholar ] [ CrossRef ]
  • Baragwanath, K.; Bayi, E.; Shinde, N. Collective Property Rights Lead to Secondary Forest Growth in the Brazilian Amazon. Proc. Natl. Acad. Sci. USA 2023 , 120 , e2221346120. [ Google Scholar ] [ CrossRef ]
  • Gutiérrez-Zamora, V.; Estrada, M.H. Responsibilization and State Territorialization: Governing Socio-Territorial Conflicts in Community Forestry in Mexico. For. Policy Econ. 2020 , 116 , 102188. [ Google Scholar ] [ CrossRef ]
  • Reydon, B.; Molendijk, M.; Porras, N.; Siqueira, G. The Amazon Forest Preservation by Clarifying Property Rights and Potential Conflicts: How Experiments Using Fit-for-Purpose Can Help. Land 2021 , 10 , 225. [ Google Scholar ] [ CrossRef ]
  • Reydon, B.; Siqueira, G.P.; Passos, D.S.; Honer, S. Unclear Land Rights and Deforestation: Pieces of Evidence from Brazilian Reality. Land 2022 , 12 , 89. [ Google Scholar ] [ CrossRef ]
  • Xu, Z.; Zhuo, Y.; Liao, R.; Wu, C.; Wu, Y.; Li, G. LADM-Based Model for Natural Resource Administration in China. ISPRS Int. J. Geo-Inf. 2019 , 8 , 456. [ Google Scholar ] [ CrossRef ]
  • Yi, Y. Devolution of Tenure Rights in Forestland in China: Impact on Investment and Forest Growth. For. Policy Econ. 2023 , 154 , 103025. [ Google Scholar ] [ CrossRef ]
  • Zegar, M.; Pęska-Siwik, A.; Maciuk, K. The Problem of the Modernisation of Land and Building Register in Poland as Exemplified by the Village of Rejowiec. Bud. I Archit. 2023 , 22 , 5–20. [ Google Scholar ] [ CrossRef ]
Cluster Detection AlgorithmsModularity ValueNumber of Clusters Obtained
Louvain [ ]0.3187
Girvan–Newman [ ]0.27910
Leiden [ ]0.3106
Leading Eigenvector [ ]0.2705
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Beriozovas, O.; Perkumienė, D.; Škėma, M.; Saoualih, A.; Safaa, L.; Aleinikovas, M. Research Advancement in Forest Property Rights: A Thematic Review over Half a Decade Using Natural Language Processing. Sustainability 2024 , 16 , 8280. https://doi.org/10.3390/su16198280

Beriozovas O, Perkumienė D, Škėma M, Saoualih A, Safaa L, Aleinikovas M. Research Advancement in Forest Property Rights: A Thematic Review over Half a Decade Using Natural Language Processing. Sustainability . 2024; 16(19):8280. https://doi.org/10.3390/su16198280

Beriozovas, Olegas, Dalia Perkumienė, Mindaugas Škėma, Abdellah Saoualih, Larbi Safaa, and Marius Aleinikovas. 2024. "Research Advancement in Forest Property Rights: A Thematic Review over Half a Decade Using Natural Language Processing" Sustainability 16, no. 19: 8280. https://doi.org/10.3390/su16198280

Article Metrics

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Comments on “Worldwide research on extraction and recovery of cobalt through bibliometric analysis: a review”

  • Letter to the Editor
  • Published: 24 September 2024

Cite this article

corpus analysis in research articles

  • Yuh-Shan Ho   ORCID: orcid.org/0000-0002-2557-8736 1 &
  • Francis Lwesya   ORCID: orcid.org/0000-0002-0415-1215 2  

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

corpus analysis in research articles

Data availability

Data will be made available upon request.

Fu H, Ho Y (2015) Top cited articles in thermodynamic research. J Eng Thermophys 24(1):68–85. https://doi.org/10.1134/s1810232815010075

Article   Google Scholar  

Giannoudis PV, Chloros GD, Ho Y (2021) A historical review and bibliometric analysis of research on fracture nonunion in the last three decades. Int Orthop 45(7):1663–1676. https://doi.org/10.1007/s00264-021-05020-6

Ho Y (2017) Comments on “Mapping the scientific research on non-point source pollution: a bibliometric analysis” by Yang et al. (2017). Environ Sci Pollut Res 25(30):30737–30738. https://doi.org/10.1007/s11356-017-0381-8

Ho YS (2020a) Rebuttal to: Li et al. “Dynamic analysis of international green behavior from the perspective of the mapping knowledge domain,” Environmental Science and Pollution Research, vol. 26, pp. 6087–6098. Environ Sci Pollut Res 27(17):22127–22128. https://doi.org/10.1007/s11356-020-08728-x

Ho YS (2020b) Rebuttal to: Ma et al. “Past, current, and future research on microalga-derived biodiesel: a critical review and bibliometric analysis”, vol. 25, pp. 10596–10610. Environ Sci Pollut Res 27(7):7742–7743. https://doi.org/10.1007/s11356-020-07836-y

Ho YS (2020c) Comments on “Research on sulfur oxides and nitric oxides released from coal-fired flue gas and vehicle exhaust: a bibliometric analysis” by Wang et al. (2019). Environ Sci Pollut Res 27(6):6714–6720. https://doi.org/10.1007/s11356-019-07338-6

Ho YS (2020d) Some comments on using of Web of Science for bibliometric studies [Environ. Sci. Pollut. Res. Vol. 25]. Environ Sci Pollut Res 27(6):6711–6713. https://doi.org/10.1007/s11356-019-06515-x

Ho YS (2021) Comments on method for the top cited papers. Fresenius Environ Bull 30(7A):9624–9625

CAS   Google Scholar  

Ho YS (2023a) Comments on “Research trends and frontiers on source appointment of soil heavy metal: a scientometric review (2000–2020)” by Wang, Jingyun et al. DOI (10.1007/s11356‑021–16151‑z). Environ Sci Pollut Res 30(19):57205–57206. https://doi.org/10.1007/s11356-023-26216-w

Ho YS (2023b) Comments on “Unveiling the recycling characteristics and trends of spent lithium‑ion battery: a scientometric study” by Li, Guangming et al. DOI (10.1007/s11356‑021–17,814‑7). Environ Sci Pollut Res 30(17):51370. https://doi.org/10.1007/s11356-023-25774-3

Ho YS, Shekofteh M (2021) Performance of highly cited multiple sclerosis publications in the Science Citation Index expanded: a scientometric analysis. Mult Scler Relat Disord 54:103112. https://doi.org/10.1016/j.msard.2021.103112

Ho YS, Al-Moraissi EA, Christidis N, Christidis M (2024) Research focuses and trends in literacy within education: a bibliometric analysis. Cogent Educ 11(1):2287922. https://doi.org/10.1080/2331186X.2023.2287922

Wang MH, Ho YS (2011) Research articles and publication trends in environmental sciences from 1998 to 2009. Arch Environ Sci 5:1–10

Google Scholar  

Zhou YL, Wei XS, Huang LM, Wang H (2023) Worldwide research on extraction and recovery of cobalt through bibliometric analysis: a review. Environ Sci Pollut Res 30(7):16930–16946. https://doi.org/10.1007/s11356-022-24727-6

Download references

Author information

Authors and affiliations.

Trend Research Centre, Asia University, No. 500, Lioufeng Road, Taichung, 41354, Taiwan

Yuh-Shan Ho

Department of Business Administration and Management, The University of Dodoma, Dodoma, Tanzania

Francis Lwesya

You can also search for this author in PubMed   Google Scholar

Contributions

YH: Study conceptualization, writing and data analysis. FL: Writing, and finalizing the letter.

Corresponding author

Correspondence to Francis Lwesya .

Ethics declarations

Ethical approval.

Not applicable.

Consent to participate

Consent to publish, competing interest, additional information.

Responsible Editor: Philippe Garrigues

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Ho, YS., Lwesya, F. Comments on “Worldwide research on extraction and recovery of cobalt through bibliometric analysis: a review”. Environ Sci Pollut Res (2024). https://doi.org/10.1007/s11356-024-35148-y

Download citation

Received : 27 May 2024

Accepted : 20 September 2024

Published : 24 September 2024

DOI : https://doi.org/10.1007/s11356-024-35148-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

The Use of Verbs in Research Articles: Corpus Analysis for Scientific Writing and Translation (1)

  • January 2006

Arianne Reimerink at University of Granada

  • University of Granada

Abstract and Figures

Activation of conceptual areas in the Introduction section

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • Leonardo Alba Lopez

Bruno Oberlé

  • Novita Nurfitria
  • Lilia Indriani
  • صالح العصيمي
  • عبد المحسن الثبيتي
  • Ferdinand De Saussure
  • F.A. Navarro
  • J NATL CANCER I

Johan W Jonker

  • CARCINOGENESIS
  • Catherine S Healey
  • Sharon Galper
  • Daniel Wartenberg

Clara Inés Lopez-Rodriguez

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

IMAGES

  1. (PDF) The language of public health

    corpus analysis in research articles

  2. Corpus Analysis with Antconc

    corpus analysis in research articles

  3. (PDF) Corpus Analysis

    corpus analysis in research articles

  4. PPT

    corpus analysis in research articles

  5. | Corpus analysis results for top 200 articles from "Healthcare and

    corpus analysis in research articles

  6. (PDF) Restating Research Findings in Research Articles Discussion

    corpus analysis in research articles

VIDEO

  1. Corpus Analysis

  2. Corpus Basics II (Analysis)

  3. Unraveling Ancient Religious Traditions: Insights from the Koran Manuscript Corpus

  4. Corpus Composition

  5. Push Small Amount to create 1 Crore Corpus using SleepSIP

  6. Lancaster Summer Schools in Corpus Linguistics and other Digital Methods

COMMENTS

  1. (PDF) Corpus Analysis

    This article provides an introduction to the concept of the "corpus" where language research is at issue and to the field of corpus linguistics. It reviews the main corpus analysis tools and ...

  2. Corpus Analysis

    A corpus is a reasoned and pondered collection of texts of any nature that can be analysed on paper or in audio-video or digital form. It becomes data when it is accessed with a research objective. A digital corpus offers an empirical basis with automated access, but as Clark (Methods in Pragmatics. Mouton De Gruyter, 2018) points out, this ...

  3. PDF An Approach to Corpus-based Discourse Analysis: The Move Analysis ...

    n Approach. Corpus-based Discourse Anal. Move Analysis as ExampleTHOMAS A . UPTON AND MARY ANN COHENAbstractThis article presents a seven-step corpus-based approach to discourse analysis that starts with a detailed analysis of each individual text in a corpus that can then be generalized across all texts of a corpus, providing a description o.

  4. Corpus Analysis

    Abstract. Large and small language text corpora have become quite ubiquitous in the broad fields that make up the study of language and social interaction. This article provides an introduction to the concept of the "corpus" where language research is at issue and to the field of corpus linguistics. It reviews the main corpus analysis tools ...

  5. A corpus-based analysis of research article macrostructure patterns

    Abstract. This study investigates how the macrostructure patterns (MSPs) of research articles (RAs) are distributed across different disciplines. The investigation is based on the Elsevier OA CC-BY corpus consisting of 76,835 RAs from 26 disciplines coming from Health Sciences (HS), Social Sciences and Humanities (SH), Life Sciences (LS), and ...

  6. Corpus-Based and Corpus-driven Analyses of Language Variation and Use

    In contrast, "corpus-driven" research is more inductive, so that the linguistic constructs themselves emerge from analysis of a corpus. The availability of very large, representative corpora, combined with computational tools for analysis, make it possible to approach linguistic variation from this radically different perspective.

  7. Corpus Analysis

    Co-word analysis is an established bibliometric technique widely used in scientometric research to describe and interpret the organization of knowledge in a scientific discipline (e.g., Lee & Jeong, 2008). It involves a co-occurrence analysis of keywords or meaningful terms in a selected body of literature.

  8. CORPUS METHODS IN LANGUAGE STUDIES

    Abstract. This chapter offers an introduction to corpus linguistics as a methodology for studying language, literature, and other fields in the humanities. It defines corpus linguistics, explores ...

  9. Epistemologies of corpus linguistics across disciplines

    This paper discusses the scientific impact of corpus linguistics methodology and methods by paying attention to the selection, analysis and reporting in a range of disciplines including, philosophy, sociology and, most notably here, education. Initially, a broad focus is offered in Section 2, with a more detailed reflection on the use of corpus ...

  10. Qualitative Corpus Analysis

    Qualitative corpus analysis is a methodology for pursuing in-depth investigations of linguistic phenomena, as grounded in the context of authentic, communicative situations that are digitally stored as language corpora and made available for access, retrieval, and analysis via computer. Researchers using qualitative corpus analysis as the ...

  11. (PDF) Qualitative Corpus Analysis

    Abstract. Qualitative corpus analysis is a methodology for pursuing in-depth investigations of linguistic phenomena, as grounded in the context of authentic, communicative situations that are ...

  12. Corpus Analysis and Corpus-Based Writing Instruction

    This chapter discusses the research and instructional practice on corpus approaches in L2 contexts. The chapter begins with the definition of corpus analysis and the rationales for corpus-based writing instruction. It then explains key texts in this domain of research, with important information presented in illustrative tables.

  13. A Multi-Dimensional Analysis of Research Article Discussion Sections in

    From a small corpus of 24 RAs of research articles of Random Control Trials (RCTs) in orthopedic medicine, 161 uses of hype language were identified and categorized for functional and linguistic realization. ... Jin B. (2018a). A multi-dimensional analysis of the research article discussion sections in the field of chemical engineering. IEEE ...

  14. Corpus-based discourse analysis: from meta-reflection to accountability

    This openness of corpus-based discourse analysis to method reflection is possibly based on "the distinct combination of qualitative and quantitative techniques that has made us more apt to think about method per se" (Baker 2018: 291). A recent volume (Taylor and Marchi 2018) is explicitly dedicated to "dusty corners" (neglected aspects ...

  15. Diachronic corpus analysis of stance markers in research articles: The

    Motivated by such an ambition, the current research drew on a corpus of 4.3 million words taken from three leading journals of applied linguistics in order to trace the diachronic evolution of stance markers of research articles from 1996 to 2016. Hyland's model of metadiscourse was adopted for the analysis of the selected corpus.

  16. Replicating corpus-based research in English for academic purposes

    The present article makes a case for replication of corpus-based studies in the field of EAP. It argues that replication research not only enhances the credibility of corpus linguistics for EAP pedagogy and research but also provides practical advice for EAP teachers and materials designers.

  17. Using Corpora to Aid Qualitative Text Analysis

    Aim. The aim of this paper is to present and exemplify a number of basic uses of corpus-based text analysis tools that can supplement and provide additional insight for an otherwise qualitative analysis of a text. I attempt to show that nowadays certain corpus tools are easily accessible to any researcher and can be used to enrich the results of studies concerned with texts.  Methods.

  18. Aarts: Corpus analysis

    Corpus analysis. Roughly, the data available for linguistic research stem from either of two sources: intuitions about language or observations of linguistic events. Collections of data of the latter kind are called corpora. Although corpus data have been used throughout the history of linguistic research, a real breakthrough in their use came ...

  19. Digital Humanities Workbench

    Corpus analysis is an empirical research strategy that is widely used within language research, using authentic (real, actually attested) language material. A so-called corpus (also known as a text corpus) is a digital collection of texts, text fragments and/or transcripts (of spoken language), which are selected in such a way that they are the ...

  20. Corpus Linguistic Analysis

    Doug Biber and his research team used corpus linguistic analysis to analyze different kinds of language use on college campuses, including research articles, textbooks, and office hours. One thing they wanted to investigate was how textbooks compared to these other kinds of language use, because instructors often think that textbooks provide ...

  21. How complex is professional academic writing? A corpus-based analysis

    This study focuses on the analysis of linguistic complexity in professional academic writing in light of the empirical evidence provided by a 1,597,000-word corpus of 'hard' (life and physical ...

  22. Research Advancement in Forest Property Rights: A Thematic ...

    This paper proposes a thematic literature review of advances in the literature on forest property rights over the first half of this decade. From a methodological point of view, we exploited a corpus of scientific articles published between 2019 and 2023, extracted from the Scopus and Web of Science databases. We then performed a co-word analysis using the Louvain algorithm to reveal thematic ...

  23. (PDF) A Corpus Analysis of Frequently Occurring Words and their

    A Corpus Analysis of Frequently Occurring Words and their Collocations in High-Impact Research Articles in Education December 2020 3L The Southeast Asian Journal of English Language Studies 26(4 ...

  24. Comments on "Worldwide research on extraction and recovery of cobalt

    Zhou et al. recently published a paper in Environmental Science and Pollution Research entitled 'Worldwide research on extraction and recovery of cobalt through bibliometric analysis: A review'.Zhou et al. stated in Data collection that "The query was limited to Science Citation Index Expanded and was "TS = (((TS = (recycle)) OR TS = (recovery)) OR TS = (extraction)) AND TS = (cobalt)."

  25. (PDF) The Use of Verbs in Research Articles: Corpus Analysis for

    This article describes the results of a study of the use of verbs in the different sections of medical research articles. A corpus of 30 POS-tagged texts was used.