Let's automate: Natural language processing tools and their applications
Organizer: Scott Crossley, Georgia State University
Abstract
Natural Language Processing (NLP) focuses on computer programs that analyze large corpora of natural data for linguistic features (e.g., lexical, syntactic, and cohesion features). The use of NLP in applied linguistics research is steadily increasing as larger corpora and more effective and robust NLP tools become available. This colloquium brings together specialists in the development and application of NLP tools that specifically tackle issues related to language acquisition, use, and education. The colloquium will provide an overview of current trends and themes in NLP from a variety of linguistic perspectives as well as introduce available NLP tools and discuss their use in language research.
For instance, Scott Jarvis will discuss how a new NLP tool that measures lexical diversity (LD) can help provide construct definition for LD in both first and second language (L2) writers. Kristopher Kyle’s presentation will provide an overview of lexical sophistication features that can be used to predict L2 lexical proficiency, writing quality, and speaking proficiency, as well as introduce a new NLP tool to assess lexical sophistication. Lastly, Xiaofei Lu will discuss how new NLP approaches to syntactic complexity can incorporate functionally appropriate uses of linguistic features within writing contexts.
Overall, this colloquium will provide a synopsis of current NLP trends and tools in applied linguistics as well as provide information about how NLP tools can be used to assess language constructs, the functional effectiveness and limitations of NLP tools, implications for NLP tools in language teaching, and opportunities for NLP tools in future applied linguistic research.
Presenter: Scott Jarvis, University of Utah
Title: Automated tools for investigating lexical diversity: Exploring what writers do differently when they try to increase their LD
Lexical diversity (LD) refers to the variety of words found in samples of speech and writing. LD is of interest to applied linguists because it has been found to serve as a useful proxy for constructs such as language ability (Yu, 2010) and language dominance (Treffers-Daller, 2009). However, existing measures of LD have important shortcomings, and recent research has concentrated on defining the construct (Jarvis, 2013a, 2017) and developing and validating measures that are consistent with the construct definition (Fergadiotis, Wright, & West, 2013; Jarvis, 2013b). The purpose of the present paper is twofold: (a) to contribute to the construct definition of LD by determining which lexical properties of a text change when writers intentionally try to increase its LD, and (b) to introduce a new, automated LD tool that is particularly suited to this purpose.
The writers of the texts analyzed in this study included 26 students (15 native and 11
Presenter: Kristopher Kyle,
Title: Automatically assessing multiple features of lexical sophistication with
Lexical sophistication is commonly understood as the use of “advanced” words. Sophistication has most often been defined with regard to the proportion of infrequent words in a text (e.g., Laufer & Nation, 1995; Read, 2000), under the generally accepted hypothesis that highly frequent words will be learned earlier and more easily than less frequent words (e.g., Ellis, 2002). While frequency is undoubtedly an important feature of sophistication, a number of recent studies have demonstrated that lexical sophistication is most accurately modeled when multiple complementary features are used (e.g., Kim, Crossley, & Kyle, 2018; Kyle & Crossley, 2015; Kyle, Crossley, & Berger, 2018). Automated text analysis tools such as the Tool for the Automatic Analysis of Lexical Sophistication (
In this presentation, a review of recent literature will highlight the importance of a number of features of lexical sophistication in predicting
Additionally, the most recent version of
Presenter: Xiaofei Lu, Pennsylvania State University
Title: Towards a functional turn in L2 writing syntactic complexity research
Syntactic complexity (SC) is commonly construed as the range and degree of sophistication of the syntactic structures used in language production (Ortega, 2003). With the advent of multiple tools for automating syntactic complexity analysis using various coarse- and fine-grained measures (Biber et al., 1999; Kyle, 2016; Lu, 2010; McNamara et al., 2014), numerous quantitative studies have examined and generated valuable insights into SC features predictive of
In this talk, I argue for the need for a functional approach to SC research that systematically examines the genre appropriateness and functional effectiveness of syntactically complex structures and illustrates the resources and insights functional SC research can generate using findings from a recent project. Using several commonly adopted operationalizations of SC (e.g., sentence length, subordination, left embeddedness, nominalizations) and a modified version of Swales’ (2004) Create A Research Space (CARS) model for rhetorical functional analysis, this study systematically aligns the SC features identified with the rhetorical functions they are deployed to realize in a corpus of social science research article introductions. I conclude with a discussion of the implications of the functional approach for L2 writing pedagogy and assessment and the possibility of tapping into success in emerging research on automating rhetorical function annotation to move functional SC research forward.