Degrees of difficulty to learn an additional language: the role of typological distinctions and linguistic distances between the languages involved

Abstract

Transfer plays an important role in second language acquisition. It appears that humans can quickly perform quite well on new but similar tasks, such as learning an additional language that is similar to a previously learned language. In contrast, difficulty in learning a new language depends on the typological distinctions and the linguistic distance between the language involved.

New approaches are currently being developed that might present opportunities for closer understanding of the learning mechanisms underlying transfer in SLA. Big data and its tools often play an important role in these developments and their applications. These include, for instance:

  • Language testing institutions can provide access to language proficiency testing scores for learners with diverse language backgrounds and learning trajectories (age and exposure).
  • Educational technology generates similarly large numbers of constructions from language learners with diverse backgrounds.
  • Typological databases make it easier to quantify linguistic differences across many languages.
  • The availability of computational tools for SLA research such as mixed effects modeling, NLP, and Bayesian modeling.

These innovations have resulted in new theoretical perspectives that quantify the roles of linguistic similarities or distances. Recent research suggests that the linguistic starting points of the learners determine aspects across all domains of language proficiency. Varying types of similarity also seem to have varying impacts on language learnability. This colloquium showcases new research on the different types of similarity and highlights the implications for additional language learning.

Capturing the role of L1 experience in L2 learning
Florian Jaeger, University of Rochester

An adult learner’s native language (L1) has a tremendous influence on the difficulty they experience when acquiring a second or other language (L2). Recent estimates attribute as much as 69% of the explained variance in L2 speaking proficiency to learners’ L1 background (Schepens et al., 2019). However, how to best describe or model the influence of L1 knowledge and experience on L2 learning has remained a challenge. This holds, in particular, with regard to the most fine-grained aspects of L1 knowledge, such as the implicit knowledge about the mapping from phonological categories or words onto acoustic dimensions (e.g., voice onset timing, energy formants, etc.)—i.e., the knowledge that allows listeners to recognize the basic building blocks of language.

I will use a related, but simpler, learning problem—native speakers’ adaptation to an unfamiliar foreign accent—to demonstrate how Bayesian inference provides an effective way to model previous experience and its effect on learning. Computational models that implement Bayesian inference (ideal adapters, Kleinschmidt & Jaeger, 2015) allow us to make testable predictions about how a learners’ implicit knowledge (or in Bayesian terminology: beliefs) changes with exposure to unfamiliar input (e.g., from a novel language or an unfamiliar foreign accent), and how these changes are predicted to affect, for example, comprehension (Xie et al., in progress).

The advantages of explaining learners’ L2 Dutch language variation by means of L1-Ln lexical, morphological, and phonological distance measures
Job Schepens, Freie University; Frans van der Slik, Radboud University; and Roeland van Hout, Radboud University

We studied the impact of three L1 to additional language (Ln) Dutch distance measures on the speaking test scores of more than 50,000 adult learners of Dutch: lexical distance, morphological distance and phonological distance. Lexical distance is an absolute measure that expresses branch lengths in a phylogenetic language tree based on expert cognacy judgements of words in Swadesh lists (Gray & Atkinson, 2003). Morphological distance is a relative measure relating the properties of an L1 to the properties of an Ln (Ln Dutch) that is based on selected morphological features as described in WALS, used by Lupyan and Dale (2010). Phonological distance is a relative measure, too, relating the new features for an L1 to the features of an Ln (Ln Dutch). This measure is based on selected phonological features as described in PHOIBLE (Moran, McCloy & Wright, 2014) (Schepens, Jaeger & van Hout, submitted).

The impact of the three distance measures on the acquisition of Dutch as an additional language was examined in immigrants from 49 mother tongue backgrounds, spoken in 74 countries, 20 of which were Indo-European (IE) and 13 non-Indo-European (non-IE). We found that the combination of lexical, morphological and phonological distance measures successfully yields an accumulative, unbiased, and fairly complete account of differences in Ln Dutch speaking test scores.

Linguistic typology and learnability in second language
Dora Alexopoulou (in collaboration with Xiaobin Chen and Ianthi Tsimpli), University of Cambridge

In this work we exploit recent results from linguistic typology and large datasets from online language learning  to provide a typological framework for  the investigation of linguistic distance based on an empirical investigation of a set of 10 typologically diverse languages. Our approach to measuring linguistic distance is syntactic, complementing recent approaches relying on lexical, morphological and phonological features (e.g. Schepens et al. 2016; Borin & Saxena 2013). Specifically, to measure linguistic distance between L1 and L2, we adopt the Parametric Comparison Method (PCM) (Longobardi & Guardiano 2009).  Following the Principles and Parameters framework, PMC uses binary parameters to model cross-linguistic variation and measures distance through identities and differences in parameter values. It yields measures refined enough to differentiate between as many as 28 languages and successfully distinguish between language genealogies.

To obtain a dataset rich enough for the investigation of typological effects across developmental stages with significant learner numbers, we exploit advances in online learning technology.  Specifically, we use the EF Cambridge Open Language Database (EFCAMDAT), an open access corpus consisting of L2 writings submitted to Englishtown, the online school of EF Education First, an international school of English as a foreign language. EFCAMDAT is an open access corpus standing out for its size, with 1.2 million scripts summing 71.8 million words.  Available at http://corpus.mml.cam.ac.uk/efcamdat, it contains  128 distinct tasks across the proficiency spectrum drawing from learners across the globe (170 nationalities).   

Our main research question is the impact of linguistic distance on the acquisition of L2 features that are absent from the L1. Specifically, we focused on whether there is evidence for typological effects on the  acquisition of individual features rather than  language specific effects only. We draw evidence from two phenomena, the acquisition of relative clauses and the acquisition of articles.

The Hartshorne, Tenenbaum, and Pinker data revisited
Frans van der Slik, Roeland van Hout, Job Schepens & Theo Bongaerts, Radboud University

Based on the data of 2/3 million speakers of English, Hartshorne, Tenenbaum and Pinker (2018) have found supportive evidence for the Critical Period Hypothesis (CPH). Starting to acquire English before the age of 17.4 would be of critical importance to reach native-like acquisition of English. We re-analyzed the data to investigate whether language distance effects may have an effect on ultimate attainment in Ln acquisition, because that is the level of acquisition they wanted to measure.

We first came to the conclusion that their claim about the CPH is unwarranted and based on at least two fundamental analytical flaws. First, rather than making use of individual data, Hartshorne, Tenenbaum and Pinker, performed their analyses on aggregated data, thereby ignoring the massive random variation in their data. Second, the authors did not analyze the data of immersion language learners separately, but also included non-immersion learners (and mono, and bilinguals). The vast majority of these non-immersion learners assumingly have learned English in high school in a non-English environment, which fits neatly fits with the existence of a discontinuity in learning rate at the age of 17 or 18 years of age (see Flege, 2018).

We claim that the evidence supporting the CPH is an artefact of the inclusion of these non-immersion learners. A re-analysis of the data of 14,650 immersion learners did not reveal a discontinuity in learning rate when learners grow into late adolescence. We rather found support for a gradual decrease in learning rate, which is in line with the Life Span Theory of Cognitive Development. In addition, there is no clear distance effect and we will investigate the reasons for its absence in this dataset.

Discussant: Prof. Scott Jarvis, University of Utah