The Swiss Learner Corpus SWIKO
The Swiss Learner Corpus SWIKO is compiled as part of the SWIKO/WETLAND research project at the Institute of Multilingualism in Fribourg (CH). Currently, the corpus contains over 2600 annotated texts containing more than 170'000 tokens. These are based on oral and written productions by Swiss lower secondary school students in German, French, and English, both as their language of schooling and foreign languages.
This website provides information about the project, particularly regarding data collection and processing, derived from a number of project internal documents. Additionally, some use cases for the classroom, material design and research are suggested.
The online corpus databank is currently accessible via a personalized log in which can be obtained by contacting Moritz Sommet.
Corpus design
Use scenarios
Introduction [Back]
Context and aims
SWIKO is a multilingual corpus currently being developed as a research project at the Institute of Multilingualism in Fribourg, CH . The project investigates selected areas of linguistic competence among Swiss (lower) secondary school language learners. The starting point for the project is the shift in foreign language (FL) education towards a communicative as well as action- and content-oriented approach focusing on language in use, which first emerged as a guideline in the form of the CEFR (Council of Europe, 2001) and is now reflected in the regional curricula (Lehrplan 21, D-EDK, 2023; e.g., Passepartout, Bertschy et al., 2015; or Plan d’études romand, CIIP, 2023) and teaching materials (e.g., New World, Arnet-Clark et al., 2013). In this context, language education “should be directed towards enabling learners to act in real-life situations, expressing themselves, and accomplishing tasks of different natures” (Council of Europe, 2020, S. 29). Language learning is therefore equated with the development of communicative language competencies. Thus, the focus has shifted away from an emphasis of grammar and lies instead on the appropriate use of vocabulary and grammatical structures (Ende et al., 2013).
While a national monitoring (ÜGK, 2019) as well as related projects (e.g., Peyer et al., 2016) assessed the achievement of these communicative goals, SWIKO investigates what the linguistic competences of the students look like under these “new” guiding principles in FL education, and which linguistic means students use when solving tasks to achieve these communicative curricular goals. Therefore, in line with classroom practice, learner language is elicited through tasks, wherein learners carry out linguistic macro-functions such as describing and arguing (e.g., tasks in the teaching material genial klick, Endt & Conférence intercantonale de l’instruction publique de la Suisse romande et du Tessin, 2018). The data is then processed and analysed using corpus-linguistic methods. This results in an empirical contribution towards a better understanding of the linguistic competencies in the context of the communicative approach to FL education as well as an evidence-based foundation for the formulation of realistic expectations of students’ performance in regards to formal language aspects at the end of mandatory schooling in Switzerland.
Particular attention is devoted to the interplay of various conditions of the productions – the language (German, French, and Englisch, both as the language of schooling (LoS) and foreign language (FL)), modality (written vs. oral production), medium (computer vs. paper-based) – and task design features, i.e., the text type (descriptive vs. argumentative), topic familiarity (academic vs. leisure), and structure (more or less restrictive input), in order to address the question of how changes in these variables influence the quality of a learner production.
Current scope
As of February 2024, data was collected from three different curriculum regions, which are recorded in the sub-corpora:
- SWIKO17 (French-speaking Switzerland, German 1st FL, English 2nd FL)
- SWIKO18 (German-speaking Switzerland, French 1st FL, English 2nd FL)
- SWIKO19 (German-speaking Switzerland, German and English as LoS)
- SWIKO22 (French-speaking Switzerland, German 1st FL, English 2nd FL).
Participants solved the tasks both in their language of schooling (LoS) as well as their foreign languages (FL), resulting in a trilingual learner corpus wherein a variety of aspects can be analysed that were annotated both manually and automatically.
Data collection and task variation [Back]
Data was collected from students in Grades 10-12 in different Swiss regions. Pupils performed tasks in their language of schooling and foreign languages, under various conditions. Tasks involved different languages, mediums (computer or paper-based), and modalities (oral or written). To reflect the scope of tasks in language classes, eight tasks were systematically varied based on three characteristics: text type, topic familiarity, and structure.
Overview
To reflect the scope of tasks that pupils encounter in their instructed FL classes, eight tasks were developed which systematically varied by text type (descriptive vs. argumentative), topic familiarity (academic vs. leisure), and stucture (open-ended vs. restrictive). The tasks were then solved under various conditions: the target language (French, German, and English), modality (speaking and writing), and medium (on the computer and on paper). These factors will be further elaborated and illustrated in the next sections based on a few examples.
The tasks are based on the task concept of the TBLT approach (task-based language teaching, e.g., Ellis et al., 2020; Long, 2015), i.e., task as a work plan (in contrast to task as a process). As a result, the tasks can be characterized by the following features: i) focus on meaning (vs. form), and ii) gap principle (in a broader sense, e.g., information transfer), and with certain restrictions also in regards to iii) mobilization of the entire linguistic repertoire, and iv) a clearly defined communicative goal. The restrictions relate to the fact that some tasks were more restricted, allowing learners to adopt more input, and that tasks were communicatively situated in a narrow sense (for differently accentuated task definitions see e.g., Long, 2015, chapter 5; Samuda & Bygate, 2008, chapter 5).
Participants and procedure
Data was collected among 14- to 17-year old pupils attending Grades 10-12 at mostly mid- and high-performing levels . Pupils as well as parents were informed about the project’s purpose in advance and were given the opportunity to refuse participation.
Each participant was assigned an anonymized ID, consisting of a two-letter code for the location and an individual numeric code consisting of three numbers (e.g., Th104), which was then used consistently across modalities and tasks completed. All participants carried out four tasks each in two of their three languages independently within to lessons (2x45 minutes) during regular school hours. In other words, students were given 45 minutes to complete four tasks in one language, but were otherwise free to manage their time. Most of the learners completed the tasks in writing; about half each on paper and the other half on the computer. In addition, four randomly selected learners per class solved the same tasks orally in a separate room. Two of them worked individually on a computer, whereas the other two took part in an interview situation with a researcher.
As of February 2024, data was collected in three different curriculum regions: Four classes in 2017 (student IDs Es and Mo) in French-speaking Switzerland with German as the first and English as the second FL, six classes in 2018 (student IDs Ba, Fr, and Th) in German-speaking Switzerland with French as the first and English as the second FL, one class in 2019 (student IDs Pf) in German-speaking Switzerland with both German and English as the language of instruction, and five classes in 2022 (student IDs Ri) in French-speaking Switzerland with German as the first and English as the second FL.
Production conditions: Language, medium and modality
The SWIKO-corpus contains productions in the pupils’ LoS as well as their two FLs. For the two largest language regions of Switzerland, this implies LoS (“L1” for most participants) corpora in both French and German as well as L2/L3 (FLs) corpora in German, French, and English . Following a national standardisation of the school system (HarmoS), Swiss cantons’ school systems now adhere to common structures and national standards. As such, most pupils learn one national language and English as FLs in instructional settings (with 2-3 lessons per week), starting in Grades 5 (~ age 8) and 7 (~ age 10), respectively. In the French-speaking part, pupils start to learn German first, followed by English. Pupils in the German-speaking region begin with either French or English, depending on the canton. According to the national standards developed within the HarmoS project, all students are expected to reach at least proficiency level A1.2 after Grade 8 and A2.2 (A2.1 for written skills) after Grade 11 in both FLs . Therefore, SWIKO mostly looks at lower levels of language proficiency, although individual pupils will easily surpass the minimal standards, especially in the higher levels of the educational system.
The instructions for each task were given in writing in the language of schooling, independently of the medium and target language of the task, to avoid misunderstandings due to reading comprehension problems (Barras et al., 2016; Karges et al., i. E.).
All tasks were developed both as a computer-based version with the software CBA Item Builder (DIPF & Nagarro, 2018b), and as an identical, paper-based (i.e., not-computerised) version.
The written paper-and-pencil tasks were distributed in individualised booklets in which the students wrote their texts (see example task SWI02 below). The prompts include a little introduction as well as instructions in the language of schooling, and where applicable additional information (such as a graph or ideas for pro and contra arguments where applicable). Additionally, the tasks contain lines to write an answer on as well as an indication of the number of words the students are asked to write. As mentioned above, students chose themselves how much of the overall 45 minutes they wanted to allocate to each of the four tasks assigned, in both the written paper- as well as computer-based tests.
The computer-based written tasks provided the same information through the size of the input field in which the students wrote (see example task SWI02 below). The students completed the computer-based tasks individually in a standard internet browser on identical laptops, and the resulting texts were saved to the delivery server (EE4CBA, DIPF & Nagarro, 2018a).
For the computer-based oral tasks, students again worked individually on identical laptops in a standard internet browser, recording their answers with a headset, which were then also saved to the delivery server (EE4CBA, DIPF & Nagarro, 2018a). They were assigned the same tasks, instructions, and additional information where applicable as in the written formats, and were asked to speak for at least 100 seconds per task. A counter at the bottom helped the students determine when they had talked for long enough. However, participants could start and stop the recording by clicking a red button. Thus, they could choose when they were ready to give an answer, and were free to go on whenever they had nothing more to say, which resulted in most recordings being less than 100 seconds long.
During the oral, paper-based, one-on-one interview sessions, the human interlocutor / researcher assigned the same tasks orally, and where applicable provided additional prompts both orally and in printed form. If necessary, the interlocutor / researcher encouraged the participant to continue to speak, but did not intervene otherwise. These conversations were recorded in full with a Dictaphone.
Tasks
The tasks were systematically varied along three design features: intended text type, topic familiarity, and structure. In contrast to the conditions of production described above, these task features are didactically motivated (e.g., high expectations regarding the successive mastery of different text genres in the Plan d’études romand, CIIP, 2010), de facto more gradual (instead of dichotomous), and theoretically less consistent and uniform.
In particular, the design features are based on several concepts: text type relates to discourse type (Samuda & Bygate, 2008, S. 107), rhetorical mode (Ellis et al., 2020, S. 11), text genres (CIIP, 2010), and linguistic macro functions (Council of Europe, 2001, S. 123). Topic refers to Hulstijn’s continuum of basic and higher language cognition (BLC and HLC, Hulstijn, 2015 chapter 4), wherein leisure or everyday life is used as a proxy for BLC and academic or school-related topics as a proxy for HLC. Finally, task structure (Samuda & Bygate, 2008, S. 108) describes the more or less restrictive input, e.g., specification of lists vs. free self-portrait. In regards to the complex interplay of design features, Ellis et al. conclude that “[t]here is currently no theory of how the myriad design variables interact to affect task complexity” (Ellis et al., 2020, S. 348), which is particularly due to the fact that the complexity of an individual task does not only depend on its design but also on the implementation by a learner (task as a process).
As a result, four descriptive and four argumentative texts were elicited, half of which relate to academic and the other half to everyday topics, respectively. Furthermore, the first four tasks were more restricted in terms of vocabulary, whereas the last four allowed participants to answer more freely.
Data processing [Back]
Transcriptions followed specific guidelines adapted from other projects. Oral data was transcribed using EXMaRALDA, and written data was transcribed and annotated using XMLmind. The data was tokenized, lemmatized, and POS tagged using automated tools with manual corrections where necessary. For all texts written in German, multi-level error annotations were added for further detail.
In a next step, all written productions were rated by trained raters based on four linguistic criteria: vocabulary, grammar, spelling, and text using CEFR descriptors. The ratings were then analyzed using a Many-facet Rasch analysis to ensure fair and consistent evaluation
Transcription and tagging
The transcription and annotation guidelines are based on an XML-script, which was kindly made available by the MERLIN project team. The transcription and initial manual markup were implemented by trained student assistants, who followed detailed guidelines and were in regular contact with the research team to discuss and clarify areas of uncertainty. The majority of transcriptions were checked by a second student assistant, and, if necessary, revised accordingly.
Meta data
In addition to the transcribed and annotated texts, each file (a single oral or written learner production) includes meta data on:
the author’s gender (male / female / unknown);
the author’s grade and possibly level (Grade 10-12, where applicable including level);
the location of schooling (canton and city);
the year of data collection;
the languages involved (i.e., language of schooling, of text production, and where applicable additional languages spoken at home);
the modality of production (computer- or paper-based);
the task type;
the transcriber.
Oral data
The EXMaRALDA Partitur-Editor (Schmidt & Wörner, 2009) was used to transcribe the oral data. In line with the research aims, the creation of the corpus involved orthographic transcription (rather than phonetic). The guideline was developed on site based on a comparison of several transcription systems (e.g., GAT2: Selting et al., 2009) and specifically tailored to our multilingual context. Furthermore, the transcription codes capture different features to facilitate the automized POS tagging in the next step.
Each file contains the original audio recording as well as the transcript, an example of which is given in Figure 4. A first track [v] corresponds to a basic transcript (cf. Selting et al., 2009), i.e., an orthographic transcript in lower case letters and without punctuation which reflects the original spoken utterances as closely as possible. A second track [v-corr] contains a normalized transcript in preparation for the POS tagging, which includes upper and lower case letters as well as punctuation. Silent and filled pauses as well as intonation features are not included. Furthermore, existing tags are adapted to the format < starttag > … < /endtag > , and expanded in order to allow for better comparability to written texts. A third track [c] contains further comments and observations about the speech situation (e.g., background noises). The “paper-based” interview situations additionally contain two tracks for the interlocutor; again, one containing a basic transcript and a second one with normalized spelling. Here is an example:
Written data
The written texts were transcribed and annotated using XMLmind (Shafie, 2021). Initially, the form contains the transcriber’s initials (e.g., JBE) as well the initials of the person that double checked the transcription (e.g., NMU). Furthermore, the author’s ID (one capital and one small letter for the school plus three randomly assigned numbers, e.g., Th133) is recorded as well as the task ID (SWI for SWIKO plus two numbers ranging from 01 to 08, e.g., SWI02) including the author’s language of schooling (small letter) and language of the text production (capital letter, e.g., dE for a production by an author with German as their language of schooling, writing in English as their foreign language). Finally, the medium records whether a production was computer- or paper-based (e.g., p).
Each text was transcribed twice. This process allowed to preserve relevant information from hand-written texts while also properly preparing the data for the POS tagging. An example is given below:
In the first transcription, the so-called original text remains as close to the author’s intention as possible. The second transcript, the so-called tagged text, is enriched with a variety of tags which then allows for POS-tagging. Among others, erroneous words (spelling errors and non-target words) are corrected, while both the original and corrected form are noted.
Annotation
Tokenization, lemmatisation, and POS tagging
The transcriptions were then transformed to csv files using specifically developed R scripts (R Core Team, 2022) and were lemmatized and POS-tagged using TreeTagger (Schmid, 2013) and the koRpus package (Michalke, 2019). Finally, they were converted to EMARaLDA (Schmidt & Wörner, 2009) for further analyses. For all written German texts, a target hypothesis – an orthographically and grammatically correct version (Lüdeling & Hirschmann, 2015) – was defined for each learner text, which formed the basis for a semi-automated error annotation. Furthermore, all oral German texts were error-annotated manually.
While the automatic annotation allows for an expansion of the range of available linguistic features beyond what would have been possible with a time-consuming and expensive manual annotation, it should be kept in mind that automatic annotation is particularly challenging for learner language since it often considerably deviates from the target language use across various levels of linguistic analyses such as spelling or syntax.
Each annotated file contains several levels; an overview is given below. In the oral data, the first tier contains the basic transcript [original]. Across both modalities and therefore all data sets, the next two tiers contain a token tier [tok] as well as a lemma tier [lemma], with an additional tier each to allow for manual corrections if necessary, labelled as [ctok] and [clemma].
To allow for comparison across the different languages, all transcripts are enriched with two groups of POS tags: the language-specific POS tags that are usually used in TreeTagger (e.g., the part of speech tag set STTS for German) displayed in one tier [lg-specific POS], and, in extension, cross-language POS tags, which to a certain extent form the smallest common denominator for the German, French, and English tags used, reported in an additional tier [commonPOS]. For example, as can be seen in the transcript below (Figure 4), the common POS tag set does not differentiate between full and auxiliary verbs (VVFIN and VAFIN) as in the German STTS tag set, and therefore uses the more general VER:PRE tag for both. These annotated transcripts can then be supplemented with various annotations at both word and sentence level.
Target hypothesis and semi-automated error annotation in written German productions
For each written German production, a target hypothesis – i.e., an orthographically and grammatically version of the learner text (Lüdeling & Hirschmann, 2015) – was created. Following the manual formulation of the target hypothesis, the error tags were assigned automatically via a specially programmed Python script. Generally, there are five types of differences between the original text and target hypothesis:
Change (CHA): one (or more) letter(s) or word has to be changed.
Insert (INS): one letter or word is missing and has to be added.
Delete (DEL): one letter or word is redundant and has to be deleted.
Move (MOVS-MOVT): the word order has to be changed, i.e., one or more words has/have to be moved
Word boundaries: two words were erroneously separated (i.e., have to be merged: MERGE) or erroneously written as one (i.e., have to be separated: SPLIT).
At the orthographic level, errors are categorized as either stemming from capitalization, grapheme, or word bound errors. Capitalization errors are all errors of the type “change”, and can either refer to the beginning of sentences, nouns, or others. Grapheme errors are further subdivided based on whether a grapheme was missing, redundant, changed, or whether several graphemes are affected. Capitalization and grapheme errors are tagged on a single token basis. Finally, word bound errors refer to words which are either erroneously split, merged, or hyphenated, and they are tagged across two or more tokens.
The grammar errors at token level are categorized according to the types of difference, including missing, redundant, and wrongly inflected or chosen words. All of these are tagged on a single token basis and include a reference to the part of speech. Furthermore, position or movement errors are tagged in two ways: once at the token level, and once spanning the whole sentence, which can contain one or more movement errors at the token level.
In order to assign the orthography tags, the script compared the tok track with the TH1 track but only for tokens which had already been marked as containing an orthographic error in the tag track during the initial transcription phase. An example is given below.
In order to assign the grammar tags, the script compared only the remaining token of the tok track with the TH1 track. An example is given below.
Rating
Training modules
Each rater was required to complete two training modules. In the first module, learners completed three tasks. The first aimed at developing a differentiated understanding of the criteria for evaluating the quality of written expression based on excerpts from the Common European Framework of References (Council of Europe, 2001, 2020). In order to apply their newly acquired knowledge, the second asked raters to create a self-assessment of their own writing skills in their second strongest foreign language. Finally, by completing the third task, raters developed confidence for the SWIKO rating grid (Table 6), which was compiled from the lingualevel and pre-A1 CEFR descriptors (Council of Europe, 2001, 2020; Lenz & Studer, 2008). After familiarizing themselves with the rating grid, raters solved two learningapp tasks: First, they categorized the descriptors by content, and then assigned descriptors to levels.
The second module consisted of exemplary ratings. The first task aimed at understanding a reference performance in French as a foreign language (L1 German) based on three tasks, the transcripts of the student’s answers and the assessment of her writing performance taken from lingualevel (Lenz & Studer, 2008). Raters were then asked to assess 6-10 learner texts using the SWIKO rating grid based on four linguistic categories (Table 6), and to write a short justification of their rating for each text in a word document. These sample ratings were collected, compared and discussed across all raters per language (German, English, and French as a foreign language) in order to clarify uncertainties and agree on an unambiguous rating per text.
Procedure
All written productions of the SWIKO corpus were rated between 2020 and 2023: In 2020 and 2022, overall 42 pre-service German as a foreign language teachers from the University of Fribourg rated the German as a foreign language texts. Additionally, the 2020 group rated a handful of French as a foreign language texts and the 2021 group a handful of English as a foreign language texts to compare the rater severity across languages. In order to compare the rater severity across the two groups as well as connect the raters for the Many-facet Rasch analysis, one pre-service teacher additionally rated a the majority of texts from both groups in 2022, including all texts where two or more raters disagreed by more than 2 half-levels (e.g., A1.1 and A2.2). The same rater was also part of the team for the French and English as a foreign language rating.
In 2023, 5 raters with backgrounds in social and educational studies rated all the written French as a foreign language texts, and out of them, 3 raters additionally rated all the written English as a foreign language texts. Each rater received a set of anonymized learner texts in .txt format, which had been transcribed as closely to the original as possible, i.e., including spelling errors and non-target words. They were asked to first read through all the texts in each set and then rate them based on four linguistic criteria: vocabulary, grammar, spelling, and text. For the majority of learner texts, the raters simply noted their rating in an excel sheet, while for a small number of learner texts, they additionally wrote a short written text justifying their decision.
In 2020: per rater app. 20 texts rating only + 15 texts with justification (mostly DaF + a few FLE, 20 raters), with texts based on two different tasks
In 2022: per rater app. 40 texts rating only + 6 texts with justification (mostly DaF + a few EFL, 22 raters), with texts based on three different tasks
In 2023: per rater
app. 100-130 FLE texts rating only (5 raters), with texts based on 4-6 different tasks
app. 310 EFL texts rating only (3 raters), with texts based on all 8 different tasks
285 DaF texts rating only (1 rater), with texts based on all 8 different tasks
The texts were rated according to four linguistic criteria: vocabulary, grammar, spelling, and text (Table 6) based on CEFR scales ranging from Pre-A1 to B2+ using descriptors from lingualevel (Lenz & Studer, 2008) and the Common European Framework of Reference companion volume (Council of Europe, 2020). Vocabulary focused on the breadth and depth of words used, grammar on grammatical patterns and structures such as conjugation, spelling on orthographic accuracy, and text on cohesion and syntactic patterns.
Many-facet Rasch analysis
The ratings were analysed via a Many-facet Rasch analysis (Eckes, 2015; Linacre, 1994) using the software FACETS (Linacre, 2022). A separate analysis was conducted for each language (English, French, and German as a foreign language) using a 3 facet analysis (texts x raters x 4 linguistic criteria). Before the analysis, the rating scores based on the CEFR from Pre-A1 to B2+ were converted to a scale from 1-8, whereas after the rating, the fair scores were rounded and re-converted to the CEFR levels.
In German, four raters (R09, R18, R20 and R22) were continuously excluded due to unsatisfactory in- and outfit measures. Furthermore, a comparison of rater severity across the languages revealed that the German raters were less strict overall and the French raters more strict at the A1.2 level. Therefore, the fair scores of all German texts were lowered (by -0,05; for example, from 5,19 to 5,13), whereas the fair scores of French texts were in- and decreased at the A1.2 level (by -0,05 between 2,50 and 3,00; for example, from 2,61 to 2,56; and by +0,05 between 3,00 and 3,50; for example, from 3,19 to 3,24).
Publications [Back]
Articles and book chapters
Hicks, N. S. & Studer, T. (2024). Language corpus research meets foreign language education: examples from the multilingual SWIKO corpus. Babylonia Multilingual Journal of Language Education, 2,26-35
Download Arbeitsblätter Negation im DaF-Unterricht
Studer, T. & Hicks, N. S. (i.V.). Zugang zu und Umgang mit fremdsprachlichen Lernertexten unter den Vorzeichen schulisch intendierter Mehrsprachigkeit: Befunde und Herausforderungen am Beispiel des Schweizer Lernerkorpus SWIKO. Book chapter in Schmelter, L. (ed.).
Karges, K., Studer, T., & Hicks, N. S. (2022). Lernersprache, Aufgabe und Modalität: Beobachtungen zu Texten aus dem Schweizer Lernerkorpus SWIKO. Zeitschrift für germanistische Linguistik, 50(1), 104–130.
Karges, K., Studer, T., & Wiedenkeller, E. (2020). Textmerkmale als Indikatoren von Schreibkompetenz. Bulletin suisse de linguistique appliquée, No spécial Printemps 2020, 117–140.
Karges, K., Studer, T., & Wiedenkeller, E. (2019). On the way to a new multilingual learner corpus of foreign language learning in school: Observations about task variation. In A. Abel, A. Glaznieks, V. Lyding, & L. Nicolas (Hrsg.), Widening the Scope of Learner Corpus Research. Selected papers from the fourth Learner Corpus Research Conference (pp. 137–165). Presses universitaires de Louvain.
Talks
Studer, T. & Hicks, N. S. (2024). Korpora konkret: Nutzung eines mehrsprachige Lernerkorpus im schulischen DaF-Unterricht [presentation]. RPFLC, Fribourg.
Hicks, N. S. & Studer, T. (2024). «Da muss einfach mehr Fleisch an den Knochen» - Ein Gespräch über den Nutzen des Lernerkorpus SWIKO für den Fremdsprachenunterricht [Interview]. CEDILE.
Hicks, N. S. (2023). Lexical features in adolescents’ writing: Insights from the trilingual parallel corpus SWIKO [presentation]. Workshop on Profiling second language vocabulary and grammar, University of Gothenburg.
Liste Lamas, E., Runte, M, & Hicks, N. S. (2023). Datenbasiertes Lehren und Lernen mit Korpora im Fremdsprachenunterricht [workshop]. Internationale Delegiertenkonferenz IDK, Winterthur.
Studer, T. & Hicks, N. S. (2022). The interplay of task variables, linguistic measures, and human ratings: Insights from the multilingual learner corpus SWIKO [presentation]. European Second Language Acquisition Conference, Fribourg.
Weiss, Z., Hicks, N. S., Meurers, D., & Studer, T. (2022). Using linguistic complexity to probe into genre differences? Insights from the multilingual SWIKO learner corpus [presentation]. Learner Corpus Research Conference, Padua.
Studer, T., Karges, K., & Wiedenkeller, E. (2019). Machen Tasks den Unterschied? Ein korpuslinguistischer Zugang zur Qualität von Lernertexten in den beiden Fremdsprachen der obligatorischen Schule [presentation]. Studientag VALS-ASLA: Mehrschriftlichkeit im Fremdsprachenerwerb, Brugg.
Karges, K., Wiedenkeller, E., & Studer, T. (2018). Task effects in the assessment of productive skills – a corpus-linguistic approach [poster]. 15th EALTA conference, Bochum.
Teaching and material design [Back]
The following article provides an example of how SWIKO can be used to create teaching material for German as a foreign language classes at lower secondary school levels based on the case of negation in German.
Hicks, N. S. & Studer, T. (2024). Language corpus research meets foreign language education: examples from the multilingual SWIKO corpus. Babylonia Multilingual Journal of Language Education, 2,26-35
Download Arbeitsblätter Negation im DaF-Unterricht
...to be continued...
Following Granger’s (2015) Contrastive Interlanguage Analysis, the SWIKO corpus allows for a wide variety of interesting research avenues considering
Task design features (text type, topic familiarity, structure) and production conditions (languages, modus, modality)
Linguistic properties of the resulting productions (e.g., based on the CAF framework by Bulté et al. 2012: complexity, accuracy, fluency)
CEFR ratings
...to be continued...