Why I (Almost) Never Assess Speaking

So this was asked on a forum recently and, as usual, it got me thinking.

This is a question about “El Internado,” but, really, it applies to anything we do in a language class.  We read/ask a story/do a Movietalk or Picturetalk, etc, and then we want to assess speaking, comprehension, etc.

My response to this question is don’t bother assessing speaking.

But first, a qualifier:  if our Board/school/dept. etc says we absolutely MUST assess speaking, well, then, go for it.  We do what we have to do to keep our job.  But if we don’t have to assess speaking, don’t.  Here is why.

  1. The info we gain from this cannot generally guide instruction, which is the point of any assessment (other than at the very end of the course).  The reason for this is very simple: what will we do if what we learn from assessment varies wildly (which it almost certainly will)? If Samba has problems with the pretérito verb tense, Max doesn’t understand questions with pronouns, and Sky can fluidly ask and answer anything, how are we going to design future instruction around that info?  How are we going to “customise”  reading/stories, etc to give 30 different kids the input they need?  Answer:  we can’t.
  2. This takes forever.  If we have 30 kids in our class, and we can assess them in three minutes each (which is tough) we are spending 90 min alone on speech assessment.  That’s a period and a half!  During this time, we have to design something else for them to do…and good luck having 29 kids– whose teacher is “distracted” by sitting in the corner assessing speech– staying on task for 60 minutes.
  3. We already know how well they speak.  If we are doing regular PQA– personalised questions and answers (basically, asking the class members the same questions we are asking the actors)– we know exactly how well each kid can talk.  So why waste time with a formal assessment?  In my Spanish 1 right now, Ronnie can only do y/n answers to questions, while Emma Watson (aka Kauthr) speaks fluid sentences, and so does Riya, while Sadhna mixes up present and past tense in her output (but understands tense differences in questions) etc.
    Indeed, this is where feedback to the teacher is useful. If—in the PQA moment—I see that Sadhna mixes up past and present in answers, I can guide PQA around that right then and there.
  4. In terms of bang-for-buck, we are going to get way more results from more input than from assessing speech.  We acquire language not by practising talking etc, but by processing input, as Bill VanPatten endlessly reminds us.  I used to do regular “speaking tests” and they did nothing and the info was useless.  Now, I never test speaking until the end of the course, and the kids speak better, mostly because the wasted time now goes into input.
  5. A question that comes up here, regarding assessing speech post-Internado, is, what are we testing the kids on?  Are they expected to remember content— names, events, “facts” etc– from the show?  Or are we assessing speech generally?  In my opinion, “content” should be off-limits: we are building language ability, not recall.In terms of language ability, one of the problems with assessing right after specific content (eg some of El Internado) is that, since this input is generally not very targeted, we don’t have much of a guarantee that the kids are getting enough exposure (in a period or two) to “master” or acquire anything new.  This is to say, while an episode may be 90- or even 100% comprehensible, thanks to the teacher’s guidance etc, it almost does not focus on a specific vocab set.  In a classic T.P.R.S. story, the teacher makes sure to restrict (shelter) vocab used in order to maximise the number of times each word/phrase/etc is used.

    This is whether s/he has a plan, or, as in totally “untargeted” story creation à la Ben Slavic, the kids are totally driving the bus.  As a result, the odds of the kids picking up specific “stuff” from the story—in the short term, which is the focus of the question– are greater (and greater still if the asked story is followed by reading, Movietalk and Picturetalk) than if the input is familiar but untargeted.

  6. What about the kid who missed some of (in this case) El Internado? If the speaking assessment focuses on Internado-specific vocab, it would (in my opinion) be unfair to ask Johnny who was there for all three periods and Maninder, who missed two of three periods, to do the same thing with the “language content” of the episodes.
  Kids hate speaking and tests.  Anything I can do to avoid tests, or putting people on the spot– which a one-on-one test does– I do.
  8. “Authentic content” eg El Internado has lots of low-frequency vocabulary. Sure, the teacher can keep things comprehensible, but there is inevitably kids’ mental bandwidth going into processing low-freq vocab…which is exactly what kids don’t need in a speaking assessment, where you want high-freq vocabulary that is easy to recall and applicable to lots of topics.

Anyway…this is why I save speaking assessment until the end of the course: I know how well my kids can speak, I can adjust aural input where it matters– right now–, I don’t want assessment to detract from input, and speaking assessment doesn’t really help me or my kids.




More Notes on Feedback

Amy Lenord started a great Twitter discussion about how one encourages language learners to process language.  This eventually led to Martina Bex refering us to her excellent “I am a grammar geek” post, in which she talks about how much she loved– and found effective– the “red ink” from her Spanish profs in Uni. Bex and I very briefly discussed this.  (I will bet that when she has a spare moment– and she is a Mom again, congrats!– she’ll discuss this more.  Ha!)

Now, anyone who knows Bex knows that the basic deal with her is that what she wants done, she gets done.  Bex wants babies? Bex has four (at last count).  Bex wants to acquire Spanish?  Bex signs a months-long “no English” agreement with her room-mate!  Bex wants to master C.I.?  Bex does, in like two years of teaching.

So it is not surprising that she acquired a ton of Spanish in very short order in Uni. 

Again: she wanted, liked & felt she benefited from corrective feedback  in her Spanish classes. 

This raises two questions:  did the feedback she got actually help her, and, if so, why and how?

Well, let’s take Martina’s word for it, and say, sure, corrections and comments helped.  Now, how?

Well, suppose young Bex– or anyone else– wrote this on their Spanish 201 composition:

*  Ayer, yo fue al cine con mis amigos, y vimos una película.

This should be “yo fui,” and say her prof writes that on her paper.  Now, what happens next?

  1. Bex notes there is an error.
  2. Bex re-reds the sentence: yo fui al cine.

Most of our students will not even do #1.  Most will go straight to the mark, wondering  what did I get?  did I get an A?

Some will note, ok, there was an error.

A very few will re-read the corrected sentence, and maybe linger on it, in which case it is functioning as good comprehensible input (albeit not many repetitions).

So, why is the feedback working for Bex?  In my view, it is because

a. Bex is majorly motivated which means,

b. Bex wants feedback, and when she gets it,

c. the feedback provides comprehensible input.

Suppose the prof had written “ser takes an -i in the first-person singular.”  Would this have done Bex any good?  The research says no.  Maybe for Bex it did.  Maybe she went, hmm, yo fui al cine…

I was also recently talking to Adriana Ramírez and Luce Arsenault about giving corrections in their Sp and Fr classes.  Both maintained that their kids got better as a reuslt of having to do corrections.  They havn’t obviously had time to do a controlled study, but we noted a few things:

  1.  Both have very motivated, mostly Asian and wealthy white kids, who have been hearing from their literate parents from Day 1 of school, memorise (for many Asian kids, who have had to learn zillions of Chinese characters before coming to and sometimes while in Canada), and edit (for wealthy white kids, whose parents are uber-literate, professional, etc).
  2. My kids– who are generally Hindi, Punjabi and Urdu-speaking, and have less-literate and generally non-English speaking parents, almost none of whom have any formal experience learning additional languages– have not been primed to memorise and relentlessly improve their work.  This is not to say that our parents do not value education– they do, very much– but it is to say that they have not “acquired” some of the academic habits that can sometimes for kids in language classes.

There is a simple lesson here:  unless people want feedback, and get it, and the feedback is comprehensible input, it is not going to do any good.

So the teacher should focus not on marking and correcting, but on relaxing and reading and being happy in their spare time, so when they show up in class, they have the energy and mood to provide good C.I.– in story asking or reading or MovieTalk form– for kids.  And kids should not be forced to correct work (although if they want to, why not?).  Rather, their work should be hearing C.I. in class, and– if they must have homework– reading or viewing comprehensible and interesting target-language stuff.




Second Language Acquisition Quotes

I’ve been asked a bunch of times for these so here we go: brief quotations about what we know about second language acquisition research.  Many of these, as usual, were compiled by research rounder-upper God Eric Herman, with contributions from Terry Waltz, Stephen Krashen, Beniko Mason, Diane Neubauer, and many others.

These are broadly representative of consensus among S.L.A. researchers.  To see actual research, read this.

Missing something?  Missing or incorrect attribution?  Have something to add?  Put it into the comments or email me.

Organisation of quotes:

1. Acquisition
2. Grammar
3. Compelling Input
4. Attitude
5. Output and Correction
6. Classroom Research
7. Foreign Language Benefits
8. Curriculum
9. Time
10. Reading



“Comprehensible input remains the foundation of all language acquisition.”  — Lightbrown and Spada, 2014

“Language acquisition is a subconscious process; while it is happening we are not aware that it is happening, and the competence developed this way is stored in the brain subconsciously.” – Krashen

“We acquire language when we understand messages, when we understand what people tell us and when we understand what we read.” – Krashen

All cases of successful first and second language acquisition are characterized by the availability of Comprehensible Input. – Larsen-Freeman & Long, 1991, p. 142

“(T)here is a consensus among second language researchers that input is an essential component of second language acquisition.” – VanPatten, 1996, p. 13

“Language is acoustical, not intellectual.” – Berty Segal

“In underdeveloped 
third world countries,
where bilingualism or
 even multilingualism 
is the norm rather than
 the exception, a second
 (or third) language is 
ACQUIRED without any 
reference to conscious 
learning or to written
 material.” – Ellidokuzoglu, IJFLT 2008

“[N]ot only does instruction not alter the order of acquisition, neither does practice”– VanPatten, 2013

“SLA history is not 2,000 years old but almost as old as human history and that throughout this long period, people have acquired rather than learned L2s, considering the rather short history of linguistic sciences.”
– Ellidokuzoglu, IJFLT 2008

“[T]he idea that what you teach is what they learn, and when you teach it is when they learn it, is not just simplistic, it is wrong.” — Long, 1997.

“Even after puberty, the brain is elastic enough to internalize a second (or third) language basically in the same manner it picks up the first. However, since muscles regulating the articulators are somewhat fixed after a certain age, attaining a native-like accent may not be possible for some adults.” – Ellidokuzoglu, IJFLT 2008

“Learners […] have demonstrated that acquisition of the tense and aspectual systems (e.g. the use of the preterit/passé composé and the imperfect) is piecemeal and unaffected by instructional intervention.” –VanPatten & Wong, 2003

“The amount of input necessary for L1 acquisition
to take place is expressed in thousands of hours of auditory input. We shouldn’t blame our students for not being able to speak when we provide them with so little comprehensible input.” – Ellidokuzoglu, IJFLT 2008

“If someone cannot properly perform a rule that he consciously knows, his performance must be based on a non-conscious knowledge system.” – Ellidokuzoglu, IJFLT 2008

“Real language acquisition develops slowly, and speaking skills emerge significantly later than listening skills, even when conditions are perfect. The best methods are therefore those that supply “comprehensible input” in low anxiety situations, containing messages that students really want to hear. These methods do not force early production in the second language, but allow students to produce when they are “ready,” recognizing that improvement comes from supplying communicative and comprehensible input, and not from forcing and correcting production.” – Krashen, 1982

“Most important, the input hypothesis predicts that the classroom may be an excellent place for second language acquisition, at least up to the “intermediate” level. For beginners, the classroom can be much better than the outside world, since the outside usually provides the beginner with very little comprehensible input, especially for older acquirers (Wagner-Gough and Hatch, 1975). In the classroom, we can provide an hour a day of comprehensible input, which is probably much better than the outside can do for the beginner.”
– Krashen, 1982

“There is no need for deliberate memorization; rather, firm knowledge of grammatical rules (a feel for correctness) and a large vocabulary gradually emerge as language acquirers get more “comprehensible input,” aural or written language that is understood.” – Krashen

“Our goal in foreign language pedagogy is to bring students to the point where they are autonomous acquirers, prepared to continue to improve on their own. . . an “autonomous acquirer” has two characteristics:

● The autonomous acquirer has acquired enough of the second language so that at least some authentic input is comprehensible, enough to ensure progress and the ability to acquire still more language.

● The autonomous acquirer will understand the language acquisition process. The autonomous acquirer will know that progress comes from comprehensible input, not from grammar study and vocabulary lists, and will understand ways of making input more comprehensible (e.g. getting background information, avoiding obviously incomprehensible input).

This is, of course, the goal of all education – not to produce masters but to allow people to begin work in their profession and to continue to grow.” – Krashen, 2004

“In the end, acquisition is too complex to reduce to simple ideas. There are no shortcuts.” — Bill VanPatten



“[T]he brain processes syntactic information implicitly, in the absence of awareness.” (Batterink & Neville, 2013).

“We learn grammar from language, not language from grammar.”– Kato Lamb (from Polyglot: How I Learn Languages P.73 (4th ed.). She attributes the line to the 19th-century publishers Charles Toussaint and Gustav Langenscheidt (the same), whom she paraphrases as having said “Man lernt Grammatik aus der Sprache, nicht Sprache aus der Grammatik.” (thanks Justin Slocum Bailey)

“Research shows that knowledge of grammar rules is very fragile and is rapidly forgotten.” – Krashen, 1993

“studies have shown a weakening of the impact of learning after three months.” – Krashen, 2002

“Instruction does not appear to influence the order of development. No matter what order grammatical structures are presented and practiced in the classroom, learners will follow their own “built-in” syllabus.” – Ellis, 1984

“As is well-known, studies have shown that we acquire the grammar of a language in a predictable order, and this order cannot be broken.” – Krashen

“it is not at all the case that the more linguistically simple an item is, the earlier it is acquired. Some very “simple” rules may be among the last to be acquired.” – Krashen, 1982

“Teaching complex facts about the second language is not language teaching, but rather is “language appreciation” or linguistics.” – Krashen, 1982

“Consciously learned grammar is only available as a Monitor or an editor, and the constraints on Monitor use are severe: The user has to know the rule (see the complexity argument below), have time to apply the rule, and be thinking about correctness.” – Krashen

“No study has shown that consciously learned rules have an impact on Monitor-free tests over the long term.” – Krashen

“Research on the relationship between formal grammar instruction and performance on measures of writing ability is very consistent: There is no relationship between grammar study and writing.” – Krashen, 1984

“No empirical studies have provided good evidence that form-focused instruction helps learners acquire genuine knowledge of language. Moreover, many studies have found such instruction ineffective.” – John Truscott

“Second language editing actually depends far more on intuitions of well-formedness, coming from the unconscious language system, than on metalinguistic knowledge of points of grammar.” – John Truscott, 1996

“We see performers who have known a (late-acquired) rule for years, but who still fail to consistently “get it right” even after thousand of repetitions . . . On the other hand, we often see performers who have acquired large amounts of a second language with no apparent conscious learning.” – Krashen, 1981

“People who do attempt to think about and utilize conscious rules during conversation run two risks. First, they tend to take too much time when it is their turn to speak, and have a hesitant style that is often difficult to listen to. Other overusers of the Monitor, in trying to avoid this, plan their next utterance while their conversational partner is talking. Their output may be accurate, but they all too often do not pay enough attention to what the other person is saying!” – Krashen, 1982

“No meaningful support has [ever] been provided for the position that grammar should be taught.”– Long (1997)

“Structured input works as well as structured input plus explanation”– Lightbrown



“Optimal input focuses the acquirer on the message and not on form. To go a step further, the best input is so interesting and relevant that the acquirer may even ‘forget’ that the message is encoded in a foreign language.”
– Krashen, 1982

“Compelling input appears to eliminate the need for motivation, a conscious desire to improve. When you get compelling input, you acquire whether you are interested in improving or not.” – Krashen

“It is possible that compelling input is not just optimal: It may be the only way we truly acquire language.” –Krashen


“Savignon (1976) is correct when she says ‘Attitude is the single most important factor in second language learning.’ We might even suggest that one characteristic of the ideal second language class is one in which aptitude will not predict differences in student achievement (S. Sapon, personal communication), because efficient acquisition is taking place for all students.” – Krashen, 1981

“Thus, motivational and attitudinal considerations are prior to linguistic considerations. If the affective filter is ‘up’, no matter how beautifully the input is sequenced, no matter how meaningful and communicative the exercise is intended to be, little or no acquisition will take place.” – Krashen, 1981

“Those whose attitudes are not optimal for second language acquisition will not only tend to seek less input, but they will also have a high or strong Affective Filter–even if they understand the message, the input will not reach the part of the brain responsible for language acquisition, or the language acquisition device.” – Krashen, 1982

“Studies have shown that several affective variables are related to success in language acquisition – anxiety (low anxiety is correlated with more success in language acquisition), self-esteem (more self-esteem is related to success in language acquisition), and motivation, with ‘integrative motivation,’ (a desire to belong to a certain group) related to long-term success in language acquisition (until membership is achieved), and ‘instrumental motivation’ (to accomplish a task) related to shorter term success (until the task is done).” – Krashen

“When asked what aspects of foreign language classes are the most anxiety- provoking, students put “talking” at the top of the list (Young, 1990).” – Krashen

“Finally, many classroom exercises, with their emphasis on correctness, often place the student ‘on the defensive’ (Stevick, 1976), entailing a heightened ‘affective filter’ (Dulay and Burt, 1977), which makes them less than ideal for language acquisition.” – Krashen, 1981

“Learning is most successful when it involves only a limited amount of stress, when students are relaxed and confident and enjoying their learning; but the use of correction encourages exactly the opposite condition.” – John Truscott

“the ‘elusive quality
- strong motivation’ (Allen, J.P.B.,1973), combined with the right attitude towards the target language and its culture (Gardner,1972), sustained by appropriate intellectual and physical efforts taken by the learners themselves (Kaplan,1997) . . . can lead to successful acquisition of English as a foreign language.”
– D. Sankary

“Simply hearing a second language with understanding appears to be necessary but is not sufficient for acquisition to take place. The acquirer must not only understand the input but must also, in a sense, be ‘open’ to it.”– Krashen, 1981



Research conducted since the early 1990s has shown that traditional approaches to teaching grammar that involve the use of mechanical, meaningful and communicative drills do not foster acquisition in the way that practice [listening/reading] with structured input does.” — VanPatten (2013)

“Peer-to-peer communication is the McDonalds of language teaching.” — Terry Waltz

“Students who learn language explicitly or through “skill building” are virtually unable to naturally produce language and rely on memorized rehearsed phrases in order to produce output. -Dr. Stephen Krashen


“More speaking or writing does not result in more language or literacy development, but more reading does”– Krashen

“[N]ot only does instruction not alter the order of acquisition, neither does practice”– VanPatten, 2013

“Speaking has been found to be the most anxiety-provoking form of communication. (Maclntyre & Gardner, 1991; McCroskey & Richmond, 1987)” from Baker & MacIntyre (2000)

VanPatten (2013): “If input is so important, what does traditional practice do? […] essentially very little, if anything.  It does not help mental representation.  It is not clear it helps skills.

“Adding output and correction, in fact, has been shown to make progress less efficient, not more.” – Krashen

“More output does not result in more language acquisition. For example, students in classes that demand more writing do not acquire more of the language, and students of English as a foreign language who report more speaking outside of class do not do better on the TOEFL examination; those who read more outside of class, however, do better.” – Krashen

“Children are usually allowed to go through a ‘silent period’, during which they build up acquired competence through active listening. Several scholars have suggested that providing such a silent period for all performers in second language acquisition would be beneficial (see for example, Postovsky, 1977).” – Krashen, 1981

“Thus, feedback on errors was not only unhelpful, but also harmful to learners. Those who received comments on content plus correction were significantly inferior to those who received only comments on content.” – Truscott

“Correction was not only unhelpful in these studies but also actually hindered the learning process.” – Truscott

“Oral grammar correction is a bad idea.” – Truscott, IJFLT 2005

6. CLASSROOM RESEARCH (TPRS, TPR and other C.I. methods)

“The most consistent advantages for TPRS are in developing students’ speaking, writing, vocabulary, and grammar. In all these areas, TPRS has consistently outperformed traditional teaching, and has at least equaled traditional teaching in every study.” – Karen Lichtman & Stephen Krashen

“TPRS should have advantages in retention over time, in comparison to traditional teaching. Compare TPRS students and traditional students on the same measure right before their summer break and right after their summer break.” – Karen Lichtman & Stephen Krashen

“TPR classes had only 20 hours of instruction while controls had 200 hours of instruction . . . All TPR classes, with the exception of grade five, outperformed controls after 100 hours, and the adult class, after only 20 hours, outperformed controls after 200 hours. Similar results were obtained using a reading test.” – Krashen, 1982

“Her experimental group did not speak at all for the first 14 weeks but, instead, had to produce “active responses” that demonstrated comprehension. Also, they were not forced to speak for much of the next seven weeks. The experimental group was shown to be superior to the control group in listening comprehension and equal in speaking, despite the fact that the controls had more ‘practice’ in speaking.” –Krashen, 1982

“In both first and second language development, students who participate in classes that include in-school self-selected reading programs (known as sustained silent reading) typically outperform comparison students, especially when the duration of treatment is longer than an academic year.” – Krashen

“Extremely problematic for output hypotheses was the result that the amount of ‘extracurricular writing’ and ‘extracurricular speaking’ reported were negatively related to TOEFL performance.” – Krashen

“ . . . studies consistently find that older children acquire second languages faster than younger children . . . Older children, it has been argued, have an advantage because of their greater knowledge of the world, which makes input more comprehensible, as well as more advanced levels of literacy, which transfer to the second languages.”
– Witton-Davies



“Children who are considered ‘low achievers, and/or who have a disability,’ seem to benefit the most from foreign language study.” – Wang, Jackson, Mana, Liau, & Evans, 2010

“ . . . increasingly impressive bodies of research that document . . . the great number of cognitive, social, academic, problem-solving and practical benefits that have been observed in children who learn one or more languages in addition to their home language.” – Wang, Jackson, Mana, Liau, & Evans, 2010

“Research Findings: Second Language study:
– benefits academic progress in other subjects
– narrows achievement gaps
– benefits basic skills development
– benefits higher order, abstract and creative thinking
– (early) enriches and enhances cognitive development
– enhances a student’s sense of achievement
– helps students score higher on standardized tests
– promotes cultural awareness and competency
– improves chances of college acceptance, achievement and attainment
– enhances career opportunities
– benefits understanding and security in community and society” – NEA Research. (2007). “The Benefits of Second Language Study.”



“Given that verbs typically account for 20 percent of all words in a language, this may be a good strategy. Also, a focus on function words may be equally rewarding – 60 percent of speech in English is composed of a mere 50 function words.” – Davies

“Why should one do this? Nation (1990) has shown that the 4,000–5,000 most frequent words account for up to 95 percent of a written text and the 1,000 most frequent words account for 85 percent of speech.” – Davies

“We teach language best when we use it for what it was designed for: communication.” – Krashen, 1981

Below are the most-frequently used words per theme and also the extremely low-frequency words typically taught in that theme. The numbers in parentheses are the rank frequencies as calculated in Davies’ A Frequency Dictionary of Spanish (2006). Words are translated to English.

Colors (250) white (8225) orange
Animals (780) horse (4945) elephant
Body (150) hand (2407) ear
Food (787) meat (7602) carrot
Clothing (1710) suit (4427) t-shirt
Family (166) son (5071) niece
Days (1121) Sunday (3490) Tuesday
Months (1244) August (2574) September
Sports (2513) soccer (28388) hockey
Weather (989) heat (5493) breeze

There are more than 300 more frequent words than the numbers 6 through 10, and the numbers 13 through 19 are not in the most frequently used 1,000 Spanish words. In fact, only the numbers one and two are in the most-frequently used 100 words.



“Our research shows that after 630 to 720 hours of instruction, or about midway through the fourth year of study, approximately 14% of students can read at the Intermediate-Mid level or better. Approximately 16% can write and 6% can speak at this level.” – Center for Applied Second Language Studies, 2010

The ACTFL Performance Guidelines for K–12 Learners (Swender & Duncan, 1998) propose elementary programs that meet from 3 to 5 days per week for no less than 30–40 minutes per class; middle school programs that meet daily for no less than 40–50 minutes.



“Without a reading habit children simply do not have a chance.” – Krashen

“The best way to improve your knowledge of a foreign language is to go and live among its speakers. The next best way is read extensively in it.” – Christine Nuttal, 1996

“For maximum vocabulary development, learners need to read all along the way, since most vocabulary development in both L1 and L2 is incidental, meaning that vocabulary is learned as a by-product of some other intention (normally reading).”– VanPatten

“People acquiring a second language have the best chance for success through reading.” – Krashen

“The best way to improve in a foreign language is to do a great deal of comprehensible, interesting reading. The case for self-selected reading for pleasure is overwhelming.” – Mason

“What is probably the best-supported way of improving language competence is rarely mentioned in the professional literature: wide recreational reading, or ‘free voluntary reading.’ ” – Witton-Davies

“Those who read more, write better” – Krashen

“Free voluntary reading may be the most powerful tool we have in language education. In fact, it appears to be too good to be true. It is an effective way of increasing literacy and language development, with a strong impact on reading comprehension, vocabulary, grammar, and writing.” – Krashen

“Incidental learning of words during reading may be the easiest and single most powerful means of promoting large-scale vocabulary growth.” –Nagy & Herdman

“The second language student needs massive amounts of comprehensible, interesting reading material, enough so that he can read for pleasure and/or interest for an hour an evening, if he wants to, for several months.” –Krashen

“Picking up word meanings by reading is 10 times faster than intensive vocabulary instruction.” – Krashen

“Free reading is also an excellent source of knowledge: those who read more, know more.” – Krashen

“There is overwhelming evidence for recreational reading as a means of increasing second-language competence. In fact, it is now perhaps the most thoroughly investigated and best-supported technique we have in the field of second-language pedagogy.” – Krashen

“Many studies confirm that those who read more write better . . . it is reading, not instruction, that helps us develop a good writing style.” – Krashen, IJFLT 2005

“The success of pleasure reading thus depends on the reader’s willingness to find material at his level and reject material that is beyond him.” – Krashen, 1982

“Hirsch and Nation (1992) claim that in order to reach text comprehension, readers and listeners need to be familiar with 85% of the words in a text.” – Thornber

“the source of good writing style, the vocabulary, syntax and discourse structure of the written language, is reading.” – Lee & Hsu

“Students who had a pleasure reading habit easily outperformed those who were not readers on a test of grammar and on a test of reading and writing.” – Ponniah, IJFLT 2008

Bad science meets questionable usefulness: Lyster (2004a) on prompting feedback

McGill University professor Roy Lyster gave the British Columbia Language Coordinators’ Association annual conference talk in 2015 about best practices in the French Immersion classroom. He specifically mentioned that form-focused instruction and feedback were essential for acquisition of second languages.  Well, THAT got me wondering so I went and did what a sane guy does of a fine Sunday: I went climbing and then I read his paper.

Lyster has done a very good job in terms of his research, controls, etc etc.  Unlike Orlut and Bowles (2008), Lyster did very good science.  But, as we shall see, there are a lot of problems with his conclusions.  Let’s have a look.

To sum it up, Lyster — following Ellis, DeKeyser et al– argues that there needs to be some “focus on form”– explanations about language (as well as activities that make learners process that language)– in a language classroom in addition to meaningful language itself, because without some “focus on form,” acquisition of some items fossilises or goes wrong.

Lyster noted that English-speaking kids in French immersion were not picking up French noun gender very well.  There are a bunch of reasons for this.  Noun gender is of almost zero communicative significance and so acquirers’ brains pay it little attention, and Immersion students are typically exposed to native-speaker generated/targeted materials which do not foreground grammatical features.  Noun gender acquisition is a classic study question because French has it and English does not. Lyster’s question was, “can form focused instruction (FFI) centered on noun gender improve noun gender acquisition?”  FFI involved a bunch of instruction about noun gender (how to figure out what it is basically based on noun endings, which are in French fairly regular), plus various practice decoding activities.  Lyster set up four groups:

  1. a control group which got regular content teaching.
  2. another group that got (1) plus “focus on forms” (FFI; explanations) only
  3. a second group got (1) plus FFI plus recasts (errors being “properly resaid” by teacher)
  4. a third group got (1) plus FFI (explanations) plus prompts (e.g. the teacher asking un maison ou une maison? after hearing students make noun gender errors); these prompts were designed to get students to reflect on and then output the targeted form

The reasoning for prompts is to “force” the learner to bring “less used” (and improperly or not-yet acquired) stuff into the mental processing loop.  Note that this is a technique for advanced learners– those who have a ton of language skill already built up– and would, as Bill VanPatten has noted, overload any kind of beginner learner.

The results, basically, were that the FFI + prompt group did way better than the others on both immediate and 2-month delayed post-test.  Postests included both choosing the proper form, and producing the proper form.

So, prima facie, Lyster can make the following argument:

“The present study thus contributes to theoretical arguments underpinning FFI by demonstrating its effectiveness when implemented in the context of subject-matter instruction within an iterative process comprising three inter-related pedagogical components:

  1. Learners are led to notice frequent co-occurrences of appropriate gender attribution with selected noun endings, contrived to appear salient by means of typographical enhancement
  2. Learners’ metalinguistic awareness of orthographic and phonological rules governing gender attribution is activated through inductive rule-discovery tasks and metalinguistic explanation
  3. Learners engage in complementary processes of analysis and synthesis (Klein, 1986; Skehan, 1998) through opportunities for practice in associating gender attribution with noun endings.”

Lyster claims that his results contribute to the “theoretical arguments underpinning FFI.”  He is right.  And here is the crux:  the problem with work like this is simple: while he can make theoretical puppets dance on experimental strings, what Lyster does in this paper will never work in a classroom.  Here are the problems:

First. the bandwidth problem, which is that for every acquisitional problem a teacher focuses on “solving,” another problem will receive less attention, because the amount of time/energy we have is limited, and so tradeoffs have to be made.  In this case, Lyster decided that a worthy problem was noun gender acquisition.  So, materials were made for that, time was spent practising that, and teachers focused recasts or prompts on that.  The students got 8-10 hours of FFI.

The question: what did they “de-emphasise” in order to focus on noun gender?  But Lyster does not address this.  Was Lyster’s testing instrument designed to catch changes in other errors that students made?  No– they looked specifically at noun gender. It is possible, indeed, it is almost certain, that the FFI resulted in other grammar or vocab content being downplayed.  Lyster’s testing instrument, in other words, was not holistic: he looked only at one specific aspect of language.

An analogy may be useful here.  A triathlete needs to excel in three sports– swimming, cycling and running– to win.  She may work on the bike until she is a drug-free version of Lance Armstrong. But if she ignores– or undertrains– the swimsuit and the runners, she’ll never podium.  An economist would say there is an opportunity cost: if you invest your money in stocks, you cannot buy the Ferrari, and vice versa.

Second is what Krashen called the constraint on interest problem.  By focusing instruction (or vocab) around a grammar device, we have much less room as teacher to deliver either an interesting variety of traditional “present, practice, produce” lessons or T.P.R.S. or A.I.M.-style stories.   Imagine deciding that since the kids have not acquired the French être avec le passé composé, you must build every activity  around that.  How quickly will the kids get bored?  Je suis allé aux toilettes.  Est-ce que tu est allé à l’ecole? etc. In T.P.R.S. (and in A.I.M.), stuff like this is in every story, but as background, because it’s boring.   It’s like saying, “paint but you only have red and blue.”

Third is the rule choice problem.  Since, as noted above, we can’t deal with every not-yet-acquired rule, we have to choose some items and rules over others. Which will they be? How will we decide?  What if teachers came up with a list of a hundred common errors that 6th grade French immersion kids made.  Which errors should they focus on?  How should materials be built– and paid for– to deal with these?  What if Profeseur Stolz couldn’t give a rat’s ass about French noun gender, but Profeseur Lyster foams at the mouth on hearing “une garçon”?

Fourth, Lyster’s study does not take into account individual learning needs.  OK, all of the subjects in the 4th group got better with noun genders (temporarily, and with prompting) .  But was this the most pressing issue for each person?  What if Max hasn’t acquired the passé composé?  What if Samba is OK with noun gender but terrible with pronouns?  When you use a grammar hammer, everything looks like the same nail.  Noun gender is not very important.  It’s like stripping a car: no brakes and the whole thing crashes; but no hood ornament only looks bad.  Noun gender is the hood ornament of French: looks good but hardly essential.

The problem with a study like Lyster’s– or a legacy-methods program that tries to systematically do what Lyster did– is that it reduces the multidimensionality of both the classroom language and activities and the teacher’s feedback, with the effect of impoverishing input.  If Max needs passé composé and Samba pronom input, and the experiment focuses activities, learning strategy instruction and teacher feedback on noun gender, the experiment’s focus inevitably cuts down on input they need as it plays up noun gender stuff.  As Susan Gross has argued, a comprehensible input classroom is going to solve that problem: by presenting “unsheltered” language– language with no verb tenses, pronouns or other grammatical features edited out– everything learners need is always in the mix.

Fifth, and most seriously, Lyster’s results do not– could not– pass Krashen’s “litmus test” for whether instructional interventions produce legitimate acquisition.  Krashen has said that if you really want to prove that your experimental treatment trying to get language learners to acquire __________ has worked, your results must meet the following criteria:

  • they must be statistically significant not just right after treatment, but three months later
  • they must occur unprompted (what Krashen calls not involving the Monitor)

The three-month delayed post-test is there to show that the intervention was “sticky.”   If it’s been acquired, it will be around for a long time; if it’s consciously learned, it will slowly disappear.  You can check the reasonableness of this by looking at your own experiences– or those of your students– and asking how well does language teaching stick in my or my kids’ heads? (Teachers who use T.P.R.S. know how sticky the results are: we do not need to review.  Legacy-methods teachers have to do review units.)  So what are Lyster’s study’s two most serious problems?

First, Lyster did a two month delayed post-test, so we don’t really know how “sticky” the FFI results were.

Second, Lyster’s assessment of results is largely Monitor-dependent. That is, he tested the students’ acquisition of noun gender when they had time to think about it, and under conditions where the experimenters (or test questions) often explicitly asked whether or not the noun in question was masculine or feminine. Given that the experimental kids had had explicit treatment, explanations etc about what they were learning– noun gender– it is not surprising that they were able to summon conscious knowledge to answer questions when it came assessment time.

At one point in his study, Lyster’s investigators found out that the students being tested had figured out what the investigators were after– noun genders– and had developed a word that sounded like a mix of “un” and “une” specifically to try to “get it right” on the tests. This is not acquisition, but rather conscious learning. 

Indeed, Lyster notes that “it might be argued therefore that […] prompting affects online oral production skills only minimally, serving instead to increase students’ metaliguistic awareness and their ability to draw upon declarative, rule-based representations on tasks where they have sufficient time to monitor their performance ” (425).

Now, why does this matter? Why do Krashen and VanPatten insist that tests of true acquisition be Monitor-free? Simple: because any real-world language use happens in real time, without time to think and self-Monitor.  What VanPatten calls “mental representation of language”– an instinctive, unthinking and proper grasp of the language– kicks in without the speaker being aware.  Real acquisition– knowing a language– as opposed to learning, a.k.a. knowing about a language (being able to consciously manipulate vocab and grammar on tests, and for various kinds of performance)– is what we want students to have.

The marvellous Terry Waltz has called kids who are full of grammar rules, menmonics, games, vocab lists etc “sloshers”: all that stuff has been “put in there” by well-meaning teachers, and the kids have probably “practiced” it through games, role-plays or communicative pair activities, but it hasn’t been presented in meaning-focused, memorable chunks– stories– so it sloshes around.

We also want to avoid teaching with rules, lists, etc, because– as Krashen and Vanpatten note– there is only so much room in the conscious mind to “hold and focus on” rules, and because the brain cannot  build mental representation– wired-in competence– of language without oceans of input.  If we teach with rules and prompts, and when we assess we examine rules and prompts, we are teaching conscious (read: limited) mind stuff.  We’re teaching to the grammar test.

So…to sum up Lyster’s experiment, he

  • took a bunch of time away from meaningful (and linguistically multidimensional) activities & input, and, in so doing,
  • focused on a low-importance grammar rule, and his results
  • do not show that the learners still had it three months post-treatment,
  • do not show that learners could recognise or produce the form without conscious reminders, and
  • did not measure the opportunity cost of the intervention (the question of what the students lost out on while working on noun gender)

Does this matter?  YES.  Lyster, to the best of my knowledge, is giving bad advice when he recommends “focus on form” interventions.  If you teach Immersion (or just regular language class), doing grammar practice and noticing-style activities is probably a waste of time.   Or, to put it another way, we know that input does a ton of good work, but Lyster has not shown that conscious grammar interventions build cost-free, wired-in, long-term unprompted skill.

My questions to Lyster are these:  on what functionally useful evidence do you base your claim that focus on form is essential for SLA, and how would you suggest dealing with rule choice, bandwidth, opportunity cost and individualisation problems, etc?

Are explicit grammar instruction and feedback effective and worthwhile? A look at bad research & wrong conclusions.

I have been discussing research on grammar teaching and feedback for awhile on Twitter with Steve S. and others.  I maintain that there is essentially no value– in terms of acquisitional gains for students– in explicitly teaching grammar or providing corrective feedback.  Steve sent me a paper– Bowles and Montrul (2008)— which seems to suggest the opposite.  This is a classic problem for languages teachers:  somebody does (very bad) research about Grammar Intervention Technique X, “finds” that it “works,” and then textbook publishers and grammarians use this to torture their poor students.  SO…

Today’s question:  is grammar instruction and feedback both effective and worthwhile?

Bowles and Montrul took English speakers learning Spanish, and wanted to see whether appropriate forms of the personal a in Spanish could best be acquired (for recognition) via regular exposure to Spanish, or via exposure to explicit instruction (“this is the personal a, and ____ is how/where you use it”) plus reading sentences containing (and some not containing) the personal a, some of which were grammatical and other which weren’t, plus feedback: if they screwed up, they were told so, and they got an explanation, and they could do the exercise again as often as they wanted.  They were also told to try to get a score of 90% correct.

When the treatment finished, they were tested, and statistical analyses confirm that, yes, the people who got instructional treatment– instruction, sample sentences, and feedback– did better than the others (and by “did better,” we mean “were able to recognise proper/improper uses of the personal a”).

So, Steve S. appears to be right.  Grammar instruction and feedback are prima facie effective.  BUT…but…but… there are so many problems with this study that, frankly, we might as well throw it out.  Here we go:  Stolzie versus the Professors.

First, Bowles and Montrul made several mistakes with their control group.

1.  Their study compared a treatment group with a non-treatment group, with insufficient differentiation of treatment variables.  This raises the question of cause: whether the treatment group’s gains came from instruction and feedback, or from simple exposure to Spanish.  If the treatment group got exposure to comprehensible language containing the instructional target (the personal a), and instruction and feedback, we do not know whether it was simple exposure to the target, or instruction and feedback about the target that made changes in understanding.

To address a concern like this, study design would have to expose a control group to lots of language containing the target, and the treatment group to that same language, as well as instruction plus feedback, so that the only difference between the groups would be the instruction and feedback.  This would allow us to tell what made the difference.

2.  Their study also failed to account for quantity of language exposed to.  They note that both groups got regular course instruction, but only the treatment group got the treatment (outside of class time).  So…if the treatment group got more Spanish than the controls, how do we know that the outcomes were a result of treatment?  Perhaps the treatment group’s gains came about from just simply getting more Spanish.  This is a confound: a potential and untested alternative explanation.

To address this concern, both groups should have received the same amount of exposure to Spanish– ideally only in class.

Second, Bowles and Montrul severely limited themselves with their treatment.  If you want to determine  the best way to improve language acquisition (even of a simple item), you cannot just take one intervention and compare it to a control, and from that make a general statement such as “grammar interventions work.”.  Their experiment does not look at other possibilities.  How about just simple comprehensible input containing the target in class?  Or, how about VanPatten’s processing instruction?  How about free voluntary reading in Spanish?

Lourdes and Ortega (2000) in their massive study of effectiveness of instructional intervention (that’s jargon for “does teaching people languages actually help them acquire languages?”) noted that basically any exposure to the target language– if it is meaningful– will produce some acquisition.  The question is not “does _____ work?”, but “how well— compared to other approaches– does _____ work?”  A grammarian who likes his worksheets and a “communicative” teacher who loves having her first-years do “dialogues” will both say “but they are learning!” and they are right.  The question, however, is how MUCH are they learning compared to other methods?

From the teacher’s point of view– outside of the control-group flaws noted above– this study does not provide us with anything useful.  All it (in my view wrongly) claims is that some “focus on form” (allegedly) worked better than whatever else the students were doing.  But since we have a lot of instructional options, research that doesn’t compare them is useless.

A better design would have looked at different ways of helping people acquire the personal a (other than just having it present in input, as it was for the control group) and compared their effectiveness.

Third, there was no examination of durability of intervention.  OK, a week after intervention, tests found the intervention group picked up the personal a.  How about a year later– did they still have it?  If there is no look at durability of intervention, why bother?  If I have to decide what to do with my students, and I have zero guarantee that Intervention ____ will last, why do it– especially if, as we will see, it’s boring. Krashen proposed a three-months-delayed post-test as one criterion of validity.  This study does not deliver on that.

Fourth, any classroom teacher can see the massive holes in this kind of thing right off the bat.

(A) it’s boring.  Would YOU want to read and listen to two-dimensional writing for days?  Juan vio a Juana.  Juana le dio un regalo a su mamá.  I cannot imagine any set of students paying attention to this.  If you wanted to diversify instruction– i.e. not present just tedious lists of sentences and grammar info– you would also be severely restricted in what you can actually do in the classroom, as you have to build everything around rule ______.  

(B) the “number of rules” problem rears its head.  Bowles and Montrul targeted the personal a because we don’t have that in English.  Spanish also has a ton of other grammar we don’t have in English.  Off the top of my head, umm,

  • subject position in questions
  • differences in use of past tenses with auxiliary verbs
  • major differences in uses of reflexive verbs…e.g. why does a Spanish speaker say comí una pizza, but me comí tres pizzas?

Any Spanish teacher could go on and come up with zillions more “non-Englishy” rules that need to be learned.  If a teacher wants to design teaching around rule-focused input and feedback, the problem is that they will never be able to address all the rules, because the number of rules is not only functionally infinite, but nobody knows them all.

Fifth, the opportunity cost of grammar reinforcement etc is both high and unaddressed in this study.  Basically, what we have is a bandwidth problem.  We have X amount of time per day/course/year to teach Spanish (or whatever).  Any focus on Rule A means– by definition– we will have less time to devote to Rule B.  Even the doddering grammarian with his verb charts and grammar notes can see the problem– oh no!  If we spend too much time on the personal a, I won’t be able to benefit the kids with my mesmerising object pronoun worksheets!— but it’s worse than that.

In terms of input, focus on a grammar rule/item/etc means losing out on two crucial things:

1. Language that is multidimensional in terms of content.  As noted, if the personal a is your target, you are seriously restricted in what you can say, write, etc (it’s boring) but, beyond being boring, students are losing out on whatever could be said without using the personal a.

2.  Language that is grammatically multidimensional.  If I must teach focused on the personal a, the other “rules” will be less present in the input, and so we’re starving Peter to feed Paul.

My guess is that– even if you did this study without all the flaws I note above and got positive results– you would find a cost elsewhere, as the quantity and variety of language students would be exposed to would have dropped and been simplified.  So they might master the personal a, but they acquire less of grammar rule ____ or vocab _____.

(Krashen and many others have looked at almost exactly this question in terms of acquisition of vocab and writing skills in terms of whether or not free voluntary reading (in L1 or L2) or classroom instruction works best.  You can teach people vocab, or phonics, or word-decoding, or writing rules, or you can let them read (or listen) to interesting stuff.  The research is unanaimous and clear: free voluntary reading beats everything in terms of how fast things are picked up, how interesting learning is, and how “multidimensional” the learning– measured in various ways, from word recognition to improved writing– is.)

What we need is a holistic look at acquisition, which one-item studies of this kind cannot show us.  What did these students not acquire while they were doing their personal a grammar work?  What did the students who got multidimensional input pick up?  Language is much more complex than knowing Rule ____ and looking at an instructional intervention that targets .1% of what needs to be learned– while ignoring the other 99.9%– is silly at best.

If you really want to know whether an instructional intervention, or technique, works, you have to look at all aspects of language use, not just whether or not one rule has been acquired.

SO…do grammar-focused instruction, vocab presentation and corrective feedback work to help people acquire the personal a?

  • maybe (but Bowles and Montrul don’t know why)
  • we have no idea for how long
  • sure…for one item at a time
  • in a boring way
  • in a way that sacrifices essential multidimensional input (of grammar and vocab)

So.  Next?