What should assessment and evaluation look like in the second-language classroom?

Numberz. Kids, parents, Adminz and teachers all want ’em. “What’s my mark?” asks Baninder. “How can Suzy bring her mark up?” asks Mrs Smith. “How do we get marked?” ask the keeners on Day 1.

Well, here we go, here are some ideas about formative assessment (seeing how people are doing during the learning process in order to guide instruction), and summative assessment, (a.k.a. evaluation), where we assign a Number to a kid’s performance. Here’s a picture:


There are a few general principles:

A) we should never use norm-referenced (a.k.a. “curved”) grading, for reasons discussed here.

B) We should be doing criterion-referenced grading– i.e. there should be a rubric, or what have you, which clearly defines what needs to be done to get what mark. There are a bazillion rubrics for evaluating second-languages writing, speaking etc out there, from ACTFL guidelines to various State standards to things in texts– I won’t get into details, except to say that any evaluative tool should be making an attempt to assess language use holistically, and should not include things like “students will use _____ verbs and _____ grammar structures.”

C) we should not mix up evaluation (a.k.a. summative assessment = numbers) and formative assessment (feedback). We need to see where learners are, and tailor teaching to what they can/can not do. This is assessment and we do not “mark” it, as per Rick Wormelli’s (and others’) ideas about “assessment for learning” (start here if you havn’t heard of this, then google away).

D) All evaluation and assessment practices should be explained to students. My kids have criteria in their course outlines, and we “mark” a couple of sample stories a month or so into the course. We do not do this in order to show kids “how to improve their work”– that can’t work for 95% of kids because it’s conscious learning– but rather so they can feel how assessment and eval works, and feel included.

ASSESSMENT (formative evaluation)

Assessment: seeing how people are doing along the learning road in order to steer the class car.

In a comprehensible input classroom, assessment should primarily answer one question: “do the students understand what they are hearing/reading?”

During story asking, a teacher checks choral responses to do this. We can also ask individual kids flat out– “Johnny, what did I just say/ask?”– or we can do P.Q.A. (personalised questions and answers) where we ask students in class the same question we ask the actor. If our story has “the boy owned a horse,” we ask the actor “do you own a horse?” and he has to say “yes, I own a horse.” We might ask a few more questions– “Do you own a dinosaur?” and get an answer like “no, I do not own a dinosaur”– and then we ask our keener kids in class “do YOU, Mandeep, own a crocodile?”

If, as Blaine Ray says, we get strong responses from class, actors or individuals, they are understanding. If we get slow, wrong, weak, or no answers, we have to go back and clarify, because either

1.  they aren’t listening = no input = no acquisition, OR

2. they don’t understand = no comprehensible input = no acquisition

Ben Slavic has advocated using what he calls Jen’s Great Rubric (jGR) which basically evaluates how “tuned in” kids are. The rationale here can feel ambigious. On one hand, it’s the old “if it’s not for marks, kids won’t do the work” thing: instituted by teacher cos the work is so boring/hard that no sane kid would want to/ be able to do it, so marks = carrot and stick.  (But then maybe kids need that if The System prizes Numberz and Markzzz above all else).  On the other hand, if Johnny is failing because he is on his phone, zoned out, or otherwise disengaged, use of jGR is a great tool for the teacher to say to Johnny’s Mom “look– here is how he acts in class, i.e. he is not focused, and that is what his writing, speaking etc are weak.” Jury is out on this one but lotsa folks like it.

In terms of writing assessment, as Leanda Monro, Adriana Ramírez and a zillion others have pointed out, explicit feedback (in terms of grammar) does very little. Leanda told me last year that the best thing she could do with her French kids’ writing was to ask for more detail. I have found the same: I can blather/write at length about verb tenses, adjective agreement, etc, but the kids simply don’t learn from this (Krashen and many others have repeatedly shown that we cannot transfer conscious knowledge into acquisition).  What does work is writing something like ¿Cuántos hermanos tenía la chica?

I have also found that kids make consistent writing errors– e.g. this year it took them awhile to acquire quiero tener (“I want to have”)– and so after each story the top five errors get circled more next story.

For speaking: good input = good output. However, Leanda and a few other French (and Chinese) teachers I’ve met have said that a bit of pronunciation work is necessary. This is because– for English speakers– the sound patterns of these languages are easy enough to screw up that with their output– even if it’s rock-solid– seemingly minor pronunciation errors can totally throw it. Chinese, with its subtle tones, and French, with its various “ay” sounds– é, è, ê etc– are easier than, say Spanish for English speakers to botch.

Another thing we should not be doing is, administering assessment without changes in instruction.  The old pattern– present, practice, produce, quiz on Tues, test on Friday– is useless.  Following a text or test series or a set of DVDs, and dutifully collecting quiz samples, and expecting the kids to look their quizzes over and say “oh my, I clearly need to bone up on pronoun placement and the vocabulary for discusing French art” is a great strategy…for the kids whoa re getting 95% already.

So what should assessment look like?  It should

  • be comprehension-focused
  • be ongoing: during storyasking and reading, we check for comprehension
  • actually cause us to change what we are doing.  If kids don’t understand something, or make repeated errors, they need more input around that thing

EVALUATION (summative assessment)

One problem– err, I mean, opportunity— we have is, students are never at a fixed point in their acquisition. If they are getting a ton of good comprehensible input, they are acquiring (albeit not all at the same rate, or in the same way.  Max may be picking up a few nouns from the most recent story, while Arabella’s brain is soaking up pronouns, or whatever). Students also “acquire” something, forget it, re-learn it, etc, in an ongoing, up-and-down process…so a “snapshot” of their skills is really not very useful or accurate.

For this reason, in my humble opinion, a student’s mark should always be based on their most recent output or skills. . We should not be setting up “units” and assigning a mark per “unit.”

Why? Well, maybe Rorie finishes a “unit” on shopping for clothes, and she gets 60%, so goes back and re-reads dialogues or a story, or studies the grammar. And gets better as a result. Maybe also the teacher uses the shopping vocab for the rest of the year. But how does the teacher now assess Rorie? Say the teacher assesses via units (10% of the year per unit, over 6 units = 60% of year, plus final projects or exam(s) worth 40% of year, marks for everything evenly divided between speaking, listening, reading and writing), and by end of year Rorie rocks at shopping for clothes, do they discard her crappy shopping unit mark and give her only the final exam mark? If so, cool, but why then bother with unit marks in the first place?

If the answer to this is “accountability,” you have a problem: marks are being used as carrot/stick (read: work is boring and/or not worth doing).  I have argued that topical (sometimes called “thematic”) units are a bad idea– they tie grammar sets to vocab rules, they are boring, they are artificial, they overuse low-frequency vocabulary, they can present grammar that students are not ready to acquire– and they present assessment problems too.

Of course, parents, kids, Adminz, Headz will want to get a rough picture of how kids are doing, so it might not be all bad to have some kind of “rough progress” report.  At my school, we are piloting a program where the kids get an interim report that offers feedback– neither numbers, nor just “good, OK, bad”– which teachers can customise.  Mine gets the kids to evaluate themselves (to what extent do you listen for comprehension, ask for help, co-crerate stories, etc) and if I agree with their evaluations then that’s what goes home.

My evaluation system this year was super-simple. After a story was asked, and its extended version read, and we did Movietalk around its structures, the kids had to do two things:

A) a speedwrite (5 mins) where they had to describe either themselves or a picture. Their course goal was 100 good words in 5 min. Their “mark” was 1/2 grammar (on a rubric out of 3) and 1/2 wordcount (out of 100). For the first 6 speedwrites, they got a bonus (40, then 35, then 30 etc), and after that no bonus.

(Note: the grammar rubric is out of 3 but is weighted the same as wordcount. A kid that gets 100 words and a 2/3 for grammar gets 83% (100% + 66% / 2).)

For their first speedwrite, they typically wrote 25 words + 40-word bonus, so average mark was 65% for words and grammar (for the first) was 1/3 but very rapidly climbed to about 2.2-2.5/3.

B) Relaxed write. For this, they had to re-tell (in writing) the most recent story, but they had to change details and include dialogue, etc. I marked these using grammar (/3) and wordcount (starting at 200 and going up by 50 each time) with no bonus. Their wordcount marks also went steadily up and their grammar got better after first 2 stories.

So, they had an “ongoing” mark which they could always improve on. I told them that “this is a rough guide to how well you are doing. You can improve, or you can stop paying attention (or miss a bunch of class), and your mark can drop.”

I entered marks into the spreadsheet every time we did a post-story writing assessment, and I’d post a printout, and I made them keep their relaxed writes and freewrites. They all got better with time and it was cool for them to “see” progress:  grammar marks were low for first 2 stories, then went up, and wordcounts steadily climbed.

For finals– with beginners– it was simple. They had two 5-min speedwrites (/100, and with an /3 grammar mark), one 45-min story (/800, with /3 grammar mark). These were combined. They had one listening assessment– dictation, where they listened, wrote and translated– and their reading assessment was, go back to stories we’d done and answer questions. Final mark: 100% based on final exam = 1/3 writing, 1/3 reading and 1/3 listening. Also, any kid who wants to re-do their exam can do that no problem.

This system was almost as good as it could be. The kids knew what they had to do, the work was easy, there were no surprises, and even the weakest ones were able to do well (writing functional 300-400 word stories in 3 verb tenses including dialogue), while at the top end Shayla, Manpreet, Khubaib and Jaskarn pumped out amazingly good 600-800 word stories. (Interestingly, I had equal numbers of strong (and weak) students of both genders).

(The only things I am going to change next year are

  • I am going to use a more complex rubric for marking final writing. This is mainly because the one I used this year does not adequately distinguish complexity from simplicity. Some kids write a sentence like Juan quería las chicas guapas (“John liked pretty girls),” while others write Juan quería las chicas guapas que tenían perros azules (John liked pretty girls who had blue dogs).  In both cases, good Spanish, but the second kid is clearly a notch up.)
  • I am going to give them one text they have not yet seen (for reading) and get them to answer comprehension questions on that

With my 2nd years, I’ll do a speaking assessment (3-min interview) and I’ll also do a couple of culture projects, plus Adriana’s movie idea.

So…what should evaluation look like? It should be

— holistic
— based on doing what the kids have done and reading what they have read during the course (no “gotcha” surprises).
— focused on interaction with meaningful whole language (no grammar testing)
— a picture of the kids at their best: at the end of the course, when they have had a TON of good comprehensible input


    1. They have to do it entirely from memory.

      The reason for this (and the fact that there is a time limit) is, we want to see what they have acquired, i.e., what they have “wired” into their brains that they can “spit out” without thinking. Language acquisition is subconscious, and language competency is subconscious, so to get a “real” picture of skill we get rid of the conscious mind and its various strategies.

      If we let them use notes, dictionary, Google, etc, what we get is them thinking in English (or whatever L1 is) and then consciously adding words. The results are predictable: kids will write “Yo lata juego basquetbol” (I tin can I play basketball).

  1. I’ve definitely experienced the same with students using words out of context. The reason I’m asking is that we are fighting a very depressing and quite formidable culture problem with foreign language education in our high school of 1900 in southeastern Pennsylvania. As a first year teacher, it’s very discouraging for me. Even though most of us adhere to comprehensible input methods, I can tell you that we don’t get anything even close to 200 or 300 words from year-one Spanish students in similar exercises from more than a handful of kids. The kids are of average socio-economic status and racially very homogeneous to give you some background. Spanish is seen (by students, the community, guidance, and administrators) as something you have to take at least two years of (though we offer four levels) if you want to go to college; nothing more. We have one section of Spanish 4 in a high school of 1600. It’s very discouraging. The problem is that our kids see Spanish as a subject, not as a skill. It’s obvious that we have a culture problem in our district.

    1. Hey Marc–

      In Canada, quite a few students also take a language just for Uni purposes. It sounds like your admin and counsellors need to get off their mental butts and learn about– and get enthusiastic about– the bazillion benefits of learning another language.

      If you want to boost wordcount, here are some suggestions (things that worked for me)

      a) For stories, their writing assessment is, retell the story (but give it a twist) and include dialogues. The dialogues will have been part of the initial story (the version you ask, where the actors “say” the dialogue), as well as in PQA (where you ask your keener class memebrs the same questions you ask your actors).

      This will give them a “platform” or “structure” which they can use. You can also tell them– after 2-3 stories– to add a second character.

      I found this year that the stories initially were basically re-writes of the asked (or extended (read-only) verion…but they quickly changed into entirely new things.

      b) If they are having writing problems, it is almost certainly because they need to READ more. I can’t stress this enough. My big “a-HA!” moment this year: I gave the kids photocopies of the asked story after I’d asked it (. We read it together. THEN, I gave them the extended reading and we read and PQA’d that (we also volleyball-read that). This made a HUGE difference…because they were doing twice the reading.

      c) I articulated course goals– 800 words in 40 min, 100 words in 5– at start of year– and used curves (declining bonuses) for the speedwrites and I upped the wordcount for relaxed writes by 100 per story. This way, they had a clear indication of how they were doing, and they clearly saw progress.

      d) Another cool trick: when you get them to do their relaxed write and speedwrite, read a couple of them back to the class. Obviously you will fix errors as you read aloud, go slow, and you will circle relvant bits. The coolest thing: you can take the weakest kid, read his/her story– fixing the errors and adding a few details– and s/he will be THRILLED cos most kids figure otu pretty quick how “good” they are at subjects. Plus, if you go slow, and it’s a decent story, this will be good comprehensible input.

      1. Question:
        How explicit is your student evaluation presented on a course outline? How are you evaluating the four language skills in formative and summative ways? What does this look like for your students and parents? Do you have examples of your evaluation rubrics and course outlines?
        Help! New to TPRS

      2. 1. They have an expected wordcount schedule and a grammar rubric on their course outline.

        2. See the blog for assessment info.

        3. In a C.I. classroom, formative assessment happens every time a kid opens their mouth (or the class answers as a group). I see what they understand and can produce. Slow/weak/wrong response = we need more time on the sentence or with the structure/target in question.

  2. Thanks so much for the response! I have started using a lot more embedded readings about cultural topics during this second semester. I am trying to transition to where homework (when needed) can be a choice of reading assignments, but unfortunately we still have long lists of vocabulary and a mountain of grammar concepts to cover (it gets really bad in year 2) that I’m responsible for making sure they learn (you know, for the mid-term and final! Don’t get me started on that…).

    There is so much that I am going to do differently right out of the gate next year that I have already started doing this year albeit not too recently, for example, TPRS. One thing that I’m interested in learning more about with TPRS is how the kids do with producing the non-third person conjugations since I understand from reading a couple of Ben Slavic’s books that one should ask the stories in the third-person singular. I can hit most of the other forms of each verb in the PQA, but we definitely don’t get as many reps with those compared to the target structures in the story. Also, when I’m trying to teach a verb that might have a stem-change, I’m conflicted as to how I can help them understand when it’s “viene” but why it’s not “vienís” without going back to explicit grammar instruction but also not having to include each of the six forms of the verb as a target structure as that would be impossible.

    I have followed your blog for a couple of months now and really enjoy it. I look forward to sharing it with my colleagues once I feel like I’m adequately prepared to show them how it works and calm their concerns about this different teaching style. I really need to get to a good TPRS/CI workshop and get my hands dirty!

    1. Yeah, I would for sure get to a Blaine Ray workshop. Also get on twitter. Also yahoo moretprs and Ben’s are great.

      As far as the mountain of vocab goes, do you HAVE to teach it? What does “teach” mean? In my experience, there is a small and fixed upper limit on what kids can acquire. If you give them 5,000 words they acquire nothing. If you give them two, they’ll nail ’em. I would take heart: your colleagues, guaranteed, will not be able to teach mountains of vocab any better than you can. Best idea: pick the highest-frequency stuff from the text and use that in stories. Better they can REALLY use 3 verbs than suck at 10.

      Regarding the verbforms

      A) I don’t teach venis or vineis or vos– these are low frequency and used only in a few parts of Spain and notnin Latin America,mplus all Spanish speakers understand “ustedes” and “tú”

      B) you get some of your reps from narration and the key here is to use parallel characters. So you have un chico que fue a WalMart and una chica que fue a 7-11. You circle each individually but you also circle both– “clase, ¿fueron a Walmart los dos? No, no fueron…” so they hear the plural form.

      You can do the same with the actors when you do present-tense questions to them.

      C) Readings are where you get a LOT of input so do as much as you can, and if you are writing your own stories make sure you add narration and q&a using other forms

      D) another GREAT idea from Bryan Kandel is to use fake text generator software (loads of free sites online) where you can make images of back-and-forth text convos for stories, readings etc.

    1. Thanks. My main point: assessment should focus on how much comprehensible input kids are getting, and evaluation should be holistic and represent a “best of” picture.

  3. Chris, I’m getting ready for finals and thinking about what to do differently next year. Would you mind explaining the grammar grading a little bit more? Word count and grammar are graded on a scale of 3, so 33 words (+ 40 = 77) would get you a 2.5, and not so great beginning of the year grammar (how do you measure it exactly?) would be a 1 (33%), so the student gets a 50 on their first assessment? Does this kind of grading discourage the students at the beginning of the year, or raise the affective filter? Also, I would like to see your more detailed rubric if you come up with one that shows how you grade the students with more complicated sentences. Will you end up bumping down the grades of those with simpler sentences as a result of giving the highest grade to those with fancier sentences??

    I am so curious because now, finishing my second year of TPRS, my grades have really consisted of exit quizzes and fast writes, though I have added some 5-minute writing homeworks which has helped some kids numbers. But each year I have had one or two students who I fear have not succeeded, whether for effort or more serious processing issues (one student cannot write more than 50 words, has awful handwriting and calls himself dyslexic, another who tries to write but her spelling I truly cannot understand – even when she is copying), though they both show they comprehend on exit quizzes and translating. I really want to add a little more “rigor” or “accountability” to what I am doing.

    I also, thanks to your recommendation, watched Ms. Ramirez’s videos and purchased her teacher’s manual that I am looking through. I would also like to learn more about her assessments as well – I see from the videos that she does longer writings at the end of a story, and includes a cumulative vocab quiz.

    Thanks so much for any assistance!

    1. I don’t do Adriana’s vocab quizzes, because the brain does not store and remember vocab as isloated fragments. Indeed, I once tested this, and the kids massively bombed a discrete-item test, and when I read them sentnces containing the same vocab, they all got it.

      I do the story rewrite Adriana does– rewrite the story, but add a few twists.

    2. Another alternative for start of the year is, mark the first 2 speedwrites and feeewrites just on wordcount.

      When I give the writing back, I tell them “if you got a 1 on grammar, and you stay tuned in, it will go up” and it ALWAYS does. I literally never get a kid who cannot get a 2/3 for writing. So for say speedwrite 3, the “bonus” is 30 words. If Johnny gets 2/3 = 66% and writes 70 words (easy) he gets 105 + 66 = 170 /2 = 85%

      I thought about the rubric and yes I am going to modify it when I have time. It does not distinguish between voluminous generic and more finely detailed writing (as my colleague Leanda’s does).

    3. Why do you want “rigor”– whatever that is — and “accountability”?

      If the kids are listening and reading, and they understand, they are acquiring. If the mix of input gets gradually more complex, their vocab will get more complex.

  4. Thanks – for all comments. I agree on vocab tests. For grammar grade do you have a guideline for how many mistakes = a 1, 2 or 3? And as far as rigor I guess I really wish I felt firmer about my grading. Adding a mark for the grammar will help with that.

    Do you teach novels in the first year, and if so which ones do you use?

    1. I use Berto and Pobre Ana and El Nuevo Houdini (tho not always all 3).

      For grading, I don’t think we should match # of mistakes with a mark. First, no kid will ever write a perfect paper. Second, this will create a wordcount disincentive (the more you write the greater your chance of mistakes). The grading should be holistic. Besides, kids’ output is generally very consistent: it’s not like they make mistakes on the first two sentences then blow it on the rest.

    1. The questions and answers are in Spanish. They will have seen every question. There are no surprises on the reading. Basically I just want them to do another ton of reading comprehensible input. Not playing “gotcha” with the kids

  5. Chris, what happens if kids miss assessments? Do they “bank” their last score? Do they get a 0 for that assessment? I understand it’s a snapshot but if they aren’t there they can neither demonstrate proficiency nor can they be (idealistically) dinged for not being there to demonstrate that proficiency… I would probably just give them an F on that assessment until they do it but that is not ideal either. So what do you do thanks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s