Numberz. Kids, parents, Adminz and teachers all want ’em. “What’s my mark?” asks Baninder. “How can Suzy bring her mark up?” asks Mrs Smith. “How do we get marked?” ask the keeners on Day 1.
Well, here we go, here are some ideas about formative assessment (seeing how people are doing during the learning process in order to guide instruction), and summative assessment, (a.k.a. evaluation), where we assign a Number to a kid’s performance. Here’s a picture:

There are a few general principles:
A) we should never use norm-referenced (a.k.a. “curved”) grading, for reasons discussed here.
B) We should be doing criterion-referenced grading– i.e. there should be a rubric, or what have you, which clearly defines what needs to be done to get what mark. There are a bazillion rubrics for evaluating second-languages writing, speaking etc out there, from ACTFL guidelines to various State standards to things in texts– I won’t get into details, except to say that any evaluative tool should be making an attempt to assess language use holistically, and should not include things like “students will use _____ verbs and _____ grammar structures.”
C) we should not mix up evaluation (a.k.a. summative assessment = numbers) and formative assessment (feedback). We need to see where learners are, and tailor teaching to what they can/can not do. This is assessment and we do not “mark” it, as per Rick Wormelli’s (and others’) ideas about “assessment for learning” (start here if you havn’t heard of this, then google away).
D) All evaluation and assessment practices should be explained to students. My kids have criteria in their course outlines, and we “mark” a couple of sample stories a month or so into the course. We do not do this in order to show kids “how to improve their work”– that can’t work for 95% of kids because it’s conscious learning– but rather so they can feel how assessment and eval works, and feel included.
ASSESSMENT (formative evaluation)
Assessment: seeing how people are doing along the learning road in order to steer the class car.
In a comprehensible input classroom, assessment should primarily answer one question: “do the students understand what they are hearing/reading?”
During story asking, a teacher checks choral responses to do this. We can also ask individual kids flat out– “Johnny, what did I just say/ask?”– or we can do P.Q.A. (personalised questions and answers) where we ask students in class the same question we ask the actor. If our story has “the boy owned a horse,” we ask the actor “do you own a horse?” and he has to say “yes, I own a horse.” We might ask a few more questions– “Do you own a dinosaur?” and get an answer like “no, I do not own a dinosaur”– and then we ask our keener kids in class “do YOU, Mandeep, own a crocodile?”
If, as Blaine Ray says, we get strong responses from class, actors or individuals, they are understanding. If we get slow, wrong, weak, or no answers, we have to go back and clarify, because either
1. they aren’t listening = no input = no acquisition, OR
2. they don’t understand = no comprehensible input = no acquisition
Ben Slavic has advocated using what he calls Jen’s Great Rubric (jGR) which basically evaluates how “tuned in” kids are. The rationale here can feel ambigious. On one hand, it’s the old “if it’s not for marks, kids won’t do the work” thing: instituted by teacher cos the work is so boring/hard that no sane kid would want to/ be able to do it, so marks = carrot and stick. (But then maybe kids need that if The System prizes Numberz and Markzzz above all else). On the other hand, if Johnny is failing because he is on his phone, zoned out, or otherwise disengaged, use of jGR is a great tool for the teacher to say to Johnny’s Mom “look– here is how he acts in class, i.e. he is not focused, and that is what his writing, speaking etc are weak.” Jury is out on this one but lotsa folks like it.
In terms of writing assessment, as Leanda Monro, Adriana Ramírez and a zillion others have pointed out, explicit feedback (in terms of grammar) does very little. Leanda told me last year that the best thing she could do with her French kids’ writing was to ask for more detail. I have found the same: I can blather/write at length about verb tenses, adjective agreement, etc, but the kids simply don’t learn from this (Krashen and many others have repeatedly shown that we cannot transfer conscious knowledge into acquisition). What does work is writing something like ¿Cuántos hermanos tenía la chica?
I have also found that kids make consistent writing errors– e.g. this year it took them awhile to acquire quiero tener (“I want to have”)– and so after each story the top five errors get circled more next story.
For speaking: good input = good output. However, Leanda and a few other French (and Chinese) teachers I’ve met have said that a bit of pronunciation work is necessary. This is because– for English speakers– the sound patterns of these languages are easy enough to screw up that with their output– even if it’s rock-solid– seemingly minor pronunciation errors can totally throw it. Chinese, with its subtle tones, and French, with its various “ay” sounds– é, è, ê etc– are easier than, say Spanish for English speakers to botch.
Another thing we should not be doing is, administering assessment without changes in instruction. The old pattern– present, practice, produce, quiz on Tues, test on Friday– is useless. Following a text or test series or a set of DVDs, and dutifully collecting quiz samples, and expecting the kids to look their quizzes over and say “oh my, I clearly need to bone up on pronoun placement and the vocabulary for discusing French art” is a great strategy…for the kids whoa re getting 95% already.
So what should assessment look like? It should
- be comprehension-focused
- be ongoing: during storyasking and reading, we check for comprehension
- actually cause us to change what we are doing. If kids don’t understand something, or make repeated errors, they need more input around that thing
EVALUATION (summative assessment)
One problem– err, I mean, opportunity— we have is, students are never at a fixed point in their acquisition. If they are getting a ton of good comprehensible input, they are acquiring (albeit not all at the same rate, or in the same way. Max may be picking up a few nouns from the most recent story, while Arabella’s brain is soaking up pronouns, or whatever). Students also “acquire” something, forget it, re-learn it, etc, in an ongoing, up-and-down process…so a “snapshot” of their skills is really not very useful or accurate.
For this reason, in my humble opinion, a student’s mark should always be based on their most recent output or skills. . We should not be setting up “units” and assigning a mark per “unit.”
Why? Well, maybe Rorie finishes a “unit” on shopping for clothes, and she gets 60%, so goes back and re-reads dialogues or a story, or studies the grammar. And gets better as a result. Maybe also the teacher uses the shopping vocab for the rest of the year. But how does the teacher now assess Rorie? Say the teacher assesses via units (10% of the year per unit, over 6 units = 60% of year, plus final projects or exam(s) worth 40% of year, marks for everything evenly divided between speaking, listening, reading and writing), and by end of year Rorie rocks at shopping for clothes, do they discard her crappy shopping unit mark and give her only the final exam mark? If so, cool, but why then bother with unit marks in the first place?
If the answer to this is “accountability,” you have a problem: marks are being used as carrot/stick (read: work is boring and/or not worth doing). I have argued that topical (sometimes called “thematic”) units are a bad idea– they tie grammar sets to vocab rules, they are boring, they are artificial, they overuse low-frequency vocabulary, they can present grammar that students are not ready to acquire– and they present assessment problems too.
Of course, parents, kids, Adminz, Headz will want to get a rough picture of how kids are doing, so it might not be all bad to have some kind of “rough progress” report. At my school, we are piloting a program where the kids get an interim report that offers feedback– neither numbers, nor just “good, OK, bad”– which teachers can customise. Mine gets the kids to evaluate themselves (to what extent do you listen for comprehension, ask for help, co-crerate stories, etc) and if I agree with their evaluations then that’s what goes home.
My evaluation system this year was super-simple. After a story was asked, and its extended version read, and we did Movietalk around its structures, the kids had to do two things:
A) a speedwrite (5 mins) where they had to describe either themselves or a picture. Their course goal was 100 good words in 5 min. Their “mark” was 1/2 grammar (on a rubric out of 3) and 1/2 wordcount (out of 100). For the first 6 speedwrites, they got a bonus (40, then 35, then 30 etc), and after that no bonus.
(Note: the grammar rubric is out of 3 but is weighted the same as wordcount. A kid that gets 100 words and a 2/3 for grammar gets 83% (100% + 66% / 2).)
For their first speedwrite, they typically wrote 25 words + 40-word bonus, so average mark was 65% for words and grammar (for the first) was 1/3 but very rapidly climbed to about 2.2-2.5/3.
B) Relaxed write. For this, they had to re-tell (in writing) the most recent story, but they had to change details and include dialogue, etc. I marked these using grammar (/3) and wordcount (starting at 200 and going up by 50 each time) with no bonus. Their wordcount marks also went steadily up and their grammar got better after first 2 stories.
So, they had an “ongoing” mark which they could always improve on. I told them that “this is a rough guide to how well you are doing. You can improve, or you can stop paying attention (or miss a bunch of class), and your mark can drop.”
I entered marks into the spreadsheet every time we did a post-story writing assessment, and I’d post a printout, and I made them keep their relaxed writes and freewrites. They all got better with time and it was cool for them to “see” progress: grammar marks were low for first 2 stories, then went up, and wordcounts steadily climbed.
For finals– with beginners– it was simple. They had two 5-min speedwrites (/100, and with an /3 grammar mark), one 45-min story (/800, with /3 grammar mark). These were combined. They had one listening assessment– dictation, where they listened, wrote and translated– and their reading assessment was, go back to stories we’d done and answer questions. Final mark: 100% based on final exam = 1/3 writing, 1/3 reading and 1/3 listening. Also, any kid who wants to re-do their exam can do that no problem.
This system was almost as good as it could be. The kids knew what they had to do, the work was easy, there were no surprises, and even the weakest ones were able to do well (writing functional 300-400 word stories in 3 verb tenses including dialogue), while at the top end Shayla, Manpreet, Khubaib and Jaskarn pumped out amazingly good 600-800 word stories. (Interestingly, I had equal numbers of strong (and weak) students of both genders).
(The only things I am going to change next year are
- I am going to use a more complex rubric for marking final writing. This is mainly because the one I used this year does not adequately distinguish complexity from simplicity. Some kids write a sentence like Juan quería las chicas guapas (“John liked pretty girls),” while others write Juan quería las chicas guapas que tenían perros azules (John liked pretty girls who had blue dogs). In both cases, good Spanish, but the second kid is clearly a notch up.)
- I am going to give them one text they have not yet seen (for reading) and get them to answer comprehension questions on that
With my 2nd years, I’ll do a speaking assessment (3-min interview) and I’ll also do a couple of culture projects, plus Adriana’s movie idea.
So…what should evaluation look like? It should be
— holistic
— based on doing what the kids have done and reading what they have read during the course (no “gotcha” surprises).
— focused on interaction with meaningful whole language (no grammar testing)
— a picture of the kids at their best: at the end of the course, when they have had a TON of good comprehensible input