Teacher Evaluation

Should There Be Awards for Teachers?

I saw a Facebook post recently where a teacher got the “Teacher of the Month” at their school. Which made me wonder, should there be awards for teachers?

What is an award? Well, it’s public recognition that somebody (or an institution) has done something exceptionally, according to a defined set of criteria, and it often comes with one or more of the following: a ceremony, a prize of some kind, special benefits for the recipient, and public acknowledgment of the recipient.

What is the point of an award? All awards basically say, what this person did and how they did it are worthwhile, and what everybody should do. They are norms, and institutional commands. When Johnny gets a star on his spelling test for getting 9/10– and the rest of the class can see that he got the star–the teacher is saying to the class, what Johnny did was good, and everybody should try to do the same thing. When a teacher gets an award, much the same is true.

Now awards, as Alfie Kohn has spent an entire career pointing out, are absolutely toxic for students. Why?
1. They remove intrinsic motivation, ie, they make students work for An Object, not because they find the work interesting.
2. When The Object is no longer offered, why do the work?
3. They automatically make most students not care, because everybody knows who the egg-head/super-jock is, and knows they can’t compete, so why bother trying?
4. While awards may make sense in some adult situations (eg the Superb Owl, the Word Series), the implicit point of awards– compete, so you can be better than the other people, and then be recognised for it– has nothing to do with what we know about how education (and most of life) best works. Ideally, in education, people do work because they like it and they find it personally rewarding.

So…should teachers get– or accept– awards? I would say, generally, absolutely not. To explain why, let’s see what teachers say about this, from C.I. Fight Club.

POV: when you’re the last one to get the award:

You think there’s politics involved in who wins Teacher of the Year or other awards? Hmmmm…

Ah yes the public B.S. of “who will win?” and the ridiculous amount of work required. While this is fair criticism, awards here are a choice.

You mean, teaching awards are sometimes mere popularity contests?

And finally, this sums it all up:

I don’t think teachers should compete for awards, or accept them if offered. Education isn’t sportsball, and schools should not be about competition. The question should always be, how can the energy, resources and time we devotes to awards be used to make the system as a whole better?

C.O.F.L.T. Conference Reflections

The energy-loaded Tina Hargaden, vice-president of the C.O.F.L.T. in Portland, organised a conference and I got to do the T.P.R.S. part of it– a one-day workshop with German storyasking demo, reading, Movietalk, Picturetalk, method explanation, Q&A, etc.

To say I had a busy weekend would be an understatement: work Fri, drive 7 hours to Portland (through Seattle traffic, its own special Hell, thank you NPR for making it bearable), have a beer and talk shop with Tina, sleep like a baby at the Kennedy School Hotel (a high school converted to hotel– awesome– “fall asleep in class” is their tag), do presentation, drive back to Canada, time change, it’s now 1 AM, sleep three hours, get on plane to Cuba…where thank God they have mojitos and overhung limestone rock routes.

Anyway, we had the most people of any workshop at the conference (almost 30) and Tina told me that we were the only room where people were regularly laughing. There were a bunch of Chinese student teachers doing their degrees in Portland, a few TPRSers who were in for a tune-up, and a whack of curious rookies.

So I got my evaluations back. You can see the COFLT 2016 Stolz TPRS feedback forms if you want to see how awesome I am 😉 and how much Oregonians appreciate their gluten-free, salad-based, vegan or organic meat, locally-sourced artisanally-cooked dishes, etc 😄. But mostly what is interesting in the comments are the themes that recur.

1. A lot of people said they really appreciated the German demo aspect of the presentation (an idea I got from Blaine Ray). People wrote along the lines of “it was great to experience what it is like to be a student.” I remain convinced that the only way to make any language-instruction method convincing is to teach people part of a language they don’t know. It is so easy for us to forget how tough it is– even with good C.I.– to pick up a new language.

2. Recognising that, and because we had some native Mandarin speakers at the workshop, I asked participant Yuan to teach us some Mandarin (Blaine Ray also does this). She parallel-circled two sentences: Chris climbs mountains and Tina drinks beer.

This put me into the students’ seat and it was enlightening. I noticed two things:

a) I needed a LOT of reps to remember the Mandarin, and I was glad Yuan went s.l.o.w.l.y.

b) Mandarin does not seem very difficult. No articles, verb conjugation, etc, though word order seems crucial.

3. Most people wanted more time with T.P.R.S. (or even me as presenter). There seems to be a need (in OR and WA) for more C.I.-themed language workshops. Luckily, Tina Hargaden and C.O.F.L.T. on it and there will be a conference Oct 13-15 which will feature Steve Krashen, Karen Rowan, etc.

4. I talked to another presenter who had a workshop called something like “using authentic docs to design authentic tasks for authentic assessment.” He did some explaining and I wondered two things:

a. What do you actually do with the info from an “end of unit” assessment? If Max and Sky do well, and Rorie and Arabella terribly, now what? How does that info shape your next “unit”? I guess if you want a number, awesome, but numbers help neither teachers nor students.

b. How much energy is a teacher productively using when they design #authres-based activities for assessment? I mean, most #authres don’t use high-freq vocab and are often more of a guessing game for students.

As I talked to this guy, it struck me that you would get a lot better assessment with exit quizzes for reading & translating, and with comprehension checks along the way– especially with what Ben Slavic has called “barometer kids”– so that, in the moment, you can provide more input for what the kids are misunderstanding.

5. Laughter matters. Laughing bonds people, lightens any mood, is a brain break, comes from when unexpected ideas are conjoined, etc. So I am glad that we got to laugh at our workshop (yet another practice that Blaine Ray is all about with his dancing monkeys and girls without noses).

6. There were some experienced C.I. teachers there and I was super-stoked (sorry I can’t remember names). These folks asked good questions, and they often said “well Chris does ____ but I do _____ instead.” Which teaches us that while there is a basic C.I. recipe– use a story, limit and recycle vocab, have people read the story, add images and short films for more vocab recycling– there are many cooks with a panoply of flavours. Also, the experienced people generated great lunchtime discussions over craft organic artisanal salads and quinoa vegan quiche 😉.

So, thanks COFLT and Tina for a great opportunity for all those language teachers. Their Oct confernce will rock– stay tuned.

What does good language teaching look like? The Ten Principles for ALL language teachers

Today’s question is “What does good language teaching– regardless of method– look like?”

Here are criteria. Comments welcome!

1) The class delivers a LOT of aural and written comprehensible input, supported where necessary with translation, images, acting, gestures and whatever makes the input comprehensible. Input is:

always comprehensible
quality, and not generated by (error-making) learners
compelling (this will vary with class, age, culture etc)
delivered via progress along frequency lists (more-frequently used vocab is taught before less frequently used)
not impoverished: it does not overfocus on one grammatical/vocabulary rule or grouping, and it does not leave out any elements of the language’s grammar
repeated frequently without being boring

2) Both input and class are personalised. The teacher will make an ongoing effort to get students to understand and respond to vocabulary in ways which reflect students’ interests, identities (real and/or imagined) and views.

3) Grammar— the rules and conventions of language as traditionally understood by teachers and texts–

is briefly mentioned only to clarify meaning
does not form the goal, organisational system or focus of instruction
is not practiced through drills, worksheets, songs, etc, because research shows these ineffective

4) Instruction primarily focuses on immersing learners in comprehending compelling meaning in the target language. This means that portfolio-work-revision, correction, grammar concept explanations and mind-mapping, feedback, focus on teacher-or-text-driven ideas about “cultural relevance,” etc are avoided.

5) Output has the following characteristics:

it is always unrehearsed and unforced
it has no goal other than immediately authentic conversation (no role plays, etc; scripted activities such as A.I.M. or T.P.R.S.-style stories provide input for other learners)
the learner, and not the teacher, chooses the level of output they are comfortable with, from yes/no answers to essays

6) The classroom is safe and welcoming. The classroom should not make anyone feel uncomfortable or self-conscious. The minimum behaviour standards are that students

listen and read with the intent to understand, and avoid focus on distractions
do not distract anyone in class
signal comprehension or a lack thereof

7) Instruction recognises the unchangeability of (and tremendous variation between students’ progress along) internal linguistic syllabi. Instruction therefore delivers an always-rich, non-impoverished diet of comprehensible language, so that

neural architecture constantly builds
learners consistently have exposure to whatever they need
learners can acquire new items or rules when they are ready, because “everything is present in the mix” (Susan Gross).

8) Instruction and assessment avoid

explicit goals
“I can” or any other kind of language-narrowing statements
textbook-style, discrete-item sequencing, presentation and assessment of grammar and vocabulary

9) Evaluation only involves meaningful, multi-dimensional language tasks (reading, writing, listening and speaking) which are in-context authentic and holistic. Evaluation therefore avoids legacy practices such as grammar-item tests, vocabulary quizzes, “show me you can do this real-world dialogue”-style talking activities, etc.

10) Level-to-level attrition rates, marks variability and failure rates are all low, and special-needs students succeeed int he class. In other words, people who start taking the language keep on taking it, the difference between higher and lower marks is minimal, and scores are high.

(11) The teacher modifies practice if something better comes along, or current practice does not work for students.

OK. Ça va? ¿Sirve? Geht’s? If these statements describe us, our classes and our students, we are doing everything right.

How Badly Did I Fail Teaching Languages? (1)

I have been reflecting on my teaching and I thought I would share my many screw-ups, and offer some better alternatives (which might be useful for teachers who use textbook programs and are getting frustrated). So here we go– today’s question–

Q: How– and how badly– did I screw up teaching languages?

A: Pretty badly– and here is how

1. I used grammar worksheets and explicit grammar practice to “teach grammar.” The programs I used– first ¡Díme! and then Juntos— had a lot of these. Fill in the blanks with the correct verbform, pronoun, or word, etc. The research about this is clear: a grammar item is acquired when the learner has heard loads of comprehensible input containing the item/rule in question, AND when their brain is “ready” to pick it up. If a learner hasn’t acquired it, they aren’t ready for it. If they have, there’s little point in practicing. Truscott writes that “no meaningful support has been provided for the […] position that grammar should be taught” (and practiced) and VanPatten says that “tenses are not acquired as “units” and the brain doesn’t store grammar as a textbook-stated rule.”

“Conscious awareness” of grammar rules (as Krashen points out) only helps us if ALL of the following conditions are met:

1. we know the rule

2. we know how to apply the rule

3. we have time to consciously reflect on and apply the rule

So, if students have worksheets or whatever where they are “practising the passé composé” or whatever, they’ll do well. They’ll beaver away, slowly, filling in the blanks. Of course, in real life, they won’t have time to go “hmm, is that a DR AND MRS VANDERTRAMP verb? Oh, it is, so, let’s see, how do we conjugate that?” Or, as Yogi Berra said, “you can’t think and hit at the same time.” Worksheets cannot help those who havn’t acquired a grammar rule; they are unnecessary for those who have. And they’re boring.

Doing it better: I would have kids read a ton of stuff which has the grammar item, etc, they are learning. If I had worksheets, a better way to use them would be to give the kids the worksheets with the blanks filled in and have them translate: this is quality (if boring) input.

2. I used to do projects in the target language. One typical one: the ____ report. Research ___, write up what you learned about ____ on a poster and add some pictures and lines connecting different elements. Oh, and do it in Spanish. Then read it aloud to the class. Variations: use the Interwebz and add things that talk, move, have colours, etc. The only problems were…

the kids had to look up a ton of vocab (read: Google translate)
almost none of this vocab got repeated for the rest of the year (read: little acquisition)
most of the writing had to be “edited” (read: totally re-written) by me before the final product was assembled.
most of the audience focused on pictures and missed most of the target language during the presentation because the presenters are the only ones who know the vocab, and the audience wanted to understand, and pictures were easier to understand.
if done in poster form, nobody except me, the teacher, ever read the Spanish, and a week after it was done, not even the kids who wrote the poster typically what it meant because all they did was copy it down
most of the Spanish on these was low-frequency vocab. How often is somebody going to need to say “principal exports of ___ are petroleum and fried dog” or “The Cathedral of was built in ____”?

Now, kids did pick up a bit of vocab– and culture knowledge– but at the cost of good input.

Doing it better: I now do culture, etc, projects in English. I can get higher-order thinking, more learning-via-sharing, and less energy wasted on poor target language use. Plus, the kids can easily understand each others’ work.

3. I “used games” to “make grammar and learning fun.“ From class soccer leagues to Hangman to cross-4, games got the kids focused and they found them fun. Too bad, however, that

most of their output was flawed and/or English, and that therefore
they got little accurate input, and
the language they were exposed to was fragmented (ie generally not sentences which were part of bigger meaningful “whole” passages or conversations
they got low-frequency vocab. E.g. the class soccer/hockey/baseball game. Lots of fun, shouting, etc…but words like “scores” and “goal” and “foul” are not much-used.

4. I didn’t know what I was doing with assessment.

a) I screwed up listening assessment. In every languages program I have ever seen, the listening test at the end of each unit has something like, it plays a native speaker saying something, or a conversation. Then, there are multiple-guess questions Why was this a problem for my kids?

the kids had to “hold” quite a lot of vocab in their heads while listening to (say) 60 seconds of language. This is very hard to do.
The pattern was clear: if the speaker(s) said it, the kids picked that as the answer. If the statement was more complex– eg John was not tired– and the question was How did John feel? a) tired b) awake c) energetic, the kids would pick a, because thinking about “not” and the meaning of the word tired is cognitive overload for a lot of them.

Doing it better: Give them WAY more time, restrict vocab to only what they know, and provide aural input much more slowly. I would also now suggest using aural input as listen, copy and translate.

b) I screwed up writing assessment. Yes, I had a marking rubric (thanks, Julia Macrae), which worked fine for paragraphs. However, what do you do with single-sentence questions? For example, a question would be ¿Te gustan los perros? (Do you like dogs?). If a kid wrote me gusta los perros, or yo gusto las gatos (both of which demonstrate understanding of meaning, but which have basic grammar errors), how do you mark it? Half a mark off for the mistake? 1/4? How do you do holistic assessment for a sentence? Impossible.

Doing it better: Now, I make them write only paragraphs and stories and assess holistically. I check for understanding when I am asking stories, or while we are reading.

5. I used to expect oral output from Day 1. I used to do a lot of “communicative pair” or “information gap” activities. The problems here were many:

the kids would always make output mistakes– e.g. they would have a list of things or activities, and they would have to ask their partner about them. So, dogs. A kid would say ¿Te gusta el perros? and get the answer No yo gusto perros— which was meaningful, but very low-quality input for their partners. If this is where their language modeling came from, I realised eventually that there would be huge problems. They would not acquire articles, verb endings etc properly.
I felt like a cop, cruising around the class to ensure Spanish compliance. As one person I talked to said, “speaking ____ with other people who are also learning it feels fake.” Kids simply felt funny using the language.
the logical thing to do is to get the info as easily and quickly as possible, i.e. L1, whose use was a constant problem.
the activities in books were dull: ask your partner if s/he a) went to the beach b) played soccer, c) had a BBQ last summer. I dunno about you but I and the kids don’t find that compelling.

Doing it better: I don’t do any forced oral activities and end-of-course assessment with beginners. I do one totally random three-minute oral interview with 2nd and up level kids at the end of level 2. The kids do have to chorally answer story questions, and I will ask superstars personalised questions in the PQA (personalised questions and answers) process (basically, asking the superstars the questions I ask the actors). This has allowed me to deliver much more– and better– input, partly because I am not spending 6-8 blocks/year assessing output, and because the output they do– acting in stories, and superstar PQA– is super high quality (and so is good input) for other learners.

Now, when I have kids who are reluctant to talk, I ask them yes/no or one-word PQA questions. If we are doing a story and I say a la chica, le gustaban los gatos, I’ll first circle that, and then I’ll ask the actress ¿te gustan los gatos? and a few other questions involving gustan and los gatos, los perros, los dinosaurios, etc. Then, I ask my superstar or a native speaker ¿te gustan los gatos? and they can answer with a complete sentence. Then, I go to the slower processors (or shyer kids) and ask the same question. They can say sí/no and that’s fine, or they can say a complete sentence. The point now is to deliver input, not to force output, and to use output to signal understanding.

I also no longer do communicative pair activities. Kids now pick up Q&A (first and second person) forms (and everything else) through PQA and stories.

6. I used to do kid-created target-language movie projects. Typically, I said “make a short film of ___,” ___ being either some thematic vocab (e.g. the food or shopping unit) or this plus some specific grammar requirements (e.g. use the imparfait). Now, these are fun. My daughters also did them, and when they did, I’ve never at my house seen five teenagers spend so much intense time rehearsing, giggling, planning, etc. However…

the target language output was bad. They’re learners.
most of the time spent making a film was in English.
most of the energy, mental and otherwise, spent in making the film was fixed on visuals, acting, bloopers, editing, etc
when they watched each others’ films in class, mostly they could not hear or understand the Spanish…because most of the Spanish had been special-occasion looked-up just for the film, and because the sound was bad
because the kids KNEW that the story must be primarily visually told, and they would film/edit for visual comprehension, viewers didn’t really need to pay attention to target language.
even the understood good target language was often not repeated much throughout the year (low frequency).

In retrospect, movie projects did get the kids talking, and they were fun. But they didn’t deliver the sine qua non of good languages teaching: delivering compelling comprehensible input.

Doing it better: Thanks to Adriana Ramírez, I now do this. Provide the kids a script of 100% comprehensible vocab– including dialogue, with errors edited out– and have them film it. They will have a blast filming (picking costumes, editing, hanging out with their buddies, adding music etc). When you show it in class, they will be intrigued to see their friends acting, and they will not even notice that they are hearing and understanding the target language.

7. I used to give grammar tests. Read the sentence and fill in the blanks with the right ____. Conjugate the verb. Show me where the pronoun goes. The research is clear: grammar instruction works wonders if you want your students to become manipulators of grammar. However, the part of the brain that stores “metalinguistic awareness” stuff like grammar rules is at best tangentially connected to the subconscious part that actually processes language. The researchers all say the same thing: the brain does not acquire grammar by practicing grammar, and what we teachers call “grammar rules” is not how the brain “does” grammar. So, making kids study for tests that ask them to consciously manipulate words and apply grammar rules took away from real, deep processing that happens when they hear or read stories or other meaningful language.

Doing it better: assess whole-language use (read, listening to and writing real meaningful stuff) and just, well, don’t give grammar tests. If you really want to ensure that the kids learn to conjugate, use pronouns, etc, make them do a lot of reading.

8. I used to do the portfolio. Kids take evidence of what they do– writing, reports, videos or oral presentations, tests and quizzzes, etc, and stick them in a folder called a “portfolio.” Modern versions include online collections (e.g. you video your restaurant unit dialogue and put it on Youtube). The rationale for portfolios is a) kids can “reflect on their learning, document areas of growth and areas that need work” or some such edubabble, and b) kids can go and revise stuff and c) they can see what they did and learn from their mistakes.

First, (b) I agree with– you learned more, go fix it, good. But, second, we run into a problem with A and C, because, basically, most adolescents simply cannot reflect on something as (1) complex and (2) innate as grammar etc. Most of them can’t do it in English with essays/paragraphs etc, so how can we expect them to do it in a second language? As an English teacher who teaches lit and composition to English speakers in English, I know that kids cannot meaningfully self-edit. They also mostly cannot peer edit. Yes, you can give them checklists…and they will look for– and sometimes even find– things on the checklists…and miss everything else. And this is in English, their first language. I used to provide Spanish grammatical feedback, the kids would dutifully re-copy their paragraphs and “improve them” and then they would make exactly the same mistakes on tests.

You can talk about ____ till you are blue in the face, but most kids just can’t do it. They also don’t care– I mean, what student in their right mind would care how many of the 19 irregular passé composé verbs they don’t know or whatever? That’s boring. Also, I would give kids their writing back, correct the hell out of it, and they would look for how much red ink was on it, and what Number they got on it. This is because they quite correctly understood that Numberz are what Matterz to Teacherz and Parentz.

Portfolios however look cool– especially if the student is a girl; girls in my experience are more into neatness and colouring and nice pictures than boys– and Thingz That Look Cool (extra Pointz if its online! E-learningz! Cross-platform sharingz!) get attention, Adminz and Headz love them, etc etc. The only problem is, they don’t provide the acquisitional effects we expect. The only thing a portfolio can do is show growth. Kids will have 4-sentence paragraphs at start and 20 at end of a class. Great, a teacher’s markbook should reflect that, throw the quiz in the kid’s binder, why waste time on packages and prettiness and empty self-analyses?

So…how has eliminating the screw-ups helped my kids?

My epiphany came thanks to Michelle Metcalfe’s demo workshop, and my results now blow the old results out of the water. I have abandoned grammar practice and testing, communicative gap activities, oral output and most oral assessment, games, movie and culture projects in Spanish, and portfolios.

My Level 1 kids now write 600-1,00 word stories, in good Spanish, in multiple verb tenses, in an hour at the end of the course. They understand everything they hear. They feel great when they head somewhere Spanish-speaking. I have no management issues. I have every kid who attends and pays attention passing.

Your mileage, as they say, may vary. Mainly I am happy that I can experience more success with second languages and I hope I can inspire others to get there also (though not necessarily by doing what I do). And I mean honestly people, I am neither smart nor talented so if I can do OK with T.P.R.S., anyone can do well.

What grades should kids get? Notes on evaluation for the mathematically-challenged.

Here is a part of a post from Ben’s. A teacher– let’s call him Mr John Speaking– who uses T.P.R.S. in their language class writes:

“I was told by a Defartment Chair a few weeks ago that my grades were too high across the board (all 90s/100s) and that I needed more of a range for each assessment. Two weeks later I had not fixed this “problem” and this same Defartment Chair pulled me out of class and proceeded to tell me, referencing gradebook printouts for all my classes, that these high grades “tell me there is not enough rigor in your class, or that you’re not really grading these assessments.” After this accusation, this Defartment Chair told me I was “brought on board [for a maternity leave replacement] in the hopes of being able to keep me, but that based on what he’d seen the past few weeks, I’m honestly not tenure track material.”

Obviously, Mr John Speaking’s Defartment Chair is an idiot, but, as idiots do, he does us a favour: he brings up things worth thinking about.

There are two issues here:

a) Should– or do— student scores follow any predictable distribution? I.e., should there be– or are there–a set percentage of kids in a class who get As, Bs, Cs, Ds and Fs?

b) How do you know when scores are “too low” or “too high”?

Today’s question: what grades should students get?

First, a simple, math idiot’s detour into grading systems and stats. The math idiot is me. Hate stats? Bad at math? Read on! If I can get it, anyone can get it!

It is important to note that there are basically two kinds of grading systems. We have criterion-referenced grading and curved (norm-referenced) grading.

First, we have criterion-referenced grading. This is, we have a standard– to get an A, a student does X. To get a B, a student does Y, etc. For example, we want to see what our Samoyed Dogs’ fetching skills are and assign them fetching marks. Here is our Stick Fetching Rubric:

A: the dog runs directly and quickly to the thrown stick, picks it up, brings it back to its owner, and drops it at owner’s feet.

B: the dog dawdles on its way to the stick, plays with it, dawdles on the way back, and doesn’t drop it until asked.

C: the dog takes seemingly forever to find the stick, bring it back, and refuses to drop it.

So we take our pack of five Samoyed Dogs, and we test them on their retrieval skills. Max, who is a total idiot, can’t find the stick forever, then visits everyone else in the park, then poos, then brings the stick an hour later but won’t drop it because, hell, wrestling with owner is more fun. Samba dutifully retrieves and drops. Rorie is a total diva and prances around the park before bringing the stick back. Arabella is like her mother, Rorie, but won’t drop the stick. Sky, who is so old he can remember when dinosaurs walked the Earth, goes straight there, gets the stick, and slowly trudges back. So we have one A, one B, one C, one C- (Max– we mercy passed him) and one A- (Sky, cos he’s good and focused, but slow).

Here are our Samoyeds:

Now note–

1. Under this scheme, we could theoretically get five As (if all the Dogs were like Samba), or five Fs (if everybody was as dumb and lovable as Max). We could actually get pretty much any set of grades at all.

2. The Samoyed is a notoriously hard-to-train Dog. These results are from untrained Samoyeds. But suppose we trained them? We used food, praise, hand signals etc etc to get them to fetch better and we did lots of practice. Now, Sky is faster, Rorie and Arabella don’t prance around the park, and even silly Max can find the stick and bring it. In other words, all the scores went up, and because there is an upper limit– what Samba does– and nobody is as bad as Max was at fetching, the scores are now clustered closer together.

The new scores, post-training, are:

Sky and Samba: A

Rorie, Max and Arabella: B

Variation, in other words, has been reduced.

3. Suppose we wanted– for whatever reason– to lower their scores. So, we play fetch, but we coat the sticks in a nasty mix of chocolate and chili powder, so that whenever the Dogs get near them, they get itchy noses, and very sick if they eat them. The Dogs stop wanting to fetch our sticks. Some of them will dutifully do it (e.g. Samba), but they aren’t idiots, and so most of them will decide to forget or ignore their training.

4. Also note who we don’t have in our Dog Pool: Labrador Retrievers (the genius of the fetching world), and three-legged Samoyeds. There’s no Labs because they are three orders of magnitude better than Samoyeds at fetch, and we don’t have three-legged Samoyeds because, well, they can’t run.

In other words, we could reasonably get any mix of scores, and we could improve the scores, or we could– theoretically– lower them. Also, we don’t have any Einstein-level retrievers or, uhh, “challenged” retreivers– there are no “outliers.”

Now, let’s look at “bell curve” (a.k.a. norm-referenced) grading. In this case, we decide– in advance— how many of each score we want to assign. We don’t want any random number of As or Fs or whatever– we want one A, one F, etc. We want the scores to fit into a bell curve, which looks like this:

We are saying “we want a certain # of As, Bs, Cs, Ds and Fs.” Now, we have a problem. In our above stick fetching example, we got an A, an A-, a B, a C and a C-. We have no Ds or Fs, because all of the Dogs could perform. None of them were totally useless. (After doing some training, we would get two As (Samba, Sky) and three Bs (Rorie, Max and Arabella). But if we have decided to bell curve, or norm reference, our scores, we must “force” them to fit this distribution.

So Samba gets an A, Sky gets a B, Rorie gets a C, Arabella gets a D, and Max fails.

Now, why would anyone do this? The answer is simple: norm referencing is only a way to sort students into ranks where the only thing that matters is where each person ranks in regard to others. We are not interested in being able to say “in reference to criteria ____, Max ranks at C.” All we want to do here is to say where everyone is on the marks ladder compared to everyone else.

Universities, law schools, etc sometimes do this, because they have to sort students into ranks for admissions purposes, get into the next level qualifiers, etc etc. For example, law firm Homo Hic Ebrius Est goes to U.B.C. and has 100 students from which to hire their summer slav– err, articling students. If they can see bell-curved scores, they can immediately decide to not interview the bottom ___ % of the group, etc. Which U.B.C. engineers get into second year Engineering? Why, the top 40% of first-year Engineering students, of course!

Now I am pretty sure you can see the problem with norm referencing: when we norm reference (bell curve), we don’t necessarily say anything about what students actually know/can do. In the engineering example, every student could theoretically fail…but the people with the highest marks (say between 40 and 45 per cent) would still be the top ones and get moved on. In the law example, probably 95% of the students are doing very well, yet a lot of them won’t be considered for hire. Often, bell-curves generate absurd results. For example, with the law students, you could have an overall mark of 75% (which is pretty good) but be ranked at the bottom of the class.

So where does the idea for norm referencing (“bell curving”) sudent scores come from? Simple: the idea that scores should disitribute along bell-curve line comes from a set of wrong assumptions about learning and about “nature.” In Nature, lots of numbers are distributed along bell-curve lines. For example, take the height of, say, adult men living in Vancouver. There will be a massive cluster who within two inches of 5’11” (from 5’9″ to 6’1″). There will be a smaller # who are 5’6″ to 5’8″ (and also who are 6’1.5″ to 6’3″). There will be an even smaller number who are shorter than 5’6″ and taller than 6’3″. Get it? If you graphed their heights, you’d get a bell curve like this:

If you graphed adult women, you’d also get a bell curve, but it would be “lower” as women (as dating websites tell us) are generally shorter than men.

Now– pay attention, this is where we gotta really focus– there are THREE THINGS WE HAVE TO REMEMBER ABOUT BELL CURVES

a) Bell curve distributions only happen when we have an absolutely massive set of numbers. If you looked at five men, they might all be the same height, short, tall, mixed, whatever (i.e. you could get any curveat all). But when you up your sampling to a thousand, a bell curve emerges.

b) Bell curve distributions only happen when the sample is completely random. In other words, if you sampled only elderly Chinese-born Chinese men (who are generally shorter than their Caucasian counterparts), the curve would look flatter and the left end would be higher. If you didn’t include elderly Chinese men, the curve would look “pointier” and the left end would be smaller. A bell curve emerges when we include all adult men in Vancouver. If you “edit out” anyone, or any group, from the sample, the distribution skews.

c) Bell curves raise one student’s mark at the expense of another’s. When we trained our Samoyed Dogs, then marked them on the Stick Fetching Rubric, we got three As and two Bs. When we convert this into a curve, however, what happens is, each point on the curve can only have one Dog on it. Or, to put it another way, each Dog has a different mark, no matter how well they actually do. So, our three As and two Bs become an A, a B, a C, a D and an F. If Rorie gets a B, that automatically (for math-geek reasons) means that Max will get a different mark, even if they are actually equally skilled.

As you can see in (c), bell curves are absolutely the wrong thing to do with student marks.

And now we can address the issues that Mr John Speaking’s Defartment Head brings up. Mr Defartment Head seems to think that there are too many high marks, and not enough variation within the marks.

First, there is no way one class– even of 35 kids– has enough members to form an adequate sample size for a bell-curve distribution. If Mr Defartment Head thinks, “by golly, if that damned Mr John Speaking were teaching rigorously, we’d have only a few As, a few Ds, and far more Bs and Cs,” he’s got it dead wrong: there aren’t enough kids to make that distribution possible. Now, it could happen, but it certainly doesn’t have to happen.

Second, Mr John Speaking does not have a statistically random selection of kids in his class. First, he probably doesn’t have any kids with special challenges (e.g. severe autism, super-low I.Q., deaf, etc etc). BOOM!– there goes the left side of the bell curve and up go the scores. He probably also doesn’t have Baby Einstein or Baby Curie in his class– those kids are in the gifted program, or they’ve dropped out and started hi-techs in Silicon Valley. BOOM!– there goes the right side of your curve. He’ll still have a distribution, and it could be vaguely bell-like, but it sure won’t be a classic bell curve.

Or he could have something totally different. Let’s say in 4th block there are zero shop classes, and zero Advanced Placement calculus classes. All of the kids who take A.P. calculus and shop– and who also take Spanish– therefore get put in Mr Speaking’s 4th block Spanish class. So we now have fifteen totally non-academic kids, and fifteen college-bound egg-heads. Mr Speaking, if he used poor methods, could get a double peaked curve: a bunch of scores clustering in the C range, and another punch in the A, with fewer Bs and Ds.

Third, instruction can– and does– make a massive difference in scores. Remember what happened when we trained our Samoyeds to give them mad fetching skillz, yo? Every Dog got better. If Mr Speaking gave the kids a text, said “here, learn it yourself,” then put his feet up and did Sudoku on his phone or read the newspaper for a year (I have a T.O.C. who comes in and literally does this), his kids would basically suck at the language (our curve just sank down). On the other hand, if he used excellent methods, his kids’ scores would rise (curve goes up). Or, he is awesome, but gets sick, and misses half the year, and his substitute is useless, so his kids’ scores come out average. Or, he sucks, gets sick, and for half the year his kids have Blaine Ray teaching them Spanish, so, again, his kids’ scores are average: Blaine giveth, and Speaking taketh away.

“Fine,” says the learned Defartment Chair, “Mr John Speaking is a great teacher, and obviously his students’ scores are high as a result of his great teaching, but there should still be a greater range of scores in his class.”

To this, we say a few things

a) How do we know what the “right” variability of scores is? The answer: there is no way of knowing without doing various kinds of statistical comparisons. This is because it’s possible that Mr Speaking has a bunch of geniuses in his class. Or, wait, maybe they just love him (or Spanish) and so all work their butts off. No, no, maybe they are all exactly the same in IQ? No, that’s not it. Perhaps the weak ones get extra tutoring to make up for their weakness. Unless you are prepared to do– and have the data for– something called regression squares analysis, you are not even going to have the faintest idea about what the scores “should” be.

b) score variability has been reduced with effective teaching. There are zillions of real-world examples of where appropriate, specific instruction reduces the variation in performance. Any kid speaks their native language quite well. Sure, some kids have more vocab than others, but no two Bengali (or English-speaking) ten year olds are significantly different in their basic speaking skills. 95% of drivers are never going to have an accident worse than a minor parking-lot fender-bender. U.S. studies show that an overwhelming majority of long-gun firearm owners store and handle guns properly (the rate is a bit lower for handgun owners). Teach them right, and– if they are paying attention– they will learn.

Think about this. The top possible score is 100%, and good teaching by definition raises marks. This means that all marks should rise, and because there is a top end, there will be less variation.

Most importantly, good teaching works for all students. In the case of a comprehensible input class, all of the teaching is working through what Chomsky called the “universal grammar” mechanism. It is also restricted in vocab, less (or not) restricted in grammar, and the teacher keeps everything comprehensible and focuses on input. This is how everyone learns languages– by getting comprehensible input– so it ought to work well (tho not to exactly the same extent) on all learners.

Because there is an upper end of scores (100%), because we have no outliers, and because good teaching by definition reaches everyone, we will have reduced variation in scores in a comprehensible input class.

So, Mr Speaking’s response to his Defartment Head should be “low variation in scores is an indication of the quality of my work. If my work were done poorly, I would have greater variation, as well as lower marks.” High marks plus low variation = good teaching. How could it be otherwise?

In a grammar class, or a “communicative” class, you would expect much more variation in scores. This is because the teaching– which focuses on grammar and or output, and downplays input– does not follow language acquisition brain rules. How does this translate into greater score variation?

a) Some kids won’t get enough input– or the input won’t be comprehensible enough– and so they will pick up less. Now you have more lower scores.

b) Some kids will be OK with that. Some kids won’t, and they’ll do extra work to catch up. Result: variation in acquisition. Now, there will be a few high scores and more low ones.

c) Some kids will hate speaking and so will do poorly on the speaking assessments, which will increse variation.

d) Many kids don’t learn well from grammar teaching, so in a grammar-focused class, you’d expect one or two As, and a lot of lower marks.

e) if the teacher is into things like “self-reflection on one’s language skills and areas for growth” or such edubabble and the kids are supposed to go back and rework/redo assignments, things could go either way. If, for example, they re-do a dialogue from the start of the course at the end, they might– if the vocab has been recycled all year– do better. If, however, it’s the check your grammar stuff, you’d again expect variation: only a very few kids can do that, even if their language skills have grown during the year.

And, of course, there is the “grammar bandwidth” problem: any effort to focus on a specific aspect of grammar means that other areas suffer, because our conscious minds have limited capacity. A District colleague told me that, for Level 5 (grade 12) French, the kids self-edit portfolio work. They have an editing checklist– subject-verb agreement, adjective agreement, etc– and they are supposed to go and revise their work.

The problems with this, of course, are two: in their mad hunt for s-v errors, the kids will miss out on other stuff, and we know that little to no conscious learning makes it into long-term memory.

Some real-life examples of how good instruction narrows variation in scores:

At Half-Baked School, in the Scurvy School District (names have been changed to protect the guilty), TPRS teacher Alicia Rodriguez has Beginning Spanish. So does her Defartment Chair, Michelle Double-Barreled. When, at the end of the semester, they have to decide on awards– who is the best Beginning Spanish student?– Alicia has 16 kids getting an A, another 12 getting a B, and two betting a C+. None fail. Michelle Double-Barreled has one kid getting an A, a bunch of Bs and Cs, a couple of Ds, and a few failures.

What this means is, 16 of Alicia’s kids can

a) write 100 excellent words in Spanish in 5 min, on topics ranging from “describe yourself” to “describe [a picture].”

b) Write a 600-1000 word story in 45 min.

Both will have totally comprehensible, minor-errors-only Spanish.

Michelle Double-Barrelled, on the other hand, has one A. Her “A” kid can

a) do grammar stuff

b) write a 100-word paragraph on one of the topics from the text (e.g. shopping, eating in restaurant, sports s/he plays, family).

This will be not-bad Spanish.

Now, who’s doing a better job? Alicia has more kids doing more and better work. Michelle has a classic bell-curve distribution. According to Mr John Speaking’s Defartment Chair, Mrs Double-Barreled has a “normal” range of scores. Yet Alicia is clearly getting her kids to kick major butt. Hmm…

The point is, with appropriate and effective instruction– good Dog training, or good Spanish teaching– we are going to get a cluster of generally higher scores. Poor or no teaching might produce something like a bell curve.

So…what does T.P.R.S. and other comprehensible input teaching do for student outcomes?

In my class, T.P.R.S. did the following

a) all scores rose.

b) the difference between top and bottom scores (variation) decreased.

c) I.E.P. kids all passed.

d) First-year kids in second-year classes did about 85% as well as second year kids, despite having missed a year of class.

e) In terms of what the kids could actually do, it was light-years ahead of the communicative grammar grind. Kids at end of 2nd year were telling and writing 400-600 word stories in 3-5 verb tenses, in fluent and comprehensible (though not perfect) Spanish. Oral output was greater in quality and quantity too.

f) Nobody failed.

My colleague Leanda Monro (3rd year French via T.P.R.S.) explains what T.P.R.S. did in her classes:

“[I saw a ] huge change in overall motivation. I attribute this to a variety of well-grounded theories including “emotion precedes cognition” (John Dewey), Krashen’s affective filter, and the possible power of the 9^th type of intelligence, drama and creativity. (Fels, Gardener). There is a general feeling of excitement, curiosity, eagerness to speak French, incorporation of new vocabulary, spontaneous speech.

All but one student has an A or a B. The one student in the C range has significant learning challenges , and despite excellent attendance in all courses is failing both math and English. No one is failing.

[There was] far less variation. Overall, far greater success for all students. My contribution to the “Your scores are too high” comment is this: As educators we need to pose an important question: Are we trying to identify talent, or are we trying to nurture and foster talent? T.P.R.S. works to nurture and foster.”

And here are Steve Bruno’s comments on the effect of T.P.R.S. on his kids’ scores:

“I now get more As and Bs [than before]. A few C+s and very few Cs. Let’s put it this way, in the past I’ve had to send between 20 and 25 interims/I reports (total 7 classes); this year, so far, I’ve sent just THREE! Of these, two had poor attendance; the other one is an L.A.C. student who is taking a language for the first time (Gr. 9).

Marks are also closer together. Anyone who has been teaching C.I. will understand why this is the case: Students feel more confident, less stressed and simply love T.P.R.S. They don’t like doing drills, or memorizing grammar rules, etc.

Here’s anther example [of how comprehensible input has changed student behaviour]. Last year, I had a few students who on the day of an announced test (usually, with one week of warning) would suddenly become ill, and, or skip. Some of my L.A.C. students would have to write the test in a separate room. Others would show all sorts of anxiety as I handed out the tests. Many of these students would end up either failing the test or doing very poorly.

This year, I have the same students in my class, and the day of an unannounced test, they don’t have to go to another room, nobody complains and they just get down to it and, yes, they do quite well, thank you very much!“

OK, people…you want to report on how things are going with T.P.R.S.? Post some comments, or email.

	cstolztprs on Why I (Almost) Never Assess…
	The Easiest Game Eve… on What Is My Daily Intro Ro…
	The Easiest Game Eve… on No prep? No prob! 😄😄
	reynold on Why I (Almost) Never Assess…
	cstolztprs on Why I (Almost) Never Assess…

t.p.r.s. q&a

Building a better language teacher