assessment for learning

What should assessment and evaluation look like in the second-language classroom?

Numberz. Kids, parents, Adminz and teachers all want ’em. “What’s my mark?” asks Baninder. “How can Suzy bring her mark up?” asks Mrs Smith. “How do we get marked?” ask the keeners on Day 1.

Well, here we go, here are some ideas about formative assessment (seeing how people are doing during the learning process in order to guide instruction), and summative assessment, (a.k.a. evaluation), where we assign a Number to a kid’s performance. Here’s a picture:

IMG_3353-3

There are a few general principles:

A) we should never use norm-referenced (a.k.a. “curved”) grading, for reasons discussed here.

B) We should be doing criterion-referenced grading– i.e. there should be a rubric, or what have you, which clearly defines what needs to be done to get what mark. There are a bazillion rubrics for evaluating second-languages writing, speaking etc out there, from ACTFL guidelines to various State standards to things in texts– I won’t get into details, except to say that any evaluative tool should be making an attempt to assess language use holistically, and should not include things like “students will use _____ verbs and _____ grammar structures.”

C) we should not mix up evaluation (a.k.a. summative assessment = numbers) and formative assessment (feedback). We need to see where learners are, and tailor teaching to what they can/can not do. This is assessment and we do not “mark” it, as per Rick Wormelli’s (and others’) ideas about “assessment for learning” (start here if you havn’t heard of this, then google away).

D) All evaluation and assessment practices should be explained to students. My kids have criteria in their course outlines, and we “mark” a couple of sample stories a month or so into the course. We do not do this in order to show kids “how to improve their work”– that can’t work for 95% of kids because it’s conscious learning– but rather so they can feel how assessment and eval works, and feel included.

ASSESSMENT (formative evaluation)

Assessment: seeing how people are doing along the learning road in order to steer the class car.

In a comprehensible input classroom, assessment should primarily answer one question: “do the students understand what they are hearing/reading?”

During story asking, a teacher checks choral responses to do this. We can also ask individual kids flat out– “Johnny, what did I just say/ask?”– or we can do P.Q.A. (personalised questions and answers) where we ask students in class the same question we ask the actor. If our story has “the boy owned a horse,” we ask the actor “do you own a horse?” and he has to say “yes, I own a horse.” We might ask a few more questions– “Do you own a dinosaur?” and get an answer like “no, I do not own a dinosaur”– and then we ask our keener kids in class “do YOU, Mandeep, own a crocodile?”

If, as Blaine Ray says, we get strong responses from class, actors or individuals, they are understanding. If we get slow, wrong, weak, or no answers, we have to go back and clarify, because either

1.  they aren’t listening = no input = no acquisition, OR

2. they don’t understand = no comprehensible input = no acquisition

Ben Slavic has advocated using what he calls Jen’s Great Rubric (jGR) which basically evaluates how “tuned in” kids are. The rationale here can feel ambigious. On one hand, it’s the old “if it’s not for marks, kids won’t do the work” thing: instituted by teacher cos the work is so boring/hard that no sane kid would want to/ be able to do it, so marks = carrot and stick.  (But then maybe kids need that if The System prizes Numberz and Markzzz above all else).  On the other hand, if Johnny is failing because he is on his phone, zoned out, or otherwise disengaged, use of jGR is a great tool for the teacher to say to Johnny’s Mom “look– here is how he acts in class, i.e. he is not focused, and that is what his writing, speaking etc are weak.” Jury is out on this one but lotsa folks like it.

In terms of writing assessment, as Leanda Monro, Adriana Ramírez and a zillion others have pointed out, explicit feedback (in terms of grammar) does very little. Leanda told me last year that the best thing she could do with her French kids’ writing was to ask for more detail. I have found the same: I can blather/write at length about verb tenses, adjective agreement, etc, but the kids simply don’t learn from this (Krashen and many others have repeatedly shown that we cannot transfer conscious knowledge into acquisition).  What does work is writing something like ¿Cuántos hermanos tenía la chica?

I have also found that kids make consistent writing errors– e.g. this year it took them awhile to acquire quiero tener (“I want to have”)– and so after each story the top five errors get circled more next story.

For speaking: good input = good output. However, Leanda and a few other French (and Chinese) teachers I’ve met have said that a bit of pronunciation work is necessary. This is because– for English speakers– the sound patterns of these languages are easy enough to screw up that with their output– even if it’s rock-solid– seemingly minor pronunciation errors can totally throw it. Chinese, with its subtle tones, and French, with its various “ay” sounds– é, è, ê etc– are easier than, say Spanish for English speakers to botch.

Another thing we should not be doing is, administering assessment without changes in instruction.  The old pattern– present, practice, produce, quiz on Tues, test on Friday– is useless.  Following a text or test series or a set of DVDs, and dutifully collecting quiz samples, and expecting the kids to look their quizzes over and say “oh my, I clearly need to bone up on pronoun placement and the vocabulary for discusing French art” is a great strategy…for the kids whoa re getting 95% already.

So what should assessment look like?  It should

  • be comprehension-focused
  • be ongoing: during storyasking and reading, we check for comprehension
  • actually cause us to change what we are doing.  If kids don’t understand something, or make repeated errors, they need more input around that thing

EVALUATION (summative assessment)

One problem– err, I mean, opportunity— we have is, students are never at a fixed point in their acquisition. If they are getting a ton of good comprehensible input, they are acquiring (albeit not all at the same rate, or in the same way.  Max may be picking up a few nouns from the most recent story, while Arabella’s brain is soaking up pronouns, or whatever). Students also “acquire” something, forget it, re-learn it, etc, in an ongoing, up-and-down process…so a “snapshot” of their skills is really not very useful or accurate.

For this reason, in my humble opinion, a student’s mark should always be based on their most recent output or skills. . We should not be setting up “units” and assigning a mark per “unit.”

Why? Well, maybe Rorie finishes a “unit” on shopping for clothes, and she gets 60%, so goes back and re-reads dialogues or a story, or studies the grammar. And gets better as a result. Maybe also the teacher uses the shopping vocab for the rest of the year. But how does the teacher now assess Rorie? Say the teacher assesses via units (10% of the year per unit, over 6 units = 60% of year, plus final projects or exam(s) worth 40% of year, marks for everything evenly divided between speaking, listening, reading and writing), and by end of year Rorie rocks at shopping for clothes, do they discard her crappy shopping unit mark and give her only the final exam mark? If so, cool, but why then bother with unit marks in the first place?

If the answer to this is “accountability,” you have a problem: marks are being used as carrot/stick (read: work is boring and/or not worth doing).  I have argued that topical (sometimes called “thematic”) units are a bad idea– they tie grammar sets to vocab rules, they are boring, they are artificial, they overuse low-frequency vocabulary, they can present grammar that students are not ready to acquire– and they present assessment problems too.

Of course, parents, kids, Adminz, Headz will want to get a rough picture of how kids are doing, so it might not be all bad to have some kind of “rough progress” report.  At my school, we are piloting a program where the kids get an interim report that offers feedback– neither numbers, nor just “good, OK, bad”– which teachers can customise.  Mine gets the kids to evaluate themselves (to what extent do you listen for comprehension, ask for help, co-crerate stories, etc) and if I agree with their evaluations then that’s what goes home.

My evaluation system this year was super-simple. After a story was asked, and its extended version read, and we did Movietalk around its structures, the kids had to do two things:

A) a speedwrite (5 mins) where they had to describe either themselves or a picture. Their course goal was 100 good words in 5 min. Their “mark” was 1/2 grammar (on a rubric out of 3) and 1/2 wordcount (out of 100). For the first 6 speedwrites, they got a bonus (40, then 35, then 30 etc), and after that no bonus.

(Note: the grammar rubric is out of 3 but is weighted the same as wordcount. A kid that gets 100 words and a 2/3 for grammar gets 83% (100% + 66% / 2).)

For their first speedwrite, they typically wrote 25 words + 40-word bonus, so average mark was 65% for words and grammar (for the first) was 1/3 but very rapidly climbed to about 2.2-2.5/3.

B) Relaxed write. For this, they had to re-tell (in writing) the most recent story, but they had to change details and include dialogue, etc. I marked these using grammar (/3) and wordcount (starting at 200 and going up by 50 each time) with no bonus. Their wordcount marks also went steadily up and their grammar got better after first 2 stories.

So, they had an “ongoing” mark which they could always improve on. I told them that “this is a rough guide to how well you are doing. You can improve, or you can stop paying attention (or miss a bunch of class), and your mark can drop.”

I entered marks into the spreadsheet every time we did a post-story writing assessment, and I’d post a printout, and I made them keep their relaxed writes and freewrites. They all got better with time and it was cool for them to “see” progress:  grammar marks were low for first 2 stories, then went up, and wordcounts steadily climbed.

For finals– with beginners– it was simple. They had two 5-min speedwrites (/100, and with an /3 grammar mark), one 45-min story (/800, with /3 grammar mark). These were combined. They had one listening assessment– dictation, where they listened, wrote and translated– and their reading assessment was, go back to stories we’d done and answer questions. Final mark: 100% based on final exam = 1/3 writing, 1/3 reading and 1/3 listening. Also, any kid who wants to re-do their exam can do that no problem.

This system was almost as good as it could be. The kids knew what they had to do, the work was easy, there were no surprises, and even the weakest ones were able to do well (writing functional 300-400 word stories in 3 verb tenses including dialogue), while at the top end Shayla, Manpreet, Khubaib and Jaskarn pumped out amazingly good 600-800 word stories. (Interestingly, I had equal numbers of strong (and weak) students of both genders).

(The only things I am going to change next year are

  • I am going to use a more complex rubric for marking final writing. This is mainly because the one I used this year does not adequately distinguish complexity from simplicity. Some kids write a sentence like Juan quería las chicas guapas (“John liked pretty girls),” while others write Juan quería las chicas guapas que tenían perros azules (John liked pretty girls who had blue dogs).  In both cases, good Spanish, but the second kid is clearly a notch up.)
  • I am going to give them one text they have not yet seen (for reading) and get them to answer comprehension questions on that

With my 2nd years, I’ll do a speaking assessment (3-min interview) and I’ll also do a couple of culture projects, plus Adriana’s movie idea.

So…what should evaluation look like? It should be

— holistic
— based on doing what the kids have done and reading what they have read during the course (no “gotcha” surprises).
— focused on interaction with meaningful whole language (no grammar testing)
— a picture of the kids at their best: at the end of the course, when they have had a TON of good comprehensible input

Progress report: Ramírez Book at 16 weeks

I’ve been using my colleague Adriana Ramírez’ Learning Spanish With Comprehensible Input Theough Stroytelling. We are now 16 weeks into the semester and finals will be in 2 days. We just finished asking and reading the 8th story– a story which I asked in present tense but whose reading is in full mixed past tenses (totally unsheltered grammar).

Here are the stats from most recent exam:

A) speedwrite: wordcount down slightly to around 75 and grammar down slightly to around 2.2/3.

B) relaxed write– retell most recent story, with variation– wordcount average around 450 and grammar around 2.5

DISCUSSION

First, I was anticipating a total mess with the verbs, as all stories till now have been in present. However, this was not nearly as bad as I had expected. The slower processors had problems oscillating between tenses, but 2/3 of kids did fine. The dialogues in the stories were fine too– most dialogue is in present tense anyway so no big changes.

This year because of the strike, our semester was two weeks shorter. I went reasonably quickly but I think with two more weeks I could have gotten through at least one more story (and more reading) = more exposure to past tenses and other grammar stuff.

Second, wordcount and grammar for the speedwrite (they had 5 mins to describe a picture of a girl waking up to an alarm clock) went down. This is because a picture is somewhat more ambiguous than a story and I felt like for a lot of kids the new vocab (past tense verbs) was freshest in their minds and so they just kind of threw it in there. I am not super-worried about this, as with time– and a lot of pop-ups– things will clarify.

Third, wordcount stopped going up for relaxed writes. This was I think mostly to do with processing with new grammar (past tense). There were a lot of crossed-out verbs etc on the faster processors’ papers– their conscious minds were kicking in– while with the slower ones there were more mistakes.

The book has worked pretty well. The stories are generally good (though I can’t make some of them work e.g. the Hawaian Genie…which ironically Adriana tells me is her kids’ favorite– go figure) and the vocab is nicely organised to recycle. Adriana tells me the next step is to publish a second edition with past tense (and other verb tense) versions but that’s a ways off.

Overall the book is pretty good. Unlike the Blaine Ray books, this one seems better organised (Look, I Can Talk have great ideas but I have not been able to make them work for me). Gaab’s Cuéntame series is also good but to me the exercsises seem overkill (However, this is what a lot of beginning TPRS teachers need– structure).

Honestly the only thing I think Adriana’s book needs is slightly more variety (and dialogue) in readings. To me, the basic story and the extended reading are too similar…but this is also the book’s strength: it recycles vocab. I would have included more dialogue (in written-out form) but Adriana ran into limits like space, and printing costs. These are minor quibbles– it’s a great program and she is working on her Level 2 book. If ppl wanna order it, you can contact Adriana via Twitter, where she is @veganadri, or get it off Amazon.

Next semester I have a Spanish 1 and 2 split and so I am going to do Adriana’s stories but in full mix (totally unsheltered grammar) from Day 1. I am doing this because

A) I want to see what works best: full unsheltered grammar, or sheltered grammar

B) my 2nd years have already had past tense (and subjunctive) exposure

C) If I have a focused curriculum that restricts vocab and I do lots of pop-ups, and we do a lot of reading, and I make a real effort to do a TON of PQA and actor questions, I think my last year’s problem– verb tense muddles– should be lesser.

This is te great pleasure of T.P.R.S.: I can never step twice into the same story.

What grades should kids get? Notes on evaluation for the mathematically-challenged.

Here is a part of a post from Ben’s.  A teacher– let’s call him Mr John Speaking– who uses T.P.R.S. in their language class writes:

“I was told by a Defartment Chair a few weeks ago that my grades were too high across the board (all 90s/100s) and that I needed more of a range for each assessment. Two weeks later I had not fixed this “problem” and this same Defartment Chair pulled me out of class and proceeded to tell me, referencing gradebook printouts for all my classes, that these high grades “tell me there is not enough rigor in your class, or that you’re not really grading these assessments.” After this accusation, this Defartment Chair told me I was “brought on board [for a maternity leave replacement] in the hopes of being able to keep me, but that based on what he’d seen the past few weeks, I’m honestly not tenure track material.”

Obviously, Mr John Speaking’s Defartment Chair is an idiot, but, as idiots do, he does us a favour:  he brings up things worth thinking about.

There are two issues here:

a) Should– or do— student scores follow any predictable distribution?  I.e., should there be– or are there–a set percentage of kids in a class who get As, Bs, Cs, Ds and Fs?

b) How do you know when scores are “too low” or “too high”?

Today’s question: what grades should students get?

First, a simple, math idiot’s detour into grading systems and stats.  The math idiot is me.  Hate stats?  Bad at math? Read on!  If I can get it, anyone can get it!

It is important to note that there are basically two kinds of grading systems. We have criterion-referenced grading and curved (norm-referenced) grading.

First, we have criterion-referenced grading.  This is, we have a standard– to get an A, a student does X.  To get a B, a student does Y, etc.  For example, we want to see what our Samoyed Dogs’ fetching skills are and assign them fetching marks. Here is our Stick Fetching Rubric:

A:  the dog runs directly and quickly to the thrown stick, picks it up, brings it back to its owner, and drops it at owner’s feet.

B: the dog dawdles on its way to the stick, plays with it, dawdles on the way back, and doesn’t drop it until asked.

C: the dog takes seemingly forever to find the stick, bring it back, and refuses to drop it.

So we take our pack of five Samoyed Dogs, and we test them on their retrieval skills.  Max, who is a total idiot, can’t find the stick forever, then visits everyone else in the park, then poos, then brings the stick an hour later but won’t drop it because, hell, wrestling with owner is more fun.  Samba dutifully retrieves and drops.  Rorie is a total diva and prances around the park before bringing the stick back.  Arabella is like her mother, Rorie, but won’t drop the stick.  Sky, who is so old he can remember when dinosaurs walked the Earth, goes straight there, gets the stick, and slowly trudges back.  So we have one A, one B, one C, one C- (Max– we mercy passed him) and one A- (Sky, cos he’s good and focused, but slow).

Here are our Samoyeds:

Samoyeds

Now note–

1. Under this scheme, we could theoretically get five As (if all the Dogs were like Samba), or five Fs (if everybody was as dumb and lovable as Max).  We could actually get pretty much any set of grades at all.

2.  The Samoyed is a notoriously hard-to-train Dog.  These results are from untrained Samoyeds.  But suppose we trained them?  We used food, praise, hand signals etc etc to get them to fetch better and we did lots of practice.  Now, Sky is faster, Rorie and Arabella don’t prance around the park, and even silly Max can find the stick and bring it.  In other words, all the scores went up, and because there is an upper limit– what Samba does– and nobody is as bad as Max was at fetching, the scores are now clustered closer together.

The new scores, post-training, are:

Sky and Samba: A

Rorie, Max and Arabella: B

Variation, in other words, has been reduced.

3.  Suppose we wanted– for whatever reason– to lower their scores.  So, we play fetch, but we coat the sticks in a nasty mix of chocolate and chili powder, so that whenever the Dogs get near them, they get itchy noses, and very sick if they eat them.  The Dogs stop wanting to fetch our sticks.  Some of them will dutifully do it (e.g. Samba), but they aren’t idiots, and so most of them will decide to forget or ignore their training.

4.  Also note who we don’t have in our Dog Pool:  Labrador Retrievers (the genius of the fetching world), and three-legged Samoyeds.  There’s no Labs because they are three orders of magnitude better than Samoyeds at fetch, and we don’t have three-legged Samoyeds because, well, they can’t run.

In other words, we could reasonably get any mix of scores, and we could improve the scores, or we could– theoretically– lower them.  Also, we don’t have any Einstein-level retrievers or, uhh, “challenged” retreivers– there are no “outliers.”

Now, let’s look at “bell curve” (a.k.a. norm-referenced) grading.  In this case, we decide– in advance— how many of each score we want to assign.  We don’t want any random number of As or Fs or whatever– we want one A, one F, etc.  We want the scores to fit into a bell curve, which looks like this:

bell curve

We are saying “we want a certain # of As, Bs, Cs, Ds and Fs.”  Now, we have a problem.  In our above stick fetching example, we got an A, an A-, a B, a C and a C-.  We have no Ds or Fs, because all of the Dogs could perform.  None of them were totally useless.  (After doing some training, we would get two As (Samba, Sky) and three Bs (Rorie, Max and Arabella).  But if we have decided to bell curve, or norm reference, our scores, we must “force” them to fit this distribution.

So Samba gets an A, Sky gets a B, Rorie gets a C, Arabella gets a D, and Max fails.

Now, why would anyone do this?  The answer is simple: norm referencing is only a way to sort students into ranks where the only thing that matters is where each person ranks in regard to others.  We are not interested in being able to say “in reference to criteria ____, Max ranks at C.”  All we want to do here is to say where everyone is on the marks ladder compared to everyone else.

Universities, law schools, etc sometimes do this, because they have to sort students into ranks for admissions purposes, get into the next level qualifiers, etc etc.  For example, law firm Homo Hic Ebrius Est goes to U.B.C. and has 100 students from which to hire their summer slav– err, articling students.  If they can see bell-curved scores, they can immediately decide to not interview the bottom ___ % of the group, etc.  Which U.B.C. engineers get into second year Engineering?  Why, the top 40% of first-year Engineering students, of course!

Now I am pretty sure you can see the problem with norm referencing:  when we norm reference (bell curve), we don’t necessarily say anything about what students actually know/can do.  In the engineering example, every student could theoretically fail…but the people with the highest marks (say between 40 and 45 per cent) would still be the top ones and get moved on.  In the law example, probably 95% of the students are doing very well, yet a lot of them won’t be considered for hire. Often, bell-curves generate absurd results.  For example, with the law students, you could have an overall mark of 75% (which is pretty good) but be ranked at the bottom of the class.

So where does the idea for norm referencing (“bell curving”) sudent scores come from?  Simple: the idea that scores should  disitribute along bell-curve line comes from a set of wrong assumptions about learning and about “nature.”  In Nature, lots of numbers are distributed along bell-curve lines.  For example, take the height of, say, adult men living in Vancouver.  There will be a massive cluster who within two inches of 5’11” (from 5’9″ to 6’1″).  There will be a smaller # who are 5’6″ to 5’8″ (and also who are 6’1.5″ to 6’3″).  There will be an even smaller number who are shorter than 5’6″ and taller than 6’3″.  Get it?  If you graphed their heights, you’d get a bell curve like this:

bc2

If you graphed adult women, you’d also get a bell curve, but it would be “lower” as women (as dating websites tell us) are generally shorter than men.

Now– pay attention, this is where we gotta really focus– there are THREE THINGS WE HAVE TO REMEMBER ABOUT BELL CURVES

a)  Bell curve distributions only happen when we have an absolutely massive set of numbers.  If you looked at five men, they might all be the same height, short, tall, mixed, whatever (i.e. you could get any curveat all). But when you up your sampling to a thousand, a bell curve emerges.

b) Bell curve distributions only happen when the sample is completely random.  In other words, if you sampled only elderly Chinese-born Chinese men (who are generally shorter than their Caucasian counterparts), the curve would look flatter and the left end would be higher.  If you didn’t include elderly Chinese men, the curve would look “pointier” and the left end would be smaller. A bell curve emerges when we include all adult men in Vancouver.  If you “edit out” anyone, or any group, from the sample, the distribution skews.

c)  Bell curves raise one student’s mark at the expense of another’s.  When we trained our Samoyed Dogs, then marked them on the Stick Fetching Rubric, we got three As and two Bs.  When we convert this into a curve, however, what happens is, each point on the curve can only have one Dog on it.  Or, to put it another way, each Dog has a different mark, no matter how well they actually do.  So, our three As and two Bs become an A, a B, a C, a D and an F.  If Rorie gets a B, that automatically (for math-geek reasons) means that Max will get a different mark, even if they are actually equally skilled.

As you can see in (c), bell curves are absolutely the wrong thing to do with student marks.

And now we can address the issues that Mr John Speaking’s Defartment Head brings up.  Mr Defartment Head seems to think that there are too many high marks, and not enough variation within the marks.

First, there is no way one class– even of 35 kids– has enough members to form an adequate sample size for a bell-curve distribution.  If Mr Defartment Head thinks, “by golly, if that damned Mr John Speaking were teaching rigorously, we’d have only a few As, a few Ds, and far more Bs and Cs,” he’s got it dead wrong: there aren’t enough kids to make that distribution  possible.  Now, it could happen, but it certainly doesn’t have to happen.

Second, Mr John Speaking does not have a statistically random selection of kids in his class.  First, he probably doesn’t have any kids with special challenges (e.g. severe autism, super-low I.Q., deaf, etc etc).  BOOM!– there goes the left side of the bell curve and up go the scores.  He probably also doesn’t have Baby Einstein or Baby Curie in his class– those kids are in the gifted program, or they’ve dropped out and started hi-techs in Silicon Valley.  BOOM!– there goes the right side of your curve.  He’ll still have a distribution, and it could be vaguely bell-like, but it sure won’t be a classic bell curve.

Or he could have something totally different.  Let’s say in 4th block there are zero shop classes, and zero Advanced Placement calculus classes.  All of the kids who take A.P. calculus and shop– and who also take Spanish– therefore get put in Mr Speaking’s 4th block Spanish class.  So we now have fifteen totally non-academic kids, and fifteen college-bound egg-heads.  Mr Speaking, if he used poor methods, could get a double peaked curve:  a bunch of scores clustering in the C range, and another punch in the A, with fewer Bs and Ds.

Third, instruction can– and does– make a massive difference in scores. Remember what happened when we trained our Samoyeds to give them mad fetching skillz, yo? Every Dog got better. If Mr Speaking gave the kids a text, said “here, learn it yourself,” then put his feet up and did Sudoku on his phone or read the newspaper for a year (I have a T.O.C. who comes in and literally does this), his kids would basically suck at the language (our curve just sank down).  On the other hand, if he used excellent methods, his kids’ scores would rise (curve goes up).  Or, he is awesome, but gets sick, and misses half the year, and his substitute is useless, so his kids’ scores come out average.  Or, he sucks, gets sick, and for half the year his kids have Blaine Ray teaching them Spanish, so, again, his kids’ scores are average:  Blaine giveth, and Speaking taketh away.

“Fine,” says the learned Defartment Chair, “Mr John Speaking is a great teacher, and obviously his students’ scores are high as a result of his great teaching, but there should still be a greater range of scores in his class.”

To this, we say a few  things

a)  How do we know what the “right” variability of scores is?  The answer:  there is no way of knowing without doing various kinds of statistical comparisons.  This is because it’s possible that Mr Speaking has a bunch of geniuses in his class.  Or, wait, maybe they just love him (or Spanish) and so all work their butts off.  No, no, maybe they are all exactly the same in IQ?  No, that’s not it.  Perhaps the weak ones get extra tutoring to make up for their weakness. Unless you are prepared to do– and have the data for– something called regression squares analysis, you are not even going to have the faintest idea about what the scores “should” be.

b)  score variability has been reduced with effective teaching.  There are zillions of real-world examples of where appropriate, specific instruction reduces the variation in performance. Any kid speaks their native language quite well.  Sure, some kids have more vocab than others, but no two Bengali (or English-speaking) ten year olds are significantly different in their basic speaking skills.  95% of drivers are never going to have an accident worse than a minor parking-lot fender-bender.  U.S. studies show that an overwhelming majority of long-gun firearm owners store and handle guns properly (the rate is a bit lower for handgun owners). Teach them right, and– if they are paying attention– they will learn.

Think about this.  The top possible score is 100%, and good teaching by definition raises marks.  This means that all marks should rise, and because there is a top end, there will be less variation.

Most importantly,  good teaching works for all students. In the case of a comprehensible input class, all of the teaching is working through what Chomsky called the “universal grammar” mechanism.  It is also restricted in vocab, less (or not) restricted in grammar, and the teacher keeps everything comprehensible and focuses on input.  This is how everyone learns languages– by getting comprehensible input– so it ought to work well (tho not to exactly the same extent) on all learners.

Because there is an upper end of scores (100%), because we have no outliers, and because good teaching by definition reaches everyone, we will have reduced variation in scores in a comprehensible input class.

So, Mr Speaking’s response to his Defartment Head should be “low variation in scores is an indication of the quality of my work. If my work were done poorly, I would have greater variation, as well as lower marks.” High marks plus low variation = good teaching. How could it be otherwise?

In a grammar class, or a “communicative” class, you would expect much more variation in scores.  This is because the teaching– which focuses on grammar and or output, and downplays input– does not follow language acquisition brain rules.  How does this translate into greater score variation?

a) Some kids won’t get enough input– or the input won’t be comprehensible enough– and so they will pick up less.  Now you have more lower scores.

b) Some kids will be OK with that.  Some kids won’t, and they’ll do extra work to catch up.  Result: variation in acquisition.  Now, there will be a few high scores and more low ones.

c) Some kids will hate speaking and so will do poorly on the speaking assessments, which will increse variation.

d) Many kids don’t learn well from grammar teaching, so in a grammar-focused class, you’d expect one or two As, and a lot of lower marks.

e) if the teacher is into things like “self-reflection on one’s language skills and areas for growth” or such edubabble and the kids are supposed to go back and rework/redo assignments, things could go either way.  If, for example, they re-do a dialogue from the start of the course at the end, they might– if the vocab has been recycled all year– do better.  If, however, it’s the check your grammar stuff, you’d again expect variation: only a very few kids can do that, even if their language skills have grown during the year.

And, of course, there is the “grammar bandwidth” problem: any effort to focus on a specific aspect of grammar means that other areas suffer, because our conscious minds have limited capacity. A District colleague told me that, for Level 5 (grade 12) French, the kids self-edit portfolio work. They have an editing checklist– subject-verb agreement, adjective agreement, etc– and they are supposed to go and revise their work.

The problems with this, of course, are two: in their mad hunt for s-v errors, the kids will miss out on other stuff, and we know that little to no conscious learning makes it into long-term memory.

Some real-life examples of how good instruction narrows variation  in scores:

At Half-Baked School, in the Scurvy School District (names have been changed to protect the guilty), TPRS teacher Alicia Rodriguez has Beginning Spanish.  So does her Defartment Chair, Michelle Double-Barreled.  When, at the end of the semester, they have to decide on awards– who is the best Beginning Spanish student?– Alicia has 16 kids getting an A, another 12 getting a B, and two betting a C+.  None fail.  Michelle Double-Barreled has one kid getting an A, a bunch of Bs and Cs, a couple of Ds, and a few failures.

What this means is, 16 of Alicia’s kids can

a) write 100 excellent words in Spanish in 5 min, on topics ranging from “describe yourself” to “describe [a picture].”

b) Write a 600-1000 word story in 45 min.

Both will have totally comprehensible, minor-errors-only Spanish.

Michelle Double-Barrelled, on the other hand, has one A.  Her “A” kid can

a) do grammar stuff

b) write a 100-word paragraph on one of the topics from the text (e.g. shopping, eating in restaurant, sports s/he plays, family).

This will be not-bad Spanish.

Now, who’s doing a better job?  Alicia has more kids doing more and better work.  Michelle has a classic bell-curve distribution.  According to Mr John Speaking’s Defartment Chair, Mrs Double-Barreled has a “normal” range of scores.  Yet Alicia is clearly getting her kids to kick major butt. Hmm…

The point is, with appropriate and effective instruction– good Dog training, or good Spanish teaching– we are going to get a cluster of generally higher scores.  Poor or no teaching might produce something like a bell curve.

So…what does T.P.R.S. and other comprehensible input teaching do for student outcomes?

In my class, T.P.R.S. did the following

a) all scores rose.

b) the difference between top and bottom scores (variation) decreased.

c) I.E.P. kids all passed.

d)  First-year kids in second-year classes did about 85% as well as second year kids, despite having missed a year of class.

e) In terms of what the kids could actually do, it was light-years ahead of the communicative grammar grind.  Kids at end of 2nd year were telling and writing 400-600 word stories in 3-5 verb tenses, in fluent and comprehensible (though not perfect) Spanish.  Oral output was greater in quality and quantity too.

f) Nobody failed.

My colleague Leanda Monro (3rd year French via T.P.R.S.) explains what T.P.R.S. did in her classes:

“[I saw a ] huge change in overall motivation. I attribute this to a variety of well-grounded theories including “emotion precedes cognition” (John Dewey), Krashen’s affective filter, and the possible power of the 9th type of intelligence, drama and creativity. (Fels, Gardener).   There is a  general feeling of excitement, curiosity, eagerness to speak French, incorporation of new vocabulary, spontaneous speech.

All but one student has an A or a B. The one student in the C range has significant learning challenges , and despite excellent attendance in all courses is failing both math and English. No one is failing.

[There was] far less variation. Overall, far greater success for all students. My contribution to the “Your scores are too high” comment is this: As educators we need to pose an important question:  Are we trying to identify talent, or are we trying to nurture and foster talent?  T.P.R.S. works to nurture and foster.”

And here are Steve Bruno’s comments on the effect of T.P.R.S. on his kids’ scores:

“I now get more  As and Bs [than before]. A few C+s and very few Cs. Let’s put it this way, in the past I’ve had to send between 20 and 25 interims/I reports (total 7 classes); this year, so far, I’ve sent just THREE! Of these, two had poor attendance; the other one is an L.A.C. student who is taking a language for the first time (Gr. 9).

Marks are also closer together.  Anyone who has been teaching C.I. will understand why this is the case:  Students feel more confident, less stressed and simply love T.P.R.S. They don’t like doing drills, or memorizing grammar rules, etc.

Here’s anther example [of how comprehensible input has changed student behaviour].  Last year, I had a few students who on the day of an announced test (usually, with one week of warning) would suddenly become ill, and, or skip. Some of my L.A.C. students would have to write the test in a separate room. Others would show all sorts of anxiety as I handed out the tests. Many of these students would end up either failing the test or doing very poorly.

This year, I have the same students in my class, and the day of an unannounced test, they don’t have to go to another room, nobody complains and they just get down to it and, yes, they do quite well, thank you very much!

OK, people…you want to report on how things are going with T.P.R.S.? Post some comments, or email.

What is T.P.R.S.’ Sequence of Instruction?

Now that I have been using Adriana Ramírez’ Learning Spanish With Comprehensible Input Storytelling for 10 weeks I thought I’d show how I use the text. At any point, if there is extra time, or we are bored, we take out our novel– Berto y sus Buenas Ideas, or whatever, and we read– guided and questioned by me– for 5-15 min.

Adriana’s teacher book has the historia básica– the story version we ask– and the preguntas personalizadas, along with a short list of the grammar “points” introduced in each story.

A) Photocopy the historia básica and the preguntas personalizadas and give the kids each a copy.  I give my kids the historia básica in photocopy form because I want them to re-read a simple version of the story.  The historia extendida and the comprehension questions are in the student book.

B) establish meaning– have kids write down Spanish words and English meanings in the student books.

C) ask the story, sticking fairly close to the historia básica. Add 1-2 parallel characters. Have 1-2 actors for the main story and have the parallel characters sit at their desks (with one prop each) to identify them. The beginning is always establishing lots of details about the characters.

D) Personalised questions and answers (PQA): ask the faster processors in class (just regular kids sitting there) the questions you ask the actors. Do this AFTER each actor has said his/her answer. E.g. If you narrate “the boy wants to speak Spanish,” ask the actor “do you want to speak Spanish?” Then ask the kids “do YOU want to speak ____?” For this I use whatever I ask actors plus the preguntas personalizadas in the teacher’s book (the kids also have copies of these).

E) When done, ask a thousand comp questions. Does the boy want to own a Ferrari? Does the girl want 10 blue cats or 20? I read sentences from the historia básica aloud and ask questions, and I also throw a TON of PQA into this.  I will generally do the comp questions around the historia básica  that I’ve copied and given them– I have found that another, very simple, re-reading of more or less exactly what was asked helps a lot.

F) Spend one block (75 min) reading the historia extendida aloud, asking zillions of questions, doing PQA, etc.  This takes awhile, as the historia extendida typically has a bunch of new vocab (typically 15 or so words not in the asked/básica version of the story).

G) Do ping-pong reading of the historia extendida for about 15 min. Then give them 20 min to write the answers to the comprehension questions in the student book. I collect these and mark 3 questions/student for comprehension.

H) at this point, Adriana gives them one period to practise and perform the story– changing only names and places– but I have ditched this because the kids give me crappy output and retells do not seem to boost acquisition. Adriana is convinced it works– it definitely works for her and her kids– but I have not figured this out yet.  I’ll keep ppl posted as hopefully Adriana can walk me through this for the 37th time (I am not a smurt guyy).

This is where I do MovieTalk and PictureTalk (Ben Slavic’s “Look and Discuss”). I will picturetalk 1-3 images that support the vocab from our story, and I’ll movietalk one video that does the same.

I) for homework, they have to either draw a 12-panel comic of the story, or copy and translate the story (the historia extendida). This is “deep reading” that really focuses them in on the story.

J) I sometimes “re-ask” the basic story super-quickly at some point (much less circling).

K) Test. First, speedwrite: they must write as many words as they can in 5 min. The topic will be either 1. describe yourself or 2. describe a picture I put on the overhead (this picture will be of a person who has possessions or characteristics of a character in the story).

Then we have a 5-min brain break.

Second, relaxed write. They have 35 min to re-write the story. They need 2 characters minimum, 4 dialogues central to the story, and they have to “twist” the story after our 3rd story. For the first two, they can just re-write the story. After that, they have to substantially change the story details.

L) I then give them the vocab etc (see A) for our next story.

Test and introducing new vocab takes 1 block.

NOTES:

1. If the kids like whatever we are doing, or reading,nand/or PQA takes off, I’ll spend as long as I can on this. If they are in the target language, and they understand, and there are zillions of reps, they are learning. Remember what Papa Blaine said: “My goal is to never finish a story.”

2. Another AWESOME thing to throw in are fake texts– easy to generate and personalise/customise for each story– kids like the visuals and you get loads more reps on the dialogue (this is the hardest thing to do– reps on dialogue). Just google “fake text generator” or try this one for iPhone texts.

3. Each class begins with me circling date, day, month, time and weather for about 1 min.  This means that by end of five-month semester kids will know all weather, #s 1-30, days of the week, etc.

4. It’s crucially important to remember that you must do what works for you and your kids. Adriana and I and Natalia and everyone I know who uses this book (and T.P.R.S. in general) uses it differently. T.P.R.S. itself is now different than what Blaine Ray created– he himself continues to modify the method– so do your thing. As I told Adriana, her excellent book is a platform from which Spanish teaching launches.  Adriana does retells; I don’t; both of us do assessment slightly differently, etc.

Ok there you have it, what I do.

How well is Adriana Ramírez’ book working so far?

This year I decided to go in for a more classical, purely story-based T.P.R.S. than what I began with– what Ben Slavic described as “the freewheelin’ c.i.” I am using my colleague Adriana Ramírez’ Teaching Spanish Through Comprehesible Input Storytelling text. This is a set of 16 stories. You get a vocab list, a basic story, an extended reading, story comprehension questions and personalised questions. The thing was loosely designed to “piggyback” on Avancemos, the Spanish text our District adopted, but it stands alone too.

Today’s question: how well is Adriana’s book working?

1) Great.

2) I am almost done my 4th story– “Cambio de Pelo”– and these are my results:

a) for speedwrites (“write as many words as you can in 5 min”) I am alternating topics. For even-numbered stories, the speedwrite assignment is “describe yourself.” For the odd-numbered stories, the assignment is “describe a picture on the overhead” (Picture will have something to do with just-asked story).

Word count averages for speedwrites as follows:

— story 1 25 words + 45-word bonus = 70% average

— story 2 43 words + 40-word bonus = 83% average

— story 3 50 words + 35-word bonus = 85% average

In terms of grammar, every kid– except those who miss 2-3 classes– is getting at least 2/3 and over 1/2 are getting 3/3. Out of 30 kids, only 3 have “bombed” in terms of grammar and in each case their subsequent mark went way up. I.e. a kid who misses a bunch of classes, does the test, then bombs, will do much better later on (on the test after next story) because the stories recycle all the grammar and vocab.

Word count averages for “relaxed writes” (“rewrite the story, or modify it, or make up your own, and include 2 main characters and at least 2 dialogues”)

— story 1 ~80 words (they totally sucked– average grammar mark 1/3)

— story 2 ~130 words (much better– average grammar mark 2/3)

— story 3 ~ 180 words (better again– class evenly split between 2/3 and 3/3 for grammar mark)

Oral output:

The system for “teaching” kids to talk in T.P.R.S.– a.k.a. P.Q.A. (personalised questions and answers) is super-simple: you basically ask members of the class the questions you ask your actors. So, in the first story, you ask your actor “what is your name?” and s/he says “My name is ____.” Because s/he doesn’t know any Spanish, you write it on the board and they can just read off board. You then ask them “is your name

?” and they say “No, my name is _____.” You then ask your parallel character(s) the same question(s). Then– after the audience has heard it a bunch of times from actors– you ask the members of the class, starting with the keeners, the same question. Initially, the keeners will be able to spit it our right away in sentence form, while other kids will just say “John.”

After 5 weeks x 5 classes/week = 25 classes, 4/5 of the kids can now unhesitatingly and fluently answer these questions:

— what is your name? how old are you? where do you live? are you a [boy, girl, cat…]? Are you [tall, short, crazy…]?

— do you like _____? [about 15 verbs and 15 nouns to choose from]

— what’s the weather, day, date?

— what are you like? (i.e. describe yourself)

— do you prefer ___ or ___?

— do you have ____?

The other 1/5 of class (the slower-acquirers) ALL understand the questions, and all can say something— even if it’s just one word– that makes sense. E.g. “What’s the weather like?” — “Cold.”

3) Why is it working, and what would I change?

First, it’s working cos it restricts (shelters) vocab, and because the extended reading closely mirrors the story asked. Second, it restricts vocab overall. I have done a rough count and it comes out to the kids get about 3 new words/day on average. Third, the comp questions force re-reading, and fourth, I am liking Adriana’s comic idea.

Update on the comic: for the comic, after we have done the extended reading (teacher guided, and ping-pong), the kids have to create a 12-panel comic that illustrates the story. It has to look awesome– clip art, etc fine– with colour, each panel must have at least one sentence, and the comic must include all dialogue. This time, I also added a translation option: copy the story– by hand– then translate underneath in different colour, then leave a blank line (to keep it neat) and indent all dialogue. I am gonna see how the translation works, but the comic rationale is, it’s deep reading: kids have to re-read, select, and illustrate (read: concise focus). Adriana says it works best for the laggard boys and I have to agree.

My changes: First, My kids are 90% Indian, so English is often their 2nd language, and almost none of them hear English at home. Our kids read, and are literate, but lack some of the linguistic mental infrastructure that Adriana’s (rich, white and Asian, educated) kids do. So, they need MUCH more reading practice than Adriana’s, so I make them read BOTH the basic script– the story I ask, by photocopying it and handing it out– AND the extended one in Adriana’s book. Second, I am varying the speedwrites (5 mins) as noted above. Third, my kids don’t always get the comprehension questions, so I have to go through them. E.g. on the last story, one question was ¿Dónde vive el chico? (Where does the boy live?) and the kids all answered with “Vivo en Colombia” (I live in Colombia). Fourth, the retells don’t work. I am getting junky output from the kids so I am putting the kaibosh on retells for awhile until I figure out a better way to do this.

Anyway, overall, the program is working well and I am both recommending it and gonna stick with it. If ppl want to try it, email Adriana (ramirez_a (attt) surreyschools (dottt) ca or hit her up on twitter: @veganadri

How do I mark writing? Blaine Ray’s ideas, slightly modified

Stephen B.– who after twenty years of traditional grammar teaching jumped headfirst into C.I.; how bad-assed is THAT?; writes

“Anyways, I had a question about grading speed/timed writings. I know Blaine says one point per word and he talked about scaling and not to mark it for accuracy. However, what does one do in this situtation: for example, if after a 5 minute speed write one students writes 85 words but is all over the place and makes several grammatical and spelling errors, and another student only writes 60 words but it is almost perfect, how can I give the former student a higher mark?”

Here is how I do it (with many suggestions from Adriana Ramírez, whose Teaching Spanish Through Comprehensible Input text I am using this year). First, classic TPRS in a story cycle, with movietalk and picturetalk:

A) story cycle: establish meaning, ask story (with a few parallel characters), review, retell, read a couple of versions of story

B) have the kids create a comic of the story. Story must be “complete” but obviously not everything can be put into comic. Story must be coloured, look awesome, etc (clip art fine). Each panel must

  • have at least one Spanish sentence
  • have perfect alignment between Spanish and pictures
  • where there is no dialogue, “thought bubbles” in the first person
  • where appropriate, have dialogue

This will make the kids read, choose sentences, and clarify meaning via illustrating.

C) movietalk and picturetalk to support story structures (e.g. if you taught “wants,” movietalk and picturetalk a video where a person wants something)

Then, for assessment, I am doing the following:

1) when the “story cycle” (A-C above) is done, kids will do a five-minute speedwrite and a forty-minute relaxed write.

Their first speedwrite topic will be “describe yourself.”

The speedwrite is evaluated in 2 ways:

First, wordcount. Kids count the # of words in their composition (tell them no lists, or, if they want a list, they must describe all things in the list). End of year goal: 100 good words in 5 min. For their first speedwrite, they get a 40-word bonus. So if they write 30 words, their wordcount score is 70/100.

Second, they get a grammar mark out of 3, thus:
1– it’s full of mistakes and largely incomprehensible
2– it’s mostly comprehensible but has some “whaaat?” moments and “feels” junky
3– it’s fully comprehensible, has no “whaaat?” moments, and “feels” fluid and solid (but not necessarily perfect)

Multiply their grammar mark by 33.3 and they have a grammar mark /100.

Now, average the two marks and they have a spedwrite percentage.

For the 40-min relaxed write, I tell them “either retell the story, or write your own, and you must have 3 dialogues, but put changes into your version of the story.” The goal for the year: write an 800-word story in 40 min. For their first story, I’ll expect 70-150 words. I will assign a wordcount mark out of, say, 200 and give them a 50-word bonus. I will also give them a grammar mark /3 above.  Every time they write a story, the amount of words expected goes up and the curving bonus goes down. 

We average their grammar mark and wordcount: if Johnny gets 2/3 for grammar, and writes 90 words, his score is 66.6% (grammar) + 140/200 (70%) for wordcount = 69%.

After we do the second story of the year (and until the end of the course), we repeat the procedure, with a few changes

A) the speedwrite bonus drops by 5 words each time
B) the relaxed write bonus drops by 5 words each time and the “benchmark” goes up by 75 words. By end of year kids should be able to write 800 words in 45 min.
C) we use another topic for the speedwrite for the second time: describe a picture that you project onto your screen. This picture should support what was in your story. So, if the story had a girl who wants an elephant, your picture could be a boy who has an elephant.

D) for the third speedwrite, use topic #1 (describe yourself). For the fourth, use a picture. Keep alternating.  I use fully unsheltered grammar from Day 1 (all verb tenses, subjunctive, etc) so the picture describing tests evaluate how well they can use present tense. 

The writing will improve during the year. As I write this after having done only two stories, wordcounts are WAY up and grammar is also improving.

A few notes:

— you MUST carefully restrict vocab. This has been my single-greatest problem with TPRS: adding vocab at random. If you don’t restrict vocab, you get fewer reps on each item…and worse/less acquisition.

— initially, the kids will generate pretty crappy stories. Later, word count goes up and grammar will get better. Some kids will re-write the story; most will start to improvise.

their “mark” at any given time is simply their most recent speedwrite and relaxed write mark, combined.  I also do exit quizzes for listening and reading (1 each/week) so I have a pretty good overall picture of how everyone is doing. 

PQA is super important. Adriana’s book has a list of “personalised questions.” The Blaine Ray books also do. If you are doing your own stories, you make them up. Personalised questions are super-easy: you basically ask the class the questions you ask the actor.

So if you narrate “the boy liked running,” you ask your actor “do you like running?” and he says “yes, I like running.” You then ask “do you like vomiting?” (something contrastive) and he says “no, I do not like vomiting.” Then, starting with your superstar, you ask the class members “do you like vomiting or running?” etc. Simple.

This is important because the kids need to hear the present-tense forms.

— Adriana’s advice was to make sure all the kids do the comic. This is because the comic writing is “deep reading:” it makes the kids re-read, choose, copy and write, etc. For the non-artists, translation also works: copy story, underneath it translate (diff coloured pen), leave a blank line to keep it clear. Here is a pretty good example of “Los Gatos Azules” turned into a comic (one of Adriana’s kids did this one):

20141031-122945.jpg

20141031-123008.jpg

Anyway, this is how I have organised the “units” of TPRS and how I assess. Coments, as always, welcome!