assessment

How do exit quizzes work?

Richmond, B.C. powerhouse teacher Sonya ONeill writes “Exit quizzes…could you post an example?–I’m a bit confused about how to do these well. Do you do translations only? If so, are you starting in Spanish always with all levels? Do you ever use comprehension questions (of the story you just asked) at this time? Are these your main listening assessments?”

OK, today’s question, how can we do exit quizzes?

My system is simple.

  • Based on what we did in class, I read five sentences aloud.  These sentences contain the vocab from the story we are working on, or are sentences directly from that story.  If they are not from the story, they have to be stand-alone meaningful.
  • I tell the kids, “write down what you hear in Spanish, then translate into English.”
  • They write Spanish then translate into English.
  • The kids trade papers and we mark (the Spanish writing doesn’t matter much– it’s comprehension we are after).
  • The kids return the marked papers to each other.
  • I get a show of hands: Put your hand up if you got either 4/5 or 5/5.

If 80% of class got 4 or 5 out of 5, I am happy.  If not, I delivered bad/too little input, or they weren’t listening, and so we need to do more work around those sentences.  Sometimes I collect the marks, sometimes not.

Do I do “comprehension questions”?  By this, Sonya (I think) means, Do I ask the kids comprehension questions based on the story we have read/asked without them looking at/hearing the story at the time of the quiz?  I.e., do they have to remember and then answer?

Never.  Why?  The problems with comprehension questions are as follows

a) especially with beginners, the “mental load” involved in comp questions is super high, because kids have to do three things

  • decode the meaning of the question
  • remember content
  • write answers.

We know output (writing or speaking) does not aid acquisition, so no point with that.  We also know that all we need for acquisition is comprehensible input, so again responses don’t help.  This is a lot of mental work, and Bill VanPatten reminds us that what we might call “mental bandwidth overload” is an inevitable and insurmountable fact.  Basically, the less they have to “do” with a chunk of language, the more processing power they have for each chunk.

b) We also know that comprehension always and massively outpaces production.  Our kids– and we teachers– always  recognise more words in any language than we can produce.  If we ask for output, we may be forcing kids to “do” something they havn’t acquired yet.  

Say I tell my kids five sentences in Spanish, one sentence at a time.  Max (average), Samba (fast processor) and Rorie (insanely fast) all understand the new structures fui and trabajé that were in our story.  But Max hasn’t acquired them (i.e. he can’t say or write them) yet, while Samba and Rorie have.  If we know that acquisition goes at different speeds for different students, does asking for output not penalise Max for something he cannot control?

c) In my view– and I thank James Hosler for this insight– assessment should basically just be another excuse to deliver input to the kids.  I don’t want to play “gotcha” and I want people to succeed, so I’ll focus listening around what they can understand and easily do.

By the way, I think we can totally use questions for exit quizzes, provided we do not ask for output answers.  Just have one sentence be the question and the next the answer.  You say 1. ¿trabajaste anoche? and the kids write it down in Spanish, and then translate it: “did you work last night?”  Your next sentence is 2. “No, no trabajé anoche.  Fui al cine” and the kids write that down, and then they write “No, I didn’t work.  I went to the movies.”  Then you say three more Spanish sentences which they all copy and translate.

Once the quizzes are done and marked, you can also use these for PQA if you have a few minutes at the end of class.  Ask the fast processors some of the questions and have the class listen to what they say.

Advertisements

Results: Beginner Speedwrites Week 8 (Spring 2015)

Today was our fourth story test.  The class had a speedwrite assignment: in five minutes, describe this picture in as many words as possible. Note: we are unsheltered (i.e. we use all necessary grammar and do not restrict ourselves to any one verb tense, mood, etc).  Also we are a split class: beginners, 2s and three native speakers.

This picture works well: we just did the “Cambio de Pelo” story and the kids know words for hair, eyes, colours, dog, cat, guitar.

 

So here are what the beginners did.

First, Marya.  Note spelling mistake– “guitare”– this kid had French last year and it shows.  Also note tense & person confusion.  I have started doing ¿qué hiciste anoche?  PQA at the start of every class and the beginners are mixing these up.  My theory is that if it’s not also in structured writing (i.e. story form) they mix it up more.

Manvir has some problems with verbs. She is missing hay and es which may have to do with not enough present-tense PQA and/or reading.  Also adjective agreement errors.  The thing is entirely comprehensible, but the errors at times make you go huh?

  Minali’s was interesting. It hit me that the beginners have problems with the definite article!  I had assumed, my God, this goes without saying…but…most of our kids being L1 Punjabi or Hindi (which do not have articles) perhaps I can’t assume that English crossover grammar will kick in.  Again here hay and es and está are missing.  Also note jugar guitar a classic French/English cognate mistake, one that comes from the conscious mind.

Manisha’s is basically perfect but she did not write in 3rd person.

Roshini’s is also perfect basically and very high wordcount.  But she did not write in present– I am wondering if these guys actually can consciously think about verb tenses. Also note classic on-the-way error: Saturn gustaba instead of A Saturn le gustaba.  She hasn’t added the a and le because these are of low importance: the beginner language brain is focusing on gustaba, which has all the essential info. Interesting also how she used plural adjective form azules for pelo.

Wordcounts are lower than last time. This is (I think) because when we do story-related PQA, all of the answers are in first person, so it’s easy to describe yourself.  We simply do a whole lot less talk in 3rd person present.

What did I learn? 

  • Do MUCH more present tense PQA (or ask actors about each other)
  • do pop-ups for everything including articles!
  • put more present-tense commentary in written versions of unsheltered stories, OR do way more Picturetalk  (look and discuss) in present tense.
  • the kids don’t make mistakes unless I don’t provide enough input

What grades should kids get? Notes on evaluation for the mathematically-challenged.

Here is a part of a post from Ben’s.  A teacher– let’s call him Mr John Speaking– who uses T.P.R.S. in their language class writes:

“I was told by a Defartment Chair a few weeks ago that my grades were too high across the board (all 90s/100s) and that I needed more of a range for each assessment. Two weeks later I had not fixed this “problem” and this same Defartment Chair pulled me out of class and proceeded to tell me, referencing gradebook printouts for all my classes, that these high grades “tell me there is not enough rigor in your class, or that you’re not really grading these assessments.” After this accusation, this Defartment Chair told me I was “brought on board [for a maternity leave replacement] in the hopes of being able to keep me, but that based on what he’d seen the past few weeks, I’m honestly not tenure track material.”

Obviously, Mr John Speaking’s Defartment Chair is an idiot, but, as idiots do, he does us a favour:  he brings up things worth thinking about.

There are two issues here:

a) Should– or do— student scores follow any predictable distribution?  I.e., should there be– or are there–a set percentage of kids in a class who get As, Bs, Cs, Ds and Fs?

b) How do you know when scores are “too low” or “too high”?

Today’s question: what grades should students get?

First, a simple, math idiot’s detour into grading systems and stats.  The math idiot is me.  Hate stats?  Bad at math? Read on!  If I can get it, anyone can get it!

It is important to note that there are basically two kinds of grading systems. We have criterion-referenced grading and curved (norm-referenced) grading.

First, we have criterion-referenced grading.  This is, we have a standard– to get an A, a student does X.  To get a B, a student does Y, etc.  For example, we want to see what our Samoyed Dogs’ fetching skills are and assign them fetching marks. Here is our Stick Fetching Rubric:

A:  the dog runs directly and quickly to the thrown stick, picks it up, brings it back to its owner, and drops it at owner’s feet.

B: the dog dawdles on its way to the stick, plays with it, dawdles on the way back, and doesn’t drop it until asked.

C: the dog takes seemingly forever to find the stick, bring it back, and refuses to drop it.

So we take our pack of five Samoyed Dogs, and we test them on their retrieval skills.  Max, who is a total idiot, can’t find the stick forever, then visits everyone else in the park, then poos, then brings the stick an hour later but won’t drop it because, hell, wrestling with owner is more fun.  Samba dutifully retrieves and drops.  Rorie is a total diva and prances around the park before bringing the stick back.  Arabella is like her mother, Rorie, but won’t drop the stick.  Sky, who is so old he can remember when dinosaurs walked the Earth, goes straight there, gets the stick, and slowly trudges back.  So we have one A, one B, one C, one C- (Max– we mercy passed him) and one A- (Sky, cos he’s good and focused, but slow).

Here are our Samoyeds:

Samoyeds

Now note–

1. Under this scheme, we could theoretically get five As (if all the Dogs were like Samba), or five Fs (if everybody was as dumb and lovable as Max).  We could actually get pretty much any set of grades at all.

2.  The Samoyed is a notoriously hard-to-train Dog.  These results are from untrained Samoyeds.  But suppose we trained them?  We used food, praise, hand signals etc etc to get them to fetch better and we did lots of practice.  Now, Sky is faster, Rorie and Arabella don’t prance around the park, and even silly Max can find the stick and bring it.  In other words, all the scores went up, and because there is an upper limit– what Samba does– and nobody is as bad as Max was at fetching, the scores are now clustered closer together.

The new scores, post-training, are:

Sky and Samba: A

Rorie, Max and Arabella: B

Variation, in other words, has been reduced.

3.  Suppose we wanted– for whatever reason– to lower their scores.  So, we play fetch, but we coat the sticks in a nasty mix of chocolate and chili powder, so that whenever the Dogs get near them, they get itchy noses, and very sick if they eat them.  The Dogs stop wanting to fetch our sticks.  Some of them will dutifully do it (e.g. Samba), but they aren’t idiots, and so most of them will decide to forget or ignore their training.

4.  Also note who we don’t have in our Dog Pool:  Labrador Retrievers (the genius of the fetching world), and three-legged Samoyeds.  There’s no Labs because they are three orders of magnitude better than Samoyeds at fetch, and we don’t have three-legged Samoyeds because, well, they can’t run.

In other words, we could reasonably get any mix of scores, and we could improve the scores, or we could– theoretically– lower them.  Also, we don’t have any Einstein-level retrievers or, uhh, “challenged” retreivers– there are no “outliers.”

Now, let’s look at “bell curve” (a.k.a. norm-referenced) grading.  In this case, we decide– in advance— how many of each score we want to assign.  We don’t want any random number of As or Fs or whatever– we want one A, one F, etc.  We want the scores to fit into a bell curve, which looks like this:

bell curve

We are saying “we want a certain # of As, Bs, Cs, Ds and Fs.”  Now, we have a problem.  In our above stick fetching example, we got an A, an A-, a B, a C and a C-.  We have no Ds or Fs, because all of the Dogs could perform.  None of them were totally useless.  (After doing some training, we would get two As (Samba, Sky) and three Bs (Rorie, Max and Arabella).  But if we have decided to bell curve, or norm reference, our scores, we must “force” them to fit this distribution.

So Samba gets an A, Sky gets a B, Rorie gets a C, Arabella gets a D, and Max fails.

Now, why would anyone do this?  The answer is simple: norm referencing is only a way to sort students into ranks where the only thing that matters is where each person ranks in regard to others.  We are not interested in being able to say “in reference to criteria ____, Max ranks at C.”  All we want to do here is to say where everyone is on the marks ladder compared to everyone else.

Universities, law schools, etc sometimes do this, because they have to sort students into ranks for admissions purposes, get into the next level qualifiers, etc etc.  For example, law firm Homo Hic Ebrius Est goes to U.B.C. and has 100 students from which to hire their summer slav– err, articling students.  If they can see bell-curved scores, they can immediately decide to not interview the bottom ___ % of the group, etc.  Which U.B.C. engineers get into second year Engineering?  Why, the top 40% of first-year Engineering students, of course!

Now I am pretty sure you can see the problem with norm referencing:  when we norm reference (bell curve), we don’t necessarily say anything about what students actually know/can do.  In the engineering example, every student could theoretically fail…but the people with the highest marks (say between 40 and 45 per cent) would still be the top ones and get moved on.  In the law example, probably 95% of the students are doing very well, yet a lot of them won’t be considered for hire. Often, bell-curves generate absurd results.  For example, with the law students, you could have an overall mark of 75% (which is pretty good) but be ranked at the bottom of the class.

So where does the idea for norm referencing (“bell curving”) sudent scores come from?  Simple: the idea that scores should  disitribute along bell-curve line comes from a set of wrong assumptions about learning and about “nature.”  In Nature, lots of numbers are distributed along bell-curve lines.  For example, take the height of, say, adult men living in Vancouver.  There will be a massive cluster who within two inches of 5’11” (from 5’9″ to 6’1″).  There will be a smaller # who are 5’6″ to 5’8″ (and also who are 6’1.5″ to 6’3″).  There will be an even smaller number who are shorter than 5’6″ and taller than 6’3″.  Get it?  If you graphed their heights, you’d get a bell curve like this:

bc2

If you graphed adult women, you’d also get a bell curve, but it would be “lower” as women (as dating websites tell us) are generally shorter than men.

Now– pay attention, this is where we gotta really focus– there are THREE THINGS WE HAVE TO REMEMBER ABOUT BELL CURVES

a)  Bell curve distributions only happen when we have an absolutely massive set of numbers.  If you looked at five men, they might all be the same height, short, tall, mixed, whatever (i.e. you could get any curveat all). But when you up your sampling to a thousand, a bell curve emerges.

b) Bell curve distributions only happen when the sample is completely random.  In other words, if you sampled only elderly Chinese-born Chinese men (who are generally shorter than their Caucasian counterparts), the curve would look flatter and the left end would be higher.  If you didn’t include elderly Chinese men, the curve would look “pointier” and the left end would be smaller. A bell curve emerges when we include all adult men in Vancouver.  If you “edit out” anyone, or any group, from the sample, the distribution skews.

c)  Bell curves raise one student’s mark at the expense of another’s.  When we trained our Samoyed Dogs, then marked them on the Stick Fetching Rubric, we got three As and two Bs.  When we convert this into a curve, however, what happens is, each point on the curve can only have one Dog on it.  Or, to put it another way, each Dog has a different mark, no matter how well they actually do.  So, our three As and two Bs become an A, a B, a C, a D and an F.  If Rorie gets a B, that automatically (for math-geek reasons) means that Max will get a different mark, even if they are actually equally skilled.

As you can see in (c), bell curves are absolutely the wrong thing to do with student marks.

And now we can address the issues that Mr John Speaking’s Defartment Head brings up.  Mr Defartment Head seems to think that there are too many high marks, and not enough variation within the marks.

First, there is no way one class– even of 35 kids– has enough members to form an adequate sample size for a bell-curve distribution.  If Mr Defartment Head thinks, “by golly, if that damned Mr John Speaking were teaching rigorously, we’d have only a few As, a few Ds, and far more Bs and Cs,” he’s got it dead wrong: there aren’t enough kids to make that distribution  possible.  Now, it could happen, but it certainly doesn’t have to happen.

Second, Mr John Speaking does not have a statistically random selection of kids in his class.  First, he probably doesn’t have any kids with special challenges (e.g. severe autism, super-low I.Q., deaf, etc etc).  BOOM!– there goes the left side of the bell curve and up go the scores.  He probably also doesn’t have Baby Einstein or Baby Curie in his class– those kids are in the gifted program, or they’ve dropped out and started hi-techs in Silicon Valley.  BOOM!– there goes the right side of your curve.  He’ll still have a distribution, and it could be vaguely bell-like, but it sure won’t be a classic bell curve.

Or he could have something totally different.  Let’s say in 4th block there are zero shop classes, and zero Advanced Placement calculus classes.  All of the kids who take A.P. calculus and shop– and who also take Spanish– therefore get put in Mr Speaking’s 4th block Spanish class.  So we now have fifteen totally non-academic kids, and fifteen college-bound egg-heads.  Mr Speaking, if he used poor methods, could get a double peaked curve:  a bunch of scores clustering in the C range, and another punch in the A, with fewer Bs and Ds.

Third, instruction can– and does– make a massive difference in scores. Remember what happened when we trained our Samoyeds to give them mad fetching skillz, yo? Every Dog got better. If Mr Speaking gave the kids a text, said “here, learn it yourself,” then put his feet up and did Sudoku on his phone or read the newspaper for a year (I have a T.O.C. who comes in and literally does this), his kids would basically suck at the language (our curve just sank down).  On the other hand, if he used excellent methods, his kids’ scores would rise (curve goes up).  Or, he is awesome, but gets sick, and misses half the year, and his substitute is useless, so his kids’ scores come out average.  Or, he sucks, gets sick, and for half the year his kids have Blaine Ray teaching them Spanish, so, again, his kids’ scores are average:  Blaine giveth, and Speaking taketh away.

“Fine,” says the learned Defartment Chair, “Mr John Speaking is a great teacher, and obviously his students’ scores are high as a result of his great teaching, but there should still be a greater range of scores in his class.”

To this, we say a few  things

a)  How do we know what the “right” variability of scores is?  The answer:  there is no way of knowing without doing various kinds of statistical comparisons.  This is because it’s possible that Mr Speaking has a bunch of geniuses in his class.  Or, wait, maybe they just love him (or Spanish) and so all work their butts off.  No, no, maybe they are all exactly the same in IQ?  No, that’s not it.  Perhaps the weak ones get extra tutoring to make up for their weakness. Unless you are prepared to do– and have the data for– something called regression squares analysis, you are not even going to have the faintest idea about what the scores “should” be.

b)  score variability has been reduced with effective teaching.  There are zillions of real-world examples of where appropriate, specific instruction reduces the variation in performance. Any kid speaks their native language quite well.  Sure, some kids have more vocab than others, but no two Bengali (or English-speaking) ten year olds are significantly different in their basic speaking skills.  95% of drivers are never going to have an accident worse than a minor parking-lot fender-bender.  U.S. studies show that an overwhelming majority of long-gun firearm owners store and handle guns properly (the rate is a bit lower for handgun owners). Teach them right, and– if they are paying attention– they will learn.

Think about this.  The top possible score is 100%, and good teaching by definition raises marks.  This means that all marks should rise, and because there is a top end, there will be less variation.

Most importantly,  good teaching works for all students. In the case of a comprehensible input class, all of the teaching is working through what Chomsky called the “universal grammar” mechanism.  It is also restricted in vocab, less (or not) restricted in grammar, and the teacher keeps everything comprehensible and focuses on input.  This is how everyone learns languages– by getting comprehensible input– so it ought to work well (tho not to exactly the same extent) on all learners.

Because there is an upper end of scores (100%), because we have no outliers, and because good teaching by definition reaches everyone, we will have reduced variation in scores in a comprehensible input class.

So, Mr Speaking’s response to his Defartment Head should be “low variation in scores is an indication of the quality of my work. If my work were done poorly, I would have greater variation, as well as lower marks.” High marks plus low variation = good teaching. How could it be otherwise?

In a grammar class, or a “communicative” class, you would expect much more variation in scores.  This is because the teaching– which focuses on grammar and or output, and downplays input– does not follow language acquisition brain rules.  How does this translate into greater score variation?

a) Some kids won’t get enough input– or the input won’t be comprehensible enough– and so they will pick up less.  Now you have more lower scores.

b) Some kids will be OK with that.  Some kids won’t, and they’ll do extra work to catch up.  Result: variation in acquisition.  Now, there will be a few high scores and more low ones.

c) Some kids will hate speaking and so will do poorly on the speaking assessments, which will increse variation.

d) Many kids don’t learn well from grammar teaching, so in a grammar-focused class, you’d expect one or two As, and a lot of lower marks.

e) if the teacher is into things like “self-reflection on one’s language skills and areas for growth” or such edubabble and the kids are supposed to go back and rework/redo assignments, things could go either way.  If, for example, they re-do a dialogue from the start of the course at the end, they might– if the vocab has been recycled all year– do better.  If, however, it’s the check your grammar stuff, you’d again expect variation: only a very few kids can do that, even if their language skills have grown during the year.

And, of course, there is the “grammar bandwidth” problem: any effort to focus on a specific aspect of grammar means that other areas suffer, because our conscious minds have limited capacity. A District colleague told me that, for Level 5 (grade 12) French, the kids self-edit portfolio work. They have an editing checklist– subject-verb agreement, adjective agreement, etc– and they are supposed to go and revise their work.

The problems with this, of course, are two: in their mad hunt for s-v errors, the kids will miss out on other stuff, and we know that little to no conscious learning makes it into long-term memory.

Some real-life examples of how good instruction narrows variation  in scores:

At Half-Baked School, in the Scurvy School District (names have been changed to protect the guilty), TPRS teacher Alicia Rodriguez has Beginning Spanish.  So does her Defartment Chair, Michelle Double-Barreled.  When, at the end of the semester, they have to decide on awards– who is the best Beginning Spanish student?– Alicia has 16 kids getting an A, another 12 getting a B, and two betting a C+.  None fail.  Michelle Double-Barreled has one kid getting an A, a bunch of Bs and Cs, a couple of Ds, and a few failures.

What this means is, 16 of Alicia’s kids can

a) write 100 excellent words in Spanish in 5 min, on topics ranging from “describe yourself” to “describe [a picture].”

b) Write a 600-1000 word story in 45 min.

Both will have totally comprehensible, minor-errors-only Spanish.

Michelle Double-Barrelled, on the other hand, has one A.  Her “A” kid can

a) do grammar stuff

b) write a 100-word paragraph on one of the topics from the text (e.g. shopping, eating in restaurant, sports s/he plays, family).

This will be not-bad Spanish.

Now, who’s doing a better job?  Alicia has more kids doing more and better work.  Michelle has a classic bell-curve distribution.  According to Mr John Speaking’s Defartment Chair, Mrs Double-Barreled has a “normal” range of scores.  Yet Alicia is clearly getting her kids to kick major butt. Hmm…

The point is, with appropriate and effective instruction– good Dog training, or good Spanish teaching– we are going to get a cluster of generally higher scores.  Poor or no teaching might produce something like a bell curve.

So…what does T.P.R.S. and other comprehensible input teaching do for student outcomes?

In my class, T.P.R.S. did the following

a) all scores rose.

b) the difference between top and bottom scores (variation) decreased.

c) I.E.P. kids all passed.

d)  First-year kids in second-year classes did about 85% as well as second year kids, despite having missed a year of class.

e) In terms of what the kids could actually do, it was light-years ahead of the communicative grammar grind.  Kids at end of 2nd year were telling and writing 400-600 word stories in 3-5 verb tenses, in fluent and comprehensible (though not perfect) Spanish.  Oral output was greater in quality and quantity too.

f) Nobody failed.

My colleague Leanda Monro (3rd year French via T.P.R.S.) explains what T.P.R.S. did in her classes:

“[I saw a ] huge change in overall motivation. I attribute this to a variety of well-grounded theories including “emotion precedes cognition” (John Dewey), Krashen’s affective filter, and the possible power of the 9th type of intelligence, drama and creativity. (Fels, Gardener).   There is a  general feeling of excitement, curiosity, eagerness to speak French, incorporation of new vocabulary, spontaneous speech.

All but one student has an A or a B. The one student in the C range has significant learning challenges , and despite excellent attendance in all courses is failing both math and English. No one is failing.

[There was] far less variation. Overall, far greater success for all students. My contribution to the “Your scores are too high” comment is this: As educators we need to pose an important question:  Are we trying to identify talent, or are we trying to nurture and foster talent?  T.P.R.S. works to nurture and foster.”

And here are Steve Bruno’s comments on the effect of T.P.R.S. on his kids’ scores:

“I now get more  As and Bs [than before]. A few C+s and very few Cs. Let’s put it this way, in the past I’ve had to send between 20 and 25 interims/I reports (total 7 classes); this year, so far, I’ve sent just THREE! Of these, two had poor attendance; the other one is an L.A.C. student who is taking a language for the first time (Gr. 9).

Marks are also closer together.  Anyone who has been teaching C.I. will understand why this is the case:  Students feel more confident, less stressed and simply love T.P.R.S. They don’t like doing drills, or memorizing grammar rules, etc.

Here’s anther example [of how comprehensible input has changed student behaviour].  Last year, I had a few students who on the day of an announced test (usually, with one week of warning) would suddenly become ill, and, or skip. Some of my L.A.C. students would have to write the test in a separate room. Others would show all sorts of anxiety as I handed out the tests. Many of these students would end up either failing the test or doing very poorly.

This year, I have the same students in my class, and the day of an unannounced test, they don’t have to go to another room, nobody complains and they just get down to it and, yes, they do quite well, thank you very much!

OK, people…you want to report on how things are going with T.P.R.S.? Post some comments, or email.

How well is Adriana Ramírez’ book working so far?

This year I decided to go in for a more classical, purely story-based T.P.R.S. than what I began with– what Ben Slavic described as “the freewheelin’ c.i.” I am using my colleague Adriana Ramírez’ Teaching Spanish Through Comprehesible Input Storytelling text. This is a set of 16 stories. You get a vocab list, a basic story, an extended reading, story comprehension questions and personalised questions. The thing was loosely designed to “piggyback” on Avancemos, the Spanish text our District adopted, but it stands alone too.

Today’s question: how well is Adriana’s book working?

1) Great.

2) I am almost done my 4th story– “Cambio de Pelo”– and these are my results:

a) for speedwrites (“write as many words as you can in 5 min”) I am alternating topics. For even-numbered stories, the speedwrite assignment is “describe yourself.” For the odd-numbered stories, the assignment is “describe a picture on the overhead” (Picture will have something to do with just-asked story).

Word count averages for speedwrites as follows:

— story 1 25 words + 45-word bonus = 70% average

— story 2 43 words + 40-word bonus = 83% average

— story 3 50 words + 35-word bonus = 85% average

In terms of grammar, every kid– except those who miss 2-3 classes– is getting at least 2/3 and over 1/2 are getting 3/3. Out of 30 kids, only 3 have “bombed” in terms of grammar and in each case their subsequent mark went way up. I.e. a kid who misses a bunch of classes, does the test, then bombs, will do much better later on (on the test after next story) because the stories recycle all the grammar and vocab.

Word count averages for “relaxed writes” (“rewrite the story, or modify it, or make up your own, and include 2 main characters and at least 2 dialogues”)

— story 1 ~80 words (they totally sucked– average grammar mark 1/3)

— story 2 ~130 words (much better– average grammar mark 2/3)

— story 3 ~ 180 words (better again– class evenly split between 2/3 and 3/3 for grammar mark)

Oral output:

The system for “teaching” kids to talk in T.P.R.S.– a.k.a. P.Q.A. (personalised questions and answers) is super-simple: you basically ask members of the class the questions you ask your actors. So, in the first story, you ask your actor “what is your name?” and s/he says “My name is ____.” Because s/he doesn’t know any Spanish, you write it on the board and they can just read off board. You then ask them “is your name

?” and they say “No, my name is _____.” You then ask your parallel character(s) the same question(s). Then– after the audience has heard it a bunch of times from actors– you ask the members of the class, starting with the keeners, the same question. Initially, the keeners will be able to spit it our right away in sentence form, while other kids will just say “John.”

After 5 weeks x 5 classes/week = 25 classes, 4/5 of the kids can now unhesitatingly and fluently answer these questions:

— what is your name? how old are you? where do you live? are you a [boy, girl, cat…]? Are you [tall, short, crazy…]?

— do you like _____? [about 15 verbs and 15 nouns to choose from]

— what’s the weather, day, date?

— what are you like? (i.e. describe yourself)

— do you prefer ___ or ___?

— do you have ____?

The other 1/5 of class (the slower-acquirers) ALL understand the questions, and all can say something— even if it’s just one word– that makes sense. E.g. “What’s the weather like?” — “Cold.”

3) Why is it working, and what would I change?

First, it’s working cos it restricts (shelters) vocab, and because the extended reading closely mirrors the story asked. Second, it restricts vocab overall. I have done a rough count and it comes out to the kids get about 3 new words/day on average. Third, the comp questions force re-reading, and fourth, I am liking Adriana’s comic idea.

Update on the comic: for the comic, after we have done the extended reading (teacher guided, and ping-pong), the kids have to create a 12-panel comic that illustrates the story. It has to look awesome– clip art, etc fine– with colour, each panel must have at least one sentence, and the comic must include all dialogue. This time, I also added a translation option: copy the story– by hand– then translate underneath in different colour, then leave a blank line (to keep it neat) and indent all dialogue. I am gonna see how the translation works, but the comic rationale is, it’s deep reading: kids have to re-read, select, and illustrate (read: concise focus). Adriana says it works best for the laggard boys and I have to agree.

My changes: First, My kids are 90% Indian, so English is often their 2nd language, and almost none of them hear English at home. Our kids read, and are literate, but lack some of the linguistic mental infrastructure that Adriana’s (rich, white and Asian, educated) kids do. So, they need MUCH more reading practice than Adriana’s, so I make them read BOTH the basic script– the story I ask, by photocopying it and handing it out– AND the extended one in Adriana’s book. Second, I am varying the speedwrites (5 mins) as noted above. Third, my kids don’t always get the comprehension questions, so I have to go through them. E.g. on the last story, one question was ¿Dónde vive el chico? (Where does the boy live?) and the kids all answered with “Vivo en Colombia” (I live in Colombia). Fourth, the retells don’t work. I am getting junky output from the kids so I am putting the kaibosh on retells for awhile until I figure out a better way to do this.

Anyway, overall, the program is working well and I am both recommending it and gonna stick with it. If ppl want to try it, email Adriana (ramirez_a (attt) surreyschools (dottt) ca or hit her up on twitter: @veganadri

Meaning First!

When I think back to high-school math, what I remember is, algebra was kind of neat, but pointless to me…until grade 11 physics, at which point algebra stopped being idiotic questions about how old Johnny’s younger brother will be in five years when he is twice as old as Suzy, and started being about how fast– and in what direction– the car is going after a collision. In other words, physics with Gary Laidlaw was meaningful. Despite being a math moron, I was able to get a B in Gr11 physics and to get a 4 on the A.P. physics exam in Gr12. This came back to me three days ago when I totally blew it with my Spanish beginners.

Backstory: this year I am doing TPRS totally differently: it will be 100% story-based and my “rough guide” will be Adriana Ramírez’ Teaching Spanish Through Comprehensible Input text. I’ll keep people posted with regular stats about how the kids (true beginners) are doing.

So I started asking “Los Gatos Azules” on Day 1 and 4 periods later I’d asked it. (Before starting, I gave the kids a vocab list of all the Spanish words and we translated into English.). I gave them a quiz on Day 5. The quiz was, I gave them the vocab sheet with only Spanish on it, and they had to write out the English. There were words like “talks” and “goes to” and “cats.” 33 words total.

The results were abysmal. Class average of 37%. This despite zillions of reps of all the structures. Interestingly, my native speaker kids– and the ones with one Hispanic parent– also did relatively poorly. Also, my actor– a kid we nicknamed “El Chapo” after the legendary Mexican gangster– who is amazingly quick on his feet and picks Spanish up super-fast, bombed.

So, the next day, I gave them another test. In this one, I used many of the same words, but I said them aloud in sentences. I got the kids to write down the Spanish (at which they sucked, expectedly, having done basically zero reading) and the sentence meanings in English. The results were WAY better: class average of 75%. The marks were basically, did they understand the meaning?

So…what did I learn?

A) People– especially beginners-remember in context. If they learn through stories, they’ll best remember through stories. Teaching one way and remembering in another doesn’t work.

B) Language is learned in chunks. A sentence such as el chico quiere tener 10 gatos azules (“the boy wants to have 10 blue cats”) is easier to remember than “the boy” and “cats” and “blue,” etc.

C) Meaning must come first. If it’s not meaning-based– i.e. people are dealing with what feel like random chunks of stuff– it’s much harder to remember. If it’s part of a story, or anchored with a picture or video or actor, everything hangs together. We MUST teach with meaning first or we’re wasting our time.

D) Beginners know a LOT less than we think they do The second test results should have been around 95%. When I asked questions about the story today, I was surprised to note that I got weak responses when I asked “What does el chico no quiere tener un perro mean?” As Ben Slavic puts it, we must ask zillions of comprehension questions to make sure students actually understand. I’m also going for a traditional TPRS no-no: asking individuals what “____” means because I’m not always sure meaning is clear. I think kids sometimes “chant along” with their peers, so I’ll ask more one-on-one questions.

E) Another thing I’ll do from now on is the exit quiz. I’ll read 5 sentences aloud and have the kids write down the meanings. In abfew weeks, they’ll start also writing down the Spanish. Slightly more regular assessment will help me in that if many kids get # ____, wrong, I can go back and circle that more next day.

Evaluation is Over-rated

Yesterday the B.C.A.T.M.L. conference brochure came, as did the C.A.S.L.T. newsletter, and the usual fare was offered:  lots of  “how to use iPads” workshops, lots of “how to get the kids to speak” workshops, and, of course, lots of workshops (and webinars) on D.E.L.F.

The Diplome D’etudes des langues francaises (OK; I probably missed some French finery in there) is the Common European Framework for Reference bla bla which is basically, the E.U., before they began bailing out corrupt banks and kow-towing to Vladimir Putin, set up criteria for languages proficiency.  This is a set of 6 categories– from A1 (beginner), A2, B1, B2, C1 and native speaker mastery is C2.  The idea here was that for business, government employment, work etc purposes, a company or government could assess candidates/students etc to see where they fit onto the scale in terms of proficiency in Language ____ when making employment or palcement decisions.  That’s all good, and C.E.F.R. has come to Canada and the U.S. and the exam– the D.E.L.F., and the D.E.L.E. (Spanish)– that assesses people has been adopted in lots of places and now the big push is “learn to assess in terms of the DELE/DELF exam.”

What this means in practice is basically re-doing what texts do (poorly): “planning out” language teaching by going from allegedly “simple” stuff– hellos, goodbyes, the present tense– to supposedly “complex” stuff such as the imparfait, discussing hopes and dreams, etc.  The usual problems remain, though: what teachers see as “advanced” (e.g. the subjunctive) is actually used quite early on by native speakers; other supposedly “important” vocab (e.g. clothing) is not very frequently used, etc.

Outside of providing Numberz at the end of Semesterz, I think this C.E.F.R.-based organisation of curriculum is more or less a waste of time.  Here is why.

First, in my view, there should basically be zero evaluation (giving a student a number) until literally the last day of the course.  Why?

Well…what if you taught ___ and Johnny isn’t ready to acquire it?  What if Johnny acquires it after you tested him on it, and now he knows it, but that first test mark drags him down?  Johnny gets 70% on his passé composé or whatever test.  What good does a number do him?  Evidence suggests that feedback improves learning much more than assigning numbers.  However, this does not apply to languages, where, as Lightbrown and Spada (2013) put it, “comprehensible input remains the foundation of second language acquisition” and the research clearly shows very few gains resulting from conscious feedback to learners.

A test is also a waste of time.  That’s an hour or whatever where kids could be getting comprehensible input, which is what drives language acquisition.

Second, during-the-year tests do not provide useful feedback for the teacher.

Your kids averaged, say, 70% on the passé composé test they just took.  What does this tell you?  Or, more specifically, how does this info help you plan your next unit of teaching?  What if Arabella got 90% but Sky only got 70% and Max got 50%.  Can you “tailor” your instruction to them?  What if you have 30 kids, and they are all in different places?  What if Samba got 30%? How are you going to teach both Samba and Arabella?  What if Samba isn’t ready for the passé composé and Arabella is bored and wants to move on?

Answer:  with “communicative” or grammar grind or audiolingual teaching, you aren’t going to help them, and nobody else is either.  What you have is kids with a wide range of either abilities, or willingness to listen in class, or both, and you do not have time to teach or plan individually, no matter what your Adminz or Defartment Headz say.  It’s simply not going to happen.  You have thirty kids in your class– you simply do not have time to provide Samba with ____ and Max with ___.

Third, what does Johnny see when he gets his test back?  I’ll tell you what Johnny sees:  a number, and a bunch of red.  And this helps him acquire French how?

Now, at he end of the year, at an upper level (say Gr12), giving the D.E.L.F. or D.E.L.E. exam is great; most people eventually want to/must by law get a Number.  However, one fact– no matter what test we have at the end of the year is– remains: the more interesting comprehensible input students get, the better they will do (unless the exam is of the fill-in-the-blanks-with-the-right-verb-form kind of idiocy).

So what should T.P.R.S. teachers do “along the way”– assessment– to productively guide their instruction?  Remember, people learn by getting quality, attention-worthy comprehensible input (and some people like a bit of grammar explained).

a) check choral responses:  if they are weak or non-existent, your kids either misunderstood the question, or don’t know the vocab, or both.  Go back, explain, try again.  If they are actively listening– not on phones or chatting, following with their eyes, etc– their failure to understand is your fault, not theirs.

b) Monitor retells.  Beginners should be able to re-tell a story (in skeletal form) without too many mistakes.  If they can’t do that (after, say, 20 classes, from memory), you are going too fast and not getting enough repetitions.

c)  Monitor correct use of recent structures.  If you taught “wants to own,” and circled the crap out of it, and they are writing “wants I own” or “I want I own,” there wasn’t enough repetition.

One answer, I would say, is read your speedwrites post-story, find the most-made mistake, and throw that into your next story.  If they don’t know “wants to own,” have a parallel character in the next story who wants to own a dinosaur.

d)  Most importantly, provide rich and diverse input at all times.  As Susan Gross and Stephen Krashen have noted, providing “all the grammar, all the time”– i.e. not delivering simplified, one-dimensional input in order to beat a grammar item into kids’ heads– is the best strategy, provided all input is interesting and comprehensible.  If Samba didn’t get the passé composé on her test last week, if she keeps hearing/reading it, she’ll eventually get it.  If Arabella got 90% on her passé composé test and you’re worried she’s gonna get bored, making the next story interesting will keep her tuned in, while Samba both finds the next story interesting and gets more exposure to the passé composé.

The bottom line for the comprehensible input teacher is, make sure they are listening/reading, make sure they understand– as Ben Slavic says, we ask more y/n questions than we ever thought possible–, deliver lots of interesting, quality comprehensible input,  and if they aren’t understanding, go back and clarify.

This process– assessing as you go– will deliver results.  Self-monitoring, grammar lectures, conjugation exercises:  these are for teacher egos, not kid acquisition.  Deliver good C.I., and the D.E.L.F. scores will come.

How should I grade translations?

This question is from Sarah-Beth who teaches French to grade 7s and 8s.

Q: How do I grade translations?

First, principles.  Whatever you want students to read and decode should be

1) at LEAST 90% vocabulary they have either acquired, or frequently read.  Research shows that people can read independently only when they recognise 90% of what they read.  The other 10% is “noise” and/or vocab that will slowly be acquired.  Remember– our passive (recognition) vocab is always way larger than our active (production) vocab– so, if you have been doing lots of reading, your students shoul be able to understand a fair bit of things they can’t (yet) say.

2) in some kind of meaningful form– e.g. a story, a clear and obvious character describing him/herself, etc– not isolated sentences.

3) Latin teacher James Hosler has said that “for me, assessment is just another excuse for providing comprehensible input.”  I couldn’t agree more. 

I would suggest you give them a 150-200 word story.  This story should include vocab from the entire course, not just your most recent story.  Have them copy it (this is free reps!).  They write the translation underneath in a different-coloured pen.  Underneath that, they leave a blank line (this is to keep it legible).  Or, you could hand out a triple-spaced copy of the story, and they write the translation underneath.

Count the words in the original.  For every meaning-based mistake students make, they lose one point.  For every verb-tense (relatively trivial) mistake, take off 1 mark. 

So if the original is 200 words, and Johnny makes 3 meaning-based errors, and 4 verb-tense errors, his mark is 193/200 = 96.5%

Another idea (from Ben Slavic) is dictee-translation.  For this, dictate a short, ten-sentence story, or put a picture on your O/H and describe it.  The kids listen (NO ENGLISH!) and write.  When done, project the story/description onto O/H.  Have the kids fix their mistakes (this is good CI!).  Then, have them translate (in different-cloured pen) under what they have written in TL.  You assess (a) their corrections and (b) their translation. 

Most teachers find that translation results are amazing– kids really do “get” what we repeat in stories, PQA, etc, because we teach for mastery (acquisition) and don’t go on until the kids get what we are describing– and the translation marks should be pretty high.  If colleagues object– “what?  they’re all getting As on comprehension?  they CAN’T be THAT good?” you’ll be OK…because in TPRS, we teach for mastery, not “presentation” and we EXPECT our kids who attend and focus to understand everything.    

I also think that multiple choice questions to determine how much students understand are fine…but it will be a lot of work to make them, as the “three plausible distractors” rule is tough, and it’s surprsingly hard to come up with questions.  You could use something from a standardised program (e.g. Avancemos)…but then you have the problem of super content-specific questions and grammar which probably won’t line up with what you’ve done in stories and readings.