Great Minds Think Different

Yes they do.

Monday, January 14, 2008

Flight Attendant English

I've been listening to the peculiar ergolect (I made that term up: Greek ergon "work" + English morphologized lect "language") of announcement-making flight attendants for a while. Apart from the fact that it's only ever used to say a small set of things, I've picked out a few linguistic oddities that make it so distinctive.
  • Unnecessary auxiliary verbs. The verb "do" (appropriately conjugated) is inserted before almost every main verb denoting an action, when the verb's tense doesn't necessitate it. Example: "We do ask that you remain seated until..." The "do" in that sentence is completely unnecessary, although it is grammatical.
  • Necessary auxiliary verbs are stressed. If you listen carefully, the verbs "do", "be" and "have" (and their conjugated forms), when used as auxiliary verbs, are given more phonetic stress than the main verbs that follow. "We do ask that you remain seated until the captain has turned off the 'fasten seat belt' sign."
  • Common phrases are used redundantly. Example: "At this time, we do ask that you discontinue the use of portable electronic devices at this time." This may not be as straightforward a redundancy as it seems. It could be that the first "at this time" is attached to "ask", and the second is attached to "discontinue" (denoting that both the asking and the discontinuing are happening, or should be happening, at this time). I'm pretty sure just one "at this time" would work (getting rid of the first one sounds better).
  • Wave-like intonation. In many cases (not all), the speaker's intonation rhythmically rises and falls over the course of a sentence, which stands out because most sentences spoken by flight attendants in announcements are declarative, and declarative sentences in English are not marked by intonation (as opposed to interrogative sentences, for example, which have rising intonation). Deviation from this intonation pattern is actually marked in Flight Attendant English; the main use of this that I've observed is in emphasizing important information. For example, the phrase "keeping in mind that the nearest exit may be behind you" is almost always spoken in a completely flat intonation.
I haven't noticed peculiarities like this in French-language announcements (apart from the fact that sometimes the French is just plain bad). From the little bits of Dutch and Spanish that I understand, I haven't noticed any in those languages either. And I have no good explanation as to why any of these things in English might be the case. Just thought I'd point them out.

Saturday, January 12, 2008

Le sigh

Well, I'm back in the States. It was a good trip to Belgium, though, in that I got to do all of the things that I traditionally do when I go back:
  • Eat Belgian food:
    • Gaufres liegeois (known to the uninitiated as "Belgian waffles")
    • Frietjes (Belgian fries with mayonnaise)
    • Tarte au sucre ("sugar pie"; basically custard and congealed sugar in a pie crust)
    • Chocolate from one of the world's finest chocolatiers (Irsi)
  • Drink Belgian drink: Hoegaarden and several varieties of specialty beers.
  • Hear native speakers speak French. Also, be forced to speak French for extended conversations.
  • See a movie on one of the biggest (maybe the biggest) movie screens in Europe, at Kinépolis. (I actually saw two this trip.)
I also added a PowerBook 520 to my collection of vintage Macs. I bargained for it with a coworker of my mom's. He had some data on it that he wanted to keep, so I had to go through a long stupid procedure to get the data off of it and onto a CD. I had to connect it via an AppleTalk network to a PowerMac 7200/90, and then the PowerMac via Ethernet to my laptop. Transferring 150 megs took for ass ever, but I suppose it would have been considered fast back in the day when 150 megs was an astronomical amount of data. Plus I had to restart the transfer several times because the connection between the PowerMac and my laptop kept dropping out for no good reason. That PowerMac is actually the only remaining way I have to get data between my Old World and New World Macs – it's the only machine I have that has both serial and RJ45 ports. If it ever gives out, I'm a bit fucked.

So, I was a little down on the way back here, but I saw something on the plane that cheered me up a lot: Hairspray. It's definitely the best musical film I've ever seen, and now I must see it live. I must I must.

Wednesday, January 09, 2008

Pedagogy, pt. I

Lately I’ve been working on developing a web app that manages assignments in programming classes. I know this isn’t an original project — there are at least two other such things that have been written at CMU alone — but this one was born of the complete terribleness of one of the aforementioned projects. Anyway, the focus of this post isn’t the development part, or the fact that Ruby on Rails is basically programmer’s paradise (this is a true fact, by the way).

Working on this project got me thinking about the way grading is done in CS classes. I’ve been on course staff for four different CS courses at CMU now, and been in many more, and it seems like all of them take different approaches to grading. I’m of many minds about which approach is the best. That train of thought got me thinking more generally about the problems of grade inflation and the fact that people like to blame anyone but themselves for their own inadequacy. So this might be a long post.

If a student entering a college programming course has never taken a programming course before, the nature of programming assignments, and what it means in terms of how they’re graded, will come as a shock. A programming assignment is a complex thing, but the end result (the program’s behavior) is completely unambiguous. There are no subtleties; no wiggle room. This means that students put a lot of effort into crafting a substantial, complex, difficult piece of work, which is naturally reducible to a cold hard number. It can be a frustrating phenomenon to face.

There are four major approaches to grading programs for correctness that I’ve seen:
  1. Use automated tests exclusively to determine correctness. Give out all these tests, so students can, in effect, grade themselves.
  2. Use automated tests exclusively to determine correctness. Give out a subset of these tests (typically the less comprehensive ones).
  3. Use automated tests exclusively to determine correctness. Don’t give out any of them.
  4. Use a mixture of mostly manual testing and some automated testing, or exclusively manual testing. Don’t give out the automated tests, if any.


Within the approaches that involve giving out automated tests, there’s also the distinction of whether the source code to the tests is given out, or just the results of running them on a server.

The discussions of these approaches assume that the assigned program has been fully (and unambiguously) specified in prose, and any automated tests used to determine correctness accurately test the functionality set out in the spec.

Students obviously favor the first approach, because that way they know exactly what their correctness score is all along. (Typically, the correctness of a program accounts for the vast majority of the overall score, as opposed to other things like coding style.) As a pedagogue, however, I’ve turned against approach 1. As a teaching assistant, it’s all right — I have to do very little grading work, and students don’t have a leg to stand on if they disagree with the grade they get. (“Pedagogue” and “teaching assistant” are different concepts here — unfortunately.) The reason I don’t like it as a pedagogue is that it discourages (a) good programming practice and (b) learning. Students end up programming by trial and error. They’ll write some code and run the tests. Some tests fail. At this point, the more competent students debug their code, looking carefully at the reasons for the failures, instrumenting their code, using a debugger, possibly even writing their own tests to probe the bug further. However, staffing CS courses has taught me, mostly, that such competent students are few and far between. The less competent students, i.e. the majority, who have no idea how to systematically debug, will put print statements everywhere, thus confirming that something is indeed wrong, and then run for help. (These students tend not to do well when the code and bug are the kind where merely putting in a print statement makes the bug disappear.) The trouble is that they’re encouraged to think of the course-provided tests as an oracle, rather than as a safety net. People tend not to really think about and plan out their code before starting to write it, since they know that ultimately, their own confidence in the correctness of their code doesn’t matter. The absolute worst is when the tests report a score that students judge as “good enough”, and they turn in code that they know is flawed, assured that it won’t harm them. It’s an insane, absurd luxury; I really hope people harbor no illusions that they can continue doing stuff like that after they graduate, when the code they write will matter for years.

To some extent, I’ve been guilty of some of this myself, despite considering myself a good student and a good programmer. I’ve fallen into the trial-and-error coding trap, where a test fails, I glance at the output, make a change that I think will solve the problem, rerun the test, it fails, rinse and repeat. But I systematically debugged when I needed to, and to this day I have never turned in code with known bugs. That doesn’t change the fact that this approach to grading lulled me into bad habits.

To be fair, Approach 1 does have the advantage that since students have access to testing code, they can study it, modify it to suit their needs at the time (or even make it better), and possibly glean general testing techniques from it. Unfortunately, most students don’t do that kind of thing; they just run the tests and look at what they spit out, so this advantage isn’t worth much.

Approach 2 is a compromise between 1 and 3. Students don’t know their overall scores, since there are some tests that will count towards their score, of which they can’t see the results. This mostly solves the “not perfect but good enough” problem, at least among students who aren’t pathologically lazy. Since the most rigorous tests aren’t given to the students, they’re encouraged to stress-test their code themselves, or risk getting burned when they pass all the tests they have and fail most of the ones they don’t (which typically are worth more score-wise). To a TA, this scheme reintroduces the problem of whining students: they got a poor score on the tests they didn’t have, and for some reason they feel this is unfair. Their complaint is that their grade is being based on an unknown grading instrument. To me, students who use this excuse are miserable pathetic excuses for programmers, but I think I’m starting to see the psychological basis for it. I think they feel that because programs are completely concrete, deterministic and specified, the methods for determining their correctness must have the same properties. They can’t deal with the presence of unknown in the evaluation of a concrete, unambiguous piece of work. I may be talking out of my ass, since I don’t identify with this thinking at all, but that’s my theory.

Obviously, students who don’t like Approach 2 absolutely can’t stand Approach 3. I can’t say I like Approach 3 much either, as a TA or pedagogue. It exacerbates the whining students problem, and provides no pedagogical or logistical advantages over Approach 2. (I don’t think releasing a few basic tests to students is pedagogically harmful, as long as they account for a small proportion of the assignment’s score.) I think there’s general agreement on this — I’ve only been in one CS class that took this approach. Well, there was another class that had one programming assignment and eleven discrete math assignments, but the programming assignment was “write a quine”, for which a fully comprehensive test is make quine && cmp `./quine` quine.c, so it doesn’t really count.

Approach 4 is a different kind of compromise between Approaches 1 and 3. Possibly recognizing the pedagogical disadvantages of Approach 1 and students’ hatred of Approach 3, Approach 4 replaces the blind ruthlessness of automated testing with the compassion and sympathy of a human grader. Students don’t really have a problem with this approach, since it generally gets them good grades, mainly because manual testing of programs is nowhere near as rigorous as automated testing (and it’s a hell of a tedious job too, let me tell you), and because a human can be lenient to code that is “almost right”. My thoughts are that code that is “almost right” is still broken and deserves an appropriate score, but I didn’t make the rules in the course I staffed that took this approach.

To this end, one of the courses I’ve staffed has adopted a vastly complex approach to grading. The basic approach is 2: correctness is determined solely by automated testing, and a subset of the tests is released (only the results; no source code is ever released). There are several twists. First of all, students are required to write their own tests, and these tests are handed in along with the actual program, for credit. Then, three things happen: the staff’s tests are run against the student’s code, the student’s tests are run against the staff’s reference implementation, and the student’s tests are run against their own code. While running the student’s tests, the tests’ code coverage is measured (this course was done in Java, in case you wondered). Depending on the comprehensiveness of the student’s tests, as measured by code coverage, some of the results of the staff tests on the student’s code are released to the student (never all of them, though). The full results of the student’s tests (on both the student’s code and the reference code) are available to the student all the time. This is actually very useful feedback: if the student’s tests fail on the staff’s code, it’s likely that the tests are wrong, and students can fix the tests and be more confident in using them to test their code. (Also, it gives the staff a good laugh when a student’s tests fail on the student’s own code.) So students are forced to write their own tests, and they are graded on the quality of these tests. High-quality tests are rewarded with peeks at “the real answer”. Of course, this still isn’t how the real world operates, but at least students have an incentive to write good tests.

Man, in the midst of writing this I’ve realized that those stories I’ve read about a shortage of good software developers in the workforce might actually be true. I never believed them — even though I see pretty good evidence for them all the time as a TA. The fact that college CS courses don’t do much to prepare people for real-world software development looks like it might be a major reason.

Students don’t buy the “this isn’t what the real world is like” argument, though. They think that because this technically isn’t what we call “the real world”, if we go out of our way to create a real-world-like process of judging their work, we’re being unfair. I strongly think the opposite is true: if we don’t try to prepare them for the real world, we’re not doing our jobs right. In the real world, nobody is going to walk up to you and hand you a fully comprehensive test suite for the code you’re supposed to be writing. If you don’t know how to test and debug code, you won’t get far. Once you have a job, they’ll probably assume you can already test and debug competently; if you’re picking it up as you go along, you won’t do very well. College CS courses are the perfect opportunity to drill students in good development habits and techniques (not limited to testing and debugging; this includes source control, documentation, coding style, etc.). Courses that don’t take that opportunity are that much less useful to students.

So you know how at the beginning I implied that I’d write about grade inflation here? As it turns out, I haven’t gotten my thoughts on that topic completely organized, and I really would rather write about it coherently. Also, that topic and this one don’t mesh as neatly as I’d thought. So a post on grade inflation is forthcoming.