Opinion-Policy Nexus

What is a political theorist doing teaching a seminar in social science statistics? A reasonable question to ask my colleagues, but they gave me the wheel, so I drove off!

Later I'll post some reflections on my experiences this term. For now, I want to weigh in briefly with some very preliminary thoughts on software and programming for statistics instruction at the graduate level, but in a MA programme that doesn't expect a lot by way of mathematical background from our students.

In stats-heavy graduate departments R seems to be all the rage. In undergraduate methods sequences elsewhere (including here at Laurier) SPSS is still hanging on. I opted for Stata this term, mostly out of familiarity and lingering brand loyalty. If they ever let me at this seminar again, I may well go the R route.

This semester has reassured me that Stata remains a very solid statistical analysis package: it's isn't outrageously expensive, it has good quality control, and they encourage a stable and diverse community of users, all of which are vital to keeping a piece of software alive. Furthermore, the programmers have managed to balance ease of use (for casual and beginning users) with flexibility and power (for more experienced users with more complicated tasks).

All that said, I was deeply disappointed with the "student" version of Stata, which really is far more limited than I'd hoped. Not that they trick you: you can read right up front what those limits are, but reading them online is a whole lot different than running up against them full steam in the middle of a class demonstration, when you're chugging along fine until you realize your students cannot even load the data set (that you thought you'd pared down sufficiently to fit in that modest version of stata!).

R, in contrast, is not a software package, but a programming environment. At the heart of that environment is an interpreted language (which means you can enter instructions off a command line and get a result, rather than compiling a program and then running the resulting binary file).

R was meant to be a dialect of the programming language S and an open source alternative to S+, a commercial implementation of S. R is not built in quite the same way as S+, however. R's designers started with a language called Scheme, which is a dialect of the venerable (and beautiful) language LISP.

My sense is that more than a few people truly despise programming in R. They insist that the language is hopelessly clumsy and desperately flawed, but they often keep working in the R environment because enough of their colleagues (or clients, or coworkers) use it. Often these critics will grudgingly concede that, in addition to the demands of their profession or client base, R is still worth the trouble, in spite of the language.

These critics certainly make a good case. That said, I suspect these people cut their programming teeth on languages like C+ and that, ultimately, while their complaints are presented as practical failings of R, they are in fact deeper philosophical and aesthetic differences. (... but LISP is elegant!)

I remain largely agnostic on these aesthetic questions. A language simply is what it is, and if it -- and as importantly, the community of users -- doesn't let you do what you want, the way you want, then you find another language.

If you've ever programmed before, then R doesn't seem so daunting, and increasingly there are good graphical user interfaces to make the process of working with R more intuitive for non-programmers. Still, fundamentally the philosophy of R is "build it yourself" ... or, more often, "hack together a script to do something based on code someone else has built themselves."

This latter tendency is true of Stata also, of course, but when you use someone else's package in Stata, you can be reasonably confident that it's been checked and re-checked before being released as part of the official Stata environment. That is less-often the case with R (although things are steadily improving).

Indeed, there have been, not too long ago, some significant quality-control issues with R packages, and it always leaves the lingering worry in the back of your mind as to whether the code you've invoked with a command ("lm" say, for "linear model) is actually doing what it claims to do.

Advocates of R rejoin that this not a bug, but a feature: that lingering worry ought to inspire you to learn enough to check the code yourself!

They have a point.

Comments

So which dog I back in this fight is obvious and doesn't need a lot of defense, so let me take another tack, and stump for the underest of underdogs: I once spent some time working out how to code up a few estimators using, of all things, the excel function command. There are some memory limits, but there's basically very little you can't do, so long as you're willing to hand-code from a general template as you go (a fine habit regardless!). This was on the logic that the modal student is very likely to forever have excel in front of them, and pretty likely to forget whichever data software you might teach them before too long. I didn't get farther than OLS and a couple GLMs, since any course you'd design around it would have to be pretty heavy in matrix algebra and (though less so) calculus in order for them to get what's going on in the function cells. Also who the hell knows how you'd get it to make anything like a decent figure at the end of it. But the intuition, the intuition was sound, my theory friend. And your professional-leaning grad students might be just the audience for some badass Excel-fu.

equipaciones de futbol

Thanks...Really excellent info, I truly feel a whole lot much more folks need to have to go through this.