by Matt SpittalThe first statistical package I learnt was SAS. I started using it in 1997 when I was studying for my honours degree. It ran on some sort of ancient mainframe computer and I had to sit in front of a green screen with a dirty keyboard and type my commands. There was no mouse, no syntax highlighting, no web-browser and no fancy user-interface. I certainly couldn’t put my favourite picture on the desktop or navigate between programs with the Alt-Tab keys. But it was a glorious way to fumble my way through my data and I loved every second of it. I used to spend hours in the lab typing away, and like a rat in an operant chamber, I would occasionally be rewarded with a program that ran without errors.
I can’t remember why, but at some point further down the track the university stopped supporting SAS (or at least stopped supporting the 300 year old mainframe it ran on) and, being young and impressionable, I decided to follow everyone else and learn SPSS. It had everything that SAS did not – it ran on a garden variety PC, you could use a mouse with it and it seemed to come with a modern computer with clean keyboard and fancy desktop picture. It even gave me the same results as SAS. But there were two big drawbacks. The first was that it was really hard to get access to computers that ran SPSS (and the department certainly wasn’t going to install it on our own computers) but more importantly, I stopped enjoying the simple pleasure of running analysis. On reflection, it was probably because I never learnt the syntax, and instead focused on using the point-and-click features. This was never satisfactory. I had frequent problems replicating my results and often got hopelessly confused trying to remember how I originally generated variables.
I had a chat to one of the statistical consultants at my university about this and she suggested I have a look at R. It was in its infancy at this stage, but the word on the street was that it was a pretty good program. The price was also right. It was open source and therefore free (although only in monetary terms as it turned out) and this appealed to the price-sensitive student in me. So I installed it on my shiny PC with its clean keyboard and its pretty pictures and prepared myself for analytical nirvana. Once I launched R, however, it didn’t take me long to figure out that I was out of my depth. There were no menus, no commands and no obvious way of interacting with it. The mouse didn’t do much and the desktop picture seemed pretty pointless at this point too. I did a bit of token research on the web, but it was pretty clear that this was not going to be the program for me.
More recently I’ve been using Stata at work. It’s a great program and reasonably easy to learn. In many ways it reminds me of using SAS back on the old mainframe computer. Perhaps that’s just because the results window has a largely green screen, but I think it’s deeper than that. Sure there’s the point-and-click stuff if you want to use that, but at its heart Stata is a programming environment. It’s an interesting language to code and there is almost always an elegant solution to most problems. (In fact, finding the most elegant solution has become a bit of an obsession for me – sometimes I completely rewrite my do files, not because they contain errors but because I think tinkering will improve the readability, the speed, or whatever.) But using Stata has also lead me back to flirting with R again. Largely this is because I think the plots that R produces are stunning; nothing else comes close. So I started learning R a year or two ago, and in the abstract, with clear examples from well-written books, it seemed pretty straight-forward. But every time I started to apply what I had leant to a real problem it became too hard and time consuming and I would give up. Inevitably, I would try picking it up again several months later, make some headway, but give up again.
I’ve probably been through this process about six times now, and I’ve finally started to gleam enough knowledge to be able to use R for my BCA classes. Last semester I decision to only use R and managed to do that for three of the four assignments. I’m going to try it again this semester with CDA and see how I go. I’ve recently decided to use R solely for one of my projects at work too. This has been a real challenge since it involved a considerable number of recodes, a task that I find particularly easy in Stata, but particularly hard in R. Of course, becoming slightly more knowledgeable in R has now raised another interesting issue for me. Is it better to be fluent in several statistical languages or more proficient in just one? There’s some practical implications stemming from this too: when starting a new project, how do you decide which computer program you are going to use? I suppose it depends partly on the preferences of the people you work with, but there are other considerations too. What do you think?
Good topic to start with. I found that data management and programming form such a large component of a statistician’s job, but from my experience, what you use depends on who you end up working for, and for better or worse, it’s what you’re stuck with during that job.
ReplyDeleteI thought about your question of ‘Is it better to be fluent in several statistical languages or more proficient in just one?’ and I’m for the argument that having several languages under your belt helps you more than being a guru in one.
I have only just started to really appreciate the difference between stats programs over the last year or so, but believed its better to learn different languages for awhile now. This idea goes way back to a first day at a new school, in English class I was asked to circle the past participle in a few sentences on the board. I never formally learnt grammar (apologies for any grammatical errors here) so had no idea what to do. It got a lot worse as I was in a German School with German Students who were dumfounded that I had no idea. The point being is that I believe learning a new program makes you much more aware of how your ‘native’ program ticks. Your forced to thing about your ‘native’ program in a new way.
For my ‘up-brining’ as a Statistician, SAS is my ‘native’ language, but have used Minitab, Fortran and SPSS in my work or studies. It was only when I started using the other programs that I started to enjoy SAS and actually pay attention to efficient techniques of programming, rather than just getting the coding over with.
Also, I believe knowing more gives you more flexibility as a Statistician. It gives you the ability to ‘cherry pick’ your program of choice when other factors such as cost constraints or external peer reviewing comes into play.
Thanks for you comments Lauren. It's great to get your perspective on this. You mentioned something that really resonates with me - learning a new language really does help you appreciate how your preferred statistical language works. For me it goes the other way too - sometimes I wish that my native language, Stata, worked the same way as R. For instance, in Stata, if you want to log-transform a variable you must do this in one step, and then the procedure (e.g. linear regression) in the second step. In R, you can just type -lm(log(y)~x)- and it does it on the fly.
ReplyDeleteThanks for sharing amazing information !!!!!! IIMT Rohtak offers the best BCA course in Rohtak, focusing on computer science, programming, and IT skills.
ReplyDelete