Sunday, August 2, 2009

Other blogs and useful websites

by Matt Spittal
This blog has been set-up as a forum for BCA students and has a focus on statistics. Naturally this isn’t the only website devoted to statistics and I wanted to share with you some other blogs that offer a tremendous amount of good stuff. Some of them are directly related to statistical analysis and programming; others offer either a broader view of the world or make a welcome distraction from long nights of studying.
The pick of the bunch is Andrew Gelman’s blog (http://www.stat.columbia.edu/~cook/movabletype/mlm/). Along with John Carlin and others, Andrew has written the book Bayesian Data Analysis, which is the textbook for BAYs. His blog is updated constantly with interesting posts on topics ranging from Bayesian inference, modelling techniques, US politics, the use of graphs, sport, and so on. I recommend having a look at it at least once every couple of days.
A second website that I haven’t spent nearly enough time looking at is Gapminder (http://www.gapminder.org/). The heading at the top of the page says it all: Unveiling the beauty of statistics for a fact-based world. There’s a bunch of movies with fascinating graphs posted fairly frequently, and a lot of it seems to be in response to current issues. My goal for the next few weeks is to spend more time on this website.
A related website is TED (http://www.ted.com/). Its focus is much broader than Gapminder, with videos from people such as writer Alain de Botton and British Prime Minister Gordon Brown. TED describes itself as offering riveting talks from remarkable people and this is a good one to look at when you need a break from study.
There’s a collection of websites that have been put together to help people learn statistical software. One of the most comprehensive is the Statistical Computing website at UCLA (http://www.ats.ucla.edu/stat/). SAS, SPSS and Stata are covered in a lot of detail, but other packages are mentioned too. If you are a Stata user, you can’t go past Stata’s support pages, especially the FAQs (http://www.stata.com/support/faqs/). This covers both computing issues as well as statistical issues. New R users with a background in SAS, SPSS or Stata will find Quick-R (http://www.statmethods.net/index.html) helpful. Another blog that has a bunch of useful tips, tricks and tools is dataninja (http://dataninja.wordpress.com/). Finally, although not a statistical package, there are some good websites out there to help people learn Latex. (Latex is a program for writing and typesetting documents – it excels at typing equations and is far easier to use than Equation Editor. See http://www.latex-project.org/ to learn more.) I have found Andrew Robert’s website to be one of the most helpful introductions to Latex (see http://www.andy-roberts.net/misc/latex/).
One last website that I want to mention is http://www.citeulike.org/. The principle underlying the website is that people upload their citation libraries to the web and share them with others. I guess it is a kind of social networking, or more correctly, knowledge networking. It doesn’t seem to be very useful if you are looking for articles in a certain area, since it is not a comprehensive database of the published literature, but it’s quite good for dipping into every now and then and discovering something you weren’t looking for. The tags seem to be the most useful way of navigating around.
So that’s my quick list of blogs and websites that I finding interesting or useful. What are some other goodies out there? Feel free to leave a comment and share your thoughts with others.

Thursday, July 23, 2009

Would you like to suggest a topic?

Do you have something interesting to say about biostatistics?

The BCA student blog was created to encourage discussion amongst BCA students on issues related to the study of biostatistics, or biostatistics in the workplace.

If you'd like to write a post or suggest a topic, write your ideas below and we'll see what we can do! Alternatively, send an email to bca@ctc.usyd.edu.au

Happy reading!

Wednesday, July 8, 2009

SAS, SPSS, Stata and now R...

by Matt Spittal

The first statistical package I learnt was SAS. I started using it in 1997 when I was studying for my honours degree. It ran on some sort of ancient mainframe computer and I had to sit in front of a green screen with a dirty keyboard and type my commands. There was no mouse, no syntax highlighting, no web-browser and no fancy user-interface. I certainly couldn’t put my favourite picture on the desktop or navigate between programs with the Alt-Tab keys. But it was a glorious way to fumble my way through my data and I loved every second of it. I used to spend hours in the lab typing away, and like a rat in an operant chamber, I would occasionally be rewarded with a program that ran without errors.

I can’t remember why, but at some point further down the track the university stopped supporting SAS (or at least stopped supporting the 300 year old mainframe it ran on) and, being young and impressionable, I decided to follow everyone else and learn SPSS. It had everything that SAS did not – it ran on a garden variety PC, you could use a mouse with it and it seemed to come with a modern computer with clean keyboard and fancy desktop picture. It even gave me the same results as SAS. But there were two big drawbacks. The first was that it was really hard to get access to computers that ran SPSS (and the department certainly wasn’t going to install it on our own computers) but more importantly, I stopped enjoying the simple pleasure of running analysis. On reflection, it was probably because I never learnt the syntax, and instead focused on using the point-and-click features. This was never satisfactory. I had frequent problems replicating my results and often got hopelessly confused trying to remember how I originally generated variables.

I had a chat to one of the statistical consultants at my university about this and she suggested I have a look at R. It was in its infancy at this stage, but the word on the street was that it was a pretty good program. The price was also right. It was open source and therefore free (although only in monetary terms as it turned out) and this appealed to the price-sensitive student in me. So I installed it on my shiny PC with its clean keyboard and its pretty pictures and prepared myself for analytical nirvana. Once I launched R, however, it didn’t take me long to figure out that I was out of my depth. There were no menus, no commands and no obvious way of interacting with it. The mouse didn’t do much and the desktop picture seemed pretty pointless at this point too. I did a bit of token research on the web, but it was pretty clear that this was not going to be the program for me.

More recently I’ve been using Stata at work. It’s a great program and reasonably easy to learn. In many ways it reminds me of using SAS back on the old mainframe computer. Perhaps that’s just because the results window has a largely green screen, but I think it’s deeper than that. Sure there’s the point-and-click stuff if you want to use that, but at its heart Stata is a programming environment. It’s an interesting language to code and there is almost always an elegant solution to most problems. (In fact, finding the most elegant solution has become a bit of an obsession for me – sometimes I completely rewrite my do files, not because they contain errors but because I think tinkering will improve the readability, the speed, or whatever.) But using Stata has also lead me back to flirting with R again. Largely this is because I think the plots that R produces are stunning; nothing else comes close. So I started learning R a year or two ago, and in the abstract, with clear examples from well-written books, it seemed pretty straight-forward. But every time I started to apply what I had leant to a real problem it became too hard and time consuming and I would give up. Inevitably, I would try picking it up again several months later, make some headway, but give up again.

I’ve probably been through this process about six times now, and I’ve finally started to gleam enough knowledge to be able to use R for my BCA classes. Last semester I decision to only use R and managed to do that for three of the four assignments. I’m going to try it again this semester with CDA and see how I go. I’ve recently decided to use R solely for one of my projects at work too. This has been a real challenge since it involved a considerable number of recodes, a task that I find particularly easy in Stata, but particularly hard in R. Of course, becoming slightly more knowledgeable in R has now raised another interesting issue for me. Is it better to be fluent in several statistical languages or more proficient in just one? There’s some practical implications stemming from this too: when starting a new project, how do you decide which computer program you are going to use? I suppose it depends partly on the preferences of the people you work with, but there are other considerations too. What do you think?