The course is given by Jeff Leek, an Assistant Professor in Biostatistics from the Johns Hopkins Bloomberg School of Public Health. Jeff's introductory video is shown below.
The course is run over eight weeks and is delivered as a set of video lectures. Topics covered include:
- The structure of a data analysis (steps in the process, knowing when to quit, etc.)
- Types of data (census, designed studies, randomized trials)
- Types of data analysis questions (exploratory, inferential, predictive, etc.)
- How to write up a data analysis (compositional style, reproducibility, etc.)
- Obtaining data from the Web (through downloads mostly)
- Loading data into R from different file types
- Plotting data for exploratory purposes (boxplots, scatterplots, etc.)
- Exploratory statistical models (clustering)
- Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)
- Basic model checking (primarily visually)
- The prediction process
- Study design for prediction
- Cross-validation
- A couple of simple prediction models
- Basics of simulation for evaluating models
- Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)
A 10-question quiz must be completed by the end of each week. It has hard and soft deadlines. If you miss the soft deadline you can still submit answers before the hard deadline but a penalty is applied to your score. You can attempt each quiz four times.
Two peer assignments must be completed; one in week 3 (due at the end of week 4) the other in week 6 (due at the end of week 7). The assignments are graded by your student peers, and you must grade at least four peer assignments to avoid a 20% penalty. Your grade is based on the median of the grades you receive from your peers.
An interesting aspect of the course is the forum, to which students can post questions. Prof. Leek obviously can't answer all the questions, as the course has 100,000 students. So, you can vote on questions and the lecturer responds to the top few. Students can help each other out by responding to questions too.
The course requires a working knowledge of R. I've been using R increasingly as part of my day-to-day work so am comfortable with this. Some (optional) background lectures on R are provided in the course material along with links to other resources.
Successful completion of the course conveys no official qualification or accreditation. I've enrolled purely for my own edification; to learn about MOOCs like coursera, and sharpen my data analysis skills.
I'll post follow ups as the course progresses.