Course description

Statistics is the science of extracting knowledge from data. Before the advent of computers, datasets were restricted to what could be measured and painstakingly recorded by hand (think of Mendel's genetics experiments). Now, the pipeline of turning observable phenomena into data has been dramatically broadened by the ubiquitousness of sensors and input devices, and the internet to transfer the extracted data across the world. The Data Sciences represent all of the facets of working with and extracting knowledge from data, such as data extraction, processing, curation, maintainance, representation, visualization, exploration, hypothesis testing, parameter estimation, prediction.

In this course, we will focus on data extraction, processing, curation, maintainance, and visualization. Because these tasks rarely stand alone, and you are presumably aquiring data to make sense of it, we will revisit some basic statistical tools that you've learned in 141A and other statistics classes. Because statistical tools, data types, and computational frameworks are constantly shifting, we will emphasize the ability to parse and understand the foundations of data science, through the enumeration of principles, standards, and fundamentals.

The Course: Fall 2018

Book

Other Optional Reading

Final Project

The final project is 40% - 50% of the grade for the course, and it should demonstrate a large proportion of what you have learned from the course. If you do not display proficiency in the key technologies that we have gone over, then you will not receive a good grade. This includes

Some of the general rules and things to think about are...
  1. 1. You can work individually or in groups of 1-4 people. Larger groups will be graded more harshly than smaller groups.
  2. 2. As a group you should begin with a certain curiosity, for example, in my lecture 'What happened in Ohio?' I looked at the presidential election in OH. Then we processed the data, visualized it, and asked specific questions.
  3. 3. There is an in class group presentation in the last two weeks of the course, and a final written portion due TBD (near or during finals week).
  4. 4. Your presentation and final project will be separately factored into your grade. Because many of you will be presenting weeks before the deadline for the written portion you are encouraged to continue working on your project after you give the presentation.
  5. 5. As a group you will submit code and jupyter notebooks in addition to a website for the project. You will receive a link on slack to form a group repository toward the beginning of the course.

Syllabus: TBD