What is a Versioning System?

Git is an open source software (code that is free to use and develop by anyone) that provides version control. Imagine that you are working with team of people on the same file, say you all have access to the same Dropbox directory. You could all change the same file, but then when any of you syncs your changes then it will overwrite the other changes. You could set times to edit, such as Don edits from 10am-12pm, Peggy edits from 12pm-2pm, and Joan edits from 2pm-4pm. Or you could keep versions of files by changing the file name, so when I edit lucky_strikes_v3.py then I edit and save as lucky_strikes_v4.py. These all seem cumbersome, and versioning systems provide a better way. Git (and other versioning systems) provides the following features:

I will present the functionality of git via the command line. In 141C, you will learn much more about using linux, but I would encourage you to take this time to start using it now with either cloud computing or by installing linux on your own machine. While you can start a git repository on any linux machine, we will focus on using GitHub as a central repository. So while I will make you learn what is going on behind the scenes, you can use the GitHub client most of the time. Some resources about using git with the command line are here:

Checkpoint 1: Complete the GitHub git tutorial, and download git client for Mac, Windows, or install on linux (google git install for your linux distribution)

Setting up your website repo

First you need an account on GitHub, so create one. You should look into getting the free student developer pack. Now, you should set up your personal site, which will serve as your 'data science portfolio'. So a browser like Firefox is just a program that can interpret certain file formats, like html. When you point your browser to a folder, like https://github.com/ it looks for a file called index.html there, interprets, and displays it. Every other file (like this one that you are reading now) need to be pointed to by name on your browser. We'll start by initializing the repository with the GitHub automatic page generator:

Checkpoint: Go to GitHub pages and follow the instructions to initialize the special website repository. Then use the automatic page generator here and choose your favorite template. You should end with a boilerplate index.html that you can get to via [username].github.io.


There are lots of html and css tutorials online, for example codecademy. HTML is short for hypertext markup language, which means that it is not a sequential programming language, like Python, but rather a markup language. This means that it is used for modifying text with hidden tags. Other examples of markup languages are TeX and XML. Your browser reads the html file, with the tags and everything and turns it into text with different fonts, colors, structure, etc.

We are going to learn a minimal amount of html. Enough that you can make and edit your website, but not so much that you will be a web developer. This is usually sufficient since typically people make their site by using public sites and modifying it to their taste. We will also consider html files to be a source of data, so it is good to understand the basics now. The first thing you should do is to right click on this page and to click view source. You should see a bunch of tags that look like <TAG NAME>STUFF BETWEEN TAGS<\TAG NAME>, which will operate on the text between the tags. For example, you should see that a paragraph is surrounded by <p><\p>

There are several such tags to bold <strong> text, create hyperlinks <a href="site.html">, create headers <h1>, <h2>, create unordered lists <ul>, tables <table>, newline <br>, display image <img> (see the tables in this course website). The html file has two main sections, the header and the body. The header specifies metadata and scripts, and the body has the main content. If you rely on just these tags to make a website then you typically get something that looks like it's from the 1990s. Instead, people use CSS to define how different tags work, in separate stylesheet files, and then use the customized tags in the html. For example, if you look at the header of this html file, then you see the line

<link rel="stylesheet" type="text/css" href="../stylesheets/stylesheet.css" media="screen">
and then if you go to the css file and search for page-header, you find lines like this:
.page-header {
  padding: 2rem 6rem; }
This indicates that the page header has a certain amount of padding around the text. There are tons of such properties that you can modify. For the purposes of this course, you should just learn the properties that you need when you need them (google works well).

Checkpoint: Download this html file and modify things. If it doesn't look the same, then it's probably because you don't have the css file. You should download those too (you should be able to figure out how to), and then change the link tags so that the relative pathname points to the css file. Modify properties of the tags in the css until you feel like you understand the basics of what is going on.