Useful resources for learning data science fundamentals This post is a collection of resources that I found particularly useful when I was learning the fundamentals of data science. This six-part tutorial can be worked through in a day, and hits the sweet spot for beginners by giving you enough information to understand what you are doing without overwhelming you with details. If you know another programming fundamentals of data analysis pdf, this article gives some helpful context for key areas in which R is different.

An excellent and thorough introduction to R by Roger Peng of Johns Hopkins. Peng has a deep understanding of the language, uses good coding practices, and provides a good balance of theory and practice. His lecture videos are packed with information and I highly recommend them. This one-page tutorial teaches the fundamentals of the ggplot2 package in a thoughtful order and includes a ton of useful example graphics. Although it’s geared towards novice programmers and thus glosses over details that I would have found helpful, it is still a useful first course in Python.

A bundle of written materials, video lectures, and programming assignments from an introductory two-day Python class at Google. It was a good follow-up to the Codecademy course, providing less breadth than Codecademy but more depth on the most important Python topics. A concise, well-written introduction to SQL that can easily be worked through in a day. The majority of the book focuses on retrieving, sorting, filtering, summarizing, and joining data, which are the most important SQL operations for data scientists. If you know some SQL and just need a place to practice your queries, this is a lightweight web application that allows you to run queries on a toy database and reset it at any time. Taught by Trevor Hastie and Rob Tibshirani of Stanford using their new “Introduction to Statistical Learning” textbook. It covers a wide gamut of supervised learning methods and a few unsupervised learning methods.

This is a curated list of links to news articles and research papers about how machine learning has been used to solve interesting, real-world problems. The first three chapters provided a thoughtful introduction to Git. Lots of examples, most of which are applicable to Git Bash. I’m taking this course now, and it covers a ton of data science topics in both R and Python. I have just begun taking the first few courses.

