Machine Learning and Statistics: Recommended Texts

machine learning

For learning the latest machine learning and statistics techniques, I’ve found certain guides to be much more useful than others.

The two main languages that I currently rely on are R and Python. I’ve found that R comes out on top when it comes to statistical analysis. It is an environment designed specifically for that purpose.

However, Python trumps R as a more general-purpose programming language. Namely, it is very useful when it comes to implementing data science algorithms across web apps or in conjunction with other programming languages.

In spite of their differences, Python and R are two cornerstone languages in data science.

Here are the texts that I would recommend for learning about the use of these languages in data science:

Discovering Statistics Using R by Andy Field, Zoe Field, and Jeremy Miles

The Discovering Statistics Using R text has become one of my standard references for all things R.

Not only does the book go through how to implement commands in the R environment, it also covers the building blocks behind statistics and regression analysis.

In this text, you will learn about both OLS and logistic regressions, as well as non-parametric tests, mixed designs, handling of categorical data, and factor analysis.

Introductory Time Series with R by Paul Cowpertwait and Andrew Metcalfe

I find this title to be excellent in covering both the basics of time series analysis. This includes areas such as time series decomposition, forecasting, and stationarity, to more advanced topics such as spectral analysis, state space models, and system identification.

The authors also provide quite comprehensive exercises at the end of each section. This ensures understanding of the fundamentals of time series analysis as well as how to implement the models in R.

Python for Probability, Statistics and Machine Learning by José Unpingco

While I’m only recommending one text on Python here, it has to be this one. The text is great at covering the functions of all of the core packages used in Python. This includes numpy, matplotlib, scipy, and others used for implementing statistical models. You will also learn about the IDEs (Integrated Development Environments) for which to use Python.

However, the book also goes into detail on the theory behind the frequently (and not so frequently) used models in the areas of statistics, probability, and machine learning. As well as learning Python, one learns a lot about data science itself through this guide.

Web Application Development with R Using Shiny by Chris Beeley

For all the strengths of R, one of the challenges is being able to transfer the input and output to a more intuitive interface. This is where Shiny comes in.

Shiny allows the user to create web applications that allow for more intuitive input and output. A user could choose to create a web page where a slider manipulates certain variables, while a dynamic graph of the output updates automatically.

I’ve personally found Beeley’s text to be excellent in learning how Shiny interacts with R. The author goes through how Shiny interacts with web languages such as HTML and JavaScript. He also goes through how to structure the server and UI codes to create reactive applications.

If you regularly use R and are looking to create interactive applications with Shiny, I highly recommend Beeley’s guide as a standard resource.

Disclaimer: These texts are ones that I have found personally useful and would recommend to learn data science. I have no business relationship with the above authors, and am not receiving any form of compensation for promoting these texts.