Handbook of Regression Modeling in People Analytics: With Examples in R and Python

Keith McNulty

Introduction

As a fresh-faced undergraduate in mathematics in the 1990s, I took an introductory course in statistics in my first term. I would never take another. I struggled with the subject, scored my lowest grade in it and swore I would never go anywhere near it again.

How wrong I was. Today I live and breathe statistics. How did that happen?

Firstly, statistics is about solving real-world problems, and amazingly there was not a single mention of a relatable problem from real life in that course I took all those years ago, just abstract mathematics. Nowadays, I know from my work and my personal learning activities that the mathematics has no meaning without a motivating problem to apply it to, and you’ll see example problems all through this book.

Secondly, statistics is all about data, and working with real data has encouraged me to reengage with statistics and come at it from a different angle—bottom-up you could say. Suddenly all those concepts that were put up on whiteboards using abstract formulas now had real meaning and consequence to the data I was working with. For me, real data helps statistical theory come to life, and this book is supported by numerous data sets designed for the reader to engage with.

But one more step solidified my newfound love of statistics, and that was when I put regression modeling into practice. Faced with data sets that I initially believed were just far too messy and random to be able to produce genuine insights, I progressively became more and more fascinated by how regression can cut through the messiness, compartmentalize the randomness and lead you straight to inferences that are often surprising both in their clarity and in their conclusions.

Hence my motivation for writing this book, which is to give others—whether working in people analytics or otherwise—a starting point for a practical learning of regression methods, with the hope that they will see immediate applications to their work and take advantage of a much-underused toolkit that provides strong support for evidence-based practice.

I am a mathematician who is now a practitioner of analytics. For this reason you should see that this book is neither afraid of nor obsessed with the mathematics of the methodologies covered. It is my general observation that many students and practitioners make the mistake of trying to run multivariate models without even a basic understanding of the underlying mathematics of those models, and I find it very difficult to see how they can be credible in responding to a wide range of questions or critique about their work without such an understanding. That said, it is also not necessary for students and practitioners to understand the deepest levels of theory in order to be fluent in running and interpreting multivariate models. In this book I have tried to limit the mathematical exposition to a level that allows confident and fluent execution and interpretation.

I subscribe strongly to the principles of open source sharing of knowledge. If you want to reference the material in this book or use the exercises or data sets in trainings or classes, you are free to do so and you do not need to request my permission. I only ask that you make reference to this book as the source.

I expect this book to improve over time. If you found this book or any part of it helpful to solving a problem, I’d love to hear about it. If you have comments to improve or question any aspect of the contents of this book I encourage you to leave an issue on its Github repository. This is the most reliable way for me to see your comment. I promise to consider all comments and input, but I do have to make a personal judgment about whether they are helpful to the aims and purpose of this book. If I do make changes or additions based on your input I will make a point to acknowledge your contribution.

I would like to thank the following individuals who have reviewed or contributed to this book at some point during its development: Liz Romero, Alex LoPilato, Kevin Jaggs, Seth Saavedra, Akshay Kotha. My sincere thanks to Alexis Fink for drawing on her years of people analytics experience to set the context for this book in her foreword. My thanks to the people analytics community for their constant encouragement and support in sharing theory, content and method, and to the R community for all the work they do in giving us amazing and constantly improving statistical tools to work with. Finally, I would like to thank my family for their patience and understanding on the evenings and weekends I dedicated to the writing of this book, and for tolerating far too much dinner conversation on the topic of statistics.

Keith McNulty
December 2020