What I am currently (re)learning

November 28, 2018 4 minutes read

Statistical Learning

When I did my undergrad at Clemson, I majored in Management, which was a Bachelor of Science program. As such, it heavily comprised science and math courses. I did really well in the maths and statistics classes that I took, (I think I always scored high A’s, and so I even tutored for the Advanced Multi-variable Calculus classes after I had taken them.) However, even though I was good at it, I didn’t really like higher level math all that much. I think this is partially attributable to a few reasons (which were actually major stumbling blocks to others in my classes that didn’t do so well in these courses):

Concepts are expressed in notation that is almost intentionally designed to obscure. (I found that when I trusted my intuition for how a problem should be solved, it was always more accurate than memorizing a formula with Greek letters.)
There did not seem to be a logical order to what courses and concepts were being taught… Concepts within math courses were introduced sometimes in a very disjointed fashion, with no explanation of what you were going to learn about until you got to the material, and no way to know why anything was important or had real-world applications. I could foresee very little practical use for knowing about and using things like t statistics.

Looking back on it, I think a lot of the problem was that many courses and books on math are very poorly designed. They should engage the learners and explain concepts fully and use the simplest terms possible. Ideally, they should also inform students of the reasoning behind the development of concepts that are being introduced… (Why is this an answer to a problem that needed to be solved? What’s the history behind this concept? Which mathematician solved this and why?)

Another problem was professors’ reliance on using class time to troubleshoot idiosyncracies of already archaic Excel addi-ns to perform math functions instead of teaching the why and how to those functions.

Now, I feel like I’ve come full circle. I’m working as a data scientist, and data science as a practice expects one to have a solid foundation in statistics. I am reading books in my spare time (outside of work) to refresh my memory and learn more. Right now, I am reading An Introduction to Statistical Learning, which has become something of a hallmark in the machine learning space (At least, I gather based on reviews and recommendations for it everywhere online.) It is used as a textbook for masters level courses on statistical learning / machine learning.

I’m still working through it, but so far, most everything is based on theoretical concepts that I learned and understood in undergrad, but which I then kind of dismissed from my memory because:

I discovered after graduation that they simply aren’t widely used in the business world
Hardly any of the teachers in the business-focused statistics classes would/could explain why the higher level statistical concepts are actually important to know and understand (and to their credit, statistics as a discipline in the business world was much smaller then (The proliferation of data and the need for models seems to be changing this.)

So I am (re)learning statistics to firm up my foundation. I’m glad that I’m great at manipulating data in R and have picked up a lot from engineering and software development along the way. It’s a lot easier to understand some of the concepts being discussed now. While this book seems to have a better format and design than some others that I have read, it still suffers from:

An overuse of notation at the expense of, and before explaining the concept at hand
Introductions of statistical terms or other jargon without having first provided a definition of the term.

I recently found this online:

I laughed so hard when I saw this, that I snorted! Unfortunately, this isn’t just the way that computer programmers would write arithmetic textbooks. It’s actually the way that statistics textbooks are often written in real life.

Algorithms

On a brighter note, I recently finished reading Algorithms to Live By. It’s a great book, and I can’t recommend it highly enough. It combines technical concepts and definitions about algorithms with their historical development and practical uses, in a style that makes for satisfying and easy reading. I learned more about historical development of computer science and common algorithms from this easy read than I did in all of my computer science courses.