Part 1: Decision Trees

Photo by veeterzy (Vanja Terzic) on Unsplash

Decision trees and random forests are two commonly used algorithms in predictive modeling. In this article, I’m going to discuss the process behind decision trees. I’m planning to follow this up with a second part that discusses random forests, and then compare the two.

First off: decision trees. A decision tree is named for the shape of the plot that comes out. The image below shows a decision tree for deciding what factors affected survival from the Titanic disaster.

Sometimes you need to know when to break the rules

Image by author

When creating graphs with a numeric y-axis, we are often given the rule to “start with zero”. But as with all rules, we need to know when to keep that rule and when to break it. Given that, I’d like to talk about plotting percentages.

I recently took a class on data visualization and would often hear some classmates mention this very rule (maybe not in those exact words) like it‘s gospel. When I would respond, I would typically say something like, “but percentages are weird.”

My background is in chemistry, so one of the first things I think of…

A beginner’s guide

From @thebossbaby on

No, not that kind of rattle — although it may seem like you need one some times.

The “rattle” package provides a GUI interface to R functionality — more than that provided by RStudio. Here’s what you get when you run rattle() from within R.

Rattle screenshot by author

Rattle is useful for many things. If you want to run a quick-and-dirty model? It’s great. Want to see what your data look like? It’s good for that, too. But probably the most useful aspect for novice R programmers trying to learn R, is the “Log” tab. Every time you tell…

Create a central location for all your user-defined functions.

Image by author

I don’t know about you, but I find the method that R uses to sort data frames incredibly non-intuitive. Every time I want to sort a data frame, I have to puzzle through how to do it, usually by looking up online how somebody else did it, then copying and modifying their code. Finally, I decided to create an easy-to-use function that could sort data frames for me. Here’s the latest version of the function that I came up with:

sortby <- function(df, col, desc = FALSE) {    #### df is the dataframe to be sorted…

Dick Brown

Ex-chemist, Future data analyst

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store