How do Decision Trees and Random Forests Work?

Part 1: Decision Trees

Decision trees and random forests are two commonly used algorithms in predictive modeling. In this article, I’m going to discuss the process behind decision trees. I’m planning to follow this up with a second part that discusses random forests, and then compare the two.

First off: decision trees. A decision tree is named for the shape of the plot that comes out. The image below shows a decision tree for deciding what factors affected survival from the Titanic disaster.

Problems With Graphing Percentages

Sometimes you need to know when to break the rules

When creating graphs with a numeric y-axis, we are often given the rule to “start with zero”. But as with all rules, we need to know when to keep that rule and when to break it. Given that, I’d like to talk about plotting percentages.

I recently took a class on data visualization and would often hear some classmates mention this very rule (maybe not in those exact words) like it‘s gospel. When I would respond, I would typically say something like, “but percentages are weird.”

My background is in chemistry, so one of the first things I think of…

A beginner’s guide

No, not that kind of rattle — although it may seem like you need one some times.

The “rattle” package provides a GUI interface to R functionality — more than that provided by RStudio. Here’s what you get when you run rattle() from within R.

Rattle is useful for many things. If you want to run a quick-and-dirty model? It’s great. Want to see what your data look like? It’s good for that, too. But probably the most useful aspect for novice R programmers trying to learn R, is the “Log” tab. Every time you tell…

Creating a Custom R Package

Create a central location for all your user-defined functions.

I don’t know about you, but I find the method that R uses to sort data frames incredibly non-intuitive. Every time I want to sort a data frame, I have to puzzle through how to do it, usually by looking up online how somebody else did it, then copying and modifying their code. Finally, I decided to create an easy-to-use function that could sort data frames for me. Here’s the latest version of the function that I came up with:

`sortby <- function(df, col, desc = FALSE) {    #### df is the dataframe to be sorted…`

Dick Brown

Ex-chemist, Future data analyst

Get the Medium app