I had to do some profiling of some functions, so needed to delve into the basics of R profiling options.

There seem to be some more heavyweight stuff available in R, including summaryRprof(), the proftools package, and the profr package. Hadley Wickham wrote the useful *lineprof* which can be had via

devtools::install_github("hadley/lineprof")

Another useful package to go ahead and install is the microbenchmark package.

So, with these tools and the functions that they provide, we can do some simple profiling. Let’s look at a simple profile of three different functions in R and look at the amazing difference between these implementations, as an insightful exercise for the reader. First, let’s create some sample data to work with.

Now, let’s define three different functions, one using R vectorization, one using a hybrid, and one defining the same loop behavior but with an explicit for loop.

You can see the results of all these functions, they are all idempotent in their results. But what about their performance. On my deck, I got

> microbenchmark(f(data), g(data), h(data))
Unit: nanoseconds
expr min lq mean median uq max neval
f(data) 743 859.0 1327.09 1002.5 1339.5 12740 100
g(data) 1264 1411.0 2063.42 1722.5 2097.0 12287 100
h(data) 83717 85269.5 90690.37 87546.0 94976.5 136717 100

which shows how costly not using vectorization is in a simple R scenario. Vectorize! Always vectorize! But `microbenchmark()` is simple to use and gives good enough results for first order analysis. There is much more you can do with `lineprof()`, etc., and I’ll return to those. Here are a couple of links for the meanwhile. Hadley’s page on profiling and a nice, unrelated page on debugging page from Duncan Murdoch.

*Related*