The fuss about R



My R journey started a few years back. Before R programming, I didn’t really enjoy creating software. I mean I had to do a lot of C++ during engineering at uni, besides learning many other languages, but it never really aroused a sense of fondness. I’d never create applications unless an assignment or later a work related project necessarily required it.


This changed when I got introduced to R. There was just something about it. Maybe it was that the vectorized mathematical way of thinking which I’ve been fond off, which finally found a method for expression. Or maybe it was actually the amazing number of high level libraries or ‘packages’ which made it seem that any statistical problem was just a Google search away.


To now, a few years later. It’s been a few months since I’ve been working on creating a Hadoop production cluster which has for the time being taken me away from creating data models in R. I’d push myself once a while to read a few random blogs to stay up to date, however didn’t actually open R Studio and write some code.


Now that I’ve rekindled my love for R today while stumbling upon a blog post by Hadley Wickham, I thought I’d write something to start off. So at random here goes..


     Do not underestimate Base R

Yes surely there is a package for almost everything one could think of, and there are solutions for a large number of everyday data problems in the R community. But for one to truly understand the language, it is important for an advanced grasp of Base R, which means R without the fancy high level libraries.


     Aim for expertise

R, when considering the thousands of packages, is a massive environment to truly master. To master R would probably mean to be reasonably competent in the following – Base R, the ‘good’ packages, along with having at least one respectable package in CRAN which is used by members of the community. The landscape is huge so it would be better to pick an area and focus on that.


    Get good at Shiny R

It is obvious that data models, apart from academics, need to be not only competent in performance but also in style. Although I’ve recently discovered how cool Jquery can be for dynamic data visualizations, being an R enthusiast I’d like to build a few solid apps using Shiny.


Practice with real data sets

Nothing beats regular coding. Especially working with real dirty datasets, connecting with databases and pulling of varying sizes and formats, exploring new models and techniques and testing different types of algorithms on structured and unstructured data sets.


  Advanced R by Hadley Wickham

This book is not for newbies, but really works well for the reasonably experienced user. It teaches about the R internals, and enhances understanding beyond just being a user.


  Read CRAN documentation

The key to creating great R packages would be writing new C functions and then calling them from R. The RCpp package makes this easier, but understanding the fundamental C source code would surely take someone a notch ahead of other users.