5 reasons you should learn R programming language now

Linh Le

If you’ve used JavaScript or other scripting languages, you’re going to love using R. This free, open source language is fully functional, with 25 years of history, so if you haven’t yet given it a try, now is a good time to start. Here are 5 compelling reasons why you should jump in and start using R today!

1. Benefit from full documentation and support

Online resources for R—including message boards—are well supported and well documented. Several important R developers actively participate in the online discussion of the tools and packages they designed. This is important because R programming includes a lot of community packages, from packages used to parse JSON and XML files, to packages to help create Random-Effects regression models.

The functionality of R can sometimes seem endless due to the diligent work of the R community of developers. The great documentation and participation from R developers makes it more accessible to people who are just starting out.

2. Make yourself more appealing to employers

Employers see R as an inherently useful and valuable skill. This is especially true in any industry that relies on data analysis. The prices of the popular statistical packages, especially at the enterprise level, are very high. Many employers understand that if they hire people who can use R, they can save thousands by not having to purchase proprietary statistics packages. So, set yourself apart and learn R. It might just get you hired.

3. Acquire, clean, and analyze your data in one place

By using R, you can do your data acquisition, cleaning, and analysis all in one place. Let’s have a look at a simple example of what this means.

Listing 1 shows a script that loads necessary libraries. Lines 4 through 14 perform a GET request to an NFL statistics site, and then parse the JSON response. After performing a few more formating niceties, the data is loaded into a dataframe and is ready for analysis (lines 15-16).

Listing 1. A simple example

 

The following script takes it a step further. This code creates a new database in a running local Mongo instance and inserts the dataframe containing the desired NFL statistics. The script then outputs some information, verifying that the data was properly put into the database. Any scripting language can perform these tasks, but look at how R keeps it clean and easy.

 

 

When you move into a data analysis phase, you’ll have many of the necessary tools at your fingertips.

The script in Listing 2 shows some data analysis of the football data.

Listing 2. Football data analysis

 

Here you reload the mongolite library to pull in the data just added to the database. The data came in as strings, but as you can see in the script, R’s APPLY function can quickly perform large scale changes to dataframes, such as changing characters to numerics. The rest of the script performs basic principle component analysis to see how these statistics can be generalized or combined into indices, as shown in Figure 1.

Sophisticated data analysis
Figure 1. Sophisticated data analysis

The plot shows that this group of variables doesn’t seem to have an underlying factor, or factors. This is certainly not groundbreaking, but it is sophisticated data analysis right in the same place where the data was acquired and cleaned. By learning R, you get the best of both the data analysis and general scripting worlds.

4. Take advantage of R’s excellent data visualization potential

R’s data visualization tools are great. The script in Listing 3 performs some data analysis on football data.

Listing 3. Further football data analysis

 

First, a linear model is stored in the lmod object. This model measures a relationship between yards per game and number of fumbles for the running backs in the dataset. In order to get the diagnostic plots from this linear model, just call the plot function on the lmod object. This is a huge time saver when you’re just interested in a quick glance at whether or not a linear model is appropriate for the data you’re working with.

The plots produced by this function are shown in Figure 2.

Diagnostic plots
Figure 2. Diagnostic plots

The next function call is that of ggplot. The package ggplot2 creates publication quality data visualizations with relative ease. The plot that is created by the functional call is shown in Figure 3.

ggplot2 graph
Figure 3. – A ggplot2 result

This plot does a good job of displaying the complex relationship between yards per game and touchdowns, as well as first downs. This may not be a great discovery, but the quality of the graphs produced by ggplot is excellent.

5. Get up to speed quickly

R is easy to learn. There is plenty of online community help available, and even companies offering high quality online courses. One reason R is so intuitive is that it wasn’t created for computer scientists. It was created for mathematicians, so the way the language is organized might be easier for some non-programmers to grasp.

The script in Listing 4 shows the language’s simplicity, with the results shown in Figure 4.

The script in Listing 4 creates three random samples and then creates a bar graph of the data. The sample function is called “sample” because you’re making a sample, and the probabilities are set using the prob option. data.framerefers to how the data is held.

Listing 4. Simplicity of R

The graph works by setting a base and then adding elements on top. These elements have intuitive names like geom_bar for bar graphs and the resulting graph looks pretty good for the amount of effort required to create it.

balls
Figure 4. Three random samples

Learning any language is difficult, but R is easier than most.

Summary

From helpful documentation to powerful data visualization tools, these are just some of the great reasons why you should use R. And expect more improvements and tools from the active R community in the future. Are you ready? Dive in and discover how great the R programming language is!

Share the news now

Source : https://developer.ibm.com