The R programming language is a popular statistical programming language used mainly in science and mathematics for statistical calculation. R is an interesting language with distinct characteristics. Once you get used to it, working with this language is quite fun.
What is R programming language?¶
R is not a general programming language such as Java or Python. On the contrary, it is especially useful in the particular area of Statistical Computing, or statistical calculation in French. In this area, R has been among the 20 most popular programming languages for years, despite strong competition.
What makes R special is not just the language itself, but all of its features. Programming in R usually takes place in an interactive environmentcomprising reading, evaluation, printing in a loop (Read-Eval-Print Loop or REPL) and integrated help. The Open Source language is supported by a sophisticated ecosystem; the community manages the repository The Comprehensive R Archive Network (CRAN). Datasets, scientific white papers on new approaches and new data sets are also continually updated.
Together, these features make R theideal programming environment for statistics and data science. In particular, the interactive nature of the environment invites you to investigate and allows you to learn the language and underlying mathematics in a fun way.
R is a statistics and data programming language¶
R is a statistical programming language that uses various concepts such as normal distribution, statistical tests, models and regression. In addition to R, there are a number of comparable scientific languages. In addition to the commercial product Matlab, the newer Julia language should especially be mentioned. Additionally, another significant competitor in recent years is Python.
Unlike Python, R has native support for statistical programming. The difference is how language works in relation to values. The data is most often multiple, so that R usually calculates multiple values at once. While in almost every other language the simplest value is a single number, R is a special case.
Mathematical operations can be performed in any programming language. Let’s illustrate R’s approach to data processing with a simple example. Here we add two numbers:
So far nothing unusual. However, the same add operation can be applied to a list of numbers in R. We combine two numbers into a list and add a constant value:
# returns 15, 25
c(10, 20) + 5
R
A surprising result for experienced programmers. Even a modern, dynamic language like Python doesn’t allow this:
# throws an error
[10, 20] + 5
R
With R you can even add two lists. List items are not combined, but the appropriate mathematical operation is performed for each element :
# returns 42, 69
c(40, 60) + c(2, 9)
R
In older languages such as Java or C++, you need a loop to process multiple elements of a list. These languages strictly separate individual values, scalars, composite data structures, and vectors. With R, the vector is the base unit. A scalar is a special case in programming in R as a single element vector.
For statistics, the mathematical precision has been softened, because we expects there to be uncertainties and imperfect data compared to reality. Something can always go wrong. Fortunately, R is glitch tolerant to a certain extent. The language can handle missing values without causing a running script to crash.
Let us illustrate the robustness of this language with an example. Normally, you can cause an error in any programming language by dividing a number by zero. However, with R, the value Inf is noted as the result of division by zerowhich allows it to be easily filtered from the data in a subsequent cleaning step:
# list of divisors, containing zero
divisors = c(2, 4, 0, 10)
# returns `c(50, 25, Inf, 10)`
quotients = 100 / divisors
# filter out Inf; returns `c(50, 25, 10)`
cleaned_quotients = quotients[quotients != Inf]
R
R supports OOP and functional programming¶
Programming with R is extremely flexible, this language cannot be clearly classified in the hierarchy of programming paradigms. It is supported by an OOP system, but the usual class definitions will not be found. In everyday use, functional and imperative approaches are mainly used, especially functional characteristics, which are well suited to data processing, are particularly marked.
Similar to JavaScript, this object system shines with its flexibility. Generic functions which can be applied to objects of different types are similar to Python. For example, in R programming we find the function length()similar to len() of Python.
How does R programming work?
D Programming in R is about data, because it is what statistics are based on. To develop a solution to a problem, you need a set of data. Unfortunately, this set often does not exist at development time. We therefore often start a programming project in R by simulating data. We write the code, test the functionality and later exchange the test data for real data.
How is R code executed?
Like Ruby or Python, R is a dynamic, interpreted scripting language. Unlike the C language, R does not separate source code and executable code. The development is mainly interactive, that is to say the interpretation is fed line by line with the source code, executed in real time. Variables are automatically generated as needed and names are bound at runtime.
The effect of this interactive and dynamic programming is comparable to be inside the running program. Objects already created can be examined and modified, so new ideas can be tested immediately. The command help provides access to syntax and function documentation:
# view help for `for` syntax
help('for')
# view help for `c()` function
help(c)
R
Script files can be loaded dynamically from the interpreter. The “source” command works like the equivalent “shell” command. When called, the contents of a source R file are read and introduced into the current session:
source('path/to/file.r')
R
What is the syntax of the R programming language?¶
The scripting language uses the braces known from C and Java to separate the body of functions and control statements. Unlike Python, Code indentation does not affect its functionality. Comments start with a hash, like in Ruby and Python, and no semicolon is required at the end of a statement.
With some experience, the R code can be recognized easily thanks to certain particularities of the language. In addition to the equal sign as an assignment operator, two arrow operators are used for assignments. To reverse the direction of the assignment:
# equivalent assignments
age <- 42
'Jack' -> name
person = c(age, name)
R
Another typical feature of R code is a pseudo-object notation Of type object.method() :
# test if argument is a number
is.numeric(42)
R
Function is.numeric() appears as a method numeric() which belongs to a named object is. However, this is not the case. In R programming, the point is a regular characterthe function can therefore be called is_numeric instead of is.numeric.
The c() concatenation function is used to generate the omnipresent vectors in R programming:
people.ages <- c(42, 51, 69)
R
If the function is applied to vectors, they are combined to form a contiguous vector :
# yields `c(1, 2, 3, 4)`
c(c(1, 2), c(3, 4))
R
Unlike most programming languages, R starts indexing the elements of a vector at 1. This requires some adaptation, but helps avoid the dreaded “off-by-one errors”. The highest index of a vector corresponds to the length of the latter:
# create a vector of names
people <- c('Jack', 'Jim', 'John')
# access the first name
people[1] == 'Jack'
# access the last name
people[length(people)] == 'John'
R
As with Python, programming in R also includes the concept of slicing. One slice can be used for index part of a vector. This is based on sequences that are supported natively. Below we create a sequence of numbers and select part of it:
# create vector of numbers between 42 and 69
nums = seq(42, 69)
# equivalent assignment using sequence notation
nums = 42:69
# using a sequence, slice elements 3 through 7
sliced = nums[3:7]
R
How do control structures work in R programming?
The basic operations in R programming are defined for vectors. So, loops are rarely necessary because an operation is performed directly on the entire vector, with individual elements being modified. Let’s square the first ten positive numbers without a loop:
nums <- seq(10)
squares <- nums ** 2
squares[3] == 9
R
When you use a loop in R programming forit is important to note that these do not work like in C, Java or JavaScript. Without going through a loop variablewe iterate directly over the elements, as in Python:
people = c('Jim', 'Jack', 'John')
for (person in people) {
print(paste('Here comes', person, sep = ' '))
}
R
Of course, in R programming, we can use if-else as a basic control structure. However, in many cases it can To be replaced by filtering functions or by logical indexing of vectors. We create a vector with ages and filter the « over or under 18 » into two variables without the need for a loop for or a if else :
# create 20 ages between 1 and 99
ages = as.integer(runif(20, 1, 99))
# filter adults
adults = ages[ages > 18]
# filter children
children = ages[ages < 18]
# make sure everyone is accounted for
length(adults) + length(children) == length(ages)
R
For the sake of completeness, the approach equivalent with control structures :
# create 20 ages between 1 and 99
ages = as.integer(runif(20, 1, 99))
# start with empty vectors
adults = c()
children = c()
# populate vectors
for (age in ages) {
if (age > 18) {
adults = c(adults, age)
}
else {
children = c(children, age)
}
}
R
What is needed to start programming in R?¶
To start programming in R, you only need a local R installation. Installers for all major operating systems are available for download. A standard R installation includes a graphical interpreter with REPL, built-in help, and an editor. For productive coding, you should use one of the established code editors. R-Studio offers an interesting alternative for the R environment.
For which projects is R suitable?¶
R programming is frequently used in science and research, for example, bioinformatics and machine learning. However, the language is suitable for everyone projects that use statistical modeling or mathematical models. For word processing, programming in R is less advanced than Python.
The usual calculations and visualizations in spreadsheets can be replaced with R code. This results in a clear separation of concerns because data and code are not combined in cells. This allows you to write the code once and apply it to the entire data set. There is also no risk of overwriting a cell’s formula during manual changes.
For scientific publications, R is considered the gold standard. Separation of code and data enables scientific reproducibility. The mature ecosystem of tools and packages enables the creation of efficient publishing workflows. Assessments and visualizations are automatically generated from code and data and then integrated into high-quality LaTeX or RMarkdown documents.