R squared (R2) is a statistical error measurement particularly used for measuring the quality of linear regressions. In R programming, it can be calculated by calling a simple function.
Why is R2 in R important?
R squared is a statistical measure that measures the quality of fit of a linear regression model to the data. It accepts values between 0 and 1, and is a central measure for the quality of regression models.
An interpretation of R-squared indicates the density of observed data on a calculated regression line. In this case, The higher the R-squared value, the better the model interprets data. A lower R-squared value demonstrates an inadequate model fit.
Advice
With R you can program a wide range of different applications. If you want to host one of them, having your own web hosting is the solution. At IONOS you can rent web hosting tailored to your individual requirements thanks to several tariffs.
R-squared in R and linear regression
R squared in R is often used in the context of linear regression. R being a programming language often used in the field of statistics, it is not surprising that various R functions can help with the calculation:
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)
model <- lm(y ~ x)
R
In the above code example, first, two R vectors named x And y are created, which contain the data sets used to run the linear regression. In this case, the dependent variable is the variable y. The regression model is then calculated with the R function “lm()” and stored in the variable model.
Calculate R-squared in R : calculate R2 in R
The value of R2 in R can be obtained using a function. To do this, you only needno in-depth mathematical knowledgebut just knowing how to use the right function. It's even child's play when you already have the basics of programming.
The function that can be used to calculate the statistical measure is called “summary()”. As the name suggests, it summarizes the regression analysis, including the R-squared value. The following code example, which builds on the already calculated linear regression, illustrates the use of the “summary()” function:
# Appeler la valeur de R carré
summary(model)$r.squared
R
With this code you can extract the R-squared value from the lm_model linear regression model. The R-squared value indicates how well the model interprets the variance in the dependent variable y, based on the independent variable x.
In the above code example, the “summary()” function is used with an already calculated regression model. At the same time, the R operator “$” is used to display only the R-squared value among the values that return the function call. In our example, the value is 0.6.
Advice
Want to dive deeper into the world of R programming? Articles from our guide will help you:
R-squared in R: interpretation of the value
Once the value of R-squared is determined, the next question is how to interpret the result. To do this, you need to find the specific interval that can accept the value. As previously stated, the range of values for the R2 value is between 0 and 1.
- 0 (no adjustment) : an R-squared value of 0 means that the model does not generally fit the data. In this case, there is no linear relationship between the variables being investigated.
- 1 (perfect fit) : an R-squared value of 1 indicates that all observations fit the regression line perfectly. This is extremely rare and can sometimes indicate overfitting.
- 0.7 to 0.9 (good fit) : an R-squared value in this range indicates that the model is most likely sufficiently well described by the data.
- 0.5 to 0.7 (acceptable fit) : an R-squared value in the range of 0.5 to 0.7 is acceptable, but indicates that there is still room for improvement. The corresponding model can therefore be further improved.
- Less than 0.5 (inappropriate fit) : an R-squared value less than 0.5 indicates that the calculated model does not describe the underlying data accurately enough. In this case, the model must be adjusted in order to receive relevant results.
Note
A higher R-squared value alone is not enough to judge the quality of your model. Other factors, such as model validation, residual analysis, and adjustment to specific data requirements, must also be considered when determining the quality of a regression model. The “summary()” function already introduced above provides some additional indicators that you can invoke during evaluation.