The Python Pandas function DataFrame describe() is used to create a statistical summary of the digital columns of a dataframe. This summary contains significant statistical indicators such as the average, the standard deviation, the minimum, the maximum and the different quantiles of the data.
Web accommodation
Flexible, efficient and safe web accommodation
- SSL certificate and DDOS protection
- Data backup and restoration
- Assistance 24/7 and personal advisor
Pandas function syntax describe()
The basic syntax of the Pandas function describe() For a dataaframa is quite simple and looks like the following:
DataFrame.describe(percentiles=None, include=None, exclude=None)
python
Relevant parameters for pandas DataFrame.describe()
Some settings allow you to customize the output of the function describe(). These parameters are as follows:
| Parameters | Description | Default value |
|---|---|---|
percentiles
|
List the quantiles to be included in the description | [.25, .5, .75]
|
include
|
Determines the types of data to be included in the description; The possible values are numpy.number,, object,, all Or None
|
None
|
exclude
|
Determines what types of data should be excluded from the description; values similar to include
|
None
|
Definition
Statistical quantiles are values that divide an ordered set of data into equal size sections and indicate which percentage of data points is lower than this threshold. They are used to understand data distribution and may include, for example, the median (50th centile), the 25th and 75th centile.
Examples of use of pandas describe()
Pandas function DataFrame.describe() It is mainly used when a quick overview of the main statistical ratios of a data set is desired.
Example 1: Statistical summary of digital data
In the following example, we consider the dataframe df which contains a series of different sales data.
import pandas as pd
import numpy as np
# Exemple de DataFrame avec des données de ventes
données = {
'Produit': ['A', 'B', 'C', 'D', 'E'],
'Quantité': [10, 20, 15, 5, 30],
'Prix': [100, 150, 200, 80, 120],
'Revenu': [1000, 3000, 3000, 400, 3600]
}
df = pd.DataFrame(données)
print(df)
python
We can now use Pandas describe() To obtain a statistical summary of digital columns:
summary = df.describe()
print(summary)
python
Pandas function call DataFrame.describe() provides the following output:
Quantité Prix Revenu
count 5.000000 5.000000 5.000000
mean 16.000000 130.000000 2200.000000
std 9.617692 46.904158 1407.124728
min 5.000000 80.000000 400.000000
25% 10.000000 100.000000 1000.000000
50% 15.000000 120.000000 3000.000000
75% 20.000000 150.000000 3000.000000
max 30.000000 200.000000 3600.000000
The statistical indicators provided by describe() have the following meaning:
count: Number of non -zero values in each columnmean: average values (also visible withDataFrame.mean()))std: standard deviation of valuesmin, 25%, 50%, 75%, max: minimum, 25th centile, median (50th centile), 75th centile, maximum values
Example 2: Quantile adjustment
It is possible to personalize Pandas DataFrame.describe() With the parameters already described in order to take into account specific quantiles:
# Résumé statistique avec des quantiles personnalisés
custom_summary = df.describe(percentiles=[0.1, 0.5, 0.9])
print(custom_summary)
python
The appeal of function, taking into account the selected quantiles (10%, 50%(median) and 90%), returns the following output:
Quantité Prix Revenu
count 5.000000 5.000000 5.000000
mean 16.000000 130.000000 2200.000000
std 9.617692 46.904158 1407.124728
min 5.000000 80.000000 400.000000
10% 7.000000 88.000000 640.000000
50% 15.000000 120.000000 3000.000000
90% 26.000000 180.000000 3360.000000
max 30.000000 200.000000 3600.000000

