The Python Pandas function DataFrame.groupby() allows group data according to certain criteria And to carry out various aggregations and transformations on these groups.
Web accommodation
Flexible, efficient and safe web accommodation
- SSL certificate and DDOS protection
- Data backup and restoration
- Assistance 24/7 and personal advisor
Pandas syntax DataFrame.groupby()
Pandas function groupby() takes up to four parameters. The basic syntax is as follows:
DataFrame.groupby(by=None, level=None, as_index=True, sort=True, group_keys=True, dropna=True)
python
Relevant parameters
| Parameters | Description | Default value |
|---|---|---|
by
|
Key or python list of keys for grouping; do not combine with level only for multi-index
|
None
|
level
|
Used in multiple indexes to specify one or more levels, so that the grouping is done according to specific levels | None
|
as_index
|
If Truegroup keys are defined as index of the resulting data
|
True
|
group_keys
|
If Truegroup keys are included in group indexes
|
True
|
sort
|
If Truesort the groups in an increasing order of the keys
|
True
|
dropna
|
If Trueexcludes groups containing only values NaN
|
True
|
Pandas application DataFrame.grouby()
Pandas function groupyby() is particularly useful for Analyze and aggregate large amounts of data In order to detect models or anomalies.
Gather and aggregate
Here we consider a set of data on products of products which contains information on the date of sale, the product sold and the quantity sold:
import pandas as pd
# Exemple de jeu de données avec des ventes de produits
données = {
'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-03'],
'Produit': ['A', 'B', 'A', 'B', 'A'],
'Quantité': [10, 20, 15, 25, 10]
}
df = pd.DataFrame(données)
print(df)
python
The resulting dataframa looks like this:
Date Produit Quantité
0 2021-01-01 A 10
1 2021-01-01 B 20
2 2021-01-02 A 15
3 2021-01-02 B 25
4 2021-01-03 A 10
The next step is to group all of the data by product. For this, we use Pandas groupby(). Then, the sum of the amount sold for each product is calculated using the function sum() ::
# Regrouper par produit et calculer la somme des quantités vendues
somme = df.groupby('Produit')['Quantité'].sum()
print(somme)
The result indicates how many units of each product have been sold in total:
Produit
A 35
B 45
Name: Quantité, dtype: int64
Multiple aggregations
The following example uses a similar, but extended data set, which also contains turnover:
import pandas as pd
# Création d'un DataFrame avec des ventes de produits et chiffres d'affaires
données = {
'Date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-03'],
'Produit': ['A', 'B', 'A', 'B', 'A'],
'Quantité': [10, 20, 15, 25, 10],
'Revenu': [100, 200, 150, 250, 100]
}
df = pd.DataFrame(données)
print(df)
python
The following dataframa is therefore considered:
Date Produit Quantité Revenu
0 2021-01-01 A 10 100
1 2021-01-01 B 20 200
2 2021-01-02 A 15 150
3 2021-01-02 B 25 250
4 2021-01-03 A 10 100
The data is again grouped by product using Pandas DataFrame.groupby(). Then the function agg() is used to aggregate depending on the total quantities sold and sales, as well as the average turnover per product.
# Regrouper par produit et appliquer plusieurs agrégations
groupes = df.groupby('Produit').agg({
'Quantité': 'sum',
'Revenu': ['sum', 'mean']
})
print(groupes)
The result looks like the following:
Quantité Revenu
sum sum mean
Produit
A 35 350 116.67
B 45 450 225.00
Thanks to the function groupby() From Python, it is possible to explore data sets effectively, by applying complex aggregations to analyze trends and performance.

