Pandas function DataFrame.merge() served to merge two data -dataframas using common keys (keys). This makes it possible to effectively combine data from different sources in order to carry out more complete analyzes.
Web accommodation
Flexible, efficient and safe web accommodation
- SSL certificate and DDOS protection
- Data backup and restoration
- Assistance 24/7 and personal advisor
Pandas function syntax merge()
The Python Pandas DataFrame method merge() can take into account a whole series of different parameters that influence the way in which the dataframas are combined. The general syntax of the function merge() is as follows:
DataFrame.merge(left, right, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)
python
Note
Pandas function merge() is similar to the SQL Join operation in relational databases. Therefore, if you already know database languages such as SQL, you may have more facilities to understand how the Pandas DataFrame method works merge(). Note, however, that the behavior may vary: if the two key columns contain zero values, these will also be merged.
Relevant parameters
Using the different parameters than Pandas merge() Accept, you specify not only the dataframes pandas to combine, but also the type of joint and other details.
| Setting | Description | Default value |
|---|---|---|
left
|
First data to connect | |
right
|
Second data to connect | |
how
|
Type of joint operation to be performed (inner,, outer,, left Or right))
|
inner
|
on
|
Column or index level to be used as a key; must be present in the two dataframes | |
left_on
|
Column or index level of the left data, used as a key | |
right_on
|
Column or index level of the right dataframa, used as a key | |
left_index
|
If Truethe left -wing data index will be used as a key
|
False
|
right_index
|
If Truethe right -of -right data index will be used as a key
|
False
|
sort
|
If Truethe resulting dataframa keys will be sorted lexicographically.
|
False
|
suffixes
|
Suffixes used to make columns of the same name unique | ("_x", "_y")
|
copy
|
If Falsethe copy is avoided
|
True
|
indicator
|
Add a column indicating the origin of the lines after merger (both,, left_only,, right_only))
|
False
|
Pandas application merge()
Several examples can help understand how the Pandas function works merge().
INNER JOIN (internal joint)
Sql INNER JOIN connects two dataframes pandas and only returns the lines whose keys correspond to the two dataaframes. All other lines are excluded from the result. To do this, two dataframas are created with example data:
import pandas as pd
# DataFrames d'exemple
df1 = pd.DataFrame({
'Clé': ['A', 'B', 'C'],
'Valeur1': [1, 2, 3]
})
df2 = pd.DataFrame({
'Clé': ['B', 'C', 'D'],
'Valeur2': [4, 5, 6]
})
print(df1)
print(df2)
python
The two resulting dataframes present themselves as follows:
Clé Valeur1
0 A 1
1 B 2
2 C 3
Clé Valeur2
0 B 4
1 C 5
2 D 6
We can now perform a INNER JOIN Using the function merge() ::
# Jointure interne (INNER JOIN)
result = pd.merge(df1, df2, how='inner', on='Clé')
print(result)
python
The release shows that in this example, only the lines with keys B and C are included in the final dataframa, because they are present in The two dataframes original.
Clé Valeur1 Valeur2
0 B 2 4
1 C 3 5
OUTER JOIN
A OUTER JOIN Merge two dataframas while retaining all the lines of the two sets. If a key does not correspond in one of the dataframas, the missing values are replaced by NaN.
# Jointure externe (OUTER JOIN)
résultat = pd.merge(df1, df2, how='outer', on='Clé')
print(résultat)
python
As expected, the dataframa resulting from the merger includes All the lines of the two dataframes. For key A, which only exists in df1and the key D, which only exists in df2the missing values are inserted as NaN.
Clé Valeur1 Valeur2
0 A 1.0 NaN
1 B 2.0 4.0
2 C 3.0 5.0
3 D NaN 6.0
Note
All other known variants of JOIN work almost in the same way.
Use of left_on And right_on
Sometimes the two dataframas have different key column names. In this case, you can use the settings left_on And right_on To indicate which columns should be used. To do this, two new data are first created:
import pandas as pd
# Création des DataFrames d'exemple
df3 = pd.DataFrame({
'Clé': ['A', 'B', 'C'],
'Valeur1': [1, 2, 3]
})
df4 = pd.DataFrame({
'Clé2': ['B', 'C', 'D'],
'Valeur2': [4, 5, 6]
})
print(df3)
print(df4)
python
The two dataframas present themselves as follows:
Clé Valeur1
0 A 1
1 B 2
2 C 3
Clé2 Valeur2
0 B 4
1 C 5
2 D 6
To perform the operation JOIN With different keys, the parameters left_on And right_on are now specified:
# Jointure avec des noms de colonnes de clés différents
result = pd.merge(df3, df4, how='inner', left_on='Clé', right_on='Clé2')
print(result)
python
Using explicitly left_on='Clé' And right_on='Clé2'the corresponding key columns are used for connection.
Clé Valeur1 Clé2 Valeur2
0 B 2 B 4
1 C 3 C 5
Use of indexes as keys
You can also use the Dataframa indices as connection keys By defining the parameters left_index And right_index has True. Two new data with indexes are first created:
df5 = pd.DataFrame({
'Valeur1': [1, 2, 3]
}, index=['A', 'B', 'C'])
df6 = pd.DataFrame({
'Valeur2': [4, 5, 6]
}, index=['B', 'C', 'D'])
print(df5)
print(df6)
python
The dataframas created in the above code are as follows:
Valeur1
A 1
B 2
C 3
Valeur2
B 4
C 5
D 6
An operation JOIN can now be carried out on the basis of the indexes:
# Jointure avec les index
result = pd.merge(df5, df6, how='inner', left_index=True, right_index=True)
print(result)
python
Not surprisingly, the result is a JOIN based on data indexes:
Valeur1 Valeur2
B 2 4
C 3 5
The function merge() is an essential tool to effectively combine data -based dataframas depending on different rules of joint. It is inspired by SQL joints and allows optimal flexibility to handle python datasets.

