AMZ DIGICOM

Digital Communication

AMZ DIGICOM

Digital Communication

Pandas dataframe.dropna (): presentation of the function

PARTAGEZ

The Python Pandas function DataFrame.dropna() is used to delete from a dataframe all the lines or columns which contain missing values ​​(NA). She therefore plays a crucial role, especially in the preparation and cleaning of data.

Web accommodation

Flexible, efficient and safe web accommodation

  • SSL certificate and DDOS protection
  • Data backup and restoration
  • Assistance 24/7 and personal advisor

Pandas syntax dropna()

The function dropna() takes up to five settings. The basic syntax is very simple:

DataFrame.dropna(axis=0, how=‘any’, thresh=None, subset=None, inplace=False, ignore_index=False)

python

Relevant parameters

Pandas function behavior DataFrame.dropna() can be influenced by past parameters. The most important parameters are summarized in the following table:

Parameters Description Default value
axis Determine if the lines (0 or index) or the columns (1 or columns) are deleted 0
how Indicate if all values ​​(all) or only some (any) must be no any
thresh Indicates the minimum number of non-annual values ​​that a line or a column must have to be deleted None
subset Determines which lines or columns should be considered; if Noneall columns are taken into account None
inplace Determines whether the operation is carried out in the original dataframa False
ignore_index If Truethe remaining axes will be labeled from 0 to N-1 False

Pandas application DataFrame.dropna()

Pandas dropna() is necessary to clean the data before an analysis, by deleting lines or columns with missing values. It helps avoid biases in statistical analyzes. This function also facilitates the creation of graphics and reports, because the missing values ​​can in some cases lead to erroneous representations.

Deleting lines with missing values

In the following code example, we consider a dataframe which contains nan values:

import pandas as pd
import numpy as np
# Création d'un DataFrame avec des données d'exemple
données = {
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, np.nan, 8],
    'C': [9, 10, 11, 12]
}
df = pd.DataFrame(données)
print(df)

python

Dataframa presents itself as follows:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
2  NaN  NaN  11
3  4.0  8.0  12

In the next step, we apply the Pandas function dropna() ::

## Suppression de toutes les lignes contenant au moins une valeur NaN
df_cleaned = df.dropna()
print(df_cleaned)

python

The execution of the code gives the following result:

A    B   C
0  1.0  5.0   9
3  4.0  8.0  12

Only the index lines 0 and 3 of the dataframe are still present, because all the other lines contained nan values.

Deletion of columns with missing values

Deleting columns with missing values ​​works in the same way. To do this, just define the parameter axis From 1 to 1:

## Suppression de toutes les colonnes contenant au moins une valeur NaN
df_cleaned_columns = df.dropna(axis=1)
print(df_cleaned_columns)

python

In the result, we see that only the column « C » remains, because it is the only one not to contain no value:

Application of thresh

If you only want to delete the lines that have less than two non-nan values, you can use the parameter thresh ::

## Suppression de toutes les lignes contenant moins de deux valeurs non-NaN
df_thresh = df.dropna(thresh=2)
print(df_thresh)

python

After the execution of the code, the first line is now present, because it contains two non-nan values:

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
3  4.0  8.0  12

Use of subset

The parameter subset is used to specify the specific columns in which the missing values ​​must be sought. Only lines that have missing values ​​in the specified columns will be deleted.

## Suppression de toutes les lignes contenant un NaN dans la colonne « A » :
df_subset = df.dropna(subset=['A'])
print(df_subset)

python

We note that only the index line 2 was deleted, because it contained a nan value in the « A » column. The other lines are kept, even if they contain nan in other columns.

A    B   C
0  1.0  5.0   9
1  2.0  NaN  10
3  4.0  8.0  12

Télécharger notre livre blanc

Comment construire une stratégie de marketing digital ?

Le guide indispensable pour promouvoir votre marque en ligne

En savoir plus

Souhaitez vous Booster votre Business?

écrivez-nous et restez en contact