The indexing of data in Pandas Python allows effective and direct access to specific data within a dataframa. The use of a Pandas DataFrame index allows you to select specific lines and columns, which can considerably facilitate data analysis.
Web accommodation
Flexible, efficient and safe web accommodation
- SSL certificate and DDOS protection
- Data backup and restoration
- Assistance 24/7 and personal advisor
What's going on during indexing?
The indexing of a dataframa pandas aims to Facilitate the selection of specific dataframe elements. We can thus select lines and columns depending on their positions or their labels. Indexes can help find and process data faster by providing A kind of « address system » for the data structure.
Pandas syntax DataFrame.index
You can see the index values of a dataframe pandas with the property index
. The syntax is as follows:
Dataframas indexing syntax
There are several ways to index dataframes pandas. The indexing syntax varies depending on the desired operation.
Indexing with labels (column names)
DataFrames pandas can use column names for indexing. To do this, we will first create an example of DataFrame:
import pandas as pd
# Création d'un DataFrame d'exemple
données = {
'Nom': ['Alice', 'Bob', 'Charlie'],
'Âge': [25, 30, 35],
'Ville': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(données)
print(df)
python
Dataframa presents itself as follows:
Nom Âge Ville
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
If you want to access all the values of a specific column, you can use its name in combination with the operator []
. Just specify the name of the column in the indexing operator in the form of a chain (string) Python:
# Accès à la colonne « Âge »
print(df['Âge'])
python
You get the different age values as a result:
0 25
1 30
2 35
Name: Âge, dtype: int64
If you are not interested in one, but in several columns, you can simply specify their names, separated by commas, in the indexing operator.
Indexing with loc[]
(line name)
If you want to access a specific line of your dataframe, you need the Pandas Indexer loc[]
. You then pass either the name of the line or the line number. In the following code example, we consider the same dataframe as before; We want to extract the front line containing the values for « Alice »:
As expected, the values corresponding to « Alice » are visible in the result:
Nom Alice
Âge 25
Ville New York
Name: 0, dtype: object
Indexing with iloc[]
(lines of lines and columns)
Another way to access specific elements of your dataaframa is to use line and column numbers. To work with the digital index of the dataframe pandas, you need the DataFrame Iloc property[].
# Accès à la première ligne
print(df.iloc[0])
# Accès à la valeur dans la première ligne et la deuxième colonne
print(df.iloc[0, 1])
python
The results of using iloc[]
look like this and refer the expected values:
Nom Alice
Âge 25
Ville New York
Name: 0, dtype: object
Index individual values
If you are only interested in a specific value of your dataframe, the indexor at
is an effective way to extract this value. Simply define the line and column in which the value must be found, with their names. Thus, if the place of residence of Bob is interesting, we need the « city » column and the first line:
print(df.at(1, 'Ville'))
python
As requested, the exit is the city of residence of Bob, ie « Los Angeles ».
You can also use the Indexer iat
which works in the same way as at
but who awaits the position instead of the name. The same result as in the example of previous code is obtained with the use of iat
::
print(df.iat[1, 2])
python
Boolean indexation
It is possible to index sub-assemblies of a dataframa based on a particular condition. In this case, we are talking about Boolean indexing. The condition to be verified must be assessed at True
or at False
and is placed directly in the indexing operator. To extract only the lines in which the person is over 30 years old, we can proceed as follows:
# Sélection des lignes où l'âge est supérieur à 30
print(df[df['Âge'] > 30])
python
The above condition only applies to « Charlie », 35 years old. The outing is therefore as follows:
Nom Âge Ville
2 Charlie 35 Chicago
Indexing is a fundamental pandas tool which allows effectively access to data and to extract sub-assemblies relevant for analysis.
Note
Note that in Boolean indexing, you can use all Boolean comparison operators that assess either to True
or towards False
. To find out more about the different Python operators, consult our guide article on the subject.