Python Pandas read_csv() is one of the most used methods to load data from CSV files and store it in a dataframe. CSV files (Comma-separared values) are a widely used format to store tabular data and are supported by many programs.
Web accommodation
Flexible, efficient and safe web accommodation
- SSL certificate and DDOS protection
- Data backup and restoration
- Assistance 24/7 and personal advisor
Syntax of Python Pandas read_csv()
The function pandas.read_csv() Creates a dataframe pandas based on a CSV file. She can receive A wide variety of parameters which specify the behavior of the function. For the sake of clarity, we will only approach the most important and most frequently used arguments here. For a more detailed list, see the Pandas documentation dedicated.
The basic syntax of the function is simple and looks like this:
import pandas as pd
df = pd.read_csv(filepath_or_buffer, sep=',', header="infer", names=None, index_col=None, usecols=None, dtype=None, ...)
python
Relevant parameters
Below you will find an overview of the most important parameters:
| Parameters | Responsibility | Default value |
|---|---|---|
filepath_or_buffer
|
It is a python chain (path to the file) or a file stamp like an URL. | |
sep
|
This is the separator between values. | ,
|
header
|
Indicates which line is used as a header. | infer (front line)
|
names
|
If header=None is defined, you can use names To specify a python list of column names.
|
|
index_col
|
Determine which column will be used as index. | None
|
usecols
|
This parameter allows you to select the columns you want to load in the DataFrame. | None
|
dtype
|
Indicates the type of data from the columns. | None
|
Step by step guide to access CSV files
Using pandas.read_csv()you can transfer data from CSV files in only a few steps in Python.
In the following examples, we will work with a CSV file according to the following model:
1,Maxime Mortier,35,Paris,50000
2,Anna Frelon,29,Lyon,62000
3,Pierre Corbet,41,Marseille,58000
4,Lisa Beaufort,33,Toulouse,49000
5,Tom Verron,28,Bordeaux,52000
Step 1: Import the Pandas library
At first, you need to import the Pandas library into your Python script.
import pandas as pd
python
Step 2: Load the CSV file
You can now load your CSV file with the Python Pandas function read_csv(). To do this, just pass the file path to the function. In the following code example, we consider a file named Data.csvwhich is recorded in the same repertoire as the script:
df = pd.read_csv(data.csv')
python
The code saves the file in a dataframe object dfwith which you can now continue to work. Pandas automatically interprets the first line as column headersunless otherwise indicated.
Step 3: Show the CSV file
To ensure that the file has been properly loaded, it is useful to Visualize the first lines of the DataFrame. To do this, you can use the function DataFrame.head(). It displays by default the first five lines of the dataframe. This allows you to have a quick overview of the data structure:
The exit is then as follows:
0 1 Max Mortier 35 Paris 50000
1 2 Anna Frelon 29 Lyon 62000
2 3 Pierre Corbet 41 Marseille 58000
3 4 Lisa Beaufort 33 Toulouse 49000
4 5 Tom Verron 28 Bordeaux 52000
Step 4: Modify the name of the columns (optional)
If your CSV file does not have a header line, you can manually define column names:
df = pd.read_csv('data.csv', header=None, names=['Colonne1', 'Colonne2', 'Colonne3', 'Colonne4', 'Colonne5'])
python
In this example, the columns have been manually named Column1, column2, column3, column4 and column5. The code returns the result:
Colonne1 Colonne2 Colonne3 Colonne4 Colonne5 Colonne6
0 1 Max Mortier 35 Paris 50000
1 2 Anna Frelon 29 Lyon 62000
2 3 Pierre Corbet 41 Marseille 58000
3 4 Lisa Beaufort 33 Toulouse 49000
4 5 Tom Verron 28 Bordeaux 52000
Note
The CSV file used as an example did not have much data and was therefore rather small. However, if not and you have a Very large CSV fileyou need to read the file by piece to avoid memory problems. To do this, you can use the parameter pandas.read_csv()chunksizewhich indicates how many lines should be read by iteration. You can use Python LOOP FOR To iterate on the pieces created.

