AMZ DIGICOM

Digital Communication

AMZ DIGICOM

Digital Communication

Presentation of the Pandas Read_CSV function ()

PARTAGEZ

Python Pandas read_csv() is one of the most used methods to load data from CSV files and store it in a dataframe. CSV files (Comma-separared values) are a widely used format to store tabular data and are supported by many programs.

Web accommodation

Flexible, efficient and safe web accommodation

  • SSL certificate and DDOS protection
  • Data backup and restoration
  • Assistance 24/7 and personal advisor

Syntax of Python Pandas read_csv()

The function pandas.read_csv() Creates a dataframe pandas based on a CSV file. She can receive A wide variety of parameters which specify the behavior of the function. For the sake of clarity, we will only approach the most important and most frequently used arguments here. For a more detailed list, see the Pandas documentation dedicated.

The basic syntax of the function is simple and looks like this:

import pandas as pd
df = pd.read_csv(filepath_or_buffer, sep=',', header="infer", names=None, index_col=None, usecols=None, dtype=None, ...)

python

Relevant parameters

Below you will find an overview of the most important parameters:

Parameters Responsibility Default value
filepath_or_buffer It is a python chain (path to the file) or a file stamp like an URL.
sep This is the separator between values. ,
header Indicates which line is used as a header. infer (front line)
names If header=None is defined, you can use names To specify a python list of column names.
index_col Determine which column will be used as index. None
usecols This parameter allows you to select the columns you want to load in the DataFrame. None
dtype Indicates the type of data from the columns. None

Step by step guide to access CSV files

Using pandas.read_csv()you can transfer data from CSV files in only a few steps in Python.

In the following examples, we will work with a CSV file according to the following model:

1,Maxime Mortier,35,Paris,50000
2,Anna Frelon,29,Lyon,62000
3,Pierre Corbet,41,Marseille,58000
4,Lisa Beaufort,33,Toulouse,49000
5,Tom Verron,28,Bordeaux,52000

Step 1: Import the Pandas library

At first, you need to import the Pandas library into your Python script.

import pandas as pd

python

Step 2: Load the CSV file

You can now load your CSV file with the Python Pandas function read_csv(). To do this, just pass the file path to the function. In the following code example, we consider a file named Data.csvwhich is recorded in the same repertoire as the script:

df = pd.read_csv(data.csv')

python

The code saves the file in a dataframe object dfwith which you can now continue to work. Pandas automatically interprets the first line as column headersunless otherwise indicated.

Step 3: Show the CSV file

To ensure that the file has been properly loaded, it is useful to Visualize the first lines of the DataFrame. To do this, you can use the function DataFrame.head(). It displays by default the first five lines of the dataframe. This allows you to have a quick overview of the data structure:

The exit is then as follows:

0 1 Max Mortier 35 Paris 50000
1 2 Anna Frelon 29 Lyon 62000
2 3 Pierre Corbet 41 Marseille 58000
3 4 Lisa Beaufort 33 Toulouse 49000
4 5 Tom Verron 28 Bordeaux 52000

Step 4: Modify the name of the columns (optional)

If your CSV file does not have a header line, you can manually define column names:

df = pd.read_csv('data.csv', header=None, names=['Colonne1', 'Colonne2', 'Colonne3', 'Colonne4', 'Colonne5'])

python

In this example, the columns have been manually named Column1, column2, column3, column4 and column5. The code returns the result:

Colonne1  Colonne2       Colonne3         Colonne4  Colonne5     Colonne6
0         1              Max Mortier      35        Paris        50000
1         2              Anna Frelon      29        Lyon         62000
2         3              Pierre Corbet    41        Marseille    58000
3         4              Lisa Beaufort    33        Toulouse     49000
4         5              Tom Verron       28        Bordeaux     52000

Note

The CSV file used as an example did not have much data and was therefore rather small. However, if not and you have a Very large CSV fileyou need to read the file by piece to avoid memory problems. To do this, you can use the parameter pandas.read_csv()chunksizewhich indicates how many lines should be read by iteration. You can use Python LOOP FOR To iterate on the pieces created.

Télécharger notre livre blanc

Comment construire une stratégie de marketing digital ?

Le guide indispensable pour promouvoir votre marque en ligne

En savoir plus

Souhaitez vous Booster votre Business?

écrivez-nous et restez en contact