Python Generators are a special form of functions in the Python language; they generate values at each step to allow you to work efficiently on the memory plan.
Python Generators: what are they exactly?¶
Python Generators correspond to special functions and return a Python Iterator. To create Python Generators, you can proceed in the same way as for a normal function definition. The difference is in the details: instead of a return statement, Python Generators return a yield statement. Like Iterator functions, Generator functions also implement a next() function.
Python Generators are among the advanced concepts of programming with Python. If you don’t learn anything more in the Python tutorials designed for beginners, we advise you to take a look at the following articles:
The “yield” keyword¶
If you already have some experience with other programming languages or with Python, the return statement should already be familiar to you. The latter is used to pass the values calculated by the functions to the calling instance in the program code. As soon as the return statement of a function is reached, it is aborted and its execution is interrupted. The function can be called again, if necessary.
With yield, the procedure is different: in Python Generators, this keyword replaces the return statement. If you call your Python Generator, it therefore returns the value passed to the yield command. After this operation, the Python Generator is not abandoned, but only interrupted. The status of the Generator function can then be reconciled with the status “Registered”. Thus, by calling your Python Generator again, you find it at the saved position.
Areas of use for Python Generators¶
As the operation of Python Generators is based on the principle of a » lazy evaluation (literally « lazy evaluation ») and only evaluate values when absolutely necessary, Generator functions are particularly well suited to working withlarge volumes of data.
A normal function would start by loading the entire contents of your file into a variable, and therefore into your memory. If you are working with large amounts of data, your local memory may not be sufficient and the process will therefore result in the message “MemoryError”. Python Generators make it easy to avoid problems like this, because they read your file line by line. The yield keyword returns the necessary value to you when you need it, then pauses execution of the function until its next call for another line in the file.
Many web applications require the processing of large volumes of data. Python is also suitable for working on web projects. With the Deploy Now tool, you can speed up building your web projects by taking advantage of GitHub’s automated build and deployment environment.
If the Python Generators greatly facilitate the processing of large volumes of data, they also allow the working with infinite series. Since your local memory is bound to be limited, Python Generators are your only option for creating infinite lists or other items with Python.
Python Generators: extracting CSV files
As we have already said, Python Generators are particularly suitable for working with large volumes of data. The following program allows you toextract a CSV file line by line in a memory-efficient way:
import csv
def csv_lire(nomdufichier):
with open(nomdufichier, ‘r’) en tant que fichier:
tmp = csv.reader(fichier)
for ligne in tmp:
yield ligne
for ligne in csv_lire(‘test.csv’):
print(ligne)
Python
In the sample code, we first import the “csv” module to access Python’s CSV processing functions; then comes the definition of a Python Generator with the name “csv_lire”. Like any function definition, it begins with the keyword “def”. Once the file is opened, the for loop in Python iterates line by line through the file. Each row is then returned with the yield keyword.
Outside of the Generator function, the lines returned by the Python Generator should display one after another on the console. This can be done using the print function in Python.
Python Generators: Create Infinite Data Structures¶
Logically, no infinite data structure can be saved locally on your computer. Infinite data structures are, however, essential to the operation of some applications. Generator functions are also useful in this respect, because they process all the elements one after the other, without cluttering up memory. Below is an example of infinite series of natural numbers in Python code:
def nombre_naturel():
n = 0
while True:
yield n
n += 1
for nombre in nombre_naturel():
print(nombre)
Python
A Python Generator named “natural_number” is first defined; it determines the starting value of the variable “n”. Then a while loop in Python is started and runs forever. With « yield », the last value of the variable is returned and the execution of the Generator function is interrupted. If the function is called again, the number emitted before is increased by 1 and the Python Generator runs again, until the interpreter encounters the yield keyword. The numbers produced by the Python Generator are displayed in the for loop, below the Generator function. If you don’t interrupt the program manually, it runs forever.
Python Generators: abbreviations¶
List comprehensions allow you to create lists with Python in a single line of code. There is also a abbreviation for Python Generators. Let’s take the example of a Python Generator that cycles through the digits from 0 to 9 increasing by 1 each time. This Python Generator is similar to the one we used to generate the infinite sequence of natural numbers.
def nombre_naturel():
n = 0
while n <= 9:
yield n
n+=1
Python
To write this Python Generator in a single line of code, all you need to do is use a for statement in parentheses, as in the following example:
increment_generator = (n + 1 for n in range(10))
Python
If you now emit this Python Generator, you get the following output:
<generator object <genexpr> at 0x0000020CC5A2D6C8>
Python
It then tells you the location of the Python Generator object you created in your memory. To access the result obtained by your Python Generator, you can use the “next()” function:
print(next(increment_generator))
print(next(increment_generator))
print(next(increment_generator))
Python
This section of code returns the following result, with the digits from 0 to 2 each time increased by 1:
Python Generators or List Comprehensions?¶
The abbreviation proposed for the Python Generators is very close to that of the lists in comprehension. The only visible difference lies in their framing: if square brackets are used for list comprehensions, they are parentheses that allow you to create Python Generators. Within the abbreviations themselves, however, the difference is much greater: Python Generators use much less memory than list comprehensions.
import sys
increment_liste = [n + 1 for n in range(100)]
increment_generator = (n + 1 for n in range(100))
print(sys.getsizeof(increment_liste))
print(sys.getsizeof(increment_generator))
Python
The above program is used to determine the memory requirements of a list comprehension and its equivalent in the form of Python Generator.
Where the list requires 912 bytes of memory, the Python Generator can make do with 120 bytes. The more the volume of data to be processed increases, the more this difference becomes important.