(p.379) A Basic Concepts and Examples of Python Programming
(p.379) A Basic Concepts and Examples of Python Programming
A.1 Basic Python
The open source Python language interpreter was used for the programs written in this book. We chose Python because it is an object-oriented, easy-to-use language that is equipped with many powerful features and is very well suited for both teaching and applications. Some specific reasons for choosing Python are as follows:
1. The language and the applications are freeware and open source.
2. The language is multiplatform. The applications, shells and visual applications run on Microsoft, Macintosh and Linux.
3. It has a clean syntax.
4. There is a large and growing community of developers, who are dedicated to the growth of this language.
5. The language is object-oriented and has high-level data structures, combined with dynamic typing and dynamic binding.
6. It is equipped with a numerical package (numpy) that has a powerful n-dimensional array object, sophisticated functions, tools for integrating C++ and Fortran code, linear algebra, Fourier transform, complex number algebra and other capabilities.
7. The syntax is easy to learn and therefore it is well suited for teaching purposes.
8. The language can be used with a visual application (Visual Python) allowing easy visualization of objects, vectors, graphs and simulations.
Throughout this book, Python programs are presented to solve a variety of problems that are often encountered in soil physics. We name the specific file according to its specific purpose. The programs are organized in projects. The projects and file names begin with the acronym PSP, which stands for Python Soil Physics.
The modules matplotlib, scipy and numpy should also be installed, since they are used in several programs. Recent versions of Python install scipy and numpy automatically during the installation of the Python program. matplotlib is mainly used for plotting graphs, while scipy and numpy are used to include a variety of mathematical packages. Variables are described by long names, facilitating code readability. For symbolic mathematics, the package sympy is available, where algebra, differential calculus and matrices in symbolic format are available.
(p.380) A.1.1 Programs and Modules Needed for this Book
To run the programs written in this book, the following programs and modules must be downloaded:
1. Python version 2.7
2. Tkinter for Python 2.7
3. Matplotlib for Python 2.7
4. Python imaging library (PIL) for Python 2.7
5. Scipy for Python 2.7
6. Numpy for Python 2.7
7. Visual Python 6.1 for Python 2.7
The programs are written using Python 2.7, but it is possible to incorporate the features of version 3 by writing at the beginning of the program the following line:
For Python 2.x, this statement invokes the new Python 3.x print format and division scheme, while this statement is ignored by Python 3.x, thereby allowing the reader to utilize the programs written in this book with previous versions.
A.1.2 Python Documentation
The Python language and its visual application, Visual Python, are constantly updated with new features. Therefore the descriptions provided below have the aim of providing a general overview of the language and its main features. Some of these features may change in future versions. The primary reference for Python is the official Python website <http:www.python.org>. Within the documentation page, there are a variety of links to tutorials, references, forums and community development.
There are a variety of good books on the subject. Learning Python (Lutz, 2009) is a good book for both beginners and expert programmers. Python Essential Reference (Beazley, 2006) is a highly recommend read. Python in a Nutshell (Martelli, 2006) is also an excellent and comprehensive book. A recommended book on using Python in computational science is Python Scripting for Computational Science (Langtangen, 2009).
A.1.3 Running Programs in Python
As with other languages, programs are written using a text editor. When saving the file created by the text editor, use the extension .py. After writing the program, it can be run by using an integrated development environment (IDE) or by a command prompt. A useful IDE for Python is Eclipse (<http://www.eclipse.org/>) with its integrated pydev environment, which is a freeware environment for developing Python programs and projects. It is a well-developed IDE to write, debug and create applications in Python. The programs written in this book were written and debugged using pydev included in Eclipse.
(p.381) Since Python is compiled to byte code and then interpreted in a virtual machine, when the program is executed, Python compiles the source code into a byte code and the Python virtual machine (PVM) runs the program. Throughout this book, and in accord with this usage in many Python books, the prefix for interactive prompt commands is , while the output has no prefix. Moreover, functions (both internal and user-defined) are indicated in bold, while programs, variables and constructors are written in teletype. Constants are written in capital letters.
For project names, file names and variables, we use the first word with an initial lower case letter and the consequent words with initial capital letters. For instance, the file PSP_basicProperties.py is a file where a program is written to compute basic soil properties. Analogously, the variable waterDensity has the first word with a lower case initial letter and the second with a capitalized initial. Each file begins with the acronym PSP, which stands for Python Soil Physics, except the file containing the main function, which is called main. To make programs readable, we use variables with self-explanatory names and we also present a list of variables used in the Python programs.
A.1.4 Plotting and Visualization
In this book, we use the program matplotlib (<http://matplotlib.org/>) for plotting the results of the computations. matplotlib allows the plotting of time-changing variables, and is therefore very powerful for visualization of dynamic processes. matplotlib is a powerful library for plotting scientific data. It also allows the plot to be exported in many different formats. Recent versions of matplotlib may require the installation of the dependences, which are numpy, dateutil, six, pyparsing; for Windows, they are downloadable at <http://www.lfd.uci.edu/gohlke/pythonlibs/>.
For visualization of numerical solutions in three dimensions or for mesh generation algorithms, the program Visual Python, version 6 (<http://www.vpython.org/>) is used. Further details are provided in Chapter 10. A stable version of the Python Imaging Library (PIL), version 1.1.7, Python 2.7 for Macintosh, can be downloaded from <http://www.astro.washington.edu/users/rowen/python/>. The user should download and install the following file for the PIL module:
A.1.5 Extensions for other languages
One of the advantage of Python is that it is flexible and easy to program. However, for certain types of calculations, Python (and any other interpreted language) can be slow. Usually, iterations over large arrays are difficult to do efficiently. Such calculations may be implemented in a compiled language such as C or Fortran. In this book, for the three-dimensional solution of water flow, we present two versions of the program. One version is written entirely in Python, while the second presents some parts of the program (arrays and computationally expensive functions) written in C and used with the program Cython. The latter version implemented in Cython is much faster than the former. Cython is an optimizing compiler for both the Python programming language and the extended Cython programming language. It makes writing C extensions for Python as easy as Python itself. The Cython language supports calls of C functions and declaration of C types on variables and class attributes. This allows the compiler to generate very efficient C code from Cython code. The language can be downloaded at <http://cython.org/>.
(p.382) A.2 Basic Concepts of Computer Programming
In this section, to describe some basic concepts of computer programming, we present a simple example of a program written to solve a problem. A computer program consists of a series of instructions written to perform a specific task or solve a problem. For instance, it is very easy for a person to figure out if the black point labelled a in Fig. A.1 is inside the circle A or not. However, we need to devise a way to tell the computer if the point is inside or outside the circle. Visual applications in computer programming often deal with similar problems, such as how to determine if the user clicked inside or outside a specific icon or symbol on the computer screen to activate a specific function such as opening a file. A first way to check if the black point a is inside the circle A is the following: if the point is inside circle A, a straight line from the point and passing through the centre of the circle will cross the circle perimeter A only once. If the point a is outside the circle (in this case, let us consider the circle B), a straight line from the point a and passing through the centre of the circle B would cross the perimeter of the circle B twice.
Let us draw a line that, starting from the point, crosses the centre of the circle. If the line crosses the circle only once, then the point is inside the circle; otherwise, if the line crosses the circle twice, then the point is outside. From a geometrical standpoint, if the point has coordinates x1 and y1 and the centre of the circle has coordinates X1 and Y1, the line that starts in (x1,y1) and goes through (X1,Y1) will have the equation
where ()() , while the circle of radius r of centre will have the equation
To determine the intersection point between the line and circle, a solution of the system of equations is needed. In this way, we determine if the point is inside or outside the circle based on the number of solutions of the system of equations.
An alternative and simpler approach would be to measure the distance between the centre of the point and the centre of the circle. If this distance is less than or equal to the radius of the circle, then the point would be considered inside the circle; otherwise, if the distance is larger than the (p.383) radius, the point would be outside. To achieve this very simple task, we need to write a program, where the user is asked to click with the mouse on the screen. The computer calculates the position on the screen where the user clicked, it determines if the position is inside or outside the circle and then provides an output message, based on the latter method.
A.2.1 Flow Diagrams
A flow diagram shows the sequence of operations in a computer program. Figure A.2 depicts a flow diagram for the problem just discussed. To write this program, we need to assume that (a) we have a function that draws a circle on the screen if we provide the (X,Y) coordinates for the circle center and its radius, (b) we have a function that checks if the point where the user is clicking is inside or outside the circle by using the distance formula and (c) we have a function able to write the results on the screen. We also need to import a graphical user interface (GUI) to draw a canvas for the circle. One GUI implemented in Python is tkinter. A module to perform mathematical operations is also needed, called math. Let us call the first function drawCircle with arguments (myCanvas, x, y, r), where myCanvas defines the space on the screen where to draw the circle, x and y are the coordinates and r is the radius. We are not going to analyse the object myCanvas.create_oval() at this stage, since it is an internal function of the program, but we know that if we pass the correct numbers, the function will draw an oval on the screen of the desired size.
(p.384) According to the program structure, a circle is just one form of an oval. Since ovals are defined by the size of the side of a rectangle with the oval inscribed, a circle is just an oval inscribed into a square. The program is as follows:
The first line allows the program to be used in both Python versions 2.x and 3.x, as described above. Then, the square root function (sqrt) from the module math is imported. The following lines are written to run the program Tkinter. The if statement is for both Python versions 2.x and 3.x, since Tkinter was modified from version 2.x to version 3.x. Then the variables identifying the circle coordinates and the radius are defined.
(p.385) The widget Canvas is a rectangular area intended for drawing pictures or other complex layouts, implemented in tkinter. Its definition is written in the main( ) function. The Canvas object is created and called myCanvas. A series of tkinter objects such as Label and set are then created. The function drawCircle(myCanvas,x,y,r) is created to draw the circle by using the command myCanvas.create_oval() to generate the circle on the canvas.
The function ButtonLeftHandler(myEvent) is written to read the event of clicking with the left button with the mouse on the canvas and use that point (of specific coordinates) to compute the distance from the centre of the circle. The distance is computed by first determining the difference between the coordinate where the user clicked on the canvas and the centre of the circle. Then these values (dX and dY) are used in Pythagoras’ theorem (line 16) to compute the distance. Finally, the if statement between lines 17 and 20 checks if the distance is less than or equal to the radius and prints the answer on the canvas. The output is shown in Fig. A.3.
When we attempt to perform a sequence of calculations using a computer, we need to write an algorithm. An algorithm is a method that defines, in an unambiguous manner, a finite sequence of steps to be performed in a specified order. The object of the algorithm is to implement a procedure to solve a problem or approximate a solution to a problem. When designing a program, it is useful to keep in mind the following steps:
1. Clearly state the purpose of the program and the necessary calculations.
2. Write a flow diagram of the program.
3. Write a pseudo-code.
4. Write the code.
(p.386) 5. Compile and debug the program.
6. Run the program.
7. Check the results.
When designing a program, we need to think about its components (objects), the relationships occurring among them, the input, the output and so on. Depending on the level of complexity and the size of the program, we would need to create different levels of organization or hierarchies. Designing well-organized flow diagrams is an important part of this process. A pseudo-code is used for describing the algorithm. It specifies the form of the input–output and the calculations. We can consider a pseudo-code as a code written in a language (English or Italian, say). For example we can simply write: define the variables myCenterX, myCenterY and myRadius. Draw a circle of given size by calling a function. Wait until user clicks on a chosen point on the screen with the mouse. Calculate the distance of the selected point from the circle center. If the distance is smaller than the radius, print inside else print outside. End.
After writing the pseudo-code, it is necessary to translate it into a code written in the selected language (Python, Visual BASIC, Fortran, C, C++, Java, etc.). When an algorithm is written (using an editor), it is not understood by the computer unless it is compiled. To compile is to decode a series of instructions written in a higher-order language and produce a machine language program. If the compiler finds errors in the program, it generates a series of error messages indicating the error type (usually a numerical code and its description) and its position (line number) in the program. After compilation, the computer generates an executable file. Since Python is a pre-compiled language, compilation is not necessary since it is done already while running the program. By executing this file, the program will run. It is always important to keep in mind that the absence of errors in the program does not imply that the program performs the computation without errors. Therefore a series of tests needs to be performed to verify the robustness, stability and convergence properties of the algorithms.
A.3 Data Representation: Variables
A program works with data, the material used to perform computations and operations. Organization of data, labelling and storage in a computer are fundamental elements of computer programming. In computer programming, a data type is created and variables are assigned to that type. The variable identifies an area in the computer memory where the data are stored. In most of the available programming languages (C, C++, Visual BASIC, Java, Fortran), a variable has a specific type. For instance, integer means that the variables listed are integers, in contrast to single or double, which are floating-point variables, i.e. numbers that have a fractional part. Different languages use different words to declare variables. For instance in Visual BASIC the declaration of a variable is done via the Dim statement, which is short for Dimension.
In Python, there is no need to declare a variable, since the variable takes the type of the data assigned to it, for instance:
types the variable waterDensity as an integer. We can change the type by reassigning the variable:
(p.387) In this case, the variable waterDensity is now a string. In Python, strings can be represented with either single or double quotes. Therefore ‘one thousand’ and “one thousand” are both string assignments. If we want the variable waterDensity to be a number with decimals (a floating-point number), in Python it is sufficient to include a point after the number, implying that the number can have a fractional part:
This feature is one of the ways Python differs from other languages such as C or BASIC and is one of the reasons for Python’s flexibility. It is called dynamic typing, implying that types are determined at runtime and not as a response of a prior declaration in the program. While this feature can be very useful in making the program shorter, easier to read and flexible, the programmer must still be aware of what those types are and where they were first used. Moreover, it is possible to change the variables within the program. For instance, if we write
This is an operation between two integers and the result will be 3. However, it is possible to change it to a floating-point operation:
The result will be now 3.333333333. In Python 2.x, since an operation between two integers such as is missing, only the integer part of the number is considered. From Python 3.x, such an operation converts the numbers to floating-point and returns the value . To import this feature into Python 2.x, it is necessary to include the line
at the beginning of the program to import the module division. The main built-in types of variables used in Python are listed below.
A.3.1 Numeric Types
In Python, there are four numeric types:
• long integers
• double-precision complex numbers
Integers are whole numbers from negative to positive values and are of type int, for instance 0, 2, 9. The upper and lower dimensions depends on the number of bits used on a given computer to represent a number.
Double-precision numbers are real numbers of type float such as 0.1, 14.5, 3.8E+8. Floating-point variables hold real numbers characterized by decimal places. The size and numbers (p.388) of digits of the floating-point number depend on the machine on which the program is running and in Python this information is contained in the sys.float_info, which is a module containing information about the precision and internal representation of numbers. In Python, a floating-point variable is created by simply assigning a number with a decimal point in it.
Double-precision complex numbers are characterized by a real and an imaginary part. In Python, a complex number is represented as (real+imaginary j), where j is . The real and imaginary parts of a number are obtained by typing r.real and r.imag. Imaginary numbers are written with a suffix of j. Therefore a complex number A is assigned as and and their sum . Python supports the common operations for complex numbers (addition, subtraction, multiplication, division and modulus). The following is an example of complex addition and multiplication:
As described above, in Python, a variable automatically takes the type of the data assigned to it. However, the constructors int(), float(), and complex() can be used to produce numbers of a specific type as a result of a specific arithmetic operation.
A.4 Comments Rules and Indendation
A hash mask is used in front of a comment, such as
For commenting many lines and avoiding repeating the sign, a docstring command is used by typing three double quotes at the beginning and end of the selected part of the code:
Indentation rules are an important aspect of Python programming. Python identifies block boundaries by line indentations, which are empty spaces to the left of the code. For instance, statements indented to the same block line are part of the same block of code. The end of the block is determined by a lesser-indented line. This feature makes Python code easily readable (p.389) because it uses many fewer parenthesis than other languages such as C or C++. Python functions use no explicit begin or end, nor do they use curly braces to mark where the function code starts and stops. The only delimiter is a colon () and the indentation of the code itself.
A.5 Arithmetic Expression
Arithmetic expressions in Python follow similar rules as other common programming languages. The principal operations are
Operations such as the absolute number (abs), complex numbers (complex), the square root (sqrt) and others require that the module math be imported. When the math operators are used from in the Python shell or within a program, it is necessary to import the math module:
The asterisk indicates that all the packages in the math module are imported. If only one operator is needed, it is possible to import only that one:
To decrease computational time, it is advisable to import only those packages that are needed. Similarly to other languages, the simple arithmetic operations and their computational order are
(p.390) This means that parentheses are considered first, then exponentials are calculated, divisions, and so forth. The following is an example that describes the importance of a correct understanding of this idea. The following simple calculation computes the void ratio in soils, which mathematically is written as
Computers need to know which operation we want to compute first; therefore, if we write , the division will be performed first (1) and then poros will be subtracted from the result of the first calculation. This is wrong because the result of this calculation will always be zero (). It is necessary to add parentheses to tell the computer to first compute (1 ) and then divide poros by the result of the first calculation: .
It is important to be precise when coding a mathematical equation into a programming language. Here are three rules of thumb:
(a) Understand the rules of the programming language you are using.
(b) do not be afraid of using parentheses.
(c) Break the equation in different components, but only if the equation is really large.
Another important arithmetic issue is mixed-mode arithmetic. When we have a real and an integer number in the same expression the integer is converted into a real before the operation is performed:
If the operation is performed between two integers, the results is always an integer. If the result has a decimal, the value is truncated:
As described already, Python version 2.x truncates the division between integers, while in version 3.x the result is not truncated.
A function provides a convenient way to include some computation that can be called several times within a program. Indeed, functions are the basic structures of Python programming. When a function is properly written, it is not necessary to know how the computation is performed, but it is sufficient to know what is performed. There are many functions in Python that have been previously implemented and are not explicitly in view (built-in), such as the mathematical operations described above. For specific computations and for expert programmers, it may be necessary sometimes to read and understand the built-in functions; however, in many cases, it is sufficient to know what computation is performed.
Functions can also be written by the programmer. In Python, a function is written with a statement called def, which generates a new function object and assigns it a name. Within the function, there is a return statement to provide the results of the function call.
(p.391) A.6.1 Open, Read and Analyse Experimental Data
Below is an example of a program that reads data from an ASCII file. An ASCII file (also called text file) is a standard file format that can be freely interchanged and is readable from different operating systems. The following example shows how a function can be used to open and read a data file.
Compute hourly average of air temperature
The first program where functions are employed is a simple one, where experimental data stored in a text file must be read and the hourly average temperature computed. The text file contains experimental data on air temperature, collected every ten minutes by a datalogger. The program shows how to open, and read the file and compute an hourly average of air temperature. Two functions are written to solve this problem. The first (read3VarFile) is a function that reads the experimental data. The second (computeMeanT) computes the hourly average temperature. Below is the text for the file PSP_read3VarFile.py, that contains the program written to read the experimental data. The program was named after the fact that it reads a file with three variables.
The first line is commented and contains the file name. In the second line, the built-in function called csv used to read the files, is imported. The function is used to open comma-separated files (csv). The arguments of the function read3VarFile( ) are defined within the parentheses and are the file name, the number of rows occupied by the file header, the delimiter, the three variables and the option of printing on screen the data read in the file. The variable myReader is defined to use the reader instruction. The for loop is used to increment the row number and read the file row by row, while the if statement determines if the current row is a header row. The program then reads the actual data and appends them in arrays of floating points for the three variables x1, x2 and x3. The last line is used to print the results on screen if the option was selected as True in the program that calls this function.
The program PSP_averageTair.py is as follows:
In this program, the file PSP_read3VarFile.py is imported to read the data file (as described above) and the program matplotlib to plot the results. The function computeMean is written to calculate the hourly mean temperature. The if statement is used to discriminate the end of an hour, based on the data format. Specifically, the output data format for the Campbell Scientific datalogger for the minutes has 10, 20, 30, 40, 50 and 100 corresponding to the first ten, twenty, thirty, forty and fifty minutes after midnight, with the first hour (1 a.m.) written as 100. (p.393) To discriminate between these incremental time steps, the number is divided by 100 and only the integer part of the number is taken, by using the operator int(x). If the integer part is equal to the variable myHour, then the variable mySum is incremented by adding values of airT. When the end of the hour is reached (for instance when HM=100), the condition is no longer met, since the integer is not equal to myHour, and the program goes to the else branch of the if condition, where mean temperature is computed. Below is an example of the first ten rows of the file tenMinutesTemp.txt. The first column holds the day of the year (Doy), the second the hour minutes as described above, the third the air temperatures and the fourth the percentage of relative humidity.
The function main is used to define the variables, call the reader and plot the data using the plt.plot command. The results of the computation are plotted in Fig. A.4.
Compute average temperature and cumulative precipitation
A similar example is now presented, in which hourly meteorological data on air temperature and precipitation are stored. The following is an example of the first ten rows of the file weather.txt. The first column holds the date, the second the day of the year (Doy), the third the hour minutes, (p.394) the fourth the precipitation [mm], the fourth air temperature [°C] and the fifth the percentage relative humidity.
The program imports a function contained in the file PSP_read6VarFile.py to open and read the file. The function implemented is very similar to read3VarFile presented above, but it reads six variables instead of three.
After reading the input data, the program PSP_weatherData.py computes daily average air temperature and cumulative precipitation and plots the results as shown in Fig. A.5.
The function computeDaily is written to calculate the daily mean temperature and cumulative precipitation. The if statement is used to discriminate the end of a day, based on the data format as described before. In main, the variables are defined and the program read6VarFile is called to open the file and import the data. Note that in this case we named it read6VarFile since it reads six variables. The function computeDaily is called and the plot is created, on two different y axes (Left and Right).
A.6.2 Call by Reference and by Value
Commonly in computer languages, there are two ways of passing a variable to a function: call by value and call by reference. The call by value is such that the function cannot modify the variable in the calling function; it can only alter its private, temporary copy. The call by reference allows the function to alter the original argument. In Python, it is not straightforward to pass a variable to a function and allow modification of the variable inside. Python’s way of transferring arguments is based on application of an assignment operator between the argument (p.397) and the value in the call. Langtangen (2009) calls this way of passing variables to functions call by assignment. If an argument x passed to a function can be changed inside the function and noticed in the calling function, then x must be a mutable object. A mutable object in Python is a list, a class instance, a dictionary or a Numerical Python array. Immutable objects such as numbers, strings and tuplets cannot be changed inside the function.
A.7 Flow Control
Flow control statements are the core of a programming language, and they control the order that computations are performed.
A.7.1 Loops: While and For
The while statement executes a block of code, based on the evaluation of a test:
The expression is evaluated, and if it is true, then the block of code is performed; otherwise the control exits the loop and an additional else part may be executed. Python executes directly from the Python shell. To use the shell, write each instruction separately on a different prompt row. To experiment with these short code segments, type them into the shell and see what happens. Here is an example:
The end=“ ” is used to place all outputs on the same line separated by a space. The is short for .
The For statement is used to repeat a block of statements a specific number of times. For loops, a counter variable is used, whose value is incremented or decremented with each repetition of the loop. The following is an example of for statements:
(p.398) Note that Python begins counting from zero in the first example (default), while if we want to print from 1 to 10, we need to write from 1 to 11, because Python counts up to .
Whether to use While or For depends on how data are organized and the purpose of the loop. Usually the For loop is preferred when there is a clear initialization and a known increment, because this control flow keeps the control statements close together and visible at the top of the command. When, for instance, the number of data points is known, it is convenient to use a For loop. The While loop is used more often when the increment is unknown a priori or it changes depending on the file type.
Numerical Python has an option of doing mathematical operations with arrays as arguments for the operation, but it can also perform mathematical operations by using for loops. Although numpy is powerful with direct operations with arrays, it is rather slow with for loops. Here we present an example where we add two arrays x and y, each having 10 million entries. Two functions are defined: func and func2. In the first, the addition is done using numpy arrays (without a for loop), while func2 performs the addition within a for loop. The results of the computation and the time necessary for the computation are then printed on screen.
(p.399) The results of this computation are
The if-else statement is written to perform a choice (or multiple choices) or express a decision. The general statement can be written as
The statement elif stands for else if, which allows for multiple statements. Remember that Python identifies block boundaries by line indentation. Statements indented to the same block line are part of the same block of code. The end of the block is determined when a lesser-indented line is encountered.
An example of how to use if statements is given below, in which Celsius temperatures are converted into Fahrenheit and vice versa. The user is asked to select between four options: to print the options, to convert from Celsius to Fahrenheit, to convert from Fahrenheit to Celsius or to quit the program. The if statement and two elseif statements are used to quit the program, compute the conversion or print the options again to allow the user to quit the program after the computation has been performed. Note that since the input() instruction returns a float, the instruction is wrapped into a float( ) to convert the input temperature from string to float. The program PSP_celsiusFahrenheit.py is as follows:
A.8 File Input and Output
The file I/O in Python depends on the type of file that the program is dealing with. The major distinction is between text files and binary files. The content of files that are opened in the text mode will be converted automatically as string (str). Files that are opened in binary mode are not converted into any format and are treated as raw binary files. To open a file in binary mode, it is necessary to add a lowercase b to the built-in open statement. The choice depends on the type of files the programmer is dealing with. If the programmer is dealing with a large dataset already in binary format, or with images in binary format, the best choice may be to treat the files as bytes or binary files.
If the files are data or test files, such as text files (.txt), csv, xml or html, it may be convenient to treat the file as text files. The open statement can be used to create a file or open an existing file depending on the processing mode. The following is an example of the open statement used to create a file:
This statement creates a new file called data.txt and writes the numbers 1 and 2 separated by a space. The processing mode w stands for write. The file is then closed. Details are provided by technical books on the various options available for file I/O.
In Python, arrays are not data types like floats or strings. They need to be created as array type by importing a built-in standard module named array. An array of values can include characters, integers or floating-point numbers. Arrays are sequence types and behave very much like lists, (p.401) except that the type of objects stored in them is constrained. The type is specified at object creation time by using a type code, which is a single character:
In this case, the type code is an integer (i) and the members of the array are 1, 2, 3, 4 and 5. If we wanted to define an array of floating points, we would write
The array can also be indexed and the elements printed by a for loop:
A.9.1 Arrays in numpy
Another option to create, manipulate and perform computations with arrays is to use the module numpy. The numerical Python module has three implementations: Numeric, numarray and numpy. numpy is the latest version and contains all the implementations of the first two, plus some additional features. It is therefore recommended to utilize numpy. numpy is a Python package for scientific computing, with many important features such as array creation and management, linear algebra, random numbers, Fourier transform, among others.
A useful application of numpy is its array management. Here we present a program used to open, scan and read a data file. In the previous examples, we presented a method to open and read a data file having a given number of variables. Here, a program is shown where a larger data file is opened, scanned and read, without having to prescribe a priori the number of variables contained in the file. To perform this task, a general reader was created using the array functions implemented in numpy.
Below is an example of experimental data from a soil profile, with data collected on an hourly basis (weather_soil.text). The data were collected with a Campbell Scientific datalogger (Campbell Scientific, 2006) and the data represented year (year), day of the year (doy), hour and minute (HM), precipitation (P) [mm], air temperature (Tair) [°C], relative humidity (RH) [%], soil water content (Wc1,Wc2)[m−3 m−3], soil matric potential (Mp1, Mp2) [J kg−1] and soil temperature (Ts1, Ts2) [°C] at cm and cm below the soil surface.
When data are collected for long periods of time and many variables are collected, output files are large and data must be scanned, read and processed in efficient ways. Moreover, we may need to perform various operations, for instance computing the daily average temperature or the daily average soil water content. Here we have written a program that has been used in many programs throughout the book. It is an efficient, general file reader that allows us to read data organized in rows and columns.
The program #PSP_readDataFile.py implements two functions: scanDataFile and readDataFile. The first is used to read the file using the function reader, which is implemented in Python.
The function scanDataFile takes the file name and the delimiter as input and returns the number of rows, the number of columns and a Boolean used to warrant if the file has the same number of data for each column or otherwise if there are missing data. Then it checks if it has the same number of rows and columns (to identify possible errors in the file); otherwise, when the function is called, an attribute False is passed to the function call.
A matrix A is created and is filled with zeros by using the instructions of numpy. The data are then loaded in the matrix A with a for loop.
An efficient way to incorporate these functions into different programs is to create a file called PSP_readDataFile. This file is imported into the programs where it is needed and then its functions can be called.
A.10 Reading Date Time
A modification of the program #PSP_readDataFile.py was written to read date format by adding the function readGenericDataFile, which reads strings by using lists. Since dates are strings and numpy arrays do not read strings, the strings must be converted into numbers. This can be done by using the function date2num included in the standard Python module date time. The following is an example of soil temperature data at four different depths, collected on an hourly basis and comma-separated:
(p.404) The file #PSP_readDataFile.py now contains the following function, which uses the list A=[ ] instead of the A = np.zeros arrays:
This function is called by the program #PSP_readSoilTempData.py, which is written to read, scan and plot the soil temperature data. Conversion of the list is performed by defining a variable date = [ ] and appending the values within the for loop over the nrValues. The date is then converted to a number by using the function d=date2num(date). The variable d is now a number that can be plotted by matplotlib using the function plt.plot_date(). The program is as follows (the output is shown in Fig. 4.1 in Chapter 4):
(p.406) A.11 Object-Oriented Programming in Python
Python has powerful object-oriented programming (OOP) structures. Classes are defined using the following statements:
A class has attributes (the variables to store data) and methods (the functions that the class can provide). Here the class soil_data has an attribute named texture, which is a string, and three attributes named sand, silt and clay, which are integers. The statement def is used to define a method (which is similar to a function, but is called a method when it is written within a class). To access the data contained in the class, an object is needed. To create an object, the following statement is used: read_data_object=soil_data(). Here the object read_data_object is created. With this object, it is possible to access the data contained in the class soil_data. For instance, by typing in the Python console the statement
the value of sand (30) is read from the class and returned on the screen. It is also possible to execute the method contained in the class:
A.12 Output and Visualization
The output of a Python program can be generated by importing a variety of available packages used to plot variables in various formats, such as the package Tkinter. However, packages like Tkinter are used mainly as 2D plotting packages.
For visualization of model output, it is necessary to use or implement visualization packages that are often difficult to code, complicated or not available. In this book, we use a very powerful graphics module, Visual Python. Visual Python is an easy-to-use 3D graphics module for Python.
As is shown throughout this book, the programmer can create 3D objects (such as spheres, and curves) and position them in 3D space. Visual Python can automatically update the position of objects or curves in 3D and update the position of the objects, depending on the specific features of the written program. It is very advantageous for visualizing physical processes, since the programmer does not have to spend time on the display management, but can focus on the computational aspects of the program. The package is freeware and can be downloaded at <http://www.vpython.org>. To run the visual programs written in this book, the user must install Visual Python, version 6.
(p.407) A.13 Exercises
A.1 Compute by hand the complex multiplication . Reform the same operation using Python.
A.2 Read the code of the program PSP_averageTair.py, understand the purpose of each program line and identify the flow of instructions as described above. Run the program and test the results.