Python for science
Why use python?
You may ask a question, why use python. There are tools in which scientists and normal people can do their work, right?
It's pretty easy to list some tools for typical tasks. There are lots of available tools for:
- numerical computation (matlab, octave)
- symbolical computation (mathematica, maxima)
- data analysis (R)
- plotting
- big/small data
There are also whole big environments such as (...), or simple Excel/gnumeric sheets.
But are they free? Are they open source, well documented, easy to learn and develop? Is it possible to modify them or look at their source? Is transparent what they acutally do? (Scientist should be sceptical to black boxes).
For python tools there are only affirmative answers.
How scientists can use python for their work?
It turns out that, we can get all this functionality for free.
We just have to download some free and open source packages available in PYPI (python package index). There are more benefits than that. We can look directly at code if we want to know how it works and read excellent python documentation, which should teach us how to use those packages and give us some scientific background.
Installation is usually pretty easy:
(in your python virtual environment):
$ pip install ipython
Packages every scientist should know:
- NumPy - numerical computation
- SciPy - lots of very useful scientific modules
- Matplotlib - excellent plotting library, which gives you total control for plot, 2D, 3D plotting and animation, publication ready
- IPython - interactive computing in python
- SymPy - CAS, algebra, calculus
- pandas - data analysis, similiar to R
Each of this packages is excellent and gives you lot's of capabilities. But combinging them doesn't give you just a sum, but much more.
NumPy
NumPy is a core for lots of other python scientific libraries. It supplies efficient data structures such as arrays, matrices, fixed ints and doubles. Efficient representation of data and vectorized function make computing in python much faster.
- User Guide
- with this guide you'll learn what is numpy and how to use it
- Data types
- Creating arrays
- Quick start tentative numpy tutorial
- this tutorial proves to be very useful, everything in one place
- examples of common operations
Data Types
Why do we need any special data types?
It's mostly for eficciency.
Which types are supported?
various ints and doubles
Some examples:
>>> import numpy as np
>>> x = np.float32(1.0)
>>> x
1.0
>>> y = np.int_([1,2,4])
>>> y
array([1, 2, 4])
>>> z = np.arange(3, dtype=np.uint8)
>>> z
array([0, 1, 2], dtype=uint8)
Array creation (1)
- from python objects
- using special numpy function
from numpy import *
a = array( [2,3,4] )
a.dtype
b = array([1.2, 3.5, 5.1])
b.dtype
Array creation (2)
>>> b = array( [ (1.5,2,3), (4,5,6) ] )
>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])
Type of array can be specified in constructor:
>>> c = array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]])
Special functions for array creation
>>> zeros( (3,4) )
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
>>> ones( (2,3,4), dtype=int16 ) # dtype can also be specified
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)
>>> empty( (2,3) )
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])
Indexing
>>> a = arange(10) ** 3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729])
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])
Linear algebra
- for your convinience there are easy to use matrices in numpy
- but algebraic computation can be done on ordinary arrays as well - reference
Example code with matrices
>>> A = mat([[3, 1, 4], [1, 5, 9], [2, 6, 5]])
>>> A = mat('3 1 4; 1 5 9; 2 6 5') # equivalent
>>> print A
[[3 1 4]
[1 5 9]
[2 6 5]]
>>> A
matrix([[3, 1, 4],
[1, 5, 9],
[2, 6, 5]])
>>> A.T # transpose
matrix([[3, 1, 2],
[1, 5, 6],
[4, 9, 5]])
>>> A.H # Hermitian transpose (same for real matrix)
matrix([[3, 1, 2],
[1, 5, 6],
[4, 9, 5]])
>>> A.I # matrix inverse
matrix([[ 0.32222222, -0.21111111, 0.12222222],
[-0.14444444, -0.07777778, 0.25555556],
[ 0.04444444, 0.17777778, -0.15555556]])
SciPy
Scipy is next amazing library for python. Here are some useful links if you want to start you scientific journey with SciPy
It provides packages for:
- fourier transform,
- integration, interpolation, optimization
- linear algebra routines,
- signal processing,
- statistics
and much more.
Matplotlib
Plots give us great insight for what we are currently computing. The main python plotting library is matplotlib.
Matplotlib api can be similar to this in octave, but also more object oriented.
Code generating exponential decay:
import numpy as np
import matplotlib.pyplot as plt
font = {'family' : 'serif',
'color' : 'darkred',
'weight' : 'normal',
'size' : 16,
}
x = np.linspace(0.0, 5.0, 100)
y = np.cos(2 * np.pi * x) * np.exp(-x)
plt.plot(x, y, 'k')
plt.title('Damped exponential decay', fontdict=font)
plt.text(2, 0.65, r'$\cos(2 \pi t) \exp(-t)$', fontdict=font)
plt.xlabel('time (s)', fontdict=font)
plt.ylabel('voltage (mV)', fontdict=font)
# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.show()
Code generating histogram
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)
num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r-')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')
# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.show()
Both examples are from matplotlib gallery which is great source of inspiration for generating your own plots. I often look there if I want to learn something new.
Matplotlib has also a very good documentation and many online resources. After learning some basic I highly recommend this tutorial on advanced matplotlib. It teaches how to make animations, bind event handlers and even more.
pandas
Wait, there is more!
Next amazing tool I want to show you is pandas. I used pandas several times for handling datasets. It has a very nice api and good learning resources. Pandas make interacting with tabular data fun. There are methods for I/O, pivoting, analysis, crazy indexing and manipulation and more.
It's useful for big (for me it's more than 1GB, too much for a spreadsheet) and for small datasets, both dumps from db and small experiments measurements.
For begginners there is pandas in 10 minutes which introduces most of basic concepts.
For those who want to learn more, there is a very good book about data analysis in python:
SymPy
If you are like me, you probably don't have enough spare money to buy yourself a license for mathematica software.
What are the alternatives?
- maxima
I tried maxima, but comparing to mathematica it was much worse. There was some documentation and most of tutorials were pretty helpful, but I didn't like it very much.
And one day I found SymPy. Library for symbolical computation for python.
- calculus
- amazing documentation with executable examples
- SymPy interactive shell on their website
Library works as expected and offeres really nice api. It's free, open source and very modern and flexible.
Notebooks in IPython
IPython is very intersting platform. It makes interactive computing easy and sane.
One of my favourite parts of ipython is ipython notebook.
To launch ipython notebook type:
$ ipython notebook
and voila! If you have all needed libraries, you'll be able to start playing in notebook on port 8888.
Notebooks are a browser client for ipython.
IPython notebook is a bit similiar to Mathematica notebook. You have cells with code which you can change and rerun. In notebook you can embed many kinds of media such us images, latex formulas, markdown, youtube movies and more.
One of the biggest advantages of notebooks that they're easily serializable to plain json. Thanks to that, we can have them under version control which is great for development, sharing and collaboration.
And now some practical examples.
Modelling in python
Example of modelling python apocalipse from scipy cookbook
More learning resources:
Blogs about scientific computing and modelling in python:
Happy scientific hacking!