Python for science

Why use python?

You may ask a question, why use python. There are tools in which scientists and normal people can do their work, right?

It's pretty easy to list some tools for typical tasks. There are lots of available tools for:

  • numerical computation (matlab, octave)
  • symbolical computation (mathematica, maxima)
  • data analysis (R)
  • plotting
  • big/small data

There are also whole big environments such as (...), or simple Excel/gnumeric sheets.

But are they free? Are they open source, well documented, easy to learn and develop? Is it possible to modify them or look at their source? Is transparent what they acutally do? (Scientist should be sceptical to black boxes).

For python tools there are only affirmative answers.

How scientists can use python for their work?

It turns out that, we can get all this functionality for free.

We just have to download some free and open source packages available in PYPI (python package index). There are more benefits than that. We can look directly at code if we want to know how it works and read excellent python documentation, which should teach us how to use those packages and give us some scientific background.

Installation is usually pretty easy:

(in your python virtual environment):

$ pip install ipython

Packages every scientist should know:

  • NumPy - numerical computation
  • SciPy - lots of very useful scientific modules
  • Matplotlib - excellent plotting library, which gives you total control for plot, 2D, 3D plotting and animation, publication ready
  • IPython - interactive computing in python
  • SymPy - CAS, algebra, calculus
  • pandas - data analysis, similiar to R

Each of this packages is excellent and gives you lot's of capabilities. But combinging them doesn't give you just a sum, but much more.

NumPy

NumPy is a core for lots of other python scientific libraries. It supplies efficient data structures such as arrays, matrices, fixed ints and doubles. Efficient representation of data and vectorized function make computing in python much faster.

Data Types

Why do we need any special data types?

It's mostly for eficciency.

Which types are supported?

various ints and doubles

Some examples:

>>> import numpy as np
>>> x = np.float32(1.0)
>>> x
1.0
>>> y = np.int_([1,2,4])
>>> y
array([1, 2, 4])
>>> z = np.arange(3, dtype=np.uint8)
>>> z
array([0, 1, 2], dtype=uint8)

source

Array creation (1)

  • from python objects
  • using special numpy function
from numpy  import *
a = array( [2,3,4] )
a.dtype
b = array([1.2, 3.5, 5.1])
b.dtype

Array creation (2)

>>> b = array( [ (1.5,2,3), (4,5,6) ] )
>>> b
array([[ 1.5,  2. ,  3. ],
       [ 4. ,  5. ,  6. ]])

Type of array can be specified in constructor:

>>> c = array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j,  2.+0.j],
       [ 3.+0.j,  4.+0.j]])

Special functions for array creation

>>> zeros( (3,4) )
array([[0.,  0.,  0.,  0.],
       [0.,  0.,  0.,  0.],
       [0.,  0.,  0.,  0.]])
>>> ones( (2,3,4), dtype=int16 )                # dtype can also be specified
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)
>>> empty( (2,3) )
array([[  3.73603959e-262,   6.02658058e-154,   6.55490914e-260],
       [  5.30498948e-313,   3.14673309e-307,   1.00000000e+000]])

Indexing

>>> a = arange(10) ** 3
>>> a
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000  # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000,     1, -1000,    27, -1000,   125,   216,   343,   512,   729])
>>> a[ : :-1]                                 # reversed a
array([  729,   512,   343,   216,   125, -1000,    27, -1000,     1, -1000])

Linear algebra

  • for your convinience there are easy to use matrices in numpy
  • but algebraic computation can be done on ordinary arrays as well - reference

Example code with matrices

>>> A = mat([[3, 1, 4], [1, 5, 9], [2, 6, 5]])
>>> A = mat('3 1 4; 1 5 9; 2 6 5') # equivalent
>>> print A
[[3 1 4]
 [1 5 9]
 [2 6 5]]

>>> A
matrix([[3, 1, 4],
        [1, 5, 9],
        [2, 6, 5]])

>>> A.T # transpose
matrix([[3, 1, 2],
        [1, 5, 6],
        [4, 9, 5]])

>>> A.H # Hermitian transpose (same for real matrix)
matrix([[3, 1, 2],
        [1, 5, 6],
        [4, 9, 5]])

>>> A.I # matrix inverse
matrix([[ 0.32222222, -0.21111111,  0.12222222],
        [-0.14444444, -0.07777778,  0.25555556],
        [ 0.04444444,  0.17777778, -0.15555556]])

SciPy

Scipy is next amazing library for python. Here are some useful links if you want to start you scientific journey with SciPy

It provides packages for:

  • fourier transform,
  • integration, interpolation, optimization
  • linear algebra routines,
  • signal processing,
  • statistics

and much more.

Matplotlib

Plots give us great insight for what we are currently computing. The main python plotting library is matplotlib.

Matplotlib api can be similar to this in octave, but also more object oriented.

plot1

Code generating exponential decay:

import numpy as np
import matplotlib.pyplot as plt


font = {'family' : 'serif',
        'color'  : 'darkred',
        'weight' : 'normal',
        'size'   : 16,
        }

x = np.linspace(0.0, 5.0, 100)
y = np.cos(2 * np.pi * x) * np.exp(-x)

plt.plot(x, y, 'k')
plt.title('Damped exponential decay', fontdict=font)
plt.text(2, 0.65, r'$\cos(2 \pi t) \exp(-t)$', fontdict=font)
plt.xlabel('time (s)', fontdict=font)
plt.ylabel('voltage (mV)', fontdict=font)

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.show()

plot2

Code generating histogram

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt


# example data
mu = 100 # mean of distribution
sigma = 15 # standard deviation of distribution
x = mu + sigma * np.random.randn(10000)

num_bins = 50
# the histogram of the data
n, bins, patches = plt.hist(x, num_bins, normed=1, facecolor='green', alpha=0.5)
# add a 'best fit' line
y = mlab.normpdf(bins, mu, sigma)
plt.plot(bins, y, 'r-')
plt.xlabel('Smarts')
plt.ylabel('Probability')
plt.title(r'Histogram of IQ: $\mu=100$, $\sigma=15$')

# Tweak spacing to prevent clipping of ylabel
plt.subplots_adjust(left=0.15)
plt.show()

Both examples are from matplotlib gallery which is great source of inspiration for generating your own plots. I often look there if I want to learn something new.

Matplotlib has also a very good documentation and many online resources. After learning some basic I highly recommend this tutorial on advanced matplotlib. It teaches how to make animations, bind event handlers and even more.

pandas

Wait, there is more!

Next amazing tool I want to show you is pandas. I used pandas several times for handling datasets. It has a very nice api and good learning resources. Pandas make interacting with tabular data fun. There are methods for I/O, pivoting, analysis, crazy indexing and manipulation and more.

It's useful for big (for me it's more than 1GB, too much for a spreadsheet) and for small datasets, both dumps from db and small experiments measurements.

For begginners there is pandas in 10 minutes which introduces most of basic concepts.

For those who want to learn more, there is a very good book about data analysis in python:

logo

SymPy

If you are like me, you probably don't have enough spare money to buy yourself a license for mathematica software.

What are the alternatives?

  • maxima

I tried maxima, but comparing to mathematica it was much worse. There was some documentation and most of tutorials were pretty helpful, but I didn't like it very much.

And one day I found SymPy. Library for symbolical computation for python.

  • calculus
  • amazing documentation with executable examples
  • SymPy interactive shell on their website

Library works as expected and offeres really nice api. It's free, open source and very modern and flexible.

Notebooks in IPython

IPython is very intersting platform. It makes interactive computing easy and sane.

One of my favourite parts of ipython is ipython notebook.

To launch ipython notebook type:

$ ipython notebook

and voila! If you have all needed libraries, you'll be able to start playing in notebook on port 8888.

Notebooks are a browser client for ipython.

IPython notebook is a bit similiar to Mathematica notebook. You have cells with code which you can change and rerun. In notebook you can embed many kinds of media such us images, latex formulas, markdown, youtube movies and more.

One of the biggest advantages of notebooks that they're easily serializable to plain json. Thanks to that, we can have them under version control which is great for development, sharing and collaboration.

And now some practical examples.

Modelling in python

Example of modelling python apocalipse from scipy cookbook

source

More learning resources:

Blogs about scientific computing and modelling in python:

Happy scientific hacking!

python logo

Posted: | Source
Comments powered by Disqus
Share