-
-
Notifications
You must be signed in to change notification settings - Fork 339
/
Copy path07-beyond-numpy.rst
398 lines (293 loc) · 12.9 KB
/
07-beyond-numpy.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
Beyond NumPy
===============================================================================
.. contents:: **Contents**
:local:
Back to Python
--------------
You've almost reached the end of the book and, hopefully, you've learned that
NumPy is a very versatile and powerful library. However in the meantime,
remember that Python is also quite a powerful language. In fact, in some
specific cases, it might be more powerful than NumPy. Let's consider, for
example, an interesting exercise that has been proposed by Tucker Balch in his
`Coursera's Computational Investing
<https://www.coursera.org/learn/computational-investing>`_ course. The exercise
is written as:
*Write the most succinct code possible to compute all "legal" allocations to 4
stocks such that the allocations are in 1.0 chunks, and the allocations sum
to 10.0.*
`Yaser Martinez <http://yasermartinez.com/blog/index.html>`_ collected the
different answers from the community and the proposed solutions yield
surprising results. But let's start with the most obvious Python solution:
.. code:: python
def solution_1():
# Brute force
# 14641 (=11*11*11*11) iterations & tests
Z = []
for i in range(11):
for j in range(11):
for k in range(11):
for l in range(11):
if i+j+k+l == 10:
Z.append((i,j,k,l))
return Z
This solution is the slowest solution because it requires 4 loops, and more
importantly, it tests all the different combinations (14641) of 4 integers
between 0 and 10 to retain only combinations whose sum is 10. We can of course
get rid of the 4 loops using itertools, but the code remains slow:
.. code:: python
import itertools as it
def solution_2():
# Itertools
# 14641 (=11*11*11*11) iterations & tests
return [(i,j,k,l)
for i,j,k,l in it.product(range(11),repeat=4) if i+j+k+l == 10]
One of the best solution that has been proposed by Nick Popplas takes advantage
of the fact we can have intelligent imbricated loops that will allow us to
directly build each tuple without any test as shown below:
.. code:: python
def solution_3():
return [(a, b, c, (10 - a - b - c))
for a in range(11)
for b in range(11 - a)
for c in range(11 - a - b)]
The best NumPy solution by Yaser Martinez uses a different strategy with a
restricted set of tests:
.. code:: python
def solution_4():
X123 = np.indices((11,11,11)).reshape(3,11*11*11)
X4 = 10 - X123.sum(axis=0)
return np.vstack((X123, X4)).T[X4 > -1]
If we benchmark these methods, we get:
.. code:: pycon
>>> timeit("solution_1()", globals())
100 loops, best of 3: 1.9 msec per loop
>>> timeit("solution_2()", globals())
100 loops, best of 3: 1.67 msec per loop
>>> timeit("solution_3()", globals())
1000 loops, best of 3: 60.4 usec per loop
>>> timeit("solution_4()", globals())
1000 loops, best of 3: 54.4 usec per loop
The NumPy solution is the fastest but the pure Python solution is comparable.
But let me introduce a small modification to the Python solution:
.. code:: python
def solution_3_bis():
return ((a, b, c, (10 - a - b - c))
for a in range(11)
for b in range(11 - a)
for c in range(11 - a - b))
If we benchmark it, we get:
.. code:: pycon
>>> timeit("solution_3_bis()", globals())
10000 loops, best of 3: 0.643 usec per loop
You read that right, we have gained a factor of 100 just by replacing square
brackets with parenthesis. How is that possible? The explanation can be found
by looking at the type of the returned object:
.. code:: pycon
>>> print(type(solution_3()))
<class 'list'>
>>> print(type(solution_3_bis()))
<class 'generator'>
The `solution_3_bis()` returns a generator that can be used to generate the
full list or to iterate over all the different elements. In any case, the huge
speedup comes from the non-instantiation of the full list and it is thus
important to wonder if you need an actual instance of your result or if a
simple generator might do the job.
NumPy & co
----------
Beyond NumPy, there are several other Python packages that are worth a look
because they address similar yet different class of problems using different
technology (compilation, virtual machine, just in time compilation, GPU,
compression, etc.). Depending on your specific problem and your hardware, one
package may be better than the other. Let's illustrate their usage using a very
simple example where we want to compute an expression based on two float
vectors:
.. code:: python
import numpy as np
a = np.random.uniform(0, 1, 1000).astype(np.float32)
b = np.random.uniform(0, 1, 1000).astype(np.float32)
c = 2*a + 3*b
NumExpr
+++++++
The `numexpr <https://github.com/pydata/numexpr/wiki/Numexpr-Users-Guide>`_
package supplies routines for the fast evaluation of array expressions
element-wise by using a vector-based virtual machine. It's comparable to SciPy's
weave package, but doesn't require a separate compile step of C or C++ code.
.. code:: python
import numpy as np
import numexpr as ne
a = np.random.uniform(0, 1, 1000).astype(np.float32)
b = np.random.uniform(0, 1, 1000).astype(np.float32)
c = ne.evaluate("2*a + 3*b")
Cython
++++++
`Cython <http://cython.org>`_ is an optimising static compiler for both the
Python programming language and the extended Cython programming language (based
on Pyrex). It makes writing C extensions for Python as easy as Python itself.
.. code:: python
import numpy as np
def evaluate(np.ndarray a, np.ndarray b):
cdef int i
cdef np.ndarray c = np.zeros_like(a)
for i in range(a.size):
c[i] = 2*a[i] + 3*b[i]
return c
a = np.random.uniform(0, 1, 1000).astype(np.float32)
b = np.random.uniform(0, 1, 1000).astype(np.float32)
c = evaluate(a, b)
Numba
+++++
`Numba <http://numba.pydata.org>`_ gives you the power to speed up your
applications with high performance functions written directly in Python. With a
few annotations, array-oriented and math-heavy Python code can be just-in-time
compiled to native machine instructions, similar in performance to C, C++ and
Fortran, without having to switch languages or Python interpreters.
.. code:: python
from numba import jit
import numpy as np
@jit
def evaluate(a, b):
c = np.zeros_like(a)
for i in range(a.size):
c[i] = 2*a[i] + 3*b[i]
return c
a = np.random.uniform(0, 1, 1000).astype(np.float32)
b = np.random.uniform(0, 1, 1000).astype(np.float32)
c = evaluate(a, b)
Theano
++++++
`Theano <http://www.deeplearning.net/software/theano/>`_ is a Python library
that allows you to define, optimize, and evaluate mathematical expressions
involving multi-dimensional arrays efficiently. Theano features tight
integration with NumPy, transparent use of a GPU, efficient symbolic
differentiation, speed and stability optimizations, dynamic C code generation
and extensive unit-testing and self-verification.
.. code:: python
import numpy as np
import theano.tensor as T
x = T.fvector('x')
y = T.fvector('y')
z = 2*x + 3*y
f = function([x, y], z)
a = np.random.uniform(0, 1, 1000).astype(np.float32)
b = np.random.uniform(0, 1, 1000).astype(np.float32)
c = f(a, b)
PyCUDA
++++++
`PyCUDA <http://mathema.tician.de/software/pycuda>`_ lets you access Nvidia's
CUDA parallel computation API from Python.
.. code:: python
import numpy as np
import pycuda.autoinit
import pycuda.driver as drv
from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void evaluate(float *c, float *a, float *b)
{
const int i = threadIdx.x;
c[i] = 2*a[i] + 3*b[i];
}
""")
evaluate = mod.get_function("evaluate")
a = np.random.uniform(0, 1, 1000).astype(np.float32)
b = np.random.uniform(0, 1, 1000).astype(np.float32)
c = np.zeros_like(a)
evaluate(drv.Out(c), drv.In(a), drv.In(b), block=(400,1,1), grid=(1,1))
PyOpenCL
++++++++
`PyOpenCL <http://mathema.tician.de/software/pyopencl>`_ lets you access GPUs
and other massively parallel compute devices from Python.
.. code:: python
import numpy as np
import pyopencl as cl
a = np.random.uniform(0, 1, 1000).astype(np.float32)
b = np.random.uniform(0, 1, 1000).astype(np.float32)
c = np.empty_like(a)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
gpu_a = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a)
gpu_b = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b)
evaluate = cl.Program(ctx, """
__kernel void evaluate(__global const float *gpu_a;
__global const float *gpu_b;
__global float *gpu_c)
{
int gid = get_global_id(0);
gpu_c[gid] = 2*gpu_a[gid] + 3*gpu_b[gid];
}
""").build()
gpu_c = cl.Buffer(ctx, mf.WRITE_ONLY, a.nbytes)
evaluate.evaluate(queue, a.shape, None, gpu_a, gpu_b, gpu_c)
cl.enqueue_copy(queue, c, gpu_c)
Scipy & co
----------
If there are several additional packages for NumPy, there are a trillion
additional packages for scipy. In fact, every domain of science probably has
its own package and most of the examples we've been studying until now could
have been solved in two or three calls to a method in the relevant package.
But of course, that was not the goal and programming things yourself is generally
a good exercise if you have some spare time. The biggest difficulty at this
point is to find these relevant packages. Here is a very short list of packages
that are well-maintained, well-tested and may simplify your scientific life
(depending on your domain). There are of course many more and depending on your
specific needs, chances are you do not have to program everything by
yourself. For an extensive list, have a look at the `Awesome python list
<https://awesome-python.com>`_.
scikit-learn
++++++++++++
`scikit-learn <http://scikit-learn.org/stable/>`_ is a free software machine
learning library for the Python programming language. It features various
classification, regression and clustering algorithms including support vector
machines, random forests, gradient boosting, k-means and DBSCAN, and is
designed to inter-operate with the Python numerical and scientific libraries
NumPy and SciPy.
scikit-image
++++++++++++
`scikit-image <http://scikit-image.org>`_ is a Python package dedicated to
image processing, and using natively NumPy arrays as image objects. This
chapter describes how to use scikit-image on various image processing tasks,
and insists on the link with other scientific Python modules such as NumPy and
SciPy.
SymPy
+++++
`SymPy <http://www.sympy.org/en/index.html>`_ is a Python library for symbolic
mathematics. It aims to become a full-featured computer algebra system (CAS)
while keeping the code as simple as possible in order to be comprehensible and
easily extensible. SymPy is written entirely in Python.
Astropy
+++++++
The `Astropy <http://www.astropy.org>`_ project is a community effort to
develop a single core package for astronomy in Python and foster
interoperability between Python astronomy packages.
Cartopy
+++++++
`Cartopy <http://scitools.org.uk/cartopy/>`_ is a Python package designed to
make drawing maps for data analysis and visualization as easy as
possible. Cartopy makes use of the powerful PROJ.4, NumPy and shapely libraries
and has a simple and intuitive drawing interface to matplotlib for creating
publication quality maps.
Brian
+++++
`Brian <http://www.briansimulator.org>`_ is a free, open source simulator for
spiking neural networks. It is written in the Python programming language and
is available on almost all platforms. We believe that a simulator should not
only save the time of processors, but also the time of scientists. Brian is
therefore designed to be easy to learn and use, highly flexible and easily
extensible.
Glumpy
++++++
`Glumpy <http://glumpy.github.io>`_ is an OpenGL-based interactive
visualization library in Python. Its goal is to make it easy to create fast,
scalable, beautiful, interactive and dynamic visualizations.
Conclusion
----------
NumPy is a very versatile library but still, it does not mean you have to use
it in every situation. In this chapter, we've seen some alternatives (including
Python itself) that are worth a look. As always, the choice belongs to you. You
have to consider what is the best solution for you in term of development time,
computation time and effort in maintenance. On the one hand, if you design your
own solution, you'll have to test it and to maintain it, but in exchange,
you'll be free to design it the way you want. On the other hand, if you decide
to rely on a third-party package, you'll save time in development and benefit
from community-support even though you might have to adapt the package to your
specific needs. The choice is up to you.