I recently came accross an interesting blog post by Bjorn Madsen, Is Numpy really faster.
It is a short post displaying one example where a python snippet using Numpy is much slower than the version written in pure Python due to some overheads associated to Numpy.
Building upon that example, I spent some time writing a function using Numpy that performs similar operations even faster than the pure Python version shown by Bjorn. There is a catch however, I replaced a for loop in the benchmark code by a vector operation, which is the scenario where Numpy really shines.
Let us take look!
import numpy as np
v1 = [1, 2, 3]
v2 = [2.4, 3, -1]
def f1(v1, v2): # <--- Using numpy.cross
return list(np.cross(v1, v2))
def f2(v1, v2): # <---- Using python
a1, a2, a3 = v1
b1, b2, b3 = v2
return [a2 * b3 - a3 * b2, -(a1 * b3 - a3 * b1), a1 * b2 - a2 * b1]
def x1():
for i in range(100000):
v3 = f1(v1, v2) # repeated calls for profiling usage of numpy
def x2():
for i in range(100000):
v4 = f2(v1, v2) # # repeated calls for profiling usage of python.
def x3():
v22 = np.tile(np.array(v2, dtype=np.float64), (100000, 1))
v11 = np.tile(np.array(v1, dtype=np.float64), (100000, 1))
np.cross(v11, v22)
x3()
is a new function implementing a vector operation in Numpy. Then, I wrote all these functions in a jupyter notebook and used the %prun
magic to analyse the results. Here they are.
Original function using Numpy:
>>> %prun x1()
8200004 function calls (7900004 primitive calls) in 5.442 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
100000 1.739 0.000 5.042 0.000 numeric.py:1485(cross)
600000 0.818 0.000 1.262 0.000 numeric.py:1341(normalize_axis_tuple)
300000 0.757 0.000 2.354 0.000 numeric.py:1404(moveaxis)
300000 0.384 0.000 0.384 0.000 {built-in method numpy.array}
900000 0.252 0.000 0.252 0.000 {built-in method numpy.core._multiarray_umath.normalize_axis_index}
400000/100000 0.211 0.000 5.117 0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
100000 0.206 0.000 5.395 0.000 <ipython-input-78-bfb5de128972>:7(f1)
600000 0.157 0.000 0.296 0.000 numeric.py:1391(<listcomp>)
300000 0.146 0.000 2.670 0.000 <__array_function__ internals>:2(moveaxis)
1900000 0.141 0.000 0.141 0.000 {built-in method builtins.len}
300000 0.107 0.000 0.107 0.000 {built-in method builtins.sorted}
300000 0.096 0.000 0.096 0.000 {method 'transpose' of 'numpy.ndarray' objects}
200000 0.058 0.000 0.389 0.000 _asarray.py:23(asarray)
100000 0.057 0.000 5.189 0.000 <__array_function__ internals>:2(cross)
600000 0.053 0.000 0.053 0.000 {built-in method _operator.index}
300000 0.048 0.000 0.048 0.000 numeric.py:1467(<listcomp>)
1 0.047 0.047 5.442 5.442 <ipython-input-78-bfb5de128972>:17(x1)
300000 0.046 0.000 0.046 0.000 {method 'insert' of 'list' objects}
100000 0.036 0.000 0.036 0.000 {built-in method numpy.empty}
300000 0.034 0.000 0.034 0.000 numeric.py:1400(_moveaxis_dispatcher)
100000 0.034 0.000 0.034 0.000 {built-in method numpy.promote_types}
100000 0.014 0.000 0.014 0.000 numeric.py:1481(_cross_dispatcher)
1 0.000 0.000 5.442 5.442 {built-in method builtins.exec}
1 0.000 0.000 5.442 5.442 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
Original function written in pure Python:
>>> %prun x2()
100004 function calls in 0.066 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
100000 0.045 0.000 0.045 0.000 <ipython-input-78-bfb5de128972>:11(f2)
1 0.021 0.021 0.066 0.066 <ipython-input-78-bfb5de128972>:22(x2)
1 0.000 0.000 0.066 0.066 {built-in method builtins.exec}
1 0.000 0.000 0.066 0.066 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
New function replaced a for loop with Numpy vectorization:
>>> %prun x3()
117 function calls (114 primitive calls) in 0.005 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.004 0.004 0.004 0.004 numeric.py:1485(cross)
2 0.001 0.001 0.001 0.001 {method 'repeat' of 'numpy.ndarray' objects}
7 0.000 0.000 0.000 0.000 {built-in method numpy.array}
2 0.000 0.000 0.001 0.001 shape_base.py:1171(tile)
3 0.000 0.000 0.000 0.000 numeric.py:1404(moveaxis)
6 0.000 0.000 0.000 0.000 numeric.py:1341(normalize_axis_tuple)
1 0.000 0.000 0.005 0.005 {built-in method builtins.exec}
6/3 0.000 0.000 0.005 0.002 {built-in method numpy.core._multiarray_umath.implement_array_function}
1 0.000 0.000 0.005 0.005 <ipython-input-78-bfb5de128972>:26(x3)
4 0.000 0.000 0.000 0.000 {method 'reshape' of 'numpy.ndarray' objects}
3 0.000 0.000 0.000 0.000 {method 'transpose' of 'numpy.ndarray' objects}
6 0.000 0.000 0.000 0.000 numeric.py:1391(<listcomp>)
3 0.000 0.000 0.000 0.000 {built-in method builtins.sorted}
9 0.000 0.000 0.000 0.000 {built-in method numpy.core._multiarray_umath.normalize_axis_index}
2 0.000 0.000 0.001 0.001 <__array_function__ internals>:2(tile)
21 0.000 0.000 0.000 0.000 {built-in method builtins.len}
3 0.000 0.000 0.000 0.000 <__array_function__ internals>:2(moveaxis)
4 0.000 0.000 0.000 0.000 shape_base.py:1243(<genexpr>)
1 0.000 0.000 0.005 0.005 <string>:1(<module>)
6 0.000 0.000 0.000 0.000 {built-in method _operator.index}
6 0.000 0.000 0.000 0.000 shape_base.py:1253(<genexpr>)
3 0.000 0.000 0.000 0.000 {method 'insert' of 'list' objects}
1 0.000 0.000 0.000 0.000 {built-in method numpy.empty}
3 0.000 0.000 0.000 0.000 numeric.py:1400(_moveaxis_dispatcher)
2 0.000 0.000 0.000 0.000 _asarray.py:23(asarray)
2 0.000 0.000 0.000 0.000 {built-in method builtins.all}
1 0.000 0.000 0.000 0.000 {built-in method numpy.promote_types}
1 0.000 0.000 0.000 0.000 numeric.py:1481(_cross_dispatcher)
1 0.000 0.000 0.004 0.004 <__array_function__ internals>:2(cross)
2 0.000 0.000 0.000 0.000 shape_base.py:1167(_tile_dispatcher)
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
3 0.000 0.000 0.000 0.000 numeric.py:1467(<listcomp>)
In the end, the Numpy vectorization can be faster overall, but always remember to use Numpy wisely!
I hope you enjoyed this short experience and please visit the original post by Bjorn Madsen!