Get started with the NumbaPro Quick Start [pdf]

NumbaPro is an enhanced version of Numba which adds premium features and functionality that allow developers to rapidly create optimized code that integrates well with NumPy.

With NumbaPro, Python developers can define NumPy ufuncs and generalized ufuncs (gufuncs) in Python, which are compiled to machine code dynamically and loaded on the fly. Additionally, NumbaPro offers developers the ability to target multicore and GPU architectures with Python code for both ufuncs and general-purpose code.

For targeting the GPU, NumbaPro can either do the work automatically, doing its best to optimize the code for the GPU architecture. Alternatively, CUDA-based API is provided for writing CUDA code specifically in Python for ultimate control of the hardware (with thread and block identities).

Getting Started

Let’s start with a simple function to add together all the pairwise values in two NumPy arrays. Asking NumbaPro to compile this Python function to vectorized machine code for execution on the CPU is as simple as adding a single line of code (invoked via a decorator on the function):

from numbapro import vectorize

@vectorize(['float32(float32, float32)'], target='cpu')
def sum(a, b):
    return a + b

Similarly, one can instead target the GPU for execution of the same Python function by modifying a single line in the above example:

@vectorize(['float32(float32, float32)'], target='gpu')

Targeting the GPU for execution introduces the potential for numerous GPU-specific optimizations so as a starting point for more complex scenarios, one can also target the GPU with NumbaPro via its Just-In-Time (JIT) compiler:

from numbapro import cuda

@cuda.jit('void(float32[:], float32[:], float32[:])')
def sum(a, b, result):
    i = cuda.grid(1)   # equals to threadIdx.x + blockIdx.x * blockDim.x
    result[i] = a[i] + b[i]

# Invoke like:  sum[grid_dim, block_dim](big_input_1, big_input_2, result_array)


Here’s a list of highlighted features:

  • Portable data-parallel programming through ufuncs and gufuncs for single core CPU, multicore CPU and GPU
  • Bindings to CUDA libraries: cuRAND, cuBLAS, cuFFT
  • Python CUDA programming for maximum control of hardware resources

Learn by Examples

The developer team maintains a public GitHub repository of examples. Many examples are designed to show off the potential performance gain by using GPUs.



  • Python 2.6, 2.7, 3.3, 3.4
  • LLVM 3.3

For CUDA GPU features:

  • Latest NVIDIA CUDA driver
  • CUDA Toolkit 5.5 or above
  • At least one CUDA GPU with compute capability 2.0 or above

Python modules:

  • llvmpy 0.12.7 or above
  • numba 0.14.0 or above