NumPy Funadmentals¶

Stats 507, Fall 2021

James Henderson, PhD
September 14, 2021

Overview¶

About NumPy
`ndarray`'s and `dtype`'s
Vectorization
Indexing/Slicing
Broadcasting
Random number generation
Takeways

About¶

NumPy is short for "Numerical Python"
Most scientific modules use NumPy arrays for data exchange.
NumPy provides vectorized mathematical functions ...
... and a C API useful for connecting Python to C, C++, and Fortran.

Canonical Import¶

import numpy as np
Numpy version 1.21

In [ ]:

import numpy as np
print(np.__version__)
x = range(10)
xbar = np.mean(x) # implicit conversion to ndarray 
xbar

NumPy's helpful scalars¶

Missing values / not a number np.nan
Infinity np.Inf or -np.Inf
np.pi

Numpy's ndarray object¶

The ndarray object is a flexible N-dimensional array.
ndarray's are atomic or homogenous, containing data of a single type.
In addition to its primary data, an ndarray has attributes / meta-data:
- ndim - the dimensionality,
- shape - a tuple giving the size of each dimension,
- dtype - the data type of the array's values.

In [ ]:

x = [0, 1, 2]; y = [3, 4, 5]
x_a = np.array(x)
y_a = np.array([x, y])
[(x_a.ndim, y_a.ndim), x_a.shape, y_a.shape, y_a]

Constructors for class ndarray¶

np.array() converts sequence objects to arrays.
np.asarray() is similar, but creates an alias when passed an ndarray.
There are others:
- np.ones(), np.zeros(), np.empty(), *_like(),
- np.arange(), np.identity().

In [ ]:

z_x = np.asarray(x)    # x is a list
z_y = np.asarray(y_a)  # y_a is an ndarray
print((z_x is x, z_y is y_a))
z_y.shape = (3, 2)
y_a

More Constructors¶

np.arange() - like built-in range() but returns an array; [start, stop).
np.linspace() - num evenly spaced values in [start, stop].
Many more, skim and learn as needed.

In [ ]:

print(np.linspace(0, 2 * np.pi, 5) / np.pi)
np.linspace([0, 0], [3, 6], 3)

NumPy data types¶

The dtype attribute tells NumPy what type to interpret the primary data (values) of the array as.
Primary data is contiguous in memory, so must be homogeneous with a single dtype.
Common dtype's are:
- int8, int16, int32 and int64, uint8-uint64,
- float16-float128, complex64-complex256,
- bool, object, string_, unicode_.

In [ ]:

(z_x.dtype, z_y.dtype)

Casting NumPy data types¶

Convert an ndarray to another type using the .astype() method.
This is known as casting between types.
Binary operators (among others) operating on arrays of different but compatible types will implictly cast to the more complex type.

In [ ]:

print((z_x.dtype, z_y.dtype))
z_f = z_x.astype(np.float64)
[(z_y[:, 1] + z_f).dtype, (z_f + z_y[:, 1]).dtype]

Casting NumPy data types¶

Casting using .astype() always creates a copy.
Types have shorthand strings used with np.dtype().

In [ ]:

z_f2 = z_f.astype('d') # 'd' is shorthand for np.float64
print(z_f2 is z_f)
if z_f.dtype == np.dtype('float64'):
    pass
else:
    z_f = z_f.astype(np.float64) 
if z_f.dtype != np.float64:
    z_f = z_f.astype(np.float64)

Vectorization¶

A function or operator written to operate on an entire sequence (or vector) at once is said to be vectorized.
Generally this refers to creating functions that encapsulate associated loops.
In interpreted languages, these loops are usually written in a lower-level, compiled, language (often C, C++, or Fortran) for efficiency.
This process and concept is referred to as vectorization.

Vectorization in NumPy¶

Vectorization is integral to the appeal, popularity, and efficiency of NumPy.
For PS1, you've probably already used vectorized np.mean(), np.std().
Binary operators are vectorized for ndarray objects.

In [ ]:

x = np.arange(9).reshape(3, 3)
y = np.array([-1, 0, 1])
x[:, 0] * y, x[1, :] > y, x * y

Indexing / Slicing¶

In some respects, slicing an ndarray is similar to slicing a list.
Higher dimensional indices can be omitted.
A slice of an ndarray is a view of the original array referencing the original data.

In [ ]:

z = np.ones((4, 3))
z1 = z[:, 1].copy()
z2 = z[:, 2]
z1[:] = 0 
z2[:] = 7 # 7 is broadcast to the entire slice
z

Boolean indexing¶

An ndarray can be indexed using the bool type, often created from the array itself.
Note that and and or are not vectorized, use np.logical_and() or np.logical_or() instead.

In [ ]:

print(z > 1)
z[z > 1]
col_sums = np.sum(z, axis=0)
col_sums = np.sum(z, axis=0)
z[:, np.logical_and(col_sums > 4, col_sums < 30)]

Broadcasting in NumPy¶

Broadcasting refers to rules for applying element-wise functions to arrays with disimilar dimensions.
Broadcasting makes array operations more efficient by saving on memory allocation and indexing.
NumPy uses a fairly strict form of broadcasting that allows scalar by array operations and mismatches in the number of dimensions.

Broadcasting in NumPy¶

Read the rules here.
After pre-pending with 1's to make .ndims agree, dimensions must match or be 1.

In [ ]:

np.array([-1, 2]) * np.ones((2, 2, 2))

Exercise¶

What are the shape and sum of z below?

In [ ]:

z = np.ones((2, 3, 2))
x = np.array([i % 3 - 1 for i in range(6)])
x = x.reshape(2, 3)

try:
    z = x * z
except:
    x = x.reshape(3, 2)
    try: 
        z = x * z
    except:
        pass
    
x = x.reshape(1, 1, 3, 2)
try:
    z = x * z
except:
    z[:] = 0
    
[z.shape, np.sum(z), x.shape]

Random Numbers¶

NumPy's random API provides a random number generator and routines to sample from a large number of distributions.
np.random.choice() can be used to sample a sequence object.

In [ ]:

print(np.random.uniform(0, 1, 3))
print(np.random.normal(63.5, 5.55, 2))
np.random.choice(range(3), 4)

Reproducible Results¶

To make results that rely on pseudo-random number generation exactly reproducible, set a seed for the random number generator.
The way this is done was recently updated in NumPy v1.21.
Create a Generator instance using np.random.default_rng().

In [ ]:

rng = np.random.default_rng(seed=42)
rng.uniform()

Shuffle vs Permutation¶

The .shuffle() method permutes an array in-place.
The .permutation() method creates a copy.
Both shuffle data, ignoring shape.
The .permuted() method shuffles along an axis, use out to reassign in-place.

In [ ]:

rng = np.random.default_rng(91421)
a = np.arange(4)
rng.shuffle(a)
b = rng.permutation(a)
[a, b, b is a]
b.shape = (2, 2)
c = b
_ = rng.permuted(b, axis=0, out=b)
[c is b, c, b]

Takeaways¶

NumPy is the backbone of scientific Python.
NumPy offers vectorized and optimized implementations of many mathematical functions, a flexible array class with expressive slicing, helfpul scalars, and much more.
NumPy uses a strict form of broadcasting.
NumPy's random API can be used to generate pseudo-random numbers from almost any distribution.
NumPy is a big topic, learn a little at a time.