ak.layout.NumpyArray
A NumpyArray describes multidimensional data with the same set of parameters
as a NumPy np.ndarray
.
ptr
: The data themselves, a raw buffer.
shape
: Non-negative integers, at least one. Each represents the number of components in a dimension; a shape of length N represents an N dimensional tensor. The number of items in the array is the product of all values in shape (may be zero).
strides
: The same number of integers asshape
. The strides describes how many items inptr
to skip per element in a dimension. (Strides can be negative or zero.)
offset
: the number of items in ptr to skip before the first element of the array.
itemsize
: the number of bytes per item (i.e. 1 for characters, 4 for int32, 8 for double types).
format
: a string representing the NumPy dtype constructed by pybind11. Note that on Windows and 32-bit systems,"q"
/"Q"
mean signed/unsigned 64-bit and"l"
/"L"
mean signed/unsigned 32-bit; on all other systems,"l"
/"L"
mean signed/unsigned 64-bit and"i"
/"I"
mean 32-bit.
If the shape
is one-dimensional, a NumpyArray corresponds to an Apache
Arrow Primitive array.
Below is a simplified implementation of a NumpyArray class in pure Python
that exhaustively checks validity in its constructor (see
ak.is_valid) and can generate random valid arrays. The
random_number()
function returns a random float and the
random_length(minlen)
function returns a random int that is at least
minlen
. The RawArray
class represents simple, one-dimensional data.
For a real NumpyArray, strides
and itemsize
are measured in bytes;
in the simplified code below, they are measured in number of elements, as
though we were dealing with 1-byte data.
class NumpyArray(Content):
def __init__(self, ptr, shape, strides, offset):
assert isinstance(ptr, list)
assert isinstance(shape, list)
assert isinstance(strides, list)
for x in ptr:
assert isinstance(x, (bool, int, float))
assert len(shape) > 0
assert len(strides) == len(shape)
for x in shape:
assert isinstance(x, int)
assert x >= 0
for x in strides:
assert isinstance(x, int)
assert isinstance(offset, int)
if all(x != 0 for x in shape):
assert 0 <= offset < len(ptr)
last = offset
for sh, st in zip(shape, strides):
last += (sh - 1) * st
assert last <= len(ptr)
self.ptr = ptr
self.shape = shape
self.strides = strides
self.offset = offset
@staticmethod
def random(minlen, choices):
shape = [random_length(minlen)]
for i in range(random_length(0, 2)):
shape.append(random_length(1, 3))
strides = [1]
for x in shape[:0:-1]:
skip = random_length(0, 2)
strides.insert(0, x * strides[0] + skip)
offset = random_length()
ptr = [random_number() for i in range(shape[0] * strides[0] + offset)]
return NumpyArray(ptr, shape, strides, offset)
def __len__(self):
return self.shape[0]
def __getitem__(self, where):
if isinstance(where, int):
assert 0 <= where < len(self)
offset = self.offset + self.strides[0] * where
if len(self.shape) == 1:
return self.ptr[offset]
else:
return NumpyArray(self.ptr, self.shape[1:], self.strides[1:], offset)
elif isinstance(where, slice) and where.step is None:
offset = self.offset + self.strides[0] * where.start
shape = [where.stop - where.start] + self.shape[1:]
return NumpyArray(self.ptr, shape, self.strides, offset)
elif isinstance(where, str):
raise ValueError("field " + repr(where) + " not found")
else:
raise AssertionError(where)
def __repr__(self):
return ("NumpyArray(" + repr(self.ptr) + ", " + repr(self.shape) + ", " +
repr(self.strides) + ", " + repr(self.offset) + ")")
def xml(self, indent="", pre="", post=""):
out = indent + pre + "<NumpyArray>\n"
out += indent + " <ptr>" + " ".join(str(x) for x in self.ptr) + "</ptr>\n"
out += indent + " <shape>" + " ".join(str(x) for x in self.shape) + "</shape>\n"
out += indent + " <strides>" + " ".join(str(x) for x in self.strides)
out += "</strides>\n"
out += indent + " <offset>" + str(self.offset) + "</offset>\n"
out += indent + "</NumpyArray>" + post
return out
Here is an example:
NumpyArray([2.4, 9.6, -0.2, 7.1, 10.2, 3.3, 7.9, 4.5, 2.1, 5.4, 8.4, 2.3, 12.0, 5.6, 6.2,
11.4, 4.4, 3.0, 4.7, 7.8, 2.4, 2.2, 0.8, 10.6, 8.2, 5.4, 6.7, 4.5, 5.1, 11.2,
11.4, 9.2, 6.6, 2.1, -2.4, 6.8, 8.8, 8.2, 5.4, 2.9, 8.2, 7.0, 2.2, 4.8, 5.3,
6.4, 4.1, 5.1, 8.6, 9.4, 5.1, 6.0],
[17, 2],
[2, 1],
18)
<NumpyArray>
<ptr>2.4 9.6 -0.2 7.1 10.2 3.3 7.9 4.5 2.1 5.4 8.4 2.3 12.0 5.6 6.2 11.4 4.4 3.0 4.7 7.8
2.4 2.2 0.8 10.6 8.2 5.4 6.7 4.5 5.1 11.2 11.4 9.2 6.6 2.1 -2.4 6.8 8.8 8.2 5.4 2.9
8.2 7.0 2.2 4.8 5.3 6.4 4.1 5.1 8.6 9.4 5.1 6.0</ptr>
<shape>17 2</shape>
<strides>2 1</strides>
<offset>18</offset>
</NumpyArray>
which represents the following logical data.
[[4.7, 7.8],
[2.4, 2.2],
[0.8, 10.6],
[8.2, 5.4],
[6.7, 4.5],
[5.1, 11.2],
[11.4, 9.2],
[6.6, 2.1],
[-2.4, 6.8],
[8.8, 8.2],
[5.4, 2.9],
[8.2, 7.0],
[2.2, 4.8],
[5.3, 6.4],
[4.1, 5.1],
[8.6, 9.4],
[5.1, 6.0]]
NumpyArray supports the buffer protocol, so it can be directly cast as a NumPy array.
In addition to the properties and methods described in ak.layout.Content, a NumpyArray has the following.
ak.layout.NumpyArray.__init__
- ak.layout.NumpyArray.__init__(array, identities=None, parameters=None)
ak.layout.NumpyArray.shape
- ak.layout.NumpyArray.shape
ak.layout.NumpyArray.strides
- ak.layout.NumpyArray.strides
ak.layout.NumpyArray.itemsize
- ak.layout.NumpyArray.itemsize
ak.layout.NumpyArray.format
- ak.layout.NumpyArray.format
ak.layout.NumpyArray.ndim
- ak.layout.NumpyArray.ndim
Returns len(shape)
.
ak.layout.NumpyArray.isscalar
- ak.layout.NumpyArray.isscalar
Should always return False (len(shape) == 0
NumpyArrays in C++ are converted into
scalar numbers and booleans before they appear in Python).
ak.layout.NumpyArray.isempty
- ak.layout.NumpyArray.isempty
Returns True if any shape
element is 0
; False otherwise.
ak.layout.NumpyArray.iscontiguous
- ak.layout.NumpyArray.iscontiguous
Contiguous arrays have no gaps between elements and are sequenced in increasing order in memory. This is the same as NumPy’s notion of “C contiguous”.
A NumpyArray is contiguous if the following are true of its shape
, strides
,
and itemsize
:
x = itemsize
for i in range(len(shape) - 1, 0, -1):
if x != strides[i]:
return False
x *= shape[i]
else:
return True
ak.layout.NumpyArray.toRegularArray
- ak.layout.NumpyArray.toRegularArray()
Returns a contiguous version of this array with any multidimensional shape
replaced by
nested ak.layout.RegularArray nodes.
ak.layout.NumpyArray.contiguous
- ak.layout.NumpyArray.contiguous()
Returns a contiguous version of this array (possibly the original array, unchanged).
ak.layout.NumpyArray.simplify
- ak.layout.NumpyArray.simplify()
Pass-through; returns the original array.