Your Python interface need not match what it’s wrapping¶
As a trivial example lets take a C function that multiplies two matrices are wrap it in Cython (practically you’d just use Numpy of course…):
/* calculate A@B, where A in l by m, B is m by n,
and C already allocated space for the l by n output.
Returns
0 on success,
1 if A contains a NaN,
2 if B contains a NaN,
3 if the matrices are too big
*/
int mat_mul(double *A, int l, int m, double *B, int n, double *C);
As an aside, this interface gets even more complicated if it needs to handle strides or similar.
Fairly frequently people will see this and try to replicate it in the
Python interface they expose via Cython. What is a suitable Python
version of a double* (to an array)? The really wrong answer involves ctypes, or
hiding it in a PyCapsule. A slightly better answer is that you could
use 1D contiguous typed memoryview to give:
def py_mat_mul(double[::1] A, int l, int m, double[::1] B, int n, double[::1] C):
return mat_mul(&A[0], l, m, &B[0], n, &C[0])
This is not a very useful wrapping. The first thing to note is that l, m and n
are pretty redundant here - we know the sizes from the memoryviews. Importantly it
also provides an opportunity for the user to feed it wrong information. The second thing
is that the function is really operating on 2D arrays, not 1D arrays. The 1D
representation is just how C treats it (the contiguousness is still important though):
def py_mat_mul(double[:,::1] A, double[:, ::1] B, double[:, ::1] C):
cdef int l = A.shape[0]
cdef int m = A.shape[1]
cdef int n = B.shape[1]
if B.shape[0] != m:
raise ValueError("B has the wrong first dimension")
if C.shape[0] != l and C.shape[1] != n:
raise ValueError("C has the wrong dimensions")
return mat_mul(&A[0,0], l, m, &B[0,0], n, &C[0])
The next problem is that C is really an output - we shouldn’t be expecting our Python
users to pass it (we might choose to let them as an optimization of course, similar to
how Numpy uses out parameters, but I don’t do this example). Creating the array
ourself also eliminates a source of user error:
def py_mat_mul(double[:, ::1] A, double[:, ::1] B):
cdef int l = A.shape[0]
cdef int m = A.shape[1]
cdef int n = B.shape[1]
if B.shape[0] != m:
raise ValueError("B has the wrong first dimension")
C = np.empty((l, n), dtype=np.float64)
cdef double[:,::1] C_view = C
c_res = mat_mul(&A[0,0], l, m, &B[0,0], n, &C[0])
return c_res, C
I’ve assumed here that our users will probably want a Numpy array back. This is a good assumption, but make it match your use-case.
The final issue - a C error code likely doesn’t mean much a Python user (and a tuple of an error code and a matrix is probably error-prone). Raise a Python exception instead:
def py_mat_mul(double[:, ::1] A, double[:, ::1] B):
cdef int l = A.shape[0]
cdef int m = A.shape[1]
cdef int n = B.shape[1]
if B.shape[0] != m:
raise ValueError("B has the wrong first dimension")
C = np.empty((l, n), dtype=np.float64)
cdef double[:,::1] C_view = C
c_res = mat_mul(&A[0,0], l, m, &B[0,0], n, &C[0])
if c_res == 1:
raise RuntimeError("A contained a NaN")
elif c_res == 2:
raise RuntimeError("B contained a NaN")
elif c_res == 3:
raise RuntimeError("matricies were too big")
elif c_res != 0:
raise RuntimeError("Unknown error in matrix multiplication")
return C
This at least presents a moderately intuitive Python interface.
double** matrices¶
Some C interfaces take the unhelpful decision to store their matrices as indirect pointers. This is rarely a good design decision (it makes much more sense to do what Numpy does and store an underlying 1D matrix and then calculate a 1D index from a 2D index).
This section therefore has two pieces of advice:
Don’t write this yourself¶
If you’ve decided that memoryviews are too high level for what you want
and that you want to “unleash the raw power of C” and use pointers
directly in Cython, don’t
use the double** representation yourself. (You’re also probably
wrong, and memoryviews are probably fine, but that’s a side-issue).
How to handle this interface in external code¶
Although Cython does support indirect memoryviews, very few external libraries can provide matrices that match them (possibly PIL can, I think?). Therefore the underlying advice still applies - provide your data as a big underlying 1D matrix. Then use Cython to create the indirection array of pointers that C needs:
from libc.stdlib cimport malloc, free
cdef extern from "something.h":
double calculate_from_matrix(double **mat, int m, int n)
def py_calculate_from_matrix(double[:,::1] mat):
cdef double** ptrs = <double**>malloc(sizeof(double*)*mat.shape[0])
try:
# Fill in your pointer array
for i in range(mat.shape[0]):
ptrs[i] = &mat[i,0]
return calculate_from_matrix(ptrs, mat.shape[0], mat.shape[1])
finally:
free(ptrs)
This is much more satisfactory that the alternative - having to copy all your data, including a separate allocation and deallocation for each slice of the array.
There’s a couple of small enhancements to consider here: first mat need
not actually be contiguous. Instead only the second dimension must
be contiguous. There’s no direct way of expressing a memoryview like this
but you could always except a generic 2D memoryview and validate it yourself.
Second, you could avoid the malloc and free with a Numpy
array of dtype=np.uintp (an unsigned int big enough to hold a pointer).
This is likely slightly slower and involves a bit of casting, but it
removes the need for manual memory management.
vector<double>/vector<vector<double>> matrices¶
Some C++ code uses std::vector as a representation of a matrix/array type.
This is a bit of a pain from Cython - you basically have to copy the data
into the vector (since there’s no way of making a vector “point” at some
data).
Nested vectors have many of the same disadvantages as double** matrices.
The only thing that can’t go wrong is memory management.
Essentially if you have code like this there isn’t a good solution except copying.
Most proper C++ matrix algebra libraries (e.g. Eigen) provide a way of wrapping existing data, so you should use that where possible.