A simple library for reading and writing NumPy arrays in C code. It is independent of Python both compile time and runtime.
This tiny C library can read and write NumPy arrays files (.npy
) into memory and keep it
in a C structure. There is no matrix operations available, just reading and
writing. There is not even methods to set and get elements of the array.
The idea is that you can use cblas or something similar for the matrix operations. I therefore have no intentions of adding such features.
I wrote this to be able to pass Keras saved neural network weight into a format that can be opened in a C implemented neural network.
Credit should also go to Just Jordi Castells, and his blogpost, which inspired me to write this.
This software is licensed under BSD (3 clause) license.
Added support for memory mapping (mmap()
) of arrays instead of reading them into memory.
So far this feature will only map a .npy
file read-only in shared and protected memory.
In that sense it is useful for retrieving data from large pre-calculated arrays. There are
several advantages of this: The memory for the file is mapped by the OS, such that the
memory footprint of the running process becomes much smaller, several processes can share
the mapped memory as several processes reads from the same file, and it is much faster as
the file is not read into virtual memory.
The API for mapping, is similar to to loading a file. It's just one more construction function:
npy_array_t * npy_array_mmap( const char *filename );
It cannot be simpler than that. If you are sure you only need the data to be read only, you can
actually just use this function as a drop-in replacement to npy_array_load()
. When you are
done with the array, you should be release its resources by calling npy_array_free( array );
.
There is also a new member in the npy_array_t
structure: void *map_addr;
. Do not use this.
Consider it private. Do not alter it, as it is used for unmapping when cleaning up.
There are currently no plan to support writing to mmap()'ed arrays. If you need such feature, please make a pull request, and I will probably merge.
There is also no plan to support memory mapping for .npz
files.
(Also: mmap()
is actually POSIX standard and not ANSI. If ANSI compatibility
is important to you, maybe compile with out these feature.)
The archive (.npz
) files are now handled by libzip. This redesign
creates a dependency of libzip of course, but it simplifies the code a lot. It also makes it
possible to read and save compressed NumPy arrays. It is therefore added a new public function:
int
npy_array_list_save_compressed( const char *filename,
npy_array_list_t *array_list,
zip_int32_t comp,
zip_uint32_t comp_flags);
This new public function will save a .npz
file using compression based on comp
and
comp_flags
which are the same parameters as in libzip.
I have made some changes huge changes to this library mid February 2020. The main
data structure is renamed from cmatrix_t
to npy_array_t
to illustrate better that
this is a NumPy n-dimensional array that is available in C. The structures members
are all the same when it comes to names and types.
The API calls has been changed to reflect the data structure name change. All functions are renamed.
Old name | New name |
---|---|
c_npy_matrix_read_file | npy_array_load |
c_npy_matrix_dump | npy_array_dump |
c_npy_matrix_write_file | npy_array_save |
c_npy_matrix_free | npy_array_free |
The new names are shorter and more descriptive.
The next big change is that loading .npz
files no longer returns an array of pointers to
npy_arrays. It will now return a special linked list structure of NumPy arrays, npy_array_list_t
.
The API calls for .npz
has also been changed accordingly.
Old name | New name |
---|---|
c_npy_matrix_array_read | npy_array_list_load |
c_npy_matrix_array_write | npy_array_list_save |
c_npy_matrix_array_length | npy_array_list_length |
c_npy_matrix_array_free | npy_array_list_free |
The structure is pretty self explanatory.
#define NPY_ARRAY_MAX_DIMENSIONS 8
typedef struct _npy_array_t {
char *data;
size_t shape[ NPY_ARRAY_MAX_DIMENSIONS ];
int32_t ndim;
char endianness;
char typechar;
size_t elem_size;
bool fortran_order;
} npy_array_t;
And the linked list structure for .npz
files:
typedef struct _npy_array_list_t {
npy_array_t *array;
char *filename;
struct _npy_array_list_t *next;
} npy_array_list_t;
The API is really simple. There is only 13 public functions:
/* These are the four functions for loading and saving .npy files */
npy_array_t* npy_array_load ( const char *filename);
npy_array_t* npy_array_mmap ( const char *filename);
npy_array_t* npy_array_deepcopy ( const npy_array_t *m );
npy_array_t* npy_array_copy ( const npy_array_t *m );
void npy_array_dump ( const npy_array_t *m );
void npy_array_save ( const char *filename, const npy_array_t *m );
void npy_array_free ( npy_array_t *m );
/* These are the six functions for loading and saving .npz files and lists of NumPy arrays */
npy_array_list_t* npy_array_list_load ( const char *filename );
int npy_array_list_save ( const char *filename, npy_array_list_t *array_list );
size_t npy_array_list_length ( npy_array_list_t *array_list);
void npy_array_list_free ( npy_array_list_t *array_list);
npy_array_list_t* npy_array_list_prepend( npy_array_list_t *list, npy_array_t *array, const char *filename, ...);
npy_array_list_t* npy_array_list_append ( npy_array_list_t *list, npy_array_t *array, const char *filename, ...);
Here is a really simple example. You can compile this with:
gcc -std=gnu99 -Wall -Wextra -O3 -c example.c
gcc -o example example.o npy_array.o
You can then run example with a NumPy file as argument.
#include "npy_array.h"
int main(int argc, char *argv[])
{
if( argc != 2 ) return -1;
npy_array_t *m = npy_array_load( argv[1] );
npy_array_dump( m );
npy_array_save( "tester_save.npy", m);
npy_array_free( m );
return 0;
}
Here is an example of saving multiple arrays into a single .npz file. You can compile this with:
gcc -O3 -Wall -Wextra -pedantic -std=c11 -c example_list.c
gcc -o example_list example_list.o npy_array.o npy_array_list.o -lzip
You can the run example_list with a filename (NumPy compressed) as argument.
#include "npy_array_list.h"
int main(int argc, char *argv[])
{
if( argc != 2 ) return -1;
double data[] = {0,1,2,3,4,5};
npy_array_list_t* list = NULL;
// the first npy_array_t* holds a reference to the data array
list = npy_array_list_append( list,
NPY_ARRAY_BUILDER_COPY(data, SHAPE(3,2), NPY_DTYPE_FLOAT64), "matrix" );
// the second npy_array_t* holds a copy of the data array (hence DEEPCOPY)
list = npy_array_list_append( list,
NPY_ARRAY_BUILDER_DEEPCOPY(data, SHAPE(2,1,2), NPY_DTYPE_FLOAT64), "tensor" );
npy_array_list_save_compressed( argv[1], list, ZIP_CM_DEFAULT, 0 );
npy_array_list_free( list );
}
You may have a pointer to an N-dimensional array, which you want to store as NumPy format, such that you can load it in Python/Jupiter and plot in matplotlib or whatever you find more convenient in Python.
The data structure for npy_array_t
is open and for convenience you can make a new structure
by stack allocation. Here is some example code on how you can save a .npy
file:
#include <npy_array.h>
#include <stdlib.h>
int main()
{
/* set some sizes */
int n_rows = 4;
int n_cols = 3;
/* Allocate the raw data - this can be from a blas or another */
float *arraydata = malloc( n_rows * n_cols * sizeof( float ));
/* fill in some data */
for( int i = 0; i < n_rows * n_cols; i++ )
arraydata[i] = (float) i;
npy_array_save( "my_4_by_3_array.npy",
NPY_ARRAY_BUILDER( arraydata, SHAPE( n_rows, n_cols ), NPY_DTYPE_FLOAT32 ) );
free( arraydata );
return 0;
}
Compile:
gcc -std=c99 -Wall -Wextra -O3 how_to_save.c -o how_to_save `pkg-config --libs npy_array`
When this is then executed, you can verify that you got the save .npy
file and that
it's possible to read this in Python/NumPy.
>>> import numpy as np
>>> a = np.load("my_4_by_3_array.npy")
>>> a
array([[ 0., 1., 2.],
[ 3., 4., 5.],
[ 6., 7., 8.],
[ 9., 10., 11.]], dtype=float32)
There is now a simple configure file provided (NOT autoconf/automake generated). From scratch:
./configure --prefix=/usr/local/
make
sudo make install
Please see the INSTALL.md
file for further compilation options.
This is written in a full hurry one afternoon, and then modified over some time.
There isn't much of testing performed, and you can read the code to see what is does.
All errors are written to STDERR. So, reading and writing of both .npy
and .npz
files seems to work OK -- some obvious bugs of course --
- Bugfixes
- Documentation
- Cleanup
- Refactorisation