Serialization and deserialization of numpy data

Keywords: Python encoding

It can be saved as a binary file or as a text file:

  • Save as a binary file (.npy/.Npz)
    • numpy.save
    • numpy.savez
    • numpy.savez_compressed
  • Save to text file
    • numpy.savetxt
    • numpy.loadtxt

When reading a large number of numerical files regularly (such as in-depth learning training data), we can consider storing the data in Numpy format, and then directly use Numpy to read, which is much faster than before conversion.

Save as a binary file (.npy/.Npz)

numpy.save

numpy.save
# Save an array to a binary file in. np format
numpy.save(file, 				#File Name/File Path 
		   arr, 				#Array to store
		   allow_pickle=True,   #Boolean value, allowing the use of Python pickles to save an array of objects (optional parameters, default can be)
		   fix_imports=True)    #To facilitate reading Python 3 saved data in Python 2 (optional parameters, default can be used)

Example:

>>> import numpy as np 
#Generate data 
>>> x=np.arange(10) 
>>> x 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
 
#Data preservation 
>>> np.save('save_x',x) 
 
#Read saved data 
>>> np.load('save_x.npy') 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 

numpy.savez

This is also to save arrays to a binary file, but the powerful thing is that it can save multiple arrays to the same file in the format of. npz, which is actually saved by several previous np.save. npy, and then packaged these files into a file, you can not decompress the. npz file, which is saved many.npy.

numpy.savez(file, 		#File Name/File Path
			*args,		#To store an array, you can write more than one. If you do not specify a Key to the array, Numpy will default to name it in the way of'arr_0','arr_1'.
			**kwds      #(Optional parameters, default)

Example:

>>> import numpy as np 
#Generate data 
>>> x=np.arange(10) 
>>> x 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
>>> y=np.sin(x) 
>>> y 
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 , 
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849]) 
        
#Data preservation 
>>> np.save('save_xy',x,y) 
 
#Read saved data 
>>> npzfile=np.load('save_xy.npz') 
>>> npzfile  #Is an object that cannot be read 
<numpy.lib.npyio.NpzFile object at 0x7f63ce4c8860> 
 
#Access by default key of the array 
>>> npzfile['arr_0'] 
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
>>> npzfile['arr_1'] 
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 , 
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849]) 

What's more amazing is that instead of using Numpy's default Key to an array, you can give the array a meaningful Key yourself, so you don't have to guess if you need to load the data yourself.

    #Data preservation 
    >>> np.savez('newsave_xy',x=x,y=y) 
     
    #Read saved data 
    >>> npzfile=np.load('newsave_xy.npz') 
     
    #Access by setting the array key at save time 
    >>> npzfile['x'] 
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) 
    >>> npzfile['y'] 
    array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 , 
           -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849]) 

Sometimes you save training sets, validation sets, test sets, and their labels in this way. If you store them in this way, the number of files will be greatly reduced, and the names of files will not be changed everywhere.

numpy.savez_compressed

This is to add compression on the basis of the previous numpy.savez. In particular, I mentioned earlier that numpy.savez is packaged and uncompressed. This file is packaged with compression. It can be understood that the size of each npy file is unchanged before compression, and the npz file obtained by using this function is smaller than the previous numpy.savez file.
Note: The parameters required for the function are the same as numpy.savez, and the usage is the same.

Save to text file

numpy.savetxt

numpy.savetxt(fname,	              #File name / file path, if the file suffix is. gz, the file will be automatically saved as. gzip format, which can be recognized by np.loadtxt
			  X,					  #Arrays of 1D or 2D to be stored
			  fmt='%.18e',   		  #Control the format of data storage
			  delimiter=' ', 		  #Separators between data columns
			  newline='\n', 			  #Separators between rows of data
			  header='',			  #String Written in File Header Step
			  footer='', 			  #A string written at the bottom of a file
			  comments='#',			  #The opening character of a file's header or tail string,The default is'#'
			  encoding=None) 		  #Use default parameters

Example:

>>> import numpy as np 
#Generate data 
>>> x = y = z = np.ones((2,3)) 
>>> x 
array([[1., 1., 1.], 
       [1., 1., 1.]]) 
        
#Save data 
np.savetxt('test.out', x) 
np.savetxt('test1.out', x,fmt='%1.4e') 
np.savetxt('test2.out', x, delimiter=',') 
np.savetxt('test3.out', x,newline='a') 
np.savetxt('test4.out', x,delimiter=',',newline='a') 
np.savetxt('test5.out', x,delimiter=',',header='abc') 
np.savetxt('test6.out', x,delimiter=',',footer='abc') 

numpy.loadtxt

numpy.laodtxt(fname,				#File name / file path, if the file suffix is. gz or. bz2, the file will be decompressed and then loaded
			  dtype,			    #Data type to read
			  comments='#',         #The opening character of a file's head or tail string, used to identify the head and tail strings
			  delimiter=None, 		#Partition of read-up values into strings
			  converters=None,		#Separators between rows of data
			  skiprows=0,
			  usecols=None,
			  unpack=False,
			  ndmin=0,
			  encoding='bytes')

Example:

np.loadtxt('test.out') 
np.loadtxt('test2.out', delimiter=',') 

Reference:
https://www.cnblogs.com/wushaogui/p/9142019.html

Posted by sitorush on Fri, 17 May 2019 12:41:23 -0700