././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726235029.7443438 mrcz-0.5.7/0000777000000000000000000000000014671040626007436 5ustar00././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/AUTHORS.txt0000666000000000000000000000024514671040216011320 0ustar00MRCZ was initially written by Robert A. McLeod Ricardo Righetto contributed the ability to index individual slices/frames in uncompressed files and bug fixes. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/LICENSE.txt0000666000000000000000000000311114671040216011250 0ustar00=====python-mrcz is released under the BSD-3-clause License===== Copyright (c) 2016-2017, Authors (see AUTHORS.txt) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/MANIFEST.in0000666000000000000000000000017414671040216011171 0ustar00include LICENSE.txt include README.rst include AUTHORS.txt include RELEASE_NOTES.txt include optional-requirements.txt ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726235029.7408373 mrcz-0.5.7/PKG-INFO0000666000000000000000000000303314671040626010532 0ustar00Metadata-Version: 2.1 Name: mrcz Version: 0.5.7 Summary: MRCZ meta-compressed image file-format library Home-page: http://github.com/em-MRCZ/python-mrcz Author: Robert A. McLeod, Ricardo Righetto Author-email: robbmcleod@gmail.com License: https://opensource.org/licenses/BSD-3-Clause Platform: any Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Information Technology Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: System :: Archiving :: Compression Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: Unix License-File: LICENSE.txt License-File: AUTHORS.txt Requires-Dist: numpy MRCZ is a highly optimized compressed version of the popular electron microscopy MRC image format. It uses the Blosc meta-compressor library as a backend. It can use a number of high-performance loseless compression codecs such as 'lz4' and 'zstd', it can apply bit-shuffling filters, and operates compression in a blocked and multi-threaded way to take advantage of modern multi-core CPUs. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/README.rst0000666000000000000000000000370014671040216011120 0ustar00============================================== Python MRCZ meta-compressed file-format module ============================================== ``mrcz`` is a package designed to supplement the venerable MRC image file format with a highly efficient compressed variant, using the ``blosc`` meta-compressor library to shrink files on disk and greatly accelerate file input/output for the era of "Big Data" in electron and optical microscopy. Python versions 2.7, 3.4-3.6 are supported. ``mrcz`` is currently considered to be a `beta` development state. ``mrcz`` is released under the BSD 3-clause license. Installation ------------ A scientific Python installation (such as Anaconda, WinPython, or Canopy) is advised. After installation of your Python environment, from a command prompt type:: pip install mrcz ``mrcz`` has the following dependencies: * ``numpy`` * ``blosc`` (optionally, but highly recommended) Feature List ------------ * Import: DM4, MRC, MRCZ formats * Export: MRC, MRCZ formats * Compress and bit-shuffle image stacks and volumes with ``blosc`` meta-compressor * Asynchronous read and write operations. * Support in the ``hyperspy`` electron microscopy package. Documentation ------------- Documentation is hosted at http://python-mrcz.readthedocs.io/ Authors ------- See ``AUTHORS.txt``. Citations --------- * R.A. McLeod, R. Diogo-Righetto, A. Stewart, H. Stahlberg, "MRCZ – A file format for cryo-TEM data with fast compression," Journal of Structural Biology, 201 (3) (March 2018): 252-267, https://doi.org/10.1016/j.jsb.2017.11.012 * A. Cheng et al., "MRC2014: Extensions to the MRC format header for electron cryo-microscopy and tomography", Journal of Structural Biology 192(2): 146-150, November 2015, http://dx.doi.org/10.1016/j.jsb.2015.04.002 * V. Haenel, "Bloscpack: a compressed lightweight serialization format for numerical data", arXiv:1404.6383 ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726235029.6800406 mrcz-0.5.7/mrcz/0000777000000000000000000000000014671040626010411 5ustar00././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/mrcz/ReliablePy.py0000666000000000000000000013334014671040216013012 0ustar00# -*- coding: utf-8 -*- """ Python Utilities for Relion Created on Tue Dec 1 14:26:13 2015 @author: Robert A. McLeod @email: robbmcleod@gmail.com OR robert.mcleod@unibas.ch This is a primarily a general parser for Relion star files. It creates a two-level dictionary, with the "data_*" level at the top and the "_rln*" level at the second. Use the star.keys() function to see what values the dictionary has. I.e. rln.star.keys() and then rln.star['data_whatever'].keys() Example usage: rln = ReliablePy() # Wildcards can be loaded rln.load( 'PostProcess*.star' ) # Plot the Fourier Shell Correlation plt.figure() plt.plot( rln.star['data_fsc']['Resolution'], rln.star['data_fsc']['FourierShellCorrelationUnmaskedMaps'], '.-' ) plt.xlabel( "Resolution" ) plt.ylabel( "FSC" ) Note that all Relion strings are byte-strings (char1) rather than UTF encoded. """ from __future__ import division, print_function, absolute_import from . import ioDM, ioMRC import numpy as np import os, os.path import glob import time from collections import OrderedDict # The following are not requirements of python-mrcz, only ReliablePy: import matplotlib.pyplot as plt import scipy import pandas # Static variable decorator def static_var(varname, value): def decorate(func): setattr(func, varname, value) return func return decorate def apodization( name = 'butter.32', shape= [2048,2048], radius=None ): """ apodization( name = 'butter.32', size = [2048,2048], radius=None ) Provides a 2-D filter or apodization window for Fourier filtering or image clamping. Radius = None defaults to shape/2 Valid names are: 'hann' - von Hann cosine window on radius 'hann_square' as above but on X-Y 'hamming' - good for apodization, nonsense as a filter 'butter.X' Butterworth multi-order filter where X is the order of the Lorentzian 'butter_square.X' Butterworth in X-Y 'gauss_trunc' - truncated gaussian, higher performance (smaller PSF) than hann filter 'gauss' - regular gaussian NOTE: There are windows in scipy.signal for 1D-filtering... WARNING: doesn't work properly for odd image dimensions """ # Make meshes shape = np.asarray( shape ) if radius is None: radius = shape/2.0 else: radius = np.asarray( radius, dtype='float' ) # DEBUG: Doesn't work right for odd numbers [xmesh,ymesh] = np.meshgrid( np.arange(-shape[1]/2,shape[1]/2), np.arange(-shape[0]/2,shape[0]/2) ) r2mesh = xmesh*xmesh/( np.double(radius[0])**2 ) + ymesh*ymesh/( np.double(radius[1])**2 ) try: [name, order] = name.lower().split('.') order = np.double(order) except ValueError: order = 1 if name == 'butter': window = np.sqrt( 1.0 / (1.0 + r2mesh**order ) ) elif name == 'butter_square': window = np.sqrt( 1.0 / (1.0 + (xmesh/radius[1])**order))*np.sqrt(1.0 / (1.0 + (ymesh/radius[0])**order) ) elif name == 'hann': cropwin = ((xmesh/radius[1])**2.0 + (ymesh/radius[0])**2.0) <= 1.0 window = cropwin.astype('float') * 0.5 * ( 1.0 + np.cos( 1.0*np.pi*np.sqrt( (xmesh/radius[1])**2.0 + (ymesh/radius[0])**2.0 ) ) ) elif name == 'hann_square': window = ( (0.5 + 0.5*np.cos( np.pi*( xmesh/radius[1]) ) ) * (0.5 + 0.5*np.cos( np.pi*( ymesh/radius[0] ) ) ) ) elif name == 'hamming': cropwin = ((xmesh/radius[1])**2.0 + (ymesh/radius[0])**2.0) <= 1.0 window = cropwin.astype('float') * ( 0.54 + 0.46*np.cos( 1.0*np.pi*np.sqrt( (xmesh/radius[1])**2.0 + (ymesh/radius[0])**2.0 ) ) ) elif name == 'hamming_square': window = ( (0.54 + 0.46*np.cos( np.pi*( xmesh/radius[1]) ) ) * (0.54 + 0.46*np.cos( np.pi*( ymesh/radius[0] ) ) ) ) elif name == 'gauss' or name == 'gaussian': window = np.exp( -(xmesh/radius[1])**2.0 - (ymesh/radius[0])**2.0 ) elif name == 'gauss_trunc': cropwin = ((0.5*xmesh/radius[1])**2.0 + (0.5*ymesh/radius[0])**2.0) <= 1.0 window = cropwin.astype('float') * np.exp( -(xmesh/radius[1])**2.0 - (ymesh/radius[0])**2.0 ) elif name == 'lanczos': print( "TODO: Implement Lanczos window" ) return else: print( "Error: unknown filter name passed into apodization" ) return return window def pyFFTWPlanner( realMage, fouMage=None, wisdomFile = None, effort = 'FFTW_MEASURE', n_threads = None, doForward = True, doReverse = True ): """ Appends an FFTW plan for the given realMage to a text file stored in the same directory as RAMutil, which can then be loaded in the future with pyFFTWLoadWisdom. NOTE: realMage should be typecast to 'complex64' normally. NOTE: planning pickle files are hardware dependant, so don't copy them from one machine to another. wisdomFile allows you to specify a .pkl file with the wisdom tuple written to it. The wisdomFile is never updated, whereas the default wisdom _is_ updated with each call. For multiprocessing, it's important to let FFTW generate its plan from an ideal processor state. TODO: implement real, half-space fourier transforms rfft2 and irfft2 as built """ import pyfftw import pickle import os.path from multiprocessing import cpu_count utilpath = os.path.dirname(os.path.realpath(__file__)) # First import whatever we already have if wisdomFile is None: wisdomFile = os.path.join( utilpath, "pyFFTW_wisdom.pkl" ) if os.path.isfile(wisdomFile): try: fh = open( wisdomFile, 'rb') except: print( "Util: pyFFTW wisdom plan file: " + str(wisdomFile) + " invalid/unreadable" ) try: pyfftw.import_wisdom( pickle.load( fh ) ) except: # THis is not normally a problem, it might be empty? print( "Util: pickle failed to import FFTW wisdom" ) pass try: fh.close() except: pass else: # Touch the file os.umask(0000) # Everyone should be able to delete scratch files with open( wisdomFile, 'wb') as fh: pass # I think the fouMage array has to be smaller to do the real -> complex FFT? if fouMage is None: if realMage.dtype.name == 'float32': print( "pyFFTW is recommended to work on purely complex data" ) fouShape = realMage.shape fouShape.shape[-1] = realMage.shape[-1]//2 + 1 fouDtype = 'complex64' fouMage = np.empty( fouShape, dtype=fouDtype ) elif realMage.dtype.name == 'float64': print( "pyFFTW is recommended to work on purely complex data" ) fouShape = realMage.shape fouShape.shape[-1] = realMage.shape[-1]//2 + 1 fouDtype = 'complex128' fouMage = np.empty( fouShape, dtype=fouDtype ) else: # Assume dtype is complexXX fouDtype = realMage.dtype.name fouMage = np.zeros( realMage.shape, dtype=fouDtype ) if n_threads is None: n_threads = cpu_count() print( "FFTW using " + str(n_threads) + " threads" ) if bool(doForward): #print( "Planning forward pyFFTW for shape: " + str( realMage.shape ) ) FFT2 = pyfftw.builders.fft2( realMage, planner_effort=effort, threads=n_threads, auto_align_input=True ) else: FFT2 = None if bool(doReverse): #print( "Planning reverse pyFFTW for shape: " + str( realMage.shape ) ) IFFT2 = pyfftw.builders.ifft2( fouMage, planner_effort=effort, threads=n_threads, auto_align_input=True ) else: IFFT2 = None # Setup so that we can call .execute on each one without re-copying arrays # if FFT2 is not None and IFFT2 is not None: # FFT2.update_arrays( FFT2.get_input_array(), IFFT2.get_input_array() ) # IFFT2.update_arrays( IFFT2.get_input_array(), FFT2.get_input_array() ) # Something is different in the builders compared to FFTW directly. # Can also repeat this for pyfftw.builders.rfft2 and .irfft2 if desired, but # generally it seems slower. # Opening a file for writing is supposed to truncate it # if bool(savePlan): #if wisdomFile is None: # with open( utilpath + "/pyFFTW_wisdom.pkl", 'wb') as fh: with open( wisdomFile, 'wb' ) as fh: pickle.dump( pyfftw.export_wisdom(), fh ) return FFT2, IFFT2 # TODO: put IceFilter in a ReliablePy utility function file @static_var( "bpFilter", -1 ) @static_var( "mageShape", np.array([0,0]) ) @static_var( "ps", -42 ) @static_var( "FFT2", -42 ) @static_var( "IFFT2", -42 ) def IceFilter( mage, pixelSize=1.0, filtRad = 8.0 ): """ IceFilter applies a band-pass filter to mage that passes the first 3 water ice rings, and then returns the result. pixelSize is in ANGSTROMS because this is bio. Program uses this to calculate the width of the band-pass filter. filtRad is radius of the Gaussian filter (pixels) to apply after Fourier filtration that are periodic artifacts due to multiple defocus zeros being in the band """ # First water ring is at 3.897 Angstroms # Second is ater 3.669 Angstroms # Third is at 3.441 Angstroms # And of course there is strain, so go from about 4 to 3.3 Angstroms in the mesh # Test for existance of pyfftw try: import pyfftw pyfftwFound = True except: pyfftwFound = False # Check to see if we have to update our static variables if ( (IceFilter.mageShape != mage.shape).any() ) or (IceFilter.bpFilter.size == 1) or (IceFilter.ps != pixelSize): # Make a new IceFilter.bpFilter IceFilter.mageShape = np.array( mage.shape ) IceFilter.ps = pixelSize bpMin = pixelSize / 4.0 # pixels tp the 4.0 Angstrom spacing bpMax = pixelSize / 3.3 # pixels to the 3.3 Angstrom spacing # So pixel frequency is -0.5 to +0.5 with shape steps # And we want a bandpass from 1.0/bpMin to 1.0/bpMax, which is different on each axis for rectangular images pixFreqX = 1.0 / mage.shape[1] pixFreqY = 1.0 / mage.shape[0] bpRangeX = np.round( np.array( [ bpMin/pixFreqX, bpMax/pixFreqX ] ) ) bpRangeY = np.round( np.array( [ bpMin/pixFreqY, bpMax/pixFreqY ] ) ) IceFilter.bpFilter = np.fft.fftshift( (1.0 - apodization( name='butter.64', size=mage.shape, radius=[ bpRangeY[0],bpRangeX[0] ] )) * apodization( name='butter.64', size=mage.shape, radius=[ bpRangeY[1],bpRangeX[1] ] ) ) IceFilter.bpFilter = IceFilter.bpFilter.astype( 'float32' ) if pyfftwFound: [IceFilter.FFT2, IceFilter.IFFT2] = pyFFTWPlanner( mage.astype('complex64') ) pass # Apply band-pass filter if pyfftwFound: IceFilter.FFT2.update_arrays( mage.astype('complex64'), IceFilter.FFT2.get_output_array() ) IceFilter.FFT2.execute() IceFilter.IFFT2.update_arrays( IceFilter.FFT2.get_output_array() * IceFilter.bpFilter, IceFilter.IFFT2.get_output_array() ) IceFilter.IFFT2.execute() bpMage = IceFilter.IFFT2.get_output_array() / mage.size else: FFTmage = np.fft.fft2( mage ) bpMage = np.fft.ifft2( FFTmage * IceFilter.bpFilter ) from scipy.ndimage import gaussian_filter bpGaussMage = gaussian_filter( np.abs(bpMage), filtRad ) # So if I don't want to build a mask here, and if I'm just doing band-pass # intensity scoring I don't need it, I don't need to make a thresholded mask # Should we normalize the bpGaussMage by the mean and std of the mage? return bpGaussMage class ReliablePy(object): def __init__( self, *inputs ) : self.verbose = 1 self.inputs = list( inputs ) # _data.star file dicts self.star = OrderedDict() self.par = [] self.pcol = OrderedDict() self.box = [] # Each box file loaded is indexed by its load order / dict could also be done if it's more convienent. # Particle/class data self.mrc = [] self.mrc_header = [] if inputs: self.load( *inputs ) pass def load( self, *input_names ): # See if it's a single-string or list/tuple if not isinstance( input_names, str ): new_files = [] for item in input_names: new_files.extend( glob.glob( item ) ) else: new_files = list( input_names ) for filename in new_files: [fileFront, fileExt] = os.path.splitext( filename ) if fileExt == '.mrc' or fileExt == '.mrcs': self.inputs.append(filename) self.__loadMRC( filename ) elif fileExt == '.star': self.inputs.append(filename) self.__loadStar( filename ) elif fileExt == '.par': self.inputs.append(filename) self.__loadPar( filename ) elif fileExt == '.box': self.inputs.append(filename) self.__loadBox( filename ) else: print( "Unknown file extension passed in: " + filename ) def plotFSC( self ): # Do error checking? Or no? plt.rc('lines', linewidth=2.0, markersize=12.0 ) plt.figure() plt.plot( self.star['data_fsc']['Resolution'], 0.143*np.ones_like(self.star['data_fsc']['Resolution']), '-', color='firebrick', label="Resolution criteria" ) try: plt.plot( self.star['data_fsc']['Resolution'], self.star['data_fsc']['FourierShellCorrelationUnmaskedMaps'], 'k.-', label="Unmasked FSC" ) except: pass try: plt.plot( self.star['data_fsc']['Resolution'], self.star['data_fsc']['FourierShellCorrelationMaskedMaps'], '.-', color='royalblue', label="Masked FSC" ) except: pass try: plt.plot( self.star['data_fsc']['Resolution'], self.star['data_fsc']['FourierShellCorrelationCorrected'], '.-', color='forestgreen', label="Corrected FSC" ) except: pass try: plt.plot( self.star['data_fsc']['Resolution'], self.star['data_fsc']['CorrectedFourierShellCorrelationPhaseRandomizedMaskedMaps'], '.-', color='goldenrod', label="Random-phase corrected FSC" ) except: pass plt.xlabel( "Resolution ($\AA^{-1}$)" ) plt.ylabel( "Fourier Shell Correlation" ) plt.legend( loc='upper right', fontsize=16 ) plt.xlim( np.min(self.star['data_fsc']['Resolution']), np.max(self.star['data_fsc']['Resolution']) ) print( "Final resolution (unmasked): %.2f A"%self.star['data_general']['FinalResolution'] ) print( "B-factor applied: %.1f"%self.star['data_general']['BfactorUsedForSharpening'] ) def plotSSNR( self ): """ Pulls the SSNR from each class in a _model.star file and plots them, for assessing which class is the 'best' class """ N_particles = np.sum( self.star[b'data_model_groups'][b'GroupNrParticles'] ) N_classes = self.star[b'data_model_general'][b'NrClasses'] plt.figure() for K in np.arange( N_classes ): Resolution = self.star[b'data_model_class_%d'%(K+1)][b'Resolution'] SSNR = self.star[b'data_model_class_%d'%(K+1)][b'SsnrMap'] plt.semilogy( Resolution, SSNR+1.0, label="Class %d: %d" %(K+1,N_particles*self.star[b'data_model_classes'][b'ClassDistribution'][K]) ) plt.legend( loc = 'best' ) plt.xlabel( "Resolution ($\AA^{-1}$)" ) plt.ylabel( "Spectral Signal-to-Noise Ratio" ) # Let's also display the class distributions in the legend def pruneParticlesNearImageEdge( self, box = None, shapeImage = [3838,3710] ): """ Removes any particles near image edge. Relion's default behavoir is to replicate pad these, which often leads to it crashing. box is the bounding box size for the particle, in pixels. If a _model.star file is loaded it is automatically detected. Otherwise it must be provided. Image size is not stored anywhere obvious in Relion, so it must be passed in in terms of it's shape in [y,x] """ if box == None: try: box = self.star[b'data_model_general'][b'OriginalImageSize'] except: print( "No box shape found in metadata, load a *_model.star file or provide box dimension" ) return partCount = len( self.star[b'data_'][b'CoordinateX'] ) # Hmm... removing a row is a little painful because I index by keys in columnar format. box2 = box/2 CoordX = self.star[b'data_'][b'CoordinateX'] CoordY = self.star[b'data_'][b'CoordinateY'] keepElements = ~((CoordX < box2)|(CoordY < box2)|(CoordX > shapeImage[1]-box2)|(CoordY > shapeImage[0]-box2)) for key, store in self.star[b'data_'].items(): self.star[b'data_'][key] = store[keepElements] print( "Deleted %d"%(partCount-len(self.star[b'data_'][b'CoordinateX']) ) + " particles too close to image edge" ) pass def permissiveMask( self, volumeThres, gaussSigma = 5.0, gaussRethres = 0.07, smoothSigma=1.5 ): """ Given a (tight) volumeThres(hold) measured in Chimera or IMS, this function generates a Gaussian dilated mask that is then smoothed. Everything is done with Gaussian operations so the Fourier space representation of the mask should be relatively smooth as well, and hence ring less. Excepts self.mrc to be loaded. Populates self.mask. """ thres = self.mrc > volumeThres; thres = thres.astype('float32') gaussThres = scipy.ndimage.gaussian_filter( thres, gaussSigma ) rethres = gaussThres > gaussRethres; rethres = rethres.astype('float32') self.mask = scipy.ndimage.gaussian_filter( rethres, smoothSigma ) print( "permissive mask complete, use ioMRC.writeMRC(self.mrc, 'maskname.mrc') to save" ) pass def box2star( self, directory = "." ): """ Converts all EMAN .box files in a directory to the associated .star files. Relion cannot successfully rescale particles if they come in .box format. Also does box pruning if they are too close to an edge. """ boxList = glob.glob( os.path.join( directory, "*.box") ) starHeader = """ data_ loop_ _rlnCoordinateX #1 _rlnCoordinateY #2 """ shapeImage = [3838,3710] for boxFile in boxList: print( "Loading %s" % boxFile ) boxData = np.loadtxt(boxFile) xCoord = boxData[:,0] yCoord = boxData[:,1] boxX = boxData[:,2]/2 boxY = boxData[:,3]/2 keepElements = ~((xCoord < boxX)|(yCoord < boxY)|(xCoord > shapeImage[1]-boxX)|(yCoord> shapeImage[0]-boxY)) xCoord = xCoord[keepElements] yCoord = yCoord[keepElements] boxX = boxX[keepElements] boxY = boxY[keepElements] starFilename = os.path.splitext( boxFile )[0] + ".star" with open( starFilename, 'wb' ) as sh: sh.writelines( starHeader ) for J in np.arange(0,len(xCoord)): sh.write( "%.1f %.1f\n" % (xCoord[J]+boxX[J], yCoord[J]+boxY[J] ) ) sh.write( "\n" ) sh.close() def regroupKmeans( self, partPerGroup = 100, miniBatch=True ): """ Does a 3-D k-means clustering on DefocusU, DefocusV, and GroupScaleCorrection partPerGroup is a suggestion, that is the number of groups is the # of particles / partPerGroup, so outlier groups will tend to have far fewer particle counts that those in the bulk of the data. miniBatch=True is faster for very large sets (>100,000 particles), but somewhat less accurate miniBatch=False is faster for smaller sets, and better overall """ # K-means clustering import sklearn import sklearn.cluster # We need to make an array for all particles that has the GroupScaleCorrection P = len( self.star[b'data_'][b'DefocusU'] ) n_clusters = np.int( P / partPerGroup ) DefocusU = self.star[b'data_'][b'DefocusU'] DefocusV = self.star[b'data_'][b'DefocusV'] DefocusMean = 0.5* (DefocusU + DefocusV) if b'data_model_groups' in self.star: SCALE_CORR_PRESENT = True part_GroupScaleCorrection = np.zeros_like( self.star[b'data_'][b'DefocusU'] ) # Build a GroupScaleCorrection vector for J, groupNr in enumerate( self.star[b'data_'][b'GroupNumber'] ): part_GroupScaleCorrection[J] = self.star[b'data_model_groups'][b'GroupScaleCorrection'][ np.argwhere(self.star[b'data_model_groups'][b'GroupNumber'] == groupNr)[0] ] else: print( "No _model.star loaded, not using scale correction" ) SCALE_CORR_PRESENT = False ################## # K-means clustering: ################## print( "Running K-means clustering analysis for " + str(P) + " particles into " + str(n_clusters) + " clusters" ) t0 = time.time() if bool(miniBatch): print( "TODO: determine number of jobs for K-means" ) k_means = sklearn.cluster.MiniBatchKMeans( n_clusters=n_clusters, init_size=3*n_clusters+1 ) else: k_means = sklearn.cluster.KMeans( n_clusters=n_clusters, n_jobs=12 ) #Kmeans_in = np.vstack( [DefocusMean, part_GroupScaleCorrection]).transpose() if SCALE_CORR_PRESENT: Kmeans_in = np.vstack( [DefocusU,DefocusV, part_GroupScaleCorrection]).transpose() else: Kmeans_in = np.vstack( [DefocusU,DefocusV]).transpose() Kmeans_in = sklearn.preprocessing.robust_scale( Kmeans_in ) k_predict = k_means.fit_predict( Kmeans_in ) t1 = time.time() print( "Cluster analysis finished in (s): " + str(t1-t0) ) if self.verbose >= 2: plt.figure() plt.scatter( DefocusMean, part_GroupScaleCorrection, c=k_predict) plt.xlabel( "Defocus ($\AA$)" ) plt.ylabel( "Group scale correction (a.u.)" ) plt.title("K-means on Defocus") ################## # Save the results in a new particles .star file: ################## # Replace, add one to group number because Relion starts counting from 1 particleKey = b"data_" # Add the GroupName field to the star file self.star[particleKey][b'GroupName'] = [""] * len( self.star[particleKey][b'GroupNumber'] ) for J, groupName in enumerate( k_predict ): self.star[particleKey][b'GroupName'][J] = b'G' + str(groupName + 1) # Build a new group number count groupCount = np.zeros_like( self.star[particleKey][b'GroupNumber'] ) for J in np.arange(0,len(groupCount)): groupCount[J] = np.sum( self.star[particleKey][b'GroupNumber'] == J ) self.star[particleKey][b'GroupNumber'] = groupCount # Recalculate number of particles in each group (ACTUALLY THIS SEEMS NOT NECESSARY) #GroupNr = np.zeros( np.max( k_predict )+1 ) #for J in xrange( np.min( k_predict), np.max( k_predict ) ): # GroupNr[J] = np.sum( k_predict == J ) # pass # #for J in xrange(0, len(rln.star[particleKey]['GroupNumber']) ): # rln.star[particleKey]['GroupNumber'][J] = GroupNr[ k_predict[J] ] def saveDataStar( self, outputName, particleKey = b"data_" ): """ Outputs a relion ..._data.star file that has been pruned, regrouped, etc. to outputName """ if outputName == None: # Need to store input star names, and figure out which was the last loaded particles.star file. # [outFront, outExt] = os.path.splitext() raise IOError( "Default filenames for saveDataStar not implemented yet" ) # TODO: more general star file output # Let's just hack this fh = open( outputName, 'wb' ) fh.write( b"\ndata_\n\nloop_\n") # Since we made self.star an OrderedDict we don't need to keep track of index ordering headerKeys = self.star[particleKey].keys() for J, key in enumerate(headerKeys): # print( "Column: " + "_rln" + lookupDict[J+1] + " #" + str(J+1) ) fh.write( b"_rln" + key + " #" + str(J) + "\n") # lCnt = len( headerKeys ) P = len( self.star[particleKey][ self.star[particleKey].keys()[0] ] ) for I in np.arange(0,P): fh.write( b" ") for J, key in enumerate(headerKeys): fh.write( str( self.star[particleKey][key][I] ) ) fh.write( b" " ) fh.write( b"\n" ) fh.close() def saveDataAsPar( self, outputPrefix, N_classes = 1, mag = None, pixelsize=None, particleKey = "data_" ): """ Saves a Relion .star file as a Frealign .par meta-data file. Also goes through all the particles in the Relion .star and generates an appropriate meta-MRC particle file for Frealign. Usage: saveDataAsPar( self, outputPrefix, N_classes = 1, mag = None, pixelsize=None, particleKey = "data_" ) outputPrefix will be appended with "_1_rX.par", where X is the class number. N_classes will generate N classes with random occupancy, or 100.0 % occupancy for one class. mag wil change the Relion magnification to the given integer. pixelsize is also optional. Relion tends to have round-off error in the pixelsize. Use 'relion_stack_create --i particles.star --o forFrealign' to generate the associated mrc file. Also no comment lines are written to the .par file. """ partCount = len( self.star[b'data_'][b'MicrographName'] ) # Need AnglePsi, AngleTilt, and AngleRot if not b'AnglePsi' in self.star[b'data_']: self.star[b'data_'][b'AnglePsi'] = np.zeros( partCount, dtype='float32' ) if not b'AngleTilt' in self.star[b'data_']: self.star['data_'][b'AngleTilt'] = np.zeros( partCount, dtype='float32' ) if not b'AngleRot' in self.star[b'data_']: self.star[b'data_'][b'AngleRot'] = np.zeros( partCount, dtype='float32' ) if not b'OriginY' in self.star[b'data_']: self.star[b'data_'][b'OriginY'] = np.zeros( partCount, dtype='float32' ) if not b'OriginX' in self.star[b'data_']: self.star[b'data_'][b'OriginX'] = np.zeros( partCount, dtype='float32' ) if not b'Magnification' in self.star[b'data_']: self.star[b'data_'][b'Magnification'] = np.zeros( partCount, dtype='float32' ) if not b'GroupNumber' in self.star[b'data_']: self.star[b'data_'][b'GroupNumber'] = np.zeros( partCount, dtype='uint16' ) if not b'DefocusU' in self.star[b'data_']: self.star[b'data_'][b'DefocusU'] = np.zeros( partCount, dtype='float32' ) if not b'DefocusV' in self.star[b'data_']: self.star[b'data_'][b'DefocusV'] = np.zeros( partCount, dtype='float32' ) if not b'DefocusAngle' in self.star[b'data_']: self.star[b'data_'][b'DefocusAngle'] = np.zeros( partCount, dtype='float32' ) # Frealign expects shifts in Angstroms. Pixelsize is sort of sloppily # kept track of in Relion with Magnification and DetectorPixelSize (which # defaults to 14.0) if pixelsize == None: # Detector pixel size in um, we need pixelsize in Angstrom pixelsize = self.star[b'data_'][b'DetectorPixelSize'][0]*1E4 / self.star[b'data_'][b'Magnification'][0] print( "Found pixelsize of %0.f" % pixelsize ) if mag == None: print( "Using Relion magnification of %.f and DSTEP=%.1f" % ( self.star[b'data_'][b'Magnification'][0], self.star[b'data_'][b'DetectorPixelSize'][0]) ) print( "For a K2 (DSTEP=5.0) the appropriate magnification would be %0.f" % 50000/pixelsize ) else: self.star[b'data_'][b'Magnification'] = mag * np.ones_like( self.star[b'data_'][b'Magnification'] ) logP = int(-500) sigma = 1.0 score = 20.0 change = 0.0 for K in np.arange( 1, N_classes+1 ): outputName = outputPrefix + "_1_r%d.par" % K if N_classes > 1: # Add random occupancy occupancy = np.random.uniform( low=0.0, high=100.0, size=len(self.star[b'data_'][b'DefocusU']) ) else: occupancy = 100.0* np.ones_like( self.star[b'data_'][b'DefocusU'] ) with open( outputName, 'w' ) as fh: # Frealign is very picky about the number of digits, see card10.f, line 163 #READ(LINE,*,ERR=99,IOSTAT=CNT)ILIST(NANG), # + PSI(NANG),THETA(NANG),PHI(NANG),SHX(NANG), # + SHY(NANG),ABSMAGP,FILM(NANG),DFMID1(NANG), # + DFMID2(NANG),ANGAST(NANG),OCC(NANG), # + LGP,SIG(NANG),PRESA(NANG) #7011 FORMAT(I7,3F8.2,2F10.2,I8,I6,2F9.1,2F8.2,I10,F11.4, # + F8.2) for J in np.arange(partCount): fh.write( "%7d"%(J+1) + " %8.2f"%self.star[b'data_'][b'AnglePsi'][J] + " %8.2f"%self.star[b'data_'][b'AngleTilt'][J] + " %8.2f"%self.star[b'data_'][b'AngleRot'][J] + " %8.2f"%(self.star[b'data_'][b'OriginX'][J] * pixelsize) + " %8.2f"%(self.star[b'data_'][b'OriginY'][J] * pixelsize) + " %8.0f"%self.star[b'data_'][b'Magnification'][J] + " %6d"%self.star[b'data_'][b'GroupNumber'][J] + " %9.1f"%self.star[b'data_'][b'DefocusU'][J] + " %9.1f"%self.star[b'data_'][b'DefocusV'][J] + " %8.2f"%self.star[b'data_'][b'DefocusAngle'][J] + " %8.2f"%occupancy[J] + " %10d"%logP + " %11.4f"%sigma + " %8.2f"%score + " %8.2f"%change + "\n") pass # Ok and now we need to make a giant particles file? #mrcName, _= os.path.splitext( outputName ) #mrcName = mrcName + ".mrc" #imageNames = np.zeros_like( self.star[b'data_'][b'ImageName'] ) #for J, name in enumerate( self.star[b'data_'][b'ImageName'] ): # imageNames[J] = name.split('@')[1] #uniqueNames = np.unique( imageNames ) # Ordering is preserved, thankfully! # It would be much better if we could write to a memory-mapped file rather than building the entire array in memory # However this is a little buggy in numpy. # https://docs.python.org/2/library/mmap.html instead? #particleList = [] #for uniqueName in uniqueNames: # particleList.extend( ioMRC.readMRC(uniqueName)[0] ) #print( "DONE building particle list!" ) #print( len(particleList) ) #particleArray = np.array( particleList ) # del particleList # We do have the shape parameter that we can pass in to pre-pad the array with all zeros. #ioMRC.writeMRC( particleArray, mrcName, shape=None ) # TODO: no pixelsize pass def saveCtfImagesStar( self, outputName, zorroList = "*.dm4.log", physicalPixelSize=5.0, amplitudeContrast=0.08 ): """ Given a glob pattern, generate a list of zorro logs, or alternatively one can pass in a list. For each zorro log, load it, extract the pertinant info (defocus, etc.). This is a file ready for particle extraction, with imbedded Ctf information. """ import zorro zorroList = glob.glob( zorroList ) headerDict = { b'MicrographName':1, b'CtfImage':2, b'DefocusU':3, b'DefocusV':4, b'DefocusAngle':5, b'Voltage':6, b'SphericalAberration':7, b'AmplitudeContrast':8, b'Magnification':9, b'DetectorPixelSize':10, b'CtfFigureOfMerit': 11 } lookupDict = dict( zip( headerDict.values(), headerDict.keys() ) ) data = OrderedDict() for header in headerDict: data[header] = [None]*len(zorroList) zorroReg = zorro.ImageRegistrator() for J, zorroLog in enumerate(zorroList): zorroReg.loadConfig( zorroLog, loadData=False ) data[b'MicrographName'][J] = zorroReg.files['sum'] data[b'CtfImage'][J] = os.path.splitext( zorroReg.files['sum'] )[0] + ".ctf:mrc" # CTF4Results = [Micrograph number, DF1, DF2, Azimuth, Additional Phase shift, CC, max spacing fit-to] data[b'DefocusU'][J] = zorroReg.CTF4Results[1] data[b'DefocusV'][J] = zorroReg.CTF4Results[2] data[b'DefocusAngle'][J] = zorroReg.CTF4Results[3] data[b'CtfFigureOfMerit'][J] = zorroReg.CTF4Results[5] data[b'Voltage'][J] = zorroReg.voltage data[b'SphericalAberration'][J] = zorroReg.C3 data[b'AmplitudeContrast'][J] = amplitudeContrast data[b'DetectorPixelSize'][J] = physicalPixelSize data[b'Magnification'][J] = physicalPixelSize / (zorroReg.pixelsize * 1E-3) with open( outputName, 'wb' ) as fh: fh.write( b"\ndata_\n\nloop_\n") for J in np.sort(lookupDict.keys()): # print( "Column: " + "_rln" + lookupDict[J+1] + " #" + str(J+1) ) fh.write( b"_rln" + lookupDict[J] + b" #" + str(J) + b"\n") lCnt = len( lookupDict ) for I in np.arange(0,len(zorroList)): fh.write( b" ") for J in np.arange(0,lCnt): fh.write( str( data[lookupDict[J+1]][I] ) ) fh.write( b" " ) fh.write( b"\n" ) def gctfHistogramFilter( self, defocusThreshold = 40000, astigThreshold = 800, fomThreshold = 0.0, resThreshold = 6.0, starName = "micrographs_all_gctf.star", outName = "micrographs_pruned_gctf.star" ): """ gctfHistogramFilter( self, defocusThreshold = 40000, astigThreshold = 800, fomThreshold = 0.0, resThreshold = 6.0, starName = "micrographs_all_gctf.star", outName = "micrographs_pruned_gctf.star" ) Calculates histograms of defocus, astigmatism, figure-of-merit (Pearson correlation coefficient), and resolution limit, and applies the thresholds as specified in the keyword arguments. Plots are generated showing the threshold level. The output star file `outName` rejects all micrographs that fail any of the thresholds. """ self.load( starName ) defocusU = self.star['data_']['DefocusU'] defocusV = self.star['data_']['DefocusV'] finalResolution = self.star['data_']['FinalResolution'] ctfFoM = self.star['data_']['CtfFigureOfMerit'] defocusMean = 0.5 * defocusU + 0.5 * defocusV astig = np.abs( defocusU - defocusV ) [hDefocus, cDefocus] = np.histogram( defocusMean, bins=np.arange(np.min(defocusMean),np.max(defocusMean),500.0) ) hDefocus = hDefocus.astype('float32') cDefocus = cDefocus[:-1] +500.0/2 [hAstig, cAstig] = np.histogram( astig, bins=np.arange(0, np.max(astig), 500.0) ) hAstig = hAstig.astype('float32') cAstig = cAstig[:-1] +500.0/2 [hFoM, cFoM] = np.histogram( ctfFoM, bins=np.arange(0.0,np.max(ctfFoM),0.002) ) hFoM = hFoM.astype('float32') cFoM = cFoM[:-1] +0.002/2.0 [hRes, cRes] = np.histogram( finalResolution, bins=np.arange(np.min(finalResolution),np.max(finalResolution),0.20) ) hRes = hRes.astype('float32') cRes = cRes[:-1] +0.20/2.0 plt.figure() plt.fill_between( cDefocus, hDefocus, np.zeros(len(hDefocus)), facecolor='steelblue', alpha=0.5 ) plt.plot( [defocusThreshold, defocusThreshold], [0, np.max(hDefocus)], "--", color='firebrick' ) plt.xlabel( "Defocus, $C_1 (\AA)$" ) plt.ylabel( "Histogram counts" ) plt.figure() plt.fill_between( cAstig, hAstig, np.zeros(len(hAstig)), facecolor='forestgreen', alpha=0.5 ) plt.plot( [astigThreshold, astigThreshold], [0, np.max(hAstig)], "--", color='firebrick' ) plt.xlabel( "Astigmatism, $A_1 (\AA)$" ) plt.ylabel( "Histogram counts" ) plt.figure() plt.fill_between( cFoM, hFoM, np.zeros(len(hFoM)), facecolor='darkorange', alpha=0.5 ) plt.plot( [fomThreshold, fomThreshold], [0, np.max(hFoM)], "--", color='firebrick' ) plt.xlabel( "Figure of Merit, $R^2$" ) plt.ylabel( "Histogram counts" ) plt.figure() plt.fill_between( cRes, hRes, np.zeros(len(hRes)), facecolor='purple', alpha=0.5 ) plt.plot( [resThreshold, resThreshold], [0, np.max(hRes)], "--", color='firebrick' ) plt.xlabel( "Fitted Resolution, $r (\AA)$" ) plt.ylabel( "Histogram counts" ) #keepIndices = np.ones( len(defocusU), dtype='bool' ) keepIndices = ( ( defocusMean < defocusThreshold) & (astig < astigThreshold) & (ctfFoM > fomThreshold ) & (finalResolution < resThreshold) ) print( "KEEPING %d of %d micrographs" %(np.sum(keepIndices), defocusU.size) ) for key in self.star['data_']: self.star['data_'][key] = self.star['data_'][key][keepIndices] self.saveDataStar( outName ) def __loadPar( self, parname ): """ Frealign files normally have 16 columns, with any number of comment lines that start with 'C' """ # Ergh, cannot have trailing comments with np.loadtxt? self.parCol = [b"N", b"PSI", b"THETA", b"PHI", b"SHX", b"SHY", b"MAG", b"FILM", b"DF1", b"DF2", \ b"ANGAST", b"OCC", b"LogP", b"SIGMA", b"SCORE", b"CHANGE" ] self.par = pandas.read_table( parname, engine='c', sep=' ', header=None, names =self.parCol, quotechar='C' ) #self.par.append( np.loadtxt( parname, comments=b'C' ) ) # TODO: split into a dictionary? # TODO: read comments as well # TODO: use pandas instead? #self.parCol = {b"N":0, b"PSI":1, b"THETA":2, b"PHI":3, b"SHX":4, b"SHY":5, b"MAG":6, b"FILM":7, b"DF1":8, b"DF2":9, # b"ANGAST":10, b"OCC":11, b"LogP":12, b"SIGMA":13, b"SCORE":14, b"CHANGE":15 } #self.parComments = np.loadtxt( parname, comments=b' ' ) def __loadStar( self, starname ): with open( starname, 'rb' ) as starFile: starLines = starFile.readlines() # Remove any lines that are blank blankLines = [I for I, line in enumerate(starLines) if ( line == "\n" or line == " \n") ] for blank in sorted( blankLines, reverse=True ): del starLines[blank] # Top-level keys all start with data_ headerTags = []; headerIndices = [] for J, line in enumerate(starLines): if line.startswith( b"data_" ): # New headerTag headerTags.append( line.strip() ) headerIndices.append( J ) # for end-of-file headerIndices.append(-1) # Build dict keys for K, tag in enumerate( headerTags ): self.star[tag] = OrderedDict() # Read in _rln lines and assign them as dict keys lastHeaderIndex = 0 foundLoop = False if headerIndices[K+1] == -1: #-1 is not end of the array for indexing slicedLines = starLines[headerIndices[K]:] else: slicedLines = starLines[headerIndices[K]:headerIndices[K+1]] for J, line in enumerate( slicedLines ): if line.startswith( b"loop_" ): foundLoop = True elif line.startswith( b"_rln" ): lastHeaderIndex = J # Find all the keys that start with _rln, they are sub-dict keys newKey = line.split()[0][4:] try: newValue = line.split()[1] # If newValue starts with a #, strip it newValue = newValue.lstrip( b'#' ) except: # Some really old Relion star files don't have the column numbers, so assume it's ordered newValue = J # Try to make newValue an int or float, or leave it as a string if that fails try: self.star[tag][newKey] = np.int( newValue ) except: try: self.star[tag][newKey] = np.float( newValue ) except: # leave as a string self.star[tag][newKey] = newValue # Now run again starting at lastHeaderIndex if foundLoop: # Need to check to make sure it's not an empty dict if self.star[tag] == OrderedDict(): continue # Sometimes we have an empty line on the end. for J in np.arange(len(slicedLines)-1,0,-1): if bool( slicedLines[J].strip() ): break slicedLines = slicedLines[:J] endIndex = len(slicedLines) # Reverse sub-dictionary so we can determine by which column goes to which key lookup = dict( zip( self.star[tag].values(), self.star[tag].keys() ) ) print( "DEBUG: lookup = %s" % lookup ) # Pre-allocate, we can determine types later. itemCount = endIndex - lastHeaderIndex - 1 testSplit = slicedLines[lastHeaderIndex+1].split() for K, test in enumerate( testSplit ): self.star[tag][lookup[K+1]] = [None] * itemCount # Loop through and parse items for J, line in enumerate( slicedLines[lastHeaderIndex+1:endIndex] ): for K, item in enumerate( line.split() ): self.star[tag][lookup[K+1]][J] = item pass # Try to convert to int, then float, otherwise leave as a string for key in self.star[tag].keys(): try: self.star[tag][key] = np.asarray( self.star[tag][key], dtype='int' ) except: try: self.star[tag][key] = np.asarray( self.star[tag][key], dtype='float' ) except: self.star[tag][key] = np.asarray( self.star[tag][key] ) pass def __loadMRC( self, mrcname ): mrcimage, mrcheader = ioMRC.readMRC( mrcname, pixelunits=u'nm' ) self.mrc.append( mrcimage ) self.mrc_header.append( mrcheader ) def __loadBox( self, boxname ): self.box.append( np.loadtxt( boxname ) ) # End of relion class ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/mrcz/__init__.py0000666000000000000000000000040614671040216012515 0ustar00 from mrcz.ioMRC import (readMRC, writeMRC, asyncReadMRC, asyncWriteMRC, _setAsyncWorkers, _asyncExecutor, setDefaultThreads) from mrcz.ioDM import readDM4, asyncReadDM4 from mrcz.__version__ import __version__ from mrcz.test_mrcz import test ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726235029.0 mrcz-0.5.7/mrcz/__version__.py0000666000000000000000000000002714671040625013242 0ustar00__version__ = "0.5.7" ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/mrcz/ioDM.py0000666000000000000000000005711214671040216011614 0ustar00# -*- coding: utf-8 -*- from __future__ import division, print_function, absolute_import, unicode_literals import numpy as np # Executor for writing compressed blocks to disk try: from concurrent.futures import ThreadPoolExecutor except ImportError as e: if sys.version_info > (3,0): raise ImportError('Get the backport for `concurrent.futures` for Py2.7 as `pip install futures`') raise e from mrcz.__version__ import __version__ _asyncExecutor = ThreadPoolExecutor(max_workers = 1) class DMImage(object): """ This structure-like class is for storing 2D image or 3D image stack information, since Digital Micrograph likes to have more than one in a file. Stores Calibrations, Data, and ImageTags. The NumPy array reference is stored in `imageData` and the meta-data is stored in a dictionary `imageInfo`. """ def __init__( self ): self.imageInfo = {} # Calibrations, tags, all in a dictionary self.shape = [] # necessary for unraveling self.imageData = np.zeros(1) class readDM4(object): """ A fast DM4 file reader to strip the data out of the large movie-mode files generated by K2 detectors along with important tags. Due to the emphasis on speed it's not a general file parser with dynamic allocation, so if Gatan changes the format a lot it will break the script. Parameters ---------- filename: str the verbose: bool whether to output debugging info or not. Example ------- dm4struct = readDM4( filename, verbose=False) dm4struct holds the (multiple) images in the DM4. Typically the zeroth image is the thumbnail. To convert to MRCZ:: imageData = dm4struct.im[1].imageData is a NumPy array containing image intensities:: imageInfo = dm4struct.im[1].imageInfo is a dict containing meta-data. Note ---- Chris Boothroyd provides some information on the structure of .DM4 files here: http://www.er-c.org/cbb/info/dmformat/ The directory structure of the file is tracked by passing the path of the parent to each tag parser. Gatan likes to have empty tag directories, so they are auto-numbered when present. One can examine file structure by setting ``verbose=True``, if you need to add some more tags to the dictionary generated by the parseTag function, for example. """ def __init__(self, filename, verbose=False): self.f = open(filename, 'rb') if verbose: print("Opening DM4 file: %s" % filename) # Instatiate class fields self._unnamedIndices = np.zeros( 20, dtype='int' ) # maximum number of nested tags self.im = [] # list of DMImage class objects self.verbose = verbose # version = struct.unpack( '>I', f.read(4) ) version = np.fromfile( self.f, dtype='>i4', count=1 )[0] if( version != 4): print( "Warning: Only DM4 is supported, will probably crash spectacularly" ) np.fromfile( self.f, dtype='>i8', count=1 )[0] # rootlen byteord = np.fromfile( self.f, dtype='>i4', count=1 )[0] # 1 = little endian, generally it always is if byteord != 1: print( "Error: only little endian data ordering supported at present" ) return np.fromfile( self.f, dtype='i1', count=1 )[0] # 1 = sorted np.fromfile( self.f, dtype='i1', count=1 )[0] # 1 = open structure ntags_root = np.fromfile( self.f, dtype='>i8', count=1 )[0] # number of tags if self.verbose: print( "DM version %d, byte order %d" %(version, byteord) ) # Now the tags start for J in np.arange( 0, ntags_root ): tag_type = np.fromfile( self.f, dtype='i1', count=1 )[0] # 20 = tag directory, 21 = tag base, 0 == EOF #print( "DEBUG: root tag: %d" % tag_type ) if( tag_type == 20 ): loc_tagstart = self.f.tell() tag_namelen = np.fromfile( self.f, dtype='>i2', count=1 )[0] if tag_namelen > 0: tag_name = self.f.read( tag_namelen ) else: tag_name = b"" tag_fieldlen = np.fromfile( self.f, dtype='>i8', count=1 )[0] #print( "rootTag: tag_name: %s has length %d and field length %d" %(tag_name, tag_namelen, tag_fieldlen )) if tag_name == b"ImageList": self.f.seek( loc_tagstart ) # So each image is an unnamed tag directory, composed of /ImageData and /ImageTags and /UniqueID self.parseTagDir( b"/" ) # We currently do not care about any tag but ImageList break else: # Go to next tag/tagdir self.f.seek( self.f.tell() + tag_fieldlen ) elif( tag_type == 21 ): # throw away tags in root because we don't care for them self.discardTag() elif( tag_type == 0 ): break # EOF pass self.f.close() # Unravel images for mage in self.im: # FYI: Gatan stores image stacks [z,y,x] which is opposite to how they store the dimensions (but is Numpy and C convention) mage.shape = np.flipud( np.array( mage.shape ) ) # This y-flip [:,::-1,...] puts us on the same origin-standard for data as MRC # Gatan uses a different standard for the origin location inside DM4 files. mage.imageData = np.reshape( mage.imageData, mage.shape, order='C' )[:,::-1,...] # Image data is now compatible with MRCs saved in GMS, IMOD, EMAN2, etc. # Clean up anything else def parseTag( self, parent ): """ Parse a tag at the given file handle location """ tag_namelen = np.fromfile( self.f, dtype='>i2', count=1 )[0] if tag_namelen > 0: # tag_name = np.fromfile( self.f, dtype='S%d'% tag_namelen, count=1 )[0] tag_name = self.f.read( tag_namelen ) else: tag_name = b"" tag_fieldlen = np.fromfile( self.f, dtype='>i8', count=1 )[0] # print( "parseTag: tag_name: %s has length %d and field length %d" %(tag_name, tag_namelen, tag_fieldlen )) loc_tag = self.f.tell() # Save location so we can seek to end of tag later self.f.read(4) # Throw away %%%% seperator tag_ninfo = np.fromfile( self.f, dtype='>i8', count=1 )[0] if self.verbose: print( "Found tag: %s%s with %d elements" % (parent, tag_name, tag_ninfo) ) split_tag_name = (parent + tag_name).split( b'/' ) # Skip ImageList imageIndex = np.int64( split_tag_name[2] ) # Check to see if self.im[imageIndex] exists try: self.im[imageIndex] except IndexError: self.im.append( DMImage() ) if split_tag_name[3] == b'ImageData': dimCount = 0 if split_tag_name[4] == b'Calibrations': # No calibrations saved at present # /ImageList/1/ImageData/Calibrations/Brightness/Origin # /ImageList/1/ImageData/Calibrations/Brightness/Scale # /ImageList/1/ImageData/Calibrations/Brightness/Units # /ImageList/1/ImageData/Calibrations/Dimension/[0-2]/Origin # /ImageList/1/ImageData/Calibrations/Dimension/[0-2]/Scale # /ImageList/1/ImageData/Calibrations/Dimension/[0-2]/Units if split_tag_name[5] == b'Brightness': if split_tag_name[6] == b'Origin': self.im[imageIndex].imageInfo['IntensityOrigin'] = self.retrieveTagData( tag_ninfo ) elif split_tag_name[6] == b'Scale': self.im[imageIndex].imageInfo['IntensityScale'] = self.retrieveTagData( tag_ninfo ) elif split_tag_name[6] == b'Units': # unicode_array = self.retrieveTagData( f, tag_ninfo ) # self.im[imageIndex].imageInfo['IntensityUnits'] = "".join([chr(item) for item in unicode_array]) self.im[imageIndex].imageInfo['IntensityUnits'] = self.retrieveTagData( tag_ninfo ).astype(np.uint8).tostring().decode('utf-8') elif split_tag_name[5] == b'Dimension': # 0 = x, 1 = y, 2 = z for split_tag_name[6] #print( split_tag_name[6] ) #print( np.frombuffer( split_tag_name[6], dtype='>i8' ) ) #print( np.frombuffer( split_tag_name[6], dtype=' 0 ): return np.fromfile(self.f, dtype=array_dtype, count=array_ncount) except IndexError: pass pass elif tag_infos[K] == 9: # string tag_char = self.f.read(1) if self.verbose: print( "Found char: " + tag_char ) pass elif tag_infos[K] == 18: # string # str_len = if self.verbose: print( "FIXME Found string" ) pass else: # singleton tag_dtype = self.getTagDType( tag_infos[K] ) if tag_dtype != '': tag_data = np.fromfile( self.f, dtype = tag_dtype, count=1 )[0] # if self.verbose: print( "Singleton: " + tag_dtype + ": " + str(tag_data) ): if self.verbose: print( "Singleton: " + tag_dtype + ": " + str(tag_data) ) return tag_data pass def discardTag(self): """ Quickly parse to the end of tag that we don't care about its information """ tag_namelen = np.fromfile( self.f, dtype='>i2', count=1 )[0] self.f.seek( self.f.tell() + tag_namelen ) tag_fieldlen = np.fromfile( self.f, dtype='>i8', count=1 )[0] self.f.seek( self.f.tell() + tag_fieldlen ) return def parseTagDir(self, parent): """ Parse a tag directory at the given file handle location """ try: tag_namelen = np.fromfile( self.f, dtype='>i2', count=1 )[0] if tag_namelen > 0: tag_name = self.f.read( tag_namelen ) # tag_name = np.fromfile( self.f, dtype='S%d'%tag_namelen, count=1 )[0] else: tag_name = b"" tag_fieldlen = np.fromfile( self.f, dtype='>i8', count=1 )[0] # print( "parseTagDir: tag_name: %s has length %d and field length %d" %(tag_name, tag_namelen, tag_fieldlen )) except IndexError: if self.verbose: print( "Caught IndexError, trying to recover position in file" ) # f.seek( f.tell() ) return self.f # Handle empty tag_name by giving it an auto-generated number if not bool(tag_name): if( self.verbose ) : print( "Found empty tag" ) tag_depth = int( parent.count( b'/') - 1 ) # This is kind of mental gymnastics to maintain portablity between Python 2 and 3 tag_name = bytearray( str(self._unnamedIndices[ tag_depth ]).encode('ascii') ) # print( "Gymnastics " + str(self._unnamedIndices[ tag_depth ]) + " -> " + str(tag_name) ) self._unnamedIndices[ tag_depth ] += 1 # Reset all higher indices self._unnamedIndices[ tag_depth+1:] = 0 loc_tagdir = self.f.tell() self.f.read( 2 ) # Throw away sorted and closed ntags_dir = np.fromfile( self.f, dtype='>i8', count=1 )[0] if self.verbose: print( "Found tag dir " + str(parent) + str(tag_name) + " with " + str(ntags_dir) + " tags" ) # So typically ImageList has two empty name tag directories, of which one is various stuff and the other is the data for I in np.arange(0, ntags_dir ): try: subtag_type = np.fromfile( self.f, dtype='i1', count=1 )[0] # 20 = tag directory, 21 = tag base, 0 == EOF except IndexError: if self.verbose: print( "Caught IndexError, trying to recover" ) break if( subtag_type == 20 ): self.parseTagDir( bytes(parent) + bytes(tag_name) + b"/" ) elif( subtag_type == 21 ): self.parseTag( bytes(parent) + bytes(tag_name) + b"/" ) elif( subtag_type == 0 ): break # EOF # Go to next tag in root directory self.f.seek( loc_tagdir + tag_fieldlen ) return def asyncReadDM4(*args, **kwargs): ''' Calls `readDM4` in a separate thread and executes it in the background. Parameters ---------- Valid arguments are as for `readDM4()`. Returns ------- future A ``concurrent.futures.Future()`` object. Calling ``future.result()`` will halt until the read is finished and returns the image and meta-data as per a normal call to `readMRC`. Example ------- worker = asyncReadMRC( 'someones_file.mrc' ) # Do some work mrcImage, mrcMeta = worker.result() ''' return _asyncExecutor.submit(readDM4, *args, **kwargs)././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234804.0 mrcz-0.5.7/mrcz/ioMRC.py0000666000000000000000000014004114671040264011732 0ustar00from __future__ import division, print_function, absolute_import, unicode_literals ''' Conventional MRC2014 and on-the-fly compressed MRCZ file interface CCPEM MRC2014 specification: http://www.ccpem.ac.uk/mrc_format/mrc2014.php IMOD specification: http://bio3d.colorado.edu/imod/doc/mrc_format.txt Testing: http://emportal.nysbc.org/mrc2014/ Tested output on: Gatan GMS, IMOD, Chimera, Relion, MotionCorr, UnBlur ''' import os, os.path, sys import numpy as np import threading import struct from enum import Enum try: from concurrent.futures import ThreadPoolExecutor except ImportError as e: if sys.version_info > (3,0): raise ImportError('Get the backport for `concurrent.futures` for Py2.7 as `pip install futures`') raise e from mrcz.__version__ import __version__ from distutils.version import StrictVersion import logging logger = logging.getLogger('MRCZ') try: import blosc BLOSC_PRESENT = True # For async operations we want to release the GIL in blosc operations and # file IO operations. blosc.set_releasegil(True) DEFAULT_N_THREADS = blosc.detect_number_of_cores() except ImportError: # Can be ImportError or ModuleNotFoundError depending on the Python version, # but ModuleNotFoundError is a child of ImportError and is still caught. BLOSC_PRESENT = False logger.info('`blosc` meta-compression library not found, file compression disabled.') DEFAULT_N_THREADS = 1 try: import rapidjson as json except ImportError: import json logger.info('`python-rapidjson` not found, using builtin `json` instead.') def _defaultMetaSerialize(value): """ Is called by `json.dumps()` whenever it encounters an object it does not know how to serialize. Currently handles: 1. Any object with a `serialize()` method, which is assumed to be a helper method. 1. `numpy` scalars as well as `ndarray` 2. Python `Enum` objects which are serialized as strings in the form ``'Enum.{object.__class__.__name__}.{object.name}'``. E.g. ``'Enum.Axis.X'``. """ if hasattr(value, 'serialize'): return value.serialize() elif hasattr(value, '__array_interface__'): # Checking for '__array_interface__' also sanitizes numpy scalars # like np.float32 or np.int32 return value.tolist() elif isinstance(value, Enum): return 'Enum.{}.{}'.format(value.__class__.__name__, value.name) else: raise TypeError('Unhandled type for JSON serialization: {}'.format(type(value))) # Do we also want to convert long lists to np.ndarrays? # Buffer for file I/O # Quite arbitrary, in bytes (hand-optimized) BUFFERSIZE = 2**20 BLOSC_BLOCK = 2**16 DEFAULT_HEADER_LEN = 1024 # ENUM dicts for our various Python to MRC constant conversions COMPRESSOR_ENUM = {0:None, 1:'blosclz', 2:'lz4', 3:'lz4hc', 4:'snappy', 5:'zlib', 6:'zstd'} REVERSE_COMPRESSOR_ENUM = {None:0, 'blosclz':1, 'lz4':2, 'lz4hc':2, 'snappy':4, 'zlib':5, 'zstd':6} MRC_COMP_RATIO = 1000 CCPEM_ENUM = {0: 'i1', 1:'i2', 2:'f4', 4:'c8', 6:'u2', 7:'i4', 8:'u4', 101:'u1'} EMAN2_ENUM = {1: 'i1', 2:'u1', 3:'i2', 4:'u2', 5:'i4', 6:'u4', 7:'f4'} REVERSE_CCPEM_ENUM = {'int8':0, 'i1':0, 'uint4':101, 'int16':1, 'i2':1, 'uint16':6, 'u2':6, 'int32':7, 'i4':7, 'uint32': 8, 'u4':8, 'float64':2, 'f8':2, 'float32':2, 'f4':2, 'complex128':4, 'c16':4, 'complex64':4, 'c8':4} WARNED_ABOUT_CASTING_F64 = False WARNED_ABOUT_CASTING_C128 = False # Executor for writing compressed blocks to disk _asyncWriter = ThreadPoolExecutor(max_workers = 1) # Executor for calls to asyncReadMRC and asyncWriteMRC _asyncExecutor = ThreadPoolExecutor(max_workers = 1) def _setAsyncWorkers(N_workers): ''' **This function is protected as there appears to be little value in using more than one worker. It may be subject to removal in the future.** Sets the maximum number of background workers that can be used for reading or writing with the functions. Defaults to 1. Generally when writing to hard drives use 1 worker. For random-access drives there may be advantages to using multiple workers. Some test results, 30 files, each 10 x 2048 x 2048 x float-32, on a CPU with 4 physical cores: HD, 1 worker: 42.0 s HD, 2 workers: 50.0 s HD, 4 workers: 50.4 s SSD, 1 worker: 12.6 s SSD, 2 workers: 11.6 s SSD, 4 workers: 8.9 s SSD, 8 workers: 16.9 s Parameters ---------- N_workers: int The number of threads for asynchronous reading and writing to disk ''' if N_workers <= 0: raise ValueError('N_workers must be greater than 0') if _asyncExecutor._max_workers == N_workers: return _asyncExecutor._max_workers = N_workers _asyncExecutor._adjust_thread_count() def setDefaultThreads(n_threads): """ Set the default number of threads, if the argument is not provided in calls to `readMRC` and `writeMRC`. Generally optimal thread count is the number of physical cores, but blosc defaults to the number of virtual cores. Therefore on machines with hyperthreading it can be more efficient to manually set this value """ global DEFAULT_N_THREADS DEFAULT_N_THREADS = int(n_threads) def defaultHeader(): ''' Generator function to create a metadata header dictionary with the relevant fields. Returns ------- header a default MRC header dictionary with all fields with default values. ''' header = {} header['fileConvention'] = 'ccpem' header['endian'] = 'le' header['MRCtype'] = 0 header['dimensions'] = np.array( [0,0,0], dtype=int ) header['dtype'] = 'u1' header['compressor'] = None header['packedBytes'] = 0 header['clevel'] = 1 header['maxImage'] = 1.0 header['minImage'] = 0.0 header['meanImage'] = 0.0 header['pixelsize'] = 0.1 header['pixelunits'] = u'nm' # Can be '\\AA' for Angstroms header['voltage'] = 300.0 # kV header['C3'] = 2.7 # mm header['gain'] = 1.0 # counts/electron if BLOSC_PRESENT: header['n_threads'] = DEFAULT_N_THREADS return header def _getMRCZVersion(label): """ Checks to see if the first label holds the MRCZ version information, in which case it returns a version object. Generally used to recover nicely in case of backward compatibility problems. Parameters ---------- label: Union[str, bytes] Returns ------- version: Optional[distutils.version.StrictVersion] areturns ``None`` if `label` cannot be parsed. """ if isinstance(label, bytes): label = label.decode() label = label.rstrip(' \t\r\n\0') if not label.startswith('MRCZ'): return None label = label[4:] try: version = StrictVersion(label) return version except ValueError: return None def readMRC(MRCfilename, idx=None, endian='le', pixelunits=u'\\AA', fileConvention='ccpem', useMemmap=False, n_threads=None, slices=None): ''' Imports an MRC/Z file as a NumPy array and a meta-data dict. Parameters ---------- image: numpy.ndarray a 1-3 dimension ``numpy.ndarray`` with one of the supported data types in ``mrcz.REVERSE_CCPEM_ENUM`` meta: dict a ``dict`` with various fields relating to the MRC header information. Can also hold arbitrary meta-data, but the use of large numerical data is not recommended as it is encoded as text via JSON. idx: Tuple[int] Index tuple ``(first, last)`` where first (inclusive) and last (not inclusive) indices of images to be read from the stack. Index of first image is 0. Negative indices can be used to count backwards. A singleton integer can be provided to read only one image. If omitted, will read whole file. Compression is currently not supported with this option. pixelunits: str can be ``'AA' (Angstoms), 'nm', '\mum', or 'pm'``. Internally pixel sizes are always encoded in Angstroms in the MRC file. fileConvention: str can be ``'ccpem'`` (equivalent to IMOD) or ``'eman2'``, which is only partially supported at present. endian: str can be big-endian as ``'be'`` or little-endian as ``'le'``. Defaults to `'le'` as the vast majority of modern computers are little-endian. n_threads: int is the number of threads to use for decompression, defaults to use all virtual cores. useMemmap: bool = True returns a ``numpy.memmap`` instead of a ``numpy.ndarray``. Not recommended as it will not work with compression. slices: Optional[int] = None Reflects the number of slices per frame. For example, in time-series with multi-channel STEM, would be ``4`` for a 4-quadrant detector. Data is always written contiguously in MRC, but will be returned as a list of ``[slices, *shape]``-shaped arrays. The default option ``None`` will check for a ``'slices'`` field in the meta-data and use that, otherwise it defaults to ``0`` which is one 3D array. Returns ------- image: Union[list[numpy.ndarray], numpy.ndarray] If ``slices == 0`` then a monolithic array is returned, else a ``list`` of ``[slices, *shape]``-shaped arrays. meta: dict the stored meta-data in a dictionary. Note that arrays are generally returned as lists due to the JSON serialization. Example ------- [image, meta] = readMRC(MRCfilename, idx=None, pixelunits=u'\\AA', useMemmap=False, n_threads=None) ''' with open(MRCfilename, 'rb', buffering=BUFFERSIZE) as f: # Read in header as a dict header, slices = readMRCHeader(MRCfilename, slices, endian=endian, fileConvention = fileConvention, pixelunits=pixelunits) # Support for compressed data in MRCZ if ( (header['compressor'] in REVERSE_COMPRESSOR_ENUM) and (REVERSE_COMPRESSOR_ENUM[header['compressor']] > 0) and idx == None ): return __MRCZImport(f, header, slices, endian=endian, fileConvention=fileConvention, n_threads=n_threads) # Else load as uncompressed MRC file if idx != None: # If specific images were requested: # TO DO: add support to read all images within a range at once if header['compressor'] != None: raise RuntimeError('Reading from arbitrary positions not supported for compressed files. Compressor = %s'%header['compressor']) if np.isscalar( idx ): indices = np.array([idx, idx], dtype='int') else: indices = np.array(idx, dtype='int') # Convert to old way: idx = indices[0] n = indices[1] - indices[0] + 1 if idx < 0: # Convert negative index to equivalent positive index: idx = header['dimensions'][0] + idx # Just check if the desired image is within the stack range: if idx < 0 or idx >= header['dimensions'][0]: raise ValueError('Error: image or slice index out of range. idx = %d, z_dimension = %d'%(idx, header['dimensions'][0])) elif idx + n > header['dimensions'][0]: raise ValueError('Error: image or slice index out of range. idx + n = %d, z_dimension = %d'%(idx + n, header['dimensions'][0])) elif n < 1: raise ValueError('Error: n must be >= 1. n = %d'%n) else: # We adjust the dimensions of the returned image in the header: header['dimensions'][0] = n # This offset will be applied to f.seek(): offset = idx * np.prod(header['dimensions'][1:])*np.dtype(header['dtype']).itemsize else: offset = 0 f.seek(DEFAULT_HEADER_LEN + header['extendedBytes'] + offset) if bool(useMemmap): image = np.memmap(f, dtype=header['dtype'], mode='c', shape=tuple(dim for dim in header['dimensions'])) else: # Load entire file into memory dims = header['dimensions'] if slices > 0: # List of NumPy 2D-arrays frame_size = slices * np.prod(dims[1:]) n_frames = dims[0] // slices dtype = header['dtype'] # np.fromfile advances the file pointer `f` for us. image = [] for I in range(n_frames): buffer = np.fromfile(f, dtype=dtype, count=frame_size) buffer = buffer.reshape((slices, dims[1], dims[2])).squeeze() image.append(buffer) else: # monolithic NumPy ndarray image = np.fromfile(f, dtype=header['dtype'], count=np.prod(dims)) if header['MRCtype'] == 101: # Seems the 4-bit is interlaced ... interlaced_image = image image = np.empty(np.prod(dims), dtype=header['dtype']) image[0::2] = np.left_shift(interlaced_image,4) / 15 image[1::2] = np.right_shift(interlaced_image,4) image = np.squeeze(image.reshape(dims)) return image, header def __MRCZImport(f, header, slices, endian='le', fileConvention='ccpem', returnHeader=False, n_threads=None): ''' Equivalent to MRCImport, but for compressed data using the blosc library. The following compressors are recommended: [``'zlib'``, ``'zstd'``, ``'lz4'``] Memory mapping is not possible in this case at present. Possibly support can be added for memory mapping with `c-blosc2`. ''' if not BLOSC_PRESENT: raise ImportError( '`blosc` is not installed, cannot decompress file.' ) if n_threads == None: blosc.nthreads = DEFAULT_N_THREADS else: blosc.nthreads = n_threads dims = header['dimensions'] dtype = header['dtype'] if slices > 0: image = [] n_frames = dims[0] // slices else: image = np.empty(dims, dtype=dtype) n_frames = dims[0] if slices > 1: target_shape = (slices, dims[1], dims[2]) else: target_shape = (dims[1], dims[2]) # target_shape = (dims[1], dims[2]) blosc_chunk_pos = DEFAULT_HEADER_LEN + header['extendedBytes'] # NOTE: each channel of each frame is separately compressed by blosc, # so that if slices is not what was originally input, each slice can # be decompressed individually. if slices == 1: # List of 2D frames for J in range(n_frames): f.seek(blosc_chunk_pos) ((nbytes, blockSize, ctbytes), (ver_info)) = readBloscHeader(f) f.seek(blosc_chunk_pos) image.append(np.frombuffer( blosc.decompress(f.read(ctbytes)), dtype=dtype).reshape(target_shape)) blosc_chunk_pos += (ctbytes) elif slices > 1: # List of 3D frames for J in range(n_frames): frame = np.empty(target_shape, dtype=dtype) for I in range(slices): f.seek(blosc_chunk_pos) ((nbytes, blockSize, ctbytes), (ver_info)) = readBloscHeader(f) f.seek(blosc_chunk_pos) frame[I,:,:] = np.frombuffer( blosc.decompress(f.read(ctbytes)), dtype=dtype).reshape(target_shape[1:]) blosc_chunk_pos += (ctbytes) image.append(frame) else: # Monolithic frame for J in range(n_frames): f.seek(blosc_chunk_pos) ((nbytes, blockSize, ctbytes), (ver_info)) = readBloscHeader(f) f.seek(blosc_chunk_pos) image[J,:,:] = np.frombuffer( blosc.decompress(f.read(ctbytes)), dtype=dtype).reshape(target_shape) blosc_chunk_pos += (ctbytes) if header['MRCtype'] == 101: # Seems the 4-bit is interlaced if slices > 0: raise NotImplementedError('MRC type 101 (uint4) not supported with return as `list`') interlaced_image = image image = np.empty(np.prod(header['dimensions']), dtype=dtype) # Bit-shift and Bit-and to seperate decimated pixels image[0::2] = np.left_shift(interlaced_image, 4) / 15 image[1::2] = np.right_shift(interlaced_image, 4) if not slices > 0: image = np.squeeze(image) return image, header def readBloscHeader(filehandle): ''' Reads in the 16 byte header file from a blosc chunk. Blosc header format for each chunk is as follows:: |-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-|-8-|-9-|-A-|-B-|-C-|-D-|-E-|-F-| ^ ^ ^ ^ | nbytes | blocksize | ctbytes | | | | | | | | +--typesize | | +------flags | +----------versionlz +--------------version ''' [version, versionlz, flags, typesize] = np.fromfile(filehandle, dtype='uint8', count=4) [nbytes, blocksize, ctbytes] = np.fromfile(filehandle, dtype='uint32', count=3) return ([nbytes, blocksize, ctbytes], [version, versionlz, flags, typesize]) def readMRCHeader(MRCfilename, slices=None, endian='le', fileConvention = 'ccpem', pixelunits=u'\\AA'): ''' Reads in the first 1024 bytes from an MRC file and parses it into a Python dictionary, yielding header information. This function is not intended to be called by the user under typical usage. Parameters ---------- As per `readMRC` Returns ------- header: dict All found meta-data in the header and extended header packaged into a dictionary. ''' if endian == 'le': endchar = '<' else: endchar = '>' dtype_i4 = np.dtype(endchar + 'i4') dtype_f4 = np.dtype(endchar + 'f4') header = {} with open(MRCfilename, 'rb') as f: # Grab version information early f.seek(224) mrcz_version = _getMRCZVersion(f.read(80)) # diagStr = '' # Get dimensions, in format [nz, ny, nx] (stored as [nx,ny,nz] in the file) f.seek(0) header['dimensions'] = np.flipud(np.fromfile(f, dtype=dtype_i4, count=3)) header['MRCtype'] = int(np.fromfile(f, dtype=dtype_i4, count=1)[0]) # Hack to fix lack of standard endian indication in the file header if header['MRCtype'] > 16000000: # Endianess found to be backward header['MRCtype'] = int(np.asarray(header['MRCtype']).byteswap()[0]) header['dimensions'] = header['dimensions'].byteswap() if endchar == '<': endchar = '>' else: endchar = '<' dtype_i4 = np.dtype(endchar + 'i4') dtype_f4 = np.dtype(endchar + 'f4') # Extract compressor from dtype > MRC_COMP_RATIO header['compressor'] = COMPRESSOR_ENUM[np.floor_divide(header['MRCtype'], MRC_COMP_RATIO)] header['MRCtype'] = np.mod(header['MRCtype'], MRC_COMP_RATIO) logger.debug('compressor: %s, MRCtype: %s' % (str(header['compressor']),str(header['MRCtype']))) fileConvention = fileConvention.lower() if fileConvention == 'eman2': try: header['dtype'] = EMAN2_ENUM[header['MRCtype']] except: raise ValueError('Error: unrecognized EMAN2-MRC data type = ' + str(header['MRCtype'])) elif fileConvention == 'ccpem': # Default is CCPEM try: header['dtype'] = CCPEM_ENUM[header['MRCtype']] except: raise ValueError('Error: unrecognized CCPEM-MRC data type = ' + str(header['MRCtype'])) else: raise ValueError('Error: unrecognized MRC file convention: {}'.format(fileConvention)) # Apply endian-ness to NumPy dtype header['dtype'] = endchar + header['dtype'] # slices is z-axis per frame for list-of-arrays representation if slices is None: # We had a bug in version <= 0.4.1 where we wrote the dimensions # into both (Nx, Ny, Nz) AND (Mx, My, Mz), therefore the slicing # is essentially unknown (and wrong). So we have this version # check where we force slices to be 1 (i.e. we assume it is a # stack of 2D images). if mrcz_version is not None and mrcz_version < StrictVersion('0.5.0'): logger.warning('MRCZ version < 0.5.0 for file {}, assuming slices == 1.'.format(MRCfilename)) slices = 1 else: f.seek(36) slices = int(np.fromfile(f, dtype=dtype_i4, count=1)) # Read in pixelsize f.seek(40) cellsize = np.fromfile(f, dtype=dtype_f4, count=3) header['pixelsize'] = np.flipud( cellsize ) / header['dimensions'] # MRC is Angstroms by convention header['pixelunits'] = pixelunits # '\AA' will eventually be deprecated, please cease using it. if header['pixelunits'] == u'\\AA' or header['pixelunits'] == u'\AA': pass elif header['pixelunits'] == u'\mum': header['pixelsize'] *= 1E-5 elif header['pixelunits'] == u'pm': header['pixelsize'] *= 100.0 else: # Default to nm header['pixelsize'] *= 0.1 # Read in [X,Y,Z] array ordering # Currently I don't use this # f.seek(64) # axesTranpose = np.fromfile( f, dtype=endchar + 'i4', count=3 ) - 1 # Read in statistics f.seek(76) header['minImage'], header['maxImage'], header['meanImage'] = np.fromfile(f, dtype=dtype_f4, count=3) # Size of meta-data f.seek(92) header['extendedBytes'] = int(np.fromfile(f, dtype=dtype_i4, count=1)) if header['extendedBytes'] > 0: f.seek(104) header['metaId'] = f.read(4) # Read in kV, C3, and gain f.seek(132) microscope_state = np.fromfile(f, dtype=dtype_f4, count=3) header['voltage'] = float(microscope_state[0]) header['C3'] = float(microscope_state[1]) header['gain'] = float(microscope_state[2]) # Read in size of packed data f.seek(144) # Have to convert to Python int to avoid index warning. header['packedBytes'] = struct.unpack('q', f.read(8)) # Now read in JSON meta-data if present if 'metaId' in header and header['metaId'] == b'json': f.seek(DEFAULT_HEADER_LEN) meta = json.loads(f.read(header['extendedBytes'] ).decode('utf-8')) for key, value in meta.items(): if key not in header: header[key] = value return header, slices def writeMRC(input_image, MRCfilename, meta=None, endian='le', dtype=None, pixelsize=[0.1,0.1,0.1], pixelunits=u'\\AA', shape=None, voltage=0.0, C3=0.0, gain=1.0, compressor=None, clevel=1, n_threads=None, quickStats=True, idx=None): ''' Write a conventional MRC file, or a compressed MRCZ file to disk. If compressor is ``None``, then backwards compatibility with other MRC libraries should be preserved. Other libraries will not, however, recognize the JSON extended meta-data. Parameters ---------- input_image: Union[numpy.ndarray, list[numpy.ndarray]] The image data to write, should be a 1-3 dimension ``numpy.ndarray`` or a list of 2-dimensional ``numpy.ndarray``s. meta: dict will be serialized by JSON and written into the extended header. Note that ``rapidjson`` (the default) or ``json`` (the fallback) cannot serialize all Python objects, so sanitizing ``meta`` to remove non-standard library data structures is advisable, including ``numpy.ndarray`` values. dtype: Union[numpy.dtype, str] will cast the data before writing it. pixelsize: Tuple[x,y,z] is [z,y,x] pixel size (singleton values are ok for square/cubic pixels) pixelunits: str = u'\\AA' one of - ``'\\AA'`` for Angstroms - ``'pm'`` for picometers - ``'\mum'`` for micrometers - ``'nm'`` for nanometers. MRC standard is always Angstroms, so pixelsize is converted internally from nm to Angstroms as needed. shape: Optional[Tuple[int]] is only used if you want to later append to the file, such as merging together Relion particles for Frealign. Not recommended and only present for legacy reasons. voltage: float = 300.0 accelerating potential in keV C3: float = 2.7 spherical aberration in mm gain: float = 1.0 detector gain in units (counts/primary electron) compressor: str = None is a choice of ``None, 'lz4', 'zlib', 'zstd'``, plus ``'blosclz'``, ``'lz4hc'`` - ``'lz4'`` is generally the fastest. - ``'zstd'`` generally gives the best compression performance, and is still almost as fast as 'lz4' with ``clevel == 1``. clevel: int = 1 the compression level, 1 is fastest, 9 is slowest. The compression ratio will rise slowly with clevel (but not as fast as the write time slows down). n_threads: int = None number of threads to use for blosc compression. Defaults to number of virtual cores if ``== None``. quickStats: bool = True estimates the image mean, min, max from the first frame only, which saves computational time for image stacks. Generally strongly advised to be ``True``. idx can be used to write an image or set of images starting at a specific position in the MRC file (which may already exist). Index of first image is 0. A negative index can be used to count backwards. If omitted, will write whole stack to file. If writing to an existing file, compression or extended MRC2014 headers are currently not supported with this option. Returns ------- ``None`` Warning ------- MRC definitions are not consistent. Generally we support the CCPEM2014 schema as much as possible. ''' if not BLOSC_PRESENT and compressor is not None: raise ImportError('`blosc` is not installed, cannot use file compression.') # For dask, we don't want to import dask, but we can still work-around how to # check its type without isinstance() image_type = type(input_image) if image_type.__module__ == 'dask.array.core' and image_type.__name__ == 'Array': # Ideally it would be faster to iterate over the chunks and pass each one # to blosc but that likely requires c-blosc2 input_image = input_image.__array__() dims = input_image.shape slices = 0 global WARNED_ABOUT_CASTING_F64, WARNED_ABOUT_CASTING_C128 if isinstance(input_image, (tuple,list)): shape = input_image[0].shape ndim = input_image[0].ndim if ndim == 3: slices = shape[0] shape = shape[1:] elif ndim == 2: slices = 1 else: raise ValueError('For a sequence of arrays, only 2D or 3D arrays are handled.') dims = np.array([len(input_image)*slices, shape[0], shape[1]]) # Verify that each image in the list is the same 2D shape and dtype first_shape = input_image[0].shape first_dtype = input_image[0].dtype # Cast float64 -> float32, and complex128 -> complex64 for J, z_slice in enumerate(input_image): assert(np.all(z_slice.shape == first_shape)) if z_slice.dtype == np.float64 or z_slice.dtype == float: if not WARNED_ABOUT_CASTING_F64: logger.warn('Casting {} to `numpy.float32`, further warnings will be suppressed.'.format(MRCfilename)) WARNED_ABOUT_CASTING_F64 = True input_image[J] = z_slice.astype(np.float32) elif z_slice.dtype == np.complex128: if not WARNED_ABOUT_CASTING_C128: logger.warn('Casting {} to `numpy.complex64`, further warnings will be suppressed.'.format(MRCfilename)) WARNED_ABOUT_CASTING_C128 = True input_image[J] = z_slice.astype(np.complex64) else: assert(z_slice.dtype == input_image[0].dtype) else: # Array-'like' object dims = input_image.shape if input_image.ndim == 2: # If it's a 2D image we force it to 3D - this makes life easier later: input_image = input_image.reshape((1, input_image.shape[0], input_image.shape[1])) # Cast float64 -> float32, and complex128 -> complex64 if input_image.dtype == np.float64 or input_image.dtype == float: if not WARNED_ABOUT_CASTING_F64: logger.warn('Casting {} to `numpy.float64`'.format(MRCfilename)) WARNED_ABOUT_CASTING_F64 = True input_image = input_image.astype(np.float32) elif input_image.dtype == np.complex128: if not WARNED_ABOUT_CASTING_C128: logger.warn('Casting {} to `numpy.complex64`'.format(MRCfilename)) WARNED_ABOUT_CASTING_C128 = True input_image = input_image.astype(np.complex64) # We will need this regardless if writing to an existing file or not: if endian == 'le': endchar = '<' else: endchar = '>' # We now check if we have to create a new header (i.e. new file) or not. If # the file exists, but idx is 'None', it will be replaced by a new file # with new header anyway: if os.path.isfile(MRCfilename): if idx == None: idxnewfile = True else: idxnewfile = False else: idxnewfile = True if idxnewfile: if dtype == 'uint4' and compressor != None: raise TypeError('uint4 packing is not compatible with compression, use int8 datatype.') header = {'meta': meta} if dtype == None: if slices > 0: header['dtype'] = endchar + input_image[0].dtype.descr[0][1].strip('<>|') else: header['dtype'] = endchar + input_image.dtype.descr[0][1].strip('<>|') else: header['dtype'] = dtype # Now we need to filter dtype to make sure it's actually acceptable to MRC if not header['dtype'].strip('<>|') in REVERSE_CCPEM_ENUM: raise TypeError('ioMRC.MRCExport: Unsupported dtype cast for MRC %s' % header['dtype']) header['dimensions'] = dims header['pixelsize'] = pixelsize header['pixelunits'] = pixelunits header['shape'] = shape # This overhead calculation is annoying but many 3rd party tools that use # MRC require these statistical parameters. if bool(quickStats): if slices > 0: first_image = input_image[0] else: first_image = input_image[0,:,:] imMin = first_image.real.min(); imMax = first_image.real.max() header['maxImage'] = imMax header['minImage'] = imMin header['meanImage'] = 0.5*(imMax + imMin) else: if slices > 0: header['maxImage'] = np.max( [z_slice.real.max() for z_slice in input_image] ) header['minImage'] = np.min( [z_slice.real.min() for z_slice in input_image] ) header['meanImage'] = np.mean( [z_slice.real.mean() for z_slice in input_image] ) else: header['maxImage'] = input_image.real.max() header['minImage'] = input_image.real.min() header['meanImage'] = input_image.real.mean() header['voltage'] = voltage if not bool( header['voltage'] ): header['voltage'] = 0.0 header['C3'] = C3 if not bool( header['C3'] ): header['C3'] = 0.0 header['gain'] = gain if not bool( header['gain'] ): header['gain'] = 1.0 header['compressor'] = compressor header['clevel'] = clevel if n_threads == None and BLOSC_PRESENT: n_threads = DEFAULT_N_THREADS header['n_threads'] = n_threads if dtype == 'uint4': if slices > 0: raise NotImplementedError('Saving of lists of arrays not supported for `dtype=uint4`') # Decimate to packed 4-bit input_image = input_image.astype('uint8') input_image = input_image[:,:,::2] + np.left_shift(input_image[:,:,1::2],4) else: # We are going to append to an already existing file: # So we try to figure out its header with 'CCPEM' or 'eman2' file conventions: try: header, slices = readMRCHeader(MRCfilename, slices=None, endian=endian, fileConvention='CCPEM', pixelunits=pixelunits) except ValueError: try: header, slices = readMRCHeader(MRCfilename, slices=None, endian=endian, fileConvention='eman2', pixelunits=pixelunits) except ValueError: # If neither 'CCPEM' nor 'eman2' formats satisfy: raise ValueError('Error: unrecognized MRC type for file: %s ' % MRCfilename) # If the file already exists, its X,Y dimensions must be consistent with the current image to be written: if np.any( header['dimensions'][1:] != input_image.shape[1:]): raise ValueError('Error: x,y dimensions of image do not match that of MRC file: %s ' % MRCfilename) # TO DO: check also consistency of dtype? if 'meta' not in header.keys(): header['meta'] = meta # Now that we have a proper header, we go into the details of writing to a specific position: if idx != None: if header['compressor'] != None: raise RuntimeError('Writing at arbitrary positions not supported for compressed files. Compressor = %s' % header['compressor']) idx = int(idx) # Force 2D to 3D dimensions: if len( header['dimensions'] ) == 2: header['dimensions'] = np.array([1, header['dimensions'][0], header['dimensions'][1]]) # Convert negative index to equivalent positive index: if idx < 0: idx = header['dimensions'][0] + idx # Just check if the desired image is within the stack range: # In principle we could write to a position beyond the limits of the file (missing slots would be filled with zeros), but let's avoid that the user writes a big file with zeros by mistake. So only positions within or immediately consecutive to the stack are allowed: if idx < 0 or idx > header['dimensions'][0]: raise ValueError( 'Error: image or slice index out of range. idx = %d, z_dimension = %d' % (idx, header['dimensions'][0]) ) # The new Z dimension may be larger than that of the existing file, or even of the new file, if an index larger than the current stack is specified: newZ = idx + input_image.shape[0] if newZ > header['dimensions'][0]: header['dimensions'] = np.array([idx + input_image.shape[0], header['dimensions'][1], header['dimensions'][2]]) # This offset will be applied to f.seek(): offset = idx * np.prod(header['dimensions'][1:]) * np.dtype(header['dtype']).itemsize else: offset = 0 __MRCExport(input_image, header, MRCfilename, slices, endchar=endchar, offset=offset, idxnewfile=idxnewfile) def __MRCExport(input_image, header, MRCfilename, slices, endchar='<', offset=0, idxnewfile=True): ''' MRCExport private interface with a dictionary rather than a mess of function arguments. ''' if idxnewfile: # If forcing a new file we truncate it even if it already exists: fmode = 'wb' else: # Otherwise we'll just update its header and append images as required: fmode = 'rb+' with open(MRCfilename, fmode, buffering=BUFFERSIZE) as f: extendedBytes = writeMRCHeader(f, header, slices, endchar=endchar) f.seek(DEFAULT_HEADER_LEN + extendedBytes + offset) dtype = header['dtype'] if ('compressor' in header) \ and (header['compressor'] in REVERSE_COMPRESSOR_ENUM) \ and (REVERSE_COMPRESSOR_ENUM[header['compressor']]) > 0: # compressed MRCZ logger.debug('Compressing %s with compressor %s%d' % (MRCfilename, header['compressor'], header['clevel'])) applyCast = False if slices > 0: chunkSize = input_image[0].size typeSize = input_image[0].dtype.itemsize if dtype != 'uint4' and input_image[0].dtype != dtype: applyCast = True else: chunkSize = input_image[0,:,:].size typeSize = input_image.dtype.itemsize if dtype != 'uint4' and input_image.dtype != dtype: applyCast = True blosc.set_nthreads(header['n_threads']) # for small image dimensions we need to scale blocksize appropriately # so we use the available cores block_size = np.minimum(BLOSC_BLOCK, chunkSize//header['n_threads']) blosc.set_blocksize(block_size) header['packedBytes'] = 0 clevel = header['clevel'] cname = header['compressor'] # For 3D frames in lists, we need to further sub-divide each frame # into slices so that each channel is compressed seperately by # blosc. if slices > 1: deep_image = input_image # grab a reference input_image = [] for frame in deep_image: for I in range(slices): input_image.append(frame[I,:,:]) for J, frame in enumerate(input_image): if applyCast: frame = frame.astype(dtype) if frame.flags['C_CONTIGUOUS'] and frame.flags['ALIGNED']: # Use pointer compressedData = blosc.compress_ptr(frame.__array_interface__['data'][0], frame.size, typeSize, clevel=header['clevel'], shuffle=blosc.BITSHUFFLE, cname=header['compressor']) else: # Use tobytes, which is slower in benchmarking compressedData = blosc.compress(frame.tobytes(), typeSize, clevel=clevel, shuffle=blosc.BITSHUFFLE, cname=cname) f.write(compressedData) header['packedBytes'] += len(compressedData) # Rewind and write out the total compressed size f.seek(144) np.int64(header['packedBytes']).astype(endchar + 'i8').tofile(f) else: # vanilla MRC if slices > 0: if dtype != 'uint4' and dtype != input_image[0].dtype: for z_slice in input_image: z_slice.astype(dtype).tofile(f) else: for z_slice in input_image: z_slice.tofile(f) else: if dtype != 'uint4' and dtype != input_image.dtype: input_image = input_image.astype(dtype) input_image.tofile(f) return def writeMRCHeader(f, header, slices, endchar='<'): ''' Parameters ---------- Writes a header to the file-like object ``f``, requires a dict called ``header`` to parse the appropriate fields. Returns ------- ``None`` Note ---- Use `defaultHeader()` to retrieve an example with all potential fields. ''' dtype_f4 = endchar + 'f4' dtype_i4 = endchar + 'i4' f.seek(0) # Write dimensions if len(header['dimensions']) == 2: # force to 3-D dimensions = np.array([1, header['dimensions'][0], header['dimensions'][1]]) else: dimensions = np.array(header['dimensions']) # Flip to Fortran order dimensions = np.flipud(dimensions) dimensions.astype(dtype_i4).tofile(f) # 64-bit floats are automatically down-cast dtype = header['dtype'].lower().strip( '<>|' ) try: MRCmode = np.int32(REVERSE_CCPEM_ENUM[dtype]).astype(endchar + 'i4') except: raise ValueError('Warning: Unknown dtype for MRC encountered = ' + str(dtype)) # Add 1000 * COMPRESSOR_ENUM to the dtype for compressed data if ('compressor' in header and header['compressor'] in REVERSE_COMPRESSOR_ENUM and REVERSE_COMPRESSOR_ENUM[header['compressor']] > 0): header['compressor'] = header['compressor'].lower() MRCmode += MRC_COMP_RATIO * REVERSE_COMPRESSOR_ENUM[header['compressor']] # How many bytes in an MRCZ file, so that the file can be appended-to. try: f.seek(144) np.int32( header['packedBytes'] ).astype(endchar + 'i8').tofile(f) except: # This is written afterward so we don't try to keep the entire compressed file in RAM pass f.seek(12) MRCmode.tofile(f) # Print NXSTART, NYSTART, NZSTART np.array([0, 0, 0], dtype=dtype_i4).tofile(f) # Print MX, MY, MZ, the sampling. We only allow for slicing along the z-axis, # e.g. for multi-channel STEM. f.seek(36) np.int32(slices).astype(dtype_i4).tofile(f) # Print cellsize = pixelsize * dimensions # '\AA' will eventually be deprecated (probably in Python 3.7/8), please cease using it. if header['pixelunits'] == '\\AA' or header['pixelunits'] == '\AA': AApixelsize = np.array(header['pixelsize']) elif header['pixelunits'] == '\mum': AApixelsize = np.array(header['pixelsize'])*10000.0 elif header['pixelunits'] == 'pm': AApixelsize = np.array(header['pixelsize'])/100.0 else: # Default is nm AApixelsize = np.array(header['pixelsize'])*10.0 # The above AApixelsize insures we cannot have an array of len=1 here if isinstance( AApixelsize, np.ndarray ) and AApixelsize.size == 1: cellsize = np.array([AApixelsize,AApixelsize,AApixelsize]) * dimensions elif not isinstance( AApixelsize, (list,tuple,np.ndarray) ): cellsize = np.array([AApixelsize,AApixelsize,AApixelsize]) * dimensions elif len(AApixelsize) == 2: # Default to z-axis pixelsize of 10.0 Angstroms cellsize = np.flipud(np.array( [10.0, AApixelsize[0], AApixelsize[1]])) * dimensions else: cellsize = np.flipud(np.array( AApixelsize )) * dimensions f.seek(40) np.array(cellsize, dtype=dtype_f4).tofile(f) # Print default cell angles np.array([90.0,90.0,90.0], dtype=dtype_f4).tofile(f) # Print axis associations (we use C ordering internally in all Python code) np.array([1,2,3], dtype=dtype_i4).tofile(f) # Print statistics (if available) f.seek(76) if 'minImage' in header: np.float32(header['minImage']).astype(dtype_f4).tofile(f) else: np.float32(0.0).astype(dtype_f4).tofile(f) if 'maxImage' in header: np.float32(header['maxImage']).astype(dtype_f4).tofile(f) else: np.float32(1.0).astype(dtype_f4).tofile(f) if 'meanImage' in header: np.float32(header['meanImage']).astype(dtype_f4).tofile(f) else: np.float32(0.0).astype(dtype_f4).tofile(f) # We'll put the compressor info and number of compressed bytes in 132-204 # and new metadata # RESERVED: 132: 136: 140 : 144 for voltage, C3, and gain f.seek(132) if 'voltage' in header: np.float32(header['voltage']).astype(dtype_f4).tofile(f) if 'C3' in header: np.float32(header['C3']).astype(dtype_f4).tofile(f) if 'gain' in header: np.float32(header['gain']).astype(dtype_f4).tofile(f) # Magic MAP_ indicator that tells us this is in-fact an MRC file f.seek(208) f.write(b'MAP ') # Write a machine stamp, '17,17' for big-endian or '68,65' for little # Note that the MRC format doesn't indicate the endianness of the endian # identifier... f.seek(212) if endchar == '<': f.write(struct.pack(b'BB', 68, 65)) else: f.write(struct.pack(b'BB', 17, 17)) # Write b'MRCZ' into labels f.seek(220) f.write(struct.pack(b'i', 1)) # We have one label f.write(b'MRCZ' + __version__.encode('ascii')) # Extended header, if meta is not None if isinstance(header['meta'], dict): jsonMeta = json.dumps(header['meta'], default=_defaultMetaSerialize).encode('utf-8') jsonLen = len(jsonMeta) # Length of extended header f.seek(92) f.write(struct.pack(endchar+'i', jsonLen)) # 4-byte char ID string of extended metadata type f.seek(104) f.write(b'json') # Go to the extended header f.seek(DEFAULT_HEADER_LEN) f.write( jsonMeta ) return jsonLen # No extended header return 0 def asyncReadMRC(*args, **kwargs): ''' Calls `readMRC` in a separate thread and executes it in the background. Parameters ---------- Valid arguments are as for `readMRC()`. Returns ------- future A ``concurrent.futures.Future()`` object. Calling ``future.result()`` will halt until the read is finished and returns the image and meta-data as per a normal call to `readMRC`. Example ------- worker = asyncReadMRC( 'someones_file.mrc' ) # Do some work mrcImage, mrcMeta = worker.result() ''' return _asyncExecutor.submit(readMRC, *args, **kwargs) def asyncWriteMRC(*args, **kwargs): ''' Calls `writeMRC` in a seperate thread and executes it in the background. Parameters ---------- Valid arguments are as for `writeMRC()`. Returns ------- future A ``concurrent.futures.Future`` object. If needed, you can call ``future.result()`` to wait for the write to finish, or check with ``future.done()``. Most of the time you can ignore the return and let the system write unmonitored. An exception would be if you need to pass in the output to a subprocess. Example ------- worker = asyncWriteMRC( npImageData, 'my_mrcfile.mrc' ) # Do some work if not worker.done(): time.sleep(0.001) # File is written to disk ''' return _asyncExecutor.submit(writeMRC, *args, **kwargs)././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/mrcz/test_mrcz.py0000666000000000000000000006467514671040216013012 0ustar00# -*- coding: utf-8 -*- ''' Created on Fri Sep 30 09:44:09 2016 @author: Robert A. McLeod ''' import mrcz import numpy as np import numpy.testing as npt import os, os.path, sys import subprocess as sub import tempfile import unittest from logging import Logger from enum import Enum log = Logger(__name__) def which(program): # Tries to locate a program import os if os.name == 'nt': program_ext = os.path.splitext(program)[1] if program_ext == '': prog_exe = which(program + '.exe') if prog_exe != None: return prog_exe return which(program + '.com') def is_exe(fpath): return os.path.isfile(fpath) and os.access(fpath, os.X_OK) fpath, fname = os.path.split(program) if fpath: if is_exe(program): return program else: for path in os.environ['PATH'].split(os.pathsep): path = path.strip('"') exe_file = os.path.join(path, program) if is_exe(exe_file): return exe_file return None float_dtype = 'float32' fftw_dtype = 'complex64' tmpDir = tempfile.gettempdir() #============================================================================== # ioMRC Test # # Internal python-only test. Build a random image and save and re-load it. #============================================================================== class PythonMrczTests(unittest.TestCase): def setUp(self): pass def compReadWrite(self, testMage, casttype=None, compressor=None, clevel = 1): # This is the main functions which reads and writes from disk. mrcName = os.path.join(tmpDir, 'testMage.mrc') pixelsize = np.array([1.2, 2.6, 3.4]) mrcz.writeMRC(testMage, mrcName, dtype=casttype, pixelsize=pixelsize, pixelunits=u'\AA', voltage=300.0, C3=2.7, gain=1.05, compressor=compressor, clevel=clevel) rereadMage, rereadHeader = mrcz.readMRC(mrcName, pixelunits=u'\AA') # `tempfile.TemporaryDirectory` would be better but Python 2.7 doesn't support it try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) npt.assert_array_almost_equal(testMage, rereadMage) npt.assert_array_almost_equal(rereadHeader['pixelsize'], pixelsize) assert(rereadHeader['pixelunits'] == u'\AA') npt.assert_almost_equal(rereadHeader['voltage'], 300.0) npt.assert_almost_equal(rereadHeader['C3'], 2.7) npt.assert_almost_equal(rereadHeader['gain'], 1.05) def test_MRC_uncompressed(self): log.info('Testing uncompressed MRC, float-32') testMage0 = np.random.normal(size=[2,128,96]).astype(float_dtype) self.compReadWrite(testMage0, compressor=None) log.info('Testing uncompressed MRC, uint-4') testMage1 = np.random.randint(10, size=[2,128,96], dtype='int8') self.compReadWrite(testMage1, casttype='uint4', compressor=None) log.info('Testing uncompressed MRC, int-8') testMage2 = np.random.randint(10, size=[2,128,96], dtype='int8') self.compReadWrite(testMage2, compressor=None) log.info('Testing uncompressed MRC, int-16') testMage3 = np.random.randint(10, size=[2,128,96], dtype='int16') self.compReadWrite(testMage3, compressor=None) log.info('Testing uncompressed MRC, uint-16') testMage4 = np.random.randint(10, size=[2,128,96], dtype='uint16') self.compReadWrite(testMage4, compressor=None) log.info('Testing uncompressed MRC, complex-64') testMage5 = np.random.uniform(10, size=[2,128,96]).astype('float32') + \ 1j * np.random.uniform(10, size=[2,128,96] ).astype('float32') self.compReadWrite(testMage5, compressor=None) def test_MRCZ_zstd1(self): log.info('Testing zstd_1 MRC, float-32') testMage0 = np.random.normal(size=[2,128,96]).astype(float_dtype) self.compReadWrite(testMage0, compressor='zstd', clevel=1) log.info('Testing zstd_1 MRC, int-8') testMage2 = np.random.randint(10, size=[2,128,96], dtype='int8') self.compReadWrite(testMage2, compressor='zstd', clevel=1) log.info('Testing zstd_1 MRC, int-16') testMage3 = np.random.randint(10, size=[2,128,96], dtype='int16') self.compReadWrite(testMage3, compressor='zstd', clevel=1) log.info('Testing zstd_1 MRC, uint-16') testMage4 = np.random.randint(10, size=[2,128,96], dtype='uint16') self.compReadWrite(testMage4, compressor='zstd', clevel=1) log.info('Testing zstd_1 MRC, complex-64') testMage5 = np.random.normal(10, size=[2,128,96]).astype('float32') + \ 1j * np.random.normal(10, size=[2,128,96] ).astype('float32') self.compReadWrite(testMage5, compressor='zstd', clevel=1) def test_MRCZ_lz9(self): log.info('Testing lz4_9 MRC, float-32') testMage0 = np.random.normal(size=[2,128,96]).astype(float_dtype) self.compReadWrite(testMage0, compressor='lz4', clevel=9) log.info('Testing lz4_9 MRC, int-8') testMage2 = np.random.randint(10, size=[2,128,96], dtype='int8') self.compReadWrite(testMage2, compressor='lz4', clevel=9) log.info('Testing lz4_9 MRC, int-16') testMage3 = np.random.randint(10, size=[2,128,96], dtype='int16') self.compReadWrite(testMage3, compressor='lz4', clevel=9) log.info('Testing lz4_9 MRC, uint-16') testMage4 = np.random.randint(10, size=[2,128,96], dtype='uint16') self.compReadWrite(testMage4, compressor='lz4', clevel=9) log.info('Testing lz4_9 MRC, complex-64') testMage5 = np.random.normal(10, size=[2,128,96]).astype('float32') + \ 1j * np.random.normal(10, size=[2,128,96] ).astype('float32') self.compReadWrite(testMage5, compressor='lz4', clevel=9) def test_JSON(self): testMage = np.random.uniform(high=10, size=[3,128,64]).astype('int8') meta = {'foo': 5, 'bar': 42} mrcName = os.path.join(tmpDir, 'testMage.mrcz') pixelsize = [1.2, 5.6, 3.4] mrcz.writeMRC(testMage, mrcName, meta=meta, pixelsize=pixelsize, pixelunits=u'\AA', voltage=300.0, C3=2.7, gain=1.05, compressor='zstd', clevel=1, n_threads=1) rereadMage, rereadHeader = mrcz.readMRC(mrcName, pixelunits=u'\AA') try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(np.all(testMage.shape == rereadMage.shape)) assert(testMage.dtype == rereadMage.dtype) for key in meta: assert(meta[key] == rereadHeader[key]) npt.assert_array_almost_equal(testMage, rereadMage) npt.assert_almost_equal(rereadHeader['voltage'], 300.0) npt.assert_array_almost_equal(rereadHeader['pixelsize'], pixelsize) assert(rereadHeader['pixelunits'] == u'\AA') npt.assert_almost_equal(rereadHeader['C3'], 2.7) npt.assert_almost_equal(rereadHeader['gain'], 1.05) def test_async(self): testMage = np.random.uniform(high=10, size=[3,128,64]).astype('int8') meta = {'foo': 5, 'bar': 42} mrcName = os.path.join(tmpDir, 'testMage.mrcz') pixelsize = [1.2, 5.6, 3.4] worker = mrcz.asyncWriteMRC(testMage, mrcName, meta=meta, pixelsize=pixelsize, pixelunits=u'\AA', voltage=300.0, C3=2.7, gain=1.05, compressor='zstd', clevel=1, n_threads=1) worker.result() # Wait for write to finish worker = mrcz.asyncReadMRC(mrcName, pixelunits=u'\AA') rereadMage, rereadHeader = worker.result() try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(np.all(testMage.shape == rereadMage.shape)) assert(testMage.dtype == rereadMage.dtype) for key in meta: assert(meta[key] == rereadHeader[key]) npt.assert_array_almost_equal(testMage, rereadMage) npt.assert_almost_equal(rereadHeader['voltage'], 300.0) npt.assert_array_almost_equal(rereadHeader['pixelsize'], pixelsize) assert(rereadHeader['pixelunits'] == u'\AA') npt.assert_almost_equal(rereadHeader['C3'], 2.7) npt.assert_almost_equal(rereadHeader['gain'], 1.05) def test_list_2d(self): testMage = [np.random.uniform(high=10, size=[32,16]).astype('int8')] * 3 mrcName = os.path.join(tmpDir, 'testMage.mrcz') pixelsize = [5.6, 3.4] mrcz.writeMRC(testMage, mrcName, pixelsize=pixelsize, compressor=None) rereadMage, _ = mrcz.readMRC(mrcName, pixelunits=u'\AA') try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(isinstance(rereadMage, list)) assert(len(rereadMage) == len(testMage)) for testFrame, rereadFrame in zip(testMage, rereadMage): assert(testFrame.dtype == rereadFrame.dtype) npt.assert_array_almost_equal(testFrame, rereadFrame) def test_list_2d_compressed(self): testMage = [np.random.uniform(high=10, size=[32,16]).astype('int8')] * 3 mrcName = os.path.join(tmpDir, 'testMage.mrcz') pixelsize = [5.6, 3.4] mrcz.writeMRC(testMage, mrcName, pixelsize=pixelsize, compressor='zstd', clevel=1, n_threads=1) rereadMage, _ = mrcz.readMRC(mrcName, pixelunits=u'\AA') try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(isinstance(rereadMage, list)) assert(len(rereadMage) == len(testMage)) for testFrame, rereadFrame in zip(testMage, rereadMage): assert(testFrame.dtype == rereadFrame.dtype) npt.assert_array_almost_equal(testFrame, rereadFrame) def test_list_3d(self): testMage = [np.random.uniform(high=10, size=[3,32,32]).astype('int8')] * 3 mrcName = os.path.join(tmpDir, 'testMage.mrcz') pixelsize = [5.6, 3.4] mrcz.writeMRC(testMage, mrcName, pixelsize=pixelsize, compressor=None) rereadMage, _ = mrcz.readMRC(mrcName, pixelunits=u'\AA') try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(isinstance(rereadMage, list)) assert(len(rereadMage) == len(testMage)) for testFrame, rereadFrame in zip(testMage, rereadMage): assert(testFrame.dtype == rereadFrame.dtype) npt.assert_array_almost_equal(testFrame, rereadFrame) def test_list_3d_compressed(self): testMage = [np.random.uniform(high=10, size=[3,32,32]).astype('int8')] * 3 mrcName = os.path.join(tmpDir, 'testMage.mrcz') pixelsize = [5.6, 3.4] mrcz.writeMRC(testMage, mrcName, pixelsize=pixelsize, compressor='zstd', clevel=1, n_threads=1) rereadMage, _ = mrcz.readMRC(mrcName, pixelunits=u'\AA') try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(isinstance(rereadMage, list)) assert(len(rereadMage) == len(testMage)) for testFrame, rereadFrame in zip(testMage, rereadMage): assert(testFrame.dtype == rereadFrame.dtype) npt.assert_array_almost_equal(testFrame, rereadFrame) def test_list_change_output_shape(self): testMage = np.random.uniform(high=10, size=[6,32,32]).astype('int8') pixelsize = [5.6, 3.4] mrcName = os.path.join(tmpDir, 'testMage.mrcz') for slices in (1, 2): mrcz.writeMRC(testMage, mrcName, pixelsize=pixelsize, compressor=None) rereadMage, _ = mrcz.readMRC(mrcName, pixelunits=u'\AA', slices=slices) try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(isinstance(rereadMage, list)) assert(len(rereadMage) == testMage.shape[0] // slices) def test_list_change_output_shape_compressed(self): testMage = np.random.uniform(high=10, size=[6,32,32]).astype('int8') pixelsize = [5.6, 3.4] mrcName = os.path.join(tmpDir, 'testMage.mrcz') for slices in (1, 2): mrcz.writeMRC(testMage, mrcName, pixelsize=pixelsize, compressor='zstd', clevel=1, n_threads=1) rereadMage, _ = mrcz.readMRC(mrcName, pixelunits=u'\AA', slices=slices) try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(isinstance(rereadMage, list)) assert(len(rereadMage) == testMage.shape[0] // slices) def test_strided_array(self): log.info('Testing strided array MRC') testMage0 = np.random.randint(32, size=[2,128,96]).astype(np.int8) testMage0 = testMage0[:,::2,::2] self.compReadWrite(testMage0, compressor='zstd', clevel=1) def test_cast_array_from_f64(self): log.info('Testing float-64 casting') f64_mage = np.random.normal(size=[2,128,96]).astype(np.float64) f32_mage = f64_mage.astype(np.float32) mrcName = os.path.join(tmpDir, 'testMage.mrc') mrcz.writeMRC(f64_mage, mrcName, compressor='zstd', clevel=1) rereadMage, rereadHeader = mrcz.readMRC(mrcName) # `tempfile.TemporaryDirectory` would be better but Python 2.7 doesn't support it try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) npt.assert_array_almost_equal(f32_mage, rereadMage) def test_cast_list_from_f64(self): log.info('Testing float-64 list casting') f64_mage = [np.random.normal(size=[128,96]).astype(np.float64) for I in range(2)] f32_mage = [frame.astype(np.float32) for frame in f64_mage] mrcName = os.path.join(tmpDir, 'testMage.mrc') mrcz.writeMRC(f64_mage, mrcName, compressor='zstd', clevel=1) rereadMage, rereadHeader = mrcz.readMRC(mrcName) # `tempfile.TemporaryDirectory` would be better but Python 2.7 doesn't support it try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) npt.assert_array_almost_equal(f32_mage[0], rereadMage[0]) npt.assert_array_almost_equal(f32_mage[1], rereadMage[1]) def test_cast_array_from_c128(self): log.info('Testing complex-128 casting') c128_mage = np.random.normal(size=[2,128,96]).astype(np.float64) + \ 1j * np.random.normal(size=[2,128,96]).astype(np.float64) c64_mage = c128_mage.astype(np.complex64) mrcName = os.path.join(tmpDir, 'testMage.mrc') mrcz.writeMRC(c128_mage, mrcName, compressor='zstd', clevel=1) rereadMage, rereadHeader = mrcz.readMRC(mrcName) # `tempfile.TemporaryDirectory` would be better but Python 2.7 doesn't support it try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) npt.assert_array_almost_equal(c64_mage, rereadMage) def test_cast_list_from_c128(self): log.info('Testing complex-128 casting') c128_mage = [np.random.normal(size=[128,96]).astype(np.float64) + \ 1j * np.random.normal(size=[128,96]).astype(np.float64) for I in range(2)] c64_mage = [frame.astype(np.complex64) for frame in c128_mage] mrcName = os.path.join(tmpDir, 'testMage.mrc') mrcz.writeMRC(c128_mage, mrcName, compressor='zstd', clevel=1) rereadMage, rereadHeader = mrcz.readMRC(mrcName) # `tempfile.TemporaryDirectory` would be better but Python 2.7 doesn't support it try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) npt.assert_array_almost_equal(c64_mage[0], rereadMage[0]) npt.assert_array_almost_equal(c64_mage[1], rereadMage[1]) def test_numpy_metadata(self): log.info('Testing NumPy types in meta-data') meta = { 'zoo': np.float64(1.0), 'foo': [ np.ones(16), np.ones(16), np.ones(16)], 'bar': { # Note: no support in JSON for complex numbers. 'moo': np.uint64(42), 'boo': np.full(8, 3, dtype=np.int32) } } mage = np.zeros([32, 32], dtype=np.float32) mrcName = os.path.join(tmpDir, 'testMage.mrc') mrcz.writeMRC(mage, mrcName, meta=meta) re_mage, re_meta = mrcz.readMRC(mrcName) # `tempfile.TemporaryDirectory` would be better but Python 2.7 doesn't support it try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) npt.assert_array_equal(re_meta['foo'][0], meta['foo'][0]) npt.assert_array_equal(re_meta['bar']['boo'], meta['bar']['boo']) def test_enum_metadata(self): log.info('Testing Enum types in meta-data') class Axis(Enum): X = 0 Y = 1 meta = { 'axes': [Axis.Y, Axis.X] } mage = np.zeros([4, 4], dtype=np.float32) mrcName = os.path.join(tmpDir, 'testMage.mrc') mrcz.writeMRC(mage, mrcName, meta=meta) re_mage, re_meta = mrcz.readMRC(mrcName) # `tempfile.TemporaryDirectory` would be better but Python 2.7 doesn't support it try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert('Enum.Axis.X' in re_meta['axes']) assert('Enum.Axis.Y' in re_meta['axes']) def test_MRC_append(self): log.info('Testing appending to existing MRC stack, float-32') f32_stack = [np.random.normal(size=[128,96]).astype(np.float32) for I in range(2)] mrcName = os.path.join(tmpDir, 'testStack.mrcs') pixelsize = [5.6, 3.4] for j,I in enumerate(f32_stack): mrcz.writeMRC(I, mrcName, pixelsize=pixelsize, compressor=None, idx = j) rereadMage, _ = mrcz.readMRC(mrcName, pixelunits=u'\AA') try: os.remove(mrcName) except IOError: log.info('Warning: file {} left on disk'.format(mrcName)) assert(rereadMage.shape[0] == len(f32_stack)) for testFrame, rereadFrame in zip(f32_stack, rereadMage): assert(testFrame.dtype == rereadFrame.dtype) npt.assert_array_almost_equal(testFrame, rereadFrame) cmrczProg = which('mrcz') if cmrczProg is None: log.debug('NOTE: mrcz not found in system path, not testing python-mrcz to c-mrcz cross-compatibility') else: class PythonToCMrczTests(unittest.TestCase): #============================================================================== # python-mrcz to c-mrcz tests # # mrcz executable must be found within the system path. # # Cross-compatibility tests between c-mrcz and python-mrcz. Build a random # image, load and re-save it with c-mrcz, and then reload in Python. #============================================================================== def setUp(self): pass def crossReadWrite(self, testMage, casttype=None, compressor=None, clevel = 1): mrcInput = os.path.join(tmpDir, 'testIn.mrcz') mrcOutput = os.path.join(tmpDir, 'testOut.mrcz') compressor = None blocksize = 64 clevel = 1 pixelsize = [1.2, 2.6, 3.4] mrcz.writeMRC(testMage, mrcInput, pixelsize=pixelsize, pixelunits=u'\AA', voltage=300.0, C3=2.7, gain=1.05, compressor=compressor) sub.call(cmrczProg + ' -i %s -o %s -c %s -B %d -l %d' %(mrcInput, mrcOutput, compressor, blocksize, clevel), shell=True) rereadMage, rereadHeader = mrcz.readMRC(mrcOutput, pixelunits=u'\AA') os.remove(mrcOutput) os.remove(mrcInput) assert(np.all(testMage.shape == rereadMage.shape)) assert(testMage.dtype == rereadMage.dtype) npt.assert_array_almost_equal(testMage, rereadMage) npt.assert_array_equal(rereadHeader['voltage'], 300.0) npt.assert_array_almost_equal(rereadHeader['pixelsize'], pixelsize) npt.assert_array_equal(rereadHeader['pixelunits'], u'\AA') npt.assert_array_equal(rereadHeader['C3'], 2.7) npt.assert_array_equal(rereadHeader['gain'], 1.05) def test_crossMRC_uncompressed(self): log.info('Testing cross-compatibility c-mrcz and python-mrcz, uncompressed, int-8') testMage0 = np.random.randint(10, size=[2,128,64], dtype='int8') self.crossReadWrite(testMage0, compressor=None, clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, uncompressed, int-16') testMage1 = np.random.randint(10, size=[2,128,64], dtype='int16') self.crossReadWrite(testMage1, compressor=None, clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, uncompressed, float32') testMage1 = np.random.normal(size=[2,128,64]).astype('float32') self.crossReadWrite(testMage1, compressor=None, clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, uncompressed, uint-16') testMage4 = np.random.randint(10, size=[2,128,96], dtype='uint16') self.crossReadWrite(testMage4, compressor=None, clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, uncompressed, complex-64') testMage5 = np.random.normal(10, size=[2,128,96]).astype('float32') + \ 1j * np.random.normal(10, size=[2,128,96] ).astype('float32') self.crossReadWrite(testMage5, compressor=None, clevel=1) def test_crossMRC_zstd1(self): log.info('Testing cross-compatibility c-mrcz and python-mrcz, zstd_1, int-8') testMage0 = np.random.randint(10, size=[2,128,64], dtype='int8') self.crossReadWrite(testMage0, compressor='zstd', clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, zstd_1, int-16') testMage1 = np.random.randint(10, size=[2,128,64], dtype='int16') self.crossReadWrite(testMage1, compressor='zstd', clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, zstd_1, float32') testMage1 = np.random.normal(size=[2,128,64]).astype('float32') self.crossReadWrite(testMage1, compressor='zstd1', clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, zstd_1, uint-16') testMage4 = np.random.randint(10, size=[2,128,96], dtype='uint16') self.crossReadWrite(testMage4, compressor='zstd', clevel=1) log.info('Testing cross-compatibility c-mrcz and python-mrcz, zstd_1, complex-64') testMage5 = np.random.normal(10, size=[2,128,96]).astype('float32') + \ 1j * np.random.normal(10, size=[2,128,96] ).astype('float32') self.crossReadWrite(testMage5, compressor='zstd', clevel=1) def test_crossMRC_lz4_9(self): log.info('Testing cross-compatibility c-mrcz and python-mrcz, lz4_9, int-8') testMage0 = np.random.randint(10, size=[2,128,64], dtype='int8') self.crossReadWrite(testMage0, compressor='lz4', clevel=9) log.info('Testing cross-compatibility c-mrcz and python-mrcz, lz4_9, int-16') testMage1 = np.random.randint(10, size=[2,128,64], dtype='int16') self.crossReadWrite(testMage1, compressor='lz4', clevel=9) log.info('Testing cross-compatibility c-mrcz and python-mrcz, lz4_9, float32') testMage1 = np.random.normal(size=[2,128,64]).astype('float32') self.crossReadWrite(testMage1, compressor='lz4', clevel=9) log.info('Testing cross-compatibility c-mrcz and python-mrcz, lz4_9, uint-16') testMage4 = np.random.randint(10, size=[2,128,96], dtype='uint16') self.crossReadWrite(testMage4, compressor='lz4', clevel=9) log.info('Testing cross-compatibility c-mrcz and python-mrcz, lz4_9, complex-64') testMage5 = np.random.normal(10, size=[2,128,96]).astype('float32') + \ 1j * np.random.normal(10, size=[2,128,96] ).astype('float32') self.crossReadWrite(testMage5, compressor='lz4', clevel=9) pass def test(verbosity=2): ''' test(verbosity=2) Run ``unittest`` suite for ``mrcz`` package. ''' from mrcz import __version__ log.info('MRCZ TESTING FOR VERSION %s ' % __version__) theSuite = unittest.TestSuite() theSuite.addTest(unittest.makeSuite(PythonMrczTests)) if cmrczProg is not None: theSuite.addTest(unittest.makeSuite(PythonToCMrczTests)) test_result = unittest.TextTestRunner(verbosity=verbosity).run(theSuite) return test_result if __name__ == '__main__': # Should generally call 'python -m unittest -v mrcz.test' for continuous integration test() ././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726235029.7398372 mrcz-0.5.7/mrcz.egg-info/0000777000000000000000000000000014671040626012103 5ustar00././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726235029.0 mrcz-0.5.7/mrcz.egg-info/PKG-INFO0000666000000000000000000000303314671040625013176 0ustar00Metadata-Version: 2.1 Name: mrcz Version: 0.5.7 Summary: MRCZ meta-compressed image file-format library Home-page: http://github.com/em-MRCZ/python-mrcz Author: Robert A. McLeod, Ricardo Righetto Author-email: robbmcleod@gmail.com License: https://opensource.org/licenses/BSD-3-Clause Platform: any Classifier: Development Status :: 4 - Beta Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Information Technology Classifier: Intended Audience :: Science/Research Classifier: License :: OSI Approved :: BSD License Classifier: Programming Language :: Python Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Classifier: Programming Language :: Python :: 3.8 Classifier: Topic :: Software Development :: Libraries :: Python Modules Classifier: Topic :: System :: Archiving :: Compression Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: Unix License-File: LICENSE.txt License-File: AUTHORS.txt Requires-Dist: numpy MRCZ is a highly optimized compressed version of the popular electron microscopy MRC image format. It uses the Blosc meta-compressor library as a backend. It can use a number of high-performance loseless compression codecs such as 'lz4' and 'zstd', it can apply bit-shuffling filters, and operates compression in a blocked and multi-threaded way to take advantage of modern multi-core CPUs. ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726235029.0 mrcz-0.5.7/mrcz.egg-info/SOURCES.txt0000666000000000000000000000050114671040625013762 0ustar00AUTHORS.txt LICENSE.txt MANIFEST.in README.rst optional-requirements.txt setup.py mrcz/ReliablePy.py mrcz/__init__.py mrcz/__version__.py mrcz/ioDM.py mrcz/ioMRC.py mrcz/test_mrcz.py mrcz.egg-info/PKG-INFO mrcz.egg-info/SOURCES.txt mrcz.egg-info/dependency_links.txt mrcz.egg-info/requires.txt mrcz.egg-info/top_level.txt././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726235029.0 mrcz-0.5.7/mrcz.egg-info/dependency_links.txt0000666000000000000000000000000114671040625016150 0ustar00 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726235029.0 mrcz-0.5.7/mrcz.egg-info/requires.txt0000666000000000000000000000000614671040625014476 0ustar00numpy ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726235029.0 mrcz-0.5.7/mrcz.egg-info/top_level.txt0000666000000000000000000000000514671040625014627 0ustar00mrcz ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/optional-requirements.txt0000666000000000000000000000001214671040216014531 0ustar00blosc>=1.4././@PaxHeader0000000000000000000000000000003400000000000010212 xustar0028 mtime=1726235029.7448716 mrcz-0.5.7/setup.cfg0000666000000000000000000000005214671040626011254 0ustar00[egg_info] tag_build = tag_date = 0 ././@PaxHeader0000000000000000000000000000002600000000000010213 xustar0022 mtime=1726234766.0 mrcz-0.5.7/setup.py0000666000000000000000000000574014671040216011151 0ustar00# -*- coding: utf-8 -*- ######################################################################## # # mrcz compressed MRC-file format package # License: BSD-3-clause # Created: 02 November 2016 # Author: See AUTHORS.txt # ######################################################################## from __future__ import print_function import sys from setuptools import setup ########### Check installed versions ########## def exit_with_error(message): print('ERROR: %s' % message) sys.exit(1) # Setup requirements setup_requires = [] install_requires = ['numpy'] # Check for Python if sys.version_info[0] == 2: if sys.version_info[1] < 7: exit_with_error("You need Python 2.7 or greater to install mrcz") else: install_requires.append('futures') # For concurrent.futures we need the backport in Py2.7 elif sys.version_info[0] == 3: if sys.version_info[1] < 5: exit_with_error("You need Python 3.5 or greater to install mrcz") else: exit_with_error("You need Python 2.7/3.4 or greater to install mrcz") ########### End of checks ########## #### MRCZ version #### major_ver = 0 minor_ver = 5 nano_ver = 7 branch = '' VERSION = "%d.%d.%d%s" % (major_ver, minor_ver, nano_ver, branch) # Create the version.py file open('mrcz/__version__.py', 'w').write('__version__ = "%s"\n' % VERSION) # Global variables classifiers = """\ Development Status :: 4 - Beta Intended Audience :: Developers Intended Audience :: Information Technology Intended Audience :: Science/Research License :: OSI Approved :: BSD License Programming Language :: Python Programming Language :: Python :: 2.7 Programming Language :: Python :: 3.5 Programming Language :: Python :: 3.6 Programming Language :: Python :: 3.7 Programming Language :: Python :: 3.8 Topic :: Software Development :: Libraries :: Python Modules Topic :: System :: Archiving :: Compression Operating System :: Microsoft :: Windows Operating System :: Unix """ setup(name = "mrcz", version = VERSION, description = 'MRCZ meta-compressed image file-format library', long_description = """\ MRCZ is a highly optimized compressed version of the popular electron microscopy MRC image format. It uses the Blosc meta-compressor library as a backend. It can use a number of high-performance loseless compression codecs such as 'lz4' and 'zstd', it can apply bit-shuffling filters, and operates compression in a blocked and multi-threaded way to take advantage of modern multi-core CPUs. """, classifiers = [c for c in classifiers.split("\n") if c], author = 'Robert A. McLeod, Ricardo Righetto', author_email = 'robbmcleod@gmail.com', url = 'http://github.com/em-MRCZ/python-mrcz', license = 'https://opensource.org/licenses/BSD-3-Clause', platforms = ['any'], setup_requires=setup_requires, install_requires=install_requires, packages = ['mrcz'], )