Filetypes and metadata

Last updated on 2025-10-29 | Edit this page

Estimated time: 55 minutes

Overview

Questions

  • Which file formats should be used for microscopy images?

Objectives

  • Explain the pros and cons of some popular image file formats
  • Explain the difference between lossless and lossy compression
  • Inspect image metadata with the BioIO package
  • Inspect and set an image’s scale in Napari

Image file formats


Images can be saved in a wide variety of file formats. You may be familiar with many of these like JPEG (files ending with .jpg or .jpeg extension), PNG (.png extension) or TIFF (.tiff or .tif extension). Microscopy images have an especially wide range of options, with hundreds of different formats in use, often specific to particular microscope manufacturers. With this being said, how do we choose which format(s) to use in our research? It’s worth bearing in mind that the best format to use will vary depending on your research question and experimental workflow - so you may need to use different formats on different research projects.

Metadata


First, let’s look closer at what gets stored inside an image file.

There are two main things that get stored inside image files: pixel values and metadata. We’ve looked at pixel values in previous episodes (What is an image?, Image display and Multi-dimensional images episodes) - this is the raw image data as an array of numbers with specific dimensions and data type. The metadata, on the other hand, is a wide variety of additional information about the image and how it was acquired.

For example, let’s take a look at the metadata in the ‘Plate1-Blue-A-12-Scene-3-P3-F2-03.czi’ file we downloaded as part of the setup instructions. To browse the metadata we will use the BioIO package, along with its corresponding napari plugin: napari-bioio-reader. BioIO allows a wide variety of file formats to be opened in python and Napari that aren’t supported by default. BioIO was already installed in the setup instructions, so you should be able to start using it straight away.

Let’s open the ‘Plate1-Blue-A-12-Scene-3-P3-F2-03.czi’ file by removing any existing image layers, then dragging and dropping it onto the canvas. If a pop-up menu appears, select ‘BioIO Reader’. Note this can take a while to open, so give it some time! Alternatively, you can select in Napari’s top menu-bar:
File > Open File(s)...

A screenshot of yeast sample data shown in  Napari

This image is part of a published dataset on Image Data Resource (accession number idr0011), and comes from the OME sample data. It is a 3D fluorescence microscopy image of yeast with three channels (we’ll explore what these channels represent in the Exploring metadata exercise).

In this case, Napari hasn’t automatically recognised the three channels - so it is loaded as one combined image layer. Recall that we looked at how Napari handles channels in the multi-dimensional images episode.

To split the image into its individual channels, and to browse its metadata, let’s explore with BioIO in napari’s console.

Open the console and run:

PYTHON

from bioio import BioImage

# get the file path from the image layer
image_path = viewer.layers[0].source.path
image = BioImage(image_path)

Now we can explore some of the image’s properties:

PYTHON

print(image.dims) # print size of each axis (in pixels)

OUTPUT

<Dimensions [T: 1, C: 3, Z: 21, Y: 512, X: 672]>

BioIO always loads images as (t, c, z, y, x) - time, then channels, then the 3 spatial image dimensions. You’ll notice this image has a ‘time’ axis with length 1 i.e. this dataset only represents a single timepoint.

We can browse a small summary of common metadata with:

PYTHON

print(image.standard_metadata)

This includes the image dimensions, the date/time the image was acquired, as well as information about the pixel size (pixel_size_x, pixel_size_y, pixel_size_z). The pixel size is essential for making accurate quantitative measurements from our images, and will be discussed in the next section of this episode.

Callout

Exploring metadata in the console

You can use python’s rich library to add colour-coding to make the output easier to read:

PYTHON

import rich
rich.print(image.standard_metadata)
Screenshot of metadata printed to Napari's  console

If you’d prefer not to have colour-coding, but keep the clearer formatting of one item per line - you can run this instead:

PYTHON

import pprint
pprint.pp(image.standard_metadata)

We can browse the full metadata with:

PYTHON

metadata = image.ome_metadata
print(metadata)

OUTPUT

OME(
   experimenters=[{'id': 'Experimenter:0', 'user_name': 'Zeiss'}],
   instruments=[<1 field_type>],
   images=[<1 field_type>],
)

In this case, we can see it is split into three categories: experimenters, images and instruments. You can look inside each of these using .category e.g.:

PYTHON

print(metadata.instruments)

The metadata is nested, so we can keep going deeper e.g.

PYTHON

print(metadata.instruments[0].objectives)

This metadata is a vital record of exactly how the image was acquired and what it represents. As we’ve mentioned in previous episodes - it’s important to maintain this metadata as a record for the future. It’s also essential to allow us to make quantitative measurements from our images, by understanding factors like the pixel size. Converting between file formats can result in loss of metadata, so it’s always worthwhile keeping a copy of your image safely in its original raw file format. You should take extra care to ensure that additional processing steps don’t overwrite this original image.

Challenge

Exploring metadata

Explore the metadata in the console to answer the following questions:

  • What type of microscope was used to take this image?
  • What detector was used to take this image?
  • What does each channel show? For example, which fluorophore is used? What is its excitation and emission wavelength? (hint: look inside the image’s pixel metadata: metadata.images[0].pixels)

Microscope model

Under metadata.instruments[0].microscope, we can see that an ‘Axio Imager Z2’ microscope was used.

Detector

Under metadata.instruments[0].detectors, we can see that this used an HDCamC10600-30B (ORCA-R2) detector.

Channels

Under metadata.images[0].pixels.channels, we can see one entry per channel - Channel:0:0, Channel:0:1 and Channel:0:2.

In the first (metadata.images[0].pixels.channels[0]), we can see that its name is TagYFP, a fluorophore with emission_wavelength of 524nm and excitation_wavelength of 508nm.

In the first (metadata.images[0].pixels.channels[1]), we can see that its name is mRFP1.2, a fluorophore with emission_wavelength of 612nm and excitation_wavelength of 590nm.

In the first (metadata.images[0].pixels.channels[2]), we see that its name is TL DIC and no emission_wavelength or excitation_wavelength is listed. Its illumination_type is listed as ‘transmitted’, so this is simply an image of the yeast cells with no fluorophores used.

Callout

BioIO image / metadata support

BioIO supports various file formats via its different readers. During the setup instructions, we installed the bioio-czi reader to allow us to open the czi image for this lesson.

If you want to open more types of file, you will have to install the relevant reader following the intructions in the BioIO docs. Note the bioio-bioformats reader should allow a wide range of file formats to be read.

If you have difficulty opening a specific file format with BioIO, it’s worth trying to open it in Fiji also. Fiji has very well established integration with Bio-Formats, and so tends to support a wider range of formats. Note that you can always save your image (or part of your image) as another format like .tiff via Fiji to open in Napari later (making sure you still retain a copy of the original file and its metadata!)

Pixel size


One of the most important pieces of metadata is the pixel size. In our .czi image, this is stored as:

PYTHON

print(image.physical_pixel_sizes)

OUTPUT

PhysicalPixelSizes(Z=0.35, Y=0.20476190476190476, X=0.20476190476190476)

The pixel size states how large a pixel is in physical units i.e. ‘real world’ units of measurement like micrometre, or millimetre. In this case the unit is ‘μm’ (micrometre). This means that each pixel has a size of 0.20μm (x axis), 0.20μm (y axis) and 0.35μm (z axis). As this image is 3D, you will sometimes hear the pixels referred to as ‘voxels’, which is just a term for a 3D pixel.

The pixel size is important to ensure that any measurements made on the image are correct. For example, how long is a particular cell? Or how wide is each nucleus? These answers can only be correct if the pixel size is properly set. It’s also useful when we want to overlay different images on top of each other (potentially acquired with different kinds of microscope) - setting the pixel size appropriately will ensure their overall size matches correctly in the Napari viewer.

Let’s open the image in napari again, split by individual channels and with the pixel size set correctly. First, remove the Plate1-Blue-A-12-Scene-3-P3-F2-03.czi layer, then run:

PYTHON

# Get the numpy array and remove the un-needed time axis
image_np = image.data[0,]

# Display it, giving the correct 'channel_axis' to split on + pixel size
viewer.add_image(image_np, rgb=False, channel_axis=0, scale=[0.35, 0.2047619, 0.2047619])

Each image layer in Napari has a .scale which is equivalent to the pixel size. We can check this is set correctly with:

PYTHON

# Get the first image layer
image_layer = viewer.layers[0]

# Print its scale
print(image_layer.scale)

OUTPUT

# [z y x]
[0.35 0.2047619 0.2047619]
Callout

Automatically setting scale

Depending on the BioIO reader you use, you may find that .scale is set automatically from the image metadata when you open it in napari (via drag and drop or File > Open File(s)...).

It’s always worth checking the .scale property is set correctly before making measurements from your images. You can set the .scale of an already open image layer with:

PYTHON

image_layer.scale = (0.35, 0.2047619, 0.2047619)
Challenge

Pixel size / scale

Copy and paste the following into Napari’s console to get image_layer_1, image_layer_2 and image_layer_3 of the yeast image:

PYTHON

image_layer_1 = viewer.layers[0]
image_layer_2 = viewer.layers[1]
image_layer_3 = viewer.layers[2]
  1. Check the .scale of each layer - are they the same?
  2. Set the scale of layer 3 to (0.35, 0.4, 0.4) - what changes in the viewer?
  3. Set the scale of layer 3 to (0.35, 0.4, 0.2047619) - what changes in the viewer?
  4. Set the scale of all layers so they are half as wide and half as tall as their original size in the viewer

1

All layers have the same scale

PYTHON

print(image_layer_1.scale)
print(image_layer_2.scale)
print(image_layer_3.scale)

OUTPUT

[0.35 0.2047619 0.2047619]
[0.35 0.2047619 0.2047619]
[0.35 0.2047619 0.2047619]

2

Yeast image shown in Napari with layer 3  twice as big in y and x

PYTHON

image_layer_3.scale = (0.35, 0.4, 0.4)

You should see that layer 3 becomes about twice as wide and twice as tall as the other layers. This is because we set the pixel size in y and x (which used to be 0.2047619μm) to about twice its original value (now 0.4μm).

3

Yeast image shown in Napari with layer 3  twice as big in y

PYTHON

image_layer_3.scale = (0.35, 0.4, 0.2047619)

You should see that layer 3 appears squashed - with the same width as other layers, but about twice the height. This is because we set the pixel size in y (which used to be 0.2047619μm) to about twice its original value (now 0.4μm). Bear in mind that setting the pixel sizes inappropriately can lead to stretched or squashed images like this!

4

Yeast image shown in Napari with all layers  half size in y/x

We set the pixel size in y/x to half its original value of 0.2047619μm:

PYTHON

image_layer_1.scale = (0.35, 0.10238095, 0.10238095)
image_layer_2.scale = (0.35, 0.10238095, 0.10238095)
image_layer_3.scale = (0.35, 0.10238095, 0.10238095)

Choosing a file format


Now that we’ve seen an example of browsing metadata in Napari, let’s look more closely into how we can decide on a file format. There are many factors to consider, including:

Dimension support

Some file formats will only support certain dimensions e.g. 2D, 3D, a specific number of channels… For example, .png and .jpg only support 2D images (either grayscale or RGB), while .tiff can support images with many more dimensions (including any number of channels, time, 2D and 3D etc.).

Metadata support

As mentioned above, image metadata is a very important record of how an image was acquired and what it represents. Different file formats have different standards for how to store metadata, and what kind of metadata they accept. This means that converting between file formats often results in loss of some metadata.

Compatibility with software

Some image analysis software will only support certain image file formats. If a format isn’t supported, you will likely need to make a copy of your data in a new format.

Proprietary vs open formats

Many light microscopes will save data automatically into their own proprietary file formats (owned by the microscope company). For example, Zeiss microscopes often save files to a format with a .czi extension, while Leica microscopes often use a format with a .lif extension. These formats will retain all the metadata used during acquisition, but are often difficult to open in software that wasn’t created by the same company.

Bio-Formats is an open-source project that helps to solve this problem - allowing over 100 file formats to be opened in many pieces of open source software. BioIO (that we used earlier in this episode) integrates with Bio-Formats to allow many different file formats to be opened in Napari. Bio-Formats is really essential to allow us to work with these multitude of formats! Even so, it won’t support absolutely everything, so you will likely need to convert your data to another file format sometimes. If so, it’s good practice to use an open file format whose specification is freely available, and can be opened in many different pieces of software e.g. OME-TIFF.

Compression

Different file formats use different types of ‘compression’. Compression is a way to reduce image file sizes, by changing the way that the image pixel values are stored. There are many different compression algorithms that compress files in different ways.

Many compression algorithms rely on finding areas of an image with similar pixel values that can be stored in a more efficient way. For example, imagine a row of 30 pixels from an 8-bit grayscale image:

Diagram of a line of 30 pixels - 10 with pixel  value 50, then 10 with pixel value 100, then 10 with pixel value 150

Normally, this would be stored as 30 individual pixel values, but we can reduce this greatly by recognising that many of the pixel values are the same. We could store the exact same data with only 6 values: 10 50 10 100 10 150, showing that there are 10 values with pixel value 50, then 10 with value 100, then 10 with value 150. This is the general idea behind ‘run-length encoding’ and many compression algorithms use similar principles to reduce file sizes.

There are two main types of compression:

  • Lossless compression algorithms are reversible - when the file is opened again, it can be reversed perfectly to give the exact same pixel values.

  • Lossy compression algorithms reduce the size by irreversibly altering the data. When the file is opened again, the pixel values will be different to their original values. Some image quality may be lost via this process, but it can achieve much smaller file sizes.

For microscopy data, you should therefore use a file format with no compression, or lossless compression. Lossy compression should be avoided as it degrades the pixel values and may alter the results of any analysis that you perform!

Challenge

Compression

Let’s remove all layers from the Napari viewer, and open the ‘00001_01.ome.tiff’ dataset. You should have already downloaded this to your working directory as part of the setup instructions.

Run the code below in Napari’s console. This will save a specific timepoint of this image (time = 30) as four different file formats (.tiff, .png + low and high quality .jpg). Note: these files will be written to the same folder as the ‘00001_01.ome.tiff’ image!

PYTHON

import imageio.v3 as iio
from pathlib import Path

# Get the 00001_01.ome layer, and get timepoint = 30
layer = viewer.layers["00001_01.ome"]
image = layer.data[30, :, :]

# Save as different file formats in same folder as 00001_01.ome
folder_path = Path(layer.source.path).parent
iio.imwrite( folder_path / "test-tiff.tiff", image)
iio.imwrite( folder_path / "test-png.png", image)
iio.imwrite( folder_path / "test-jpg-high-quality.jpg", image, quality=75)
iio.imwrite( folder_path / "test-jpg-low-quality.jpg", image, quality=30)
  • Go to the folder where ‘00001_01.ome.tiff’ was saved and look at the file sizes of the newly written images (test-tiff, test-png, test-jpg-high-quality and test-jpg-low-quality). Which is biggest? Which is smallest?

  • Open all four images in Napari. Zoom in very close to a bright nucleus, and try showing / hiding different layers with the A screenshot of Napari's eye button icon. How do they differ? How does each compare to timepoint 30 of the original ‘00001_01.ome’ image?

  • Which file formats use lossy compression?

File sizes

‘test_tiff’ is largest, followed by ‘test-png’, then ‘test-jpg-high-quality’, then ‘test-jpg-low-quality’.

Differences to original

By showing/hiding different layers, you should see that ‘test-tiff’ and ‘test-png’ look identical to the original image (when its slider is set to timepoint 30). In contrast, both jpg files show blocky artefacts around the nuclei - the pixel values have clearly been altered. This effect is worse in the low quality jpeg than the high quality one.

Lossy compression

Both jpg files use lossy compression, as they alter the original pixel values. The low quality jpeg compresses the file more than the high quality jpeg, resulting in smaller file sizes, but also worse alteration of the pixel values. The rest of the file formats (.tiff / .png) use no compression, or lossless compression - so their pixel values are identical to the original values.

Handling of large image data

If your images are very large, you may need to use a pyramidal file format that is specialised for handling them. Pyramidal file formats store images at multiple resolutions (and usually in small chunks) so that they can be browsed smoothly without having to load all of the full-resolution data. This is similar to how google maps allows browsing of its vast quantities of map data. Specialised software like QuPath, Fiji’s BigDataViewer and OMERO’s viewer can provide smooth browsing of these kinds of images.

See below for an example image pyramid (using napari’s Cells (3D + 2Ch) sample image) with three different resolution levels stored. Each level is about twice as small the last in x and y:

Diagram of an image pyramid with three  resolution levels

Common file formats


As we’ve seen so far, there are many different factors to consider when choosing file formats. As different file formats have various pros and cons, it is very likely that you will use different formats for different purposes. For example, having one file format for your raw acquisition data, another for some intermediate analysis steps, and another for making diagrams and figures for display. Pete Bankhead’s bioimage book has a great chapter on Files & file formats that explores this in detail.

To finish this episode, let’s look at some common file formats you are likely to encounter (this table is from Pete Bankhead’s bioimage book which is released under a CC-BY 4.0 license):

Format Extensions Main use Compression Comment
TIFF .tif, .tiff Analysis, display (print) None, lossless, lossy Very general image format
OME-TIFF .ome.tif, .ome.tiff Analysis, display (print) None, lossless, lossy TIFF, with standardized metadata for microscopy
Zarr .zarr Analysis None, lossless, lossy Emerging format, great for big datasets – but limited support currently
PNG .png Display (web, print) Lossless Small(ish) file sizes without compression artefacts
JPEG .jpg, .jpeg Display (web) Lossy (usually) Small file sizes, but visible artefacts

Note that there are many, many proprietary microscopy file formats in addition to these! You can get a sense of how many by browsing Bio-Formats list of supported formats.

You’ll also notice that many file formats support different types of compression e.g. none, lossless or lossy (as well as different compression settings like the ‘quality’ we saw on jpeg images earlier). You’ll have to make sure you are using the right kind of compression when you save images into these formats.

To summarise some general recommendations:

  • During acquisition, it’s usually a good idea to use whatever the standard proprietary format is for that microscope. This will ensure you retain as much metadata as possible, and have maximum compatibility with that company’s acquisition and analysis software.

  • During analysis, sometimes you can directly use the format you acquired your raw data in. If it’s not supported, or you need to save new images of e.g.  sub-regions of an image, then it’s a good idea to switch to one of the formats in the table above specialised for analysis (TIFF and OME-TIFF are popular choices).

  • Finally, for display, you will need to use common file formats that can be opened in any imaging software (not just scientific) like png, jpeg or tiff. Note that jpeg usually uses lossy compression, so it’s only advisable if you need very small file sizes (for example, for displaying many images on a website).

Key Points
  • Image files contain pixel values and metadata.
  • Metadata is an important record of how an image was acquired and what it represents.
  • BioIO allows many more image file formats to be opened in Napari, along with providing access to some metadata.
  • Pixel size states how large a pixel is in physical units (e.g. micrometre).
  • Compression can be lossless or lossy - lossless is best for microscopy images.
  • There are many, many different microscopy file formats. The best format to use depends on your use-case e.g. acquisition, analysis or display.