Microscopy Image Analysis

Pico-Algae Detection and Counting

A microscopy detection pipeline that counts pico-algae cells from paired brightfield and fluorescence image channels.

This repository implements an end-to-end deep learning workflow for detecting and counting pico-algae in microscopy imagery. The core model is a custom 6-channel Faster R-CNN that fuses paired `og` and `red` microscope images, then supports training, hyperparameter tuning, post-processing sweeps, and batch visualization of predicted bounding boxes.

Overview

The project is built around dense small-object detection, where manual counting is slow and subject to variability. It targets paired microscopy captures and predicts cell-level boxes for four foreground classes: `EUK`, `FE`, `FC`, and `colony`.

The repository includes data preparation scripts, manifest generation, image resizing to a fixed 2048x1500 format, annotation sanity checks, training and inference entrypoints, and saved run artifacts under `runs/` and `reports/`.

The modeling work goes beyond a default detector setup. The Faster R-CNN backbone is patched from 3 input channels to 6 so the network can ingest both image channels together, and the repo includes separate sweeps for model settings and post-processing thresholds.

Saved outputs show that the project is not only a training prototype but also an analysis workflow: example detections, debug overlays, EDA plots, and tuning CSVs are all present in the repository.

Problem

Manual pico-algae counting in microscopy images is slow and error-prone.

Cells are small, dense, and sometimes overlapping, which makes classical counting brittle.

The workflow needs to use paired `og` and `red` images rather than a single RGB frame.

Counting quality depends not only on training but also on score thresholds and NMS settings in crowded scenes.

What I Built

Built the training and batch inference workflow around a Faster R-CNN detector in PyTorch.

Adapted a ResNet50-FPN Faster R-CNN backbone from 3-channel input to a 6-channel fusion model for paired microscopy images.

Implemented dataset indexing, pair discovery, preprocessing, annotation conversion, and visualization utilities.

Added evaluation code focused on count-based metrics and ran separate tuning sweeps for model settings and post-processing.

Approach

Preprocessed raw microscopy pairs into a consistent 2048x1500 WEBP dataset and rescaled bounding boxes into absolute-pixel labels.

Validated dataset integrity with manifest generation and sanity checks for missing pairs, missing labels, and image-size mismatches.

Trained a 6-channel Faster R-CNN (`ResNet50-FPN`) initialized from COCO weights with an expanded first convolution layer.

Evaluated the detector using count-based metrics such as count MAE, RMSE, and bias, with optional class filtering.

Tuned anchor settings, backbone depth, learning rates, detection caps, score thresholds, and NMS thresholds through saved sweep configs and CSV outputs.

Results

The repository contains a complete reproducible workflow from dataset preparation to batch prediction visualizations.

The processed training index contains 249 paired image samples with 16,181 labeled boxes.

A saved 5-fold training sweep reached a best mean count MAE of 3.94 before post-processing optimization.

A saved 5-fold post-processing sweep reduced the best mean count MAE to 2.42 with a standard deviation of 0.50.

Example outputs, EDA plots, debug overlays, and batch prediction renders are all included as inspectable artifacts.

Processed Samples

249

From `data/processed/dataset_2048x1500_webp/index.csv`.

Labeled Boxes

16,181

Sum of `n_boxes` in the processed dataset index.

Best Count MAE

2.42

Best `mean_count_mae` in `runs/tuning/post_best_mae/tuning_post_results.csv`.

Foreground Classes

4

`EUK`, `FE`, `FC`, `colony`.

Visuals

Project assets are shown here when they are site-ready. Repository artifact paths are preserved so you can promote selected files into `/public` later without changing the content model.

Artifact Reference

docs/pipeline_diagram.png

Pipeline diagram for the pico-algae workflow.

Artifact Reference

examples/detection_results/Example_detection_2.png

Additional pico-algae detection output.

Artifact Reference

examples/detection_results/Example_detection_3.png

Additional pico-algae detection output.

Artifact Reference

runs/predict_run01_6ch/Image_11554_pred.png

Predicted detections on a microscopy sample.

Charts & Figures

Saved figures and chart artifacts referenced by the project.

Artifact Reference

reports/eda/objects_per_image.png

Object count distribution per image.

Artifact Reference

reports/eda/bbox_width_all.png

Bounding-box width distribution.

Artifact Reference

reports/eda/bbox_height_all.png

Bounding-box height distribution.

Chart Data

Annotation Distribution By Class

Computed from `data/processed/manifest.csv` using the raw class ids defined in the dataset code.

EUK979
FE9634
FC5458
Colony110

Best Mean Count MAE By Tuning Stage

Compares the best saved result from model-configuration tuning against the best saved post-processing sweep.

Train Tuning3.9364081632653054
Post Tuning2.423918367346939