Processed Samples
249
From `data/processed/dataset_2048x1500_webp/index.csv`.
Microscopy Image Analysis
A microscopy detection pipeline that counts pico-algae cells from paired brightfield and fluorescence image channels.
This repository implements an end-to-end deep learning workflow for detecting and counting pico-algae in microscopy imagery. The core model is a custom 6-channel Faster R-CNN that fuses paired `og` and `red` microscope images, then supports training, hyperparameter tuning, post-processing sweeps, and batch visualization of predicted bounding boxes.
The project is built around dense small-object detection, where manual counting is slow and subject to variability. It targets paired microscopy captures and predicts cell-level boxes for four foreground classes: `EUK`, `FE`, `FC`, and `colony`.
The repository includes data preparation scripts, manifest generation, image resizing to a fixed 2048x1500 format, annotation sanity checks, training and inference entrypoints, and saved run artifacts under `runs/` and `reports/`.
The modeling work goes beyond a default detector setup. The Faster R-CNN backbone is patched from 3 input channels to 6 so the network can ingest both image channels together, and the repo includes separate sweeps for model settings and post-processing thresholds.
Saved outputs show that the project is not only a training prototype but also an analysis workflow: example detections, debug overlays, EDA plots, and tuning CSVs are all present in the repository.
Manual pico-algae counting in microscopy images is slow and error-prone.
Cells are small, dense, and sometimes overlapping, which makes classical counting brittle.
The workflow needs to use paired `og` and `red` images rather than a single RGB frame.
Counting quality depends not only on training but also on score thresholds and NMS settings in crowded scenes.
Built the training and batch inference workflow around a Faster R-CNN detector in PyTorch.
Adapted a ResNet50-FPN Faster R-CNN backbone from 3-channel input to a 6-channel fusion model for paired microscopy images.
Implemented dataset indexing, pair discovery, preprocessing, annotation conversion, and visualization utilities.
Added evaluation code focused on count-based metrics and ran separate tuning sweeps for model settings and post-processing.
Preprocessed raw microscopy pairs into a consistent 2048x1500 WEBP dataset and rescaled bounding boxes into absolute-pixel labels.
Validated dataset integrity with manifest generation and sanity checks for missing pairs, missing labels, and image-size mismatches.
Trained a 6-channel Faster R-CNN (`ResNet50-FPN`) initialized from COCO weights with an expanded first convolution layer.
Evaluated the detector using count-based metrics such as count MAE, RMSE, and bias, with optional class filtering.
Tuned anchor settings, backbone depth, learning rates, detection caps, score thresholds, and NMS thresholds through saved sweep configs and CSV outputs.
The repository contains a complete reproducible workflow from dataset preparation to batch prediction visualizations.
The processed training index contains 249 paired image samples with 16,181 labeled boxes.
A saved 5-fold training sweep reached a best mean count MAE of 3.94 before post-processing optimization.
A saved 5-fold post-processing sweep reduced the best mean count MAE to 2.42 with a standard deviation of 0.50.
Example outputs, EDA plots, debug overlays, and batch prediction renders are all included as inspectable artifacts.
Processed Samples
249
From `data/processed/dataset_2048x1500_webp/index.csv`.
Labeled Boxes
16,181
Sum of `n_boxes` in the processed dataset index.
Best Count MAE
2.42
Best `mean_count_mae` in `runs/tuning/post_best_mae/tuning_post_results.csv`.
Foreground Classes
4
`EUK`, `FE`, `FC`, `colony`.
Project assets are shown here when they are site-ready. Repository artifact paths are preserved so you can promote selected files into `/public` later without changing the content model.
Artifact Reference
docs/pipeline_diagram.png
Pipeline diagram for the pico-algae workflow.
Artifact Reference
examples/detection_results/Example_detection_2.png
Additional pico-algae detection output.
Artifact Reference
examples/detection_results/Example_detection_3.png
Additional pico-algae detection output.
Artifact Reference
runs/predict_run01_6ch/Image_11554_pred.png
Predicted detections on a microscopy sample.
Saved figures and chart artifacts referenced by the project.
Artifact Reference
reports/eda/objects_per_image.png
Object count distribution per image.
Artifact Reference
reports/eda/bbox_width_all.png
Bounding-box width distribution.
Artifact Reference
reports/eda/bbox_height_all.png
Bounding-box height distribution.
Computed from `data/processed/manifest.csv` using the raw class ids defined in the dataset code.
Compares the best saved result from model-configuration tuning against the best saved post-processing sweep.