Automated Processing of Imaging Data through Multi-tiered Classification of
Biological Structures Illustrated Using
Mei Zhan1,2, MatthewM. Crane1¤, Eugeni V. Entchev3, Antonio Caballero3, Diana
Andrea Fernandes de Abreu3, QueeLim Ch’ng3, Hang Lu1,2,4* 1 Interdisciplinary Program in Bioengineering, Georgia Institute of Technology, Atlanta, Georgia, United
States of America, 2 Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of
Technology, Atlanta, Georgia, United States of America, 3 MRCCentre for Developmental Neurobiology,
Kings College London, London, United Kingdom, 4 School of Chemical & Biomolecular Engineering,
Georgia Institute of Technology, Atlanta, Georgia, United States of America ¤ Current Address: School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom * firstname.lastname@example.org
Quantitative imaging has become a vital technique in biological discovery and clinical diagnostics; a plethora of tools have recently been developed to enable new and accelerated forms of biological investigation. Increasingly, the capacity for high-throughput experimentation provided by new imaging modalities, contrast techniques, microscopy tools, microfluidics and computer controlled systems shifts the experimental bottleneck from the level of physical manipulation and raw data collection to automated recognition and data processing. Yet, despite their broad importance, image analysis solutions to address these needs have been narrowly tailored. Here, we present a generalizable formulation for autonomous identification of specific biological structures that is applicable for many problems. The process flow architecture we present here utilizes standard image processing techniques and the multi-tiered application of classification models such as support vector machines (SVM). These low-level functions are readily available in a large array of image processing software packages and programming languages. Our framework is thus both easy to implement at the modular level and provides specific high-level architecture to guide the solution of more complicated image-processing problems. We demonstrate the utility of the classification routine by developing two specific classifiers as a toolset for automation and cell identification in the model organism Caenorhabditis elegans. To serve a common need for automated high-resolution imaging and behavior applications in the C. elegans research community, we contribute a ready-to-use classifier for the identification of the head of the animal under bright field imaging. Furthermore, we extend our framework to address the pervasive problem of cell-specific identification under fluorescent imaging, which is critical for biological investigation in multicellular organisms or tissues. Using these examples as a
PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004194 April 24, 2015 1 / 21
Citation: Zhan M, Crane MM, Entchev EV, Caballero
A, Fernandes de Abreu DA, Ch’ng Q, et al. (2015)
Automated Processing of Imaging Data through Multitiered Classification of Biological Structures Illustrated
Using Caenorhabditis elegans. PLoS Comput Biol 11 (4): e1004194. doi:10.1371/journal.pcbi.1004194
Editor: Christopher V. Rao, University of Illinois at
Urbana-Champaign, UNITED STATES
Received: September 25, 2014
Accepted: February 13, 2015
Published: April 24, 2015
Copyright: © 2015 Zhan et al. This is an open access article distributed under the terms of the
Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability Statement: The data are available at http://hdl.handle.net/1853/53154. The code package is available at: https://github.com/meizhan/
Funding: This work is partially supported by funding from the US National Institutes of Health (R01AG035317, R01GM088333, and R21EB012803 to HL), the US National Science Foundation (CBET 0954578 to HL, graduate research fellowships 0946809 to MZ and MMC), Wellcome Trust (Project
Grant 087146 to QC), and European Research
Council (NeuroAge 242666 to QC). The funders had guide, we envision the broad utility of the framework for diverse problems across different length scales and imaging methods.
New technologies have increased the size and content-richness of biological imaging datasets. As a result, automated image processing is increasingly necessary to extract relevant data in an objective, consistent and time-efficient manner. While image processing tools have been developed for general problems that affect large communities of biologists, the diversity of biological research questions and experimental techniques have left many problems unaddressed. Moreover, there is no clear way in which non-computer scientists can immediately apply a large body of computer vision and image processing techniques to address their specific problems or adapt existing tools to their needs. Here, we address this need by demonstrating an adaptable framework for image processing that is capable of accommodating a large range of biological problems with both high accuracy and computational efficiency. Moreover, we demonstrate the utilization of this framework for disparate problems by solving two specific image processing challenges in the model organism Caenorhabditis elegans. In addition to contributions to the C. elegans community, the solutions developed here provide both useful concepts and adaptable image-processing modules for other biological problems.
This is a PLOS Computational BiologyMethods paper
Diverse imaging techniques exist to provide functional and structural information about biological specimens in clinical and experimental settings. On the clinical side, new and augmented imaging modalities and contrast techniques have increased the types of information that can be garnered from biological samples . Similarly, many tools have recently been developed to enable new and accelerated forms of biological experimentation in both single cells and multicellular model organisms [2–10]. Increasingly, the capacity for high-throughput experimentation provided by new optical tools, microfluidics and computer controlled systems has eased the experimental bottleneck at the level of physical manipulation and raw data collection. Still, the power of many of these toolsets lies in facilitating the automation of experimental processes. The ability to perform real-time information extraction from images during the course of an experiment is therefore a crucial computational step to harnessing the potential of many of these physical systems (Fig 1A). Even when off-line data analysis is sufficient, the capability of these systems to generate large, high-content datasets places a large burden on the speed of the downstream analysis.