A Theoretical Justification for Single Molecule Peptide SequencingPLOS Computational Biology


Jagannath Swaminathan, Alexander A. Boulgakov, Edward M. Marcotte
Molecular Biology / Computational Theory and Mathematics / Ecology, Evolution, Behavior and Systematics / Modelling and Simulation / Cellular and Molecular Neuroscience / Ecology / Genetics


DNA-Sequencing at the single molecule level

Rudolf Rigler, Frank Seela

Theoretical Interpretation of Switching in Experiments with Single Molecules

Jorge M. Seminario, Pedro A. Derosa, Jimena L. Bastos

An integrative approach for the optical sequencing of single DNA molecules

Arvind Ramanathan, Edward J Huff, Casey C Lamers, Konstantinos D Potamousis, Daniel K Forrest, David C Schwartz

Azimuthal anisotropy and correlations in p+p, d+Au and Au+Au collisions at 200 GeV

A H Tang, (for the STAR Collaboration)



A Theoretical Justification for Single Molecule

Peptide Sequencing

Jagannath Swaminathan1,2‡, Alexander A. Boulgakov1,2‡, Edward M. Marcotte1,2,3* 1 Center for Systems and Synthetic Biology, University of Texas at Austin, Austin, Texas, United States of

America, 2 Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, United

States of America, 3 Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas,

United States of America ‡ These authors contributed equally to this work. * marcotte@icmb.utexas.edu


The proteomes of cells, tissues, and organisms reflect active cellular processes and change continuously in response to intracellular and extracellular cues. Deep, quantitative profiling of the proteome, especially if combined with mRNA and metabolite measurements, should provide an unprecedented view of cell state, better revealing functions and interactions of cell components. Molecular diagnostics and biomarker discovery should benefit particularly from the accurate quantification of proteomes, since complex diseases like cancer change protein abundances and modifications. Currently, shotgun mass spectrometry is the primary technology for high-throughput protein identification and quantification; while powerful, it lacks high sensitivity and coverage. We draw parallels with next-generation DNA sequencing and propose a strategy, termed fluorosequencing, for sequencing peptides in a complex protein sample at the level of single molecules. In the proposed approach, millions of individual fluorescently labeled peptides are visualized in parallel, monitoring changing patterns of fluorescence intensity as N-terminal amino acids are sequentially removed, and using the resulting fluorescence signatures (fluorosequences) to uniquely identify individual peptides. We introduce a theoretical foundation for fluorosequencing and, by using Monte Carlo computer simulations, we explore its feasibility, anticipate the most likely experimental errors, quantify their potential impact, and discuss the broad potential utility offered by a highthroughput peptide sequencing technology.

Author Summary

The development of next-generation DNA and RNA sequencing methods has transformed biology, with current platforms generating>1 billion sequencing reads per run.

Unfortunately, no method of similar scale and throughput exists to identify and quantify specific proteins in complex mixtures, representing a critical bottleneck in many biochemical and molecular diagnostic assays. What is urgently needed is a massively parallel method, akin to next-gen DNA sequencing, for identifying and quantifying peptides or proteins in a sample. In principle, single-molecule peptide sequencing could achieve this goal,

PLOS Computational Biology | DOI:10.1371/journal.pcbi.1004080 February 25, 2015 1 / 17


Citation: Swaminathan J, Boulgakov AA, Marcotte

EM (2015) A Theoretical Justification for Single

Molecule Peptide Sequencing. PLoS Comput Biol 11 (2): e1004080. doi:10.1371/journal.pcbi.1004080

Editor: David B. Searls, Philadelphia, United States of America

Received: July 11, 2014

Accepted: December 10, 2014

Published: February 25, 2015

Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability Statement: All relevant data are within the paper and its Supporting Information files.

Funding: This work was supported by grants from

NIH, NSF, CPRIT, DARPA, and the Welch foundation (F1515) to EMM. JS is supported by an International

Student Research Fellowship from Howard Hughes

Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing Interests: EMM and JS are inventors on a pending patent application – PCT/US2012/043769.

AAB has declared that no competing interests exist. allowing billions of distinct peptides to be sequenced in parallel and thereby identifying proteins composing the sample and digitally quantifying them by direct counting of peptides. Here, we discuss theoretical considerations of single molecule peptide sequencing, suggest one possible experimental strategy, and, using computer simulations, characterize the potential utility and unusual properties of this future proteomics technology.


The basis of “next-gen” DNA sequencing is the sequencing of large numbers of short reads (typically 35–500 nucleotides) in parallel. Currently available next-generation sequencing platforms from Pacific Biosciences [1] and Helicos [2] monitor the sequencing of single DNA molecules using fluorescence microscopy and can allow for approx. one billion sequencing reads per run (e.g., for Helicos). Unfortunately, no method of similar scale and throughput exists to identify and quantify specific proteins in complex mixtures, representing a critical bottleneck in many biochemical, molecular diagnostic, and biomarker discovery assays. For example, consider the case of cancer biomarker discovery: nucleic acid mutations underlie nearly all cancers.

However, these variants are embodied by proteins and are often expressed in bodily compartments (saliva, blood, urine) accessible without invasive biopsies. The use of protein biomarkers to diagnose, characterize, and monitor most, if not all, cancers [3] would be significantly advanced by an approach to sensitively identify and quantify proteins in these compartments. Indeed, the value of diagnostic biomarkers is clearly seen in the utility of detecting thyroglobulin for monitoring thyroid cancer, and in administering Herceptin specifically for breast cancers overexpressing HER2/neu [4]. Techniques applied to this problem, including mass spectrometry (MS) and antibody arrays, often lack sufficient sensitivity and intrinsic digital quantification to be effective [5]. What is urgently needed is a massively parallel method, akin to next-gen