Articles

Reducing Attrition In The Drug Discovery Pipeline

Tue, 01/14/2003 - 8:46am
by Kevin McKernan and Paul McEwan

Introduction

While the human genome project has revealed a tremendous amount of valuable information, it has also created a significant challenge for pharmaceutical companies: that is to find ways to use this information to create better drugs. Over the last 20 years, Pharmaceutical R&D; spending has increased 15 fold while New Product Approvals have increased only 0.70 fold. It is clear that pharma needs to implement more effective tools with which to analyze the flood of information generated in the discovery phase of the pharmaceutical pipeline. The most profound impact on drug pipeline productivity will be achieved through improving attrition rates by identifying low quality targets before the expense of clinical trials. Pharmacogenomic tools present a promise of efficiency to the pharmaceutical industry, as the use of whole genome association studies can rapidly link a disease phenotype to one or many genes. It is expected that the human haplotype map will reduce the cost of such association studies 10-fold.

Pharmaceutical companies can utilize the tools of pharmacogenomics to gain a better understanding of gene systems and develop correlations between genomic and proteomic markers to efficacy and toxicity phenotypes. Ultimately, this information can be used to refine the quality of targets sent into trials and thus to minimize the overall cost of developing a drug.

Resequencing SNP discovery strategy

The SNP Consortium (more information at http://snp.cshl.org/) has identified and placed greater than 2.1 million SNPs into publicly available databases. Although the number of identified SNPs is large, they are distributed randomly throughout the genome and across a 25-person panel. As a result, often 10-fold enrichment in SNP discovery is required within specific candidate genes to fully understand the role low frequency, non-synonymous SNP's may have in clinical trials.

A resequencing SNP discovery strategy provides for not only the identification of SNPs within the genes of interest, but should also leverage recent comparative genomic projects to intelligently target promoter resequencing. Successful resequencing SNP discovery incorporates a number of key manufacturing features, including sample throughput, accuracy of polymorphism identification, intelligent promoter prediction, and data management.

High sample throughput

Although SNP discovery projects are generally targeted to specific regions of interest, they still require the sequencing of many thousands of base pairs across a panel of samples. Timely identification of SNP loci is promoted by fully automated sequencing facilities capable of high sample throughput and rapid data analysis. These facilities often incorporate robotic liquid handling and large capacity capillary electrophoretic sequencers.

Quality

Project turn around is also decreased through increased sequence quality, as this leads to fewer required reads per sample. Quality parameters include long read lengths and a high number of Phred20 base pairs per read. Low sequencing backgrounds also promote more accurate sequence calling, which is required for accurate identification of true polymorphisms.

Quality assay design and validation

While quality sequencing is an obvious asset for SNP discovery, a more subtle yet critical component of assay success is the quality of assay design and validation prior to full-scale sample sequencing. Assay design should include sequence alignment, intelligent PCR amplicon modeling, and production compliant primer design, all of which are processes promoted through software process automation. The elements of human subjectivity and amount of time required for manual evaluations are greatly decreased using software analyses. Critical features of assay design should include deign considerations for homopolymer stretches, repeat regions, and GC rich regions.

Data Handling

The large scale of required sequencing produces a mass of data whose analysis can become a bottleneck if not adequately addressed. Again, software automation greatly aids the process. Not only nucleotide calling and accuracy prediction via Phred, but also automated heterozygote detection, assembly and assembly annotation are key features.

SeeSNP Discovery service

Agencourt recognizes the importance of SNPs within the functional genomics field and also recognizes the difficulties faced by researchers in conducting quality SNP discovery. Company scientists have developed an SNP discovery program called SeeSNP Discovery that uses a resequencing strategy for cost effective identification of novel SNP loci. The service, fueled by integrated bioinformatics and sequencing pipelines, incorporates the features required for quality, timely SNP discovery. The three main phases within the SNP Discovery service are assay design, assay validation and optimization; sample sequence delineation; and, most importantly, automated sequence assembly and annotation.

Assay design

Assay design begins with the alignment of sequences of interest against GenBank sequences to identify contigs with strong homology and to assess whether multiple homologous contigs exist within the human genome. A proprietary software suite developed by Agencourt scientists offers an amplicon tiling model using the minimum number of amplicons to span the region(s) of interest. Recent comparative studies indicate that as much as 3% of the non-coding human genome has been highly conserved in the mouse and fish genomes. As a result, we incorporate both coding regions including intron/exon boundaries, and non-coding upstream conserved regulatory regions within our amplification tiling models. Amplification primers are designed to have similar melting temperatures, to eliminate primer-dimer interactions, and to contain a universal M13 tail. These properties enable the use of a universal PCR amplification protocol and simplify downstream sequencing.

Assay validation and optimization

Appropriate upstream consideration to quality prevents complex rework eddies in a production pipeline. All proposed primers within the amplification models undergo quality control via mass spectrometry to ensure the highest level of quality. This proactive quality measure eliminates poor performance during assay validation procedures due to unclean or faulty primer stocks. All assay models undergo stringent validation against an ethnically diverse discovery panel prior to use in full-scale sample analysis. Universal PCR protocols are initially evaluated for each amplification pair. Failing assays are passed to an optimization pipeline which includes gradient PCR testing for products exhibiting multiple bands or no bands with universal conditions, Betaine PCR testing for primers containing high GC content, and lastly oligonucleotide redesign. Proprietary bioinformatics tools have proven to successfully streamline the process of assay design and validation, reduce assay variability, and dramatically improve amplicon success rates to over 90%.

Sequence delineation

Following assay validation, genes of interest are screened across customer-provided samples or a multi-ethnic panel. This phase of SNP discovery is fueled by the Agencourt high performance-sequencing pipeline, which has been used to sequence greater than 30 million bases to date. Agencourt's sequencing pipeline is fully automated and has the capacity to sequence over 20 million Phred20 bases a day. Patented SPRI nucleic acid isolation technology is incorporated at a number of steps within the pipeline, including PCR amplicon purification and dye-terminator removal, facilitating high sample throughput, increased sequence quality and exceptionally low sequencing background, which is fundamental during SNP identification. Sequence data is analyzed using both proprietary and industry standard software packages to identify polymorphism locations. Previously defined SNPs from public databases are included in all data summaries. All aspects of the pipeline are managed by the Galaxy Laboratory Information Management System (LIMS) that continually collates sample genealogy, quality parameters and sequencing results. In tandem, custom bioinformatic software tools perform real-time assessment of instrument operation to maintain optimum performance and maximize sequence quality. This proactive approach to quality management allows confidence in the integrity of sequence data and true polymorphism identification.

Genotyping service

SNP loci identified using the Agencourt SeeSNP Discovery service can be seamlessly advanced into mass spectrometry-based or 3730XL Resequencing-based AgenTYPE™ genotyping services. There are an exhaustive number of potential methods that can be used in SNP genotyping. Mass spectrometry has proven to be a rapid and easily multiplexed genotyping technology. Resequencing, while at times more expensive than mass spectrometry, is the gold standard of genetic detection and provides complementary and confirmatory results to mass spectrometry. The AgenTYPE genotyping service features a multi-stage process based on assay design software tools, the Sequenom MassARRAY system, and the Applied Biosystems 3730XL platform, which together are capable of SNP genotyping with high levels of sensitivity and accuracy greater that 99%. The MassARRAY platform combined with Agencourt's genomic expertise and fully-automated facility create an SNP genotyping service that provides high quality, high throughput results for large and small scale pharmacogenomic studies. A holistic approach to SNP analysis, encompassing both directed SNP identification and genotyping, streamlines the research process, maintains data consistency and quality, decreases inter- and intra-project data, and lowers overall SNP discovery and genotyping costs.

Summary

Reduction in compound attrition rate is a necessary component of pharmaceutical industry growth. Pharmacogenomics is the technology with the most promise of helping scientists reach this goal. Understanding the variation in a specific target is critical to predicting trial outcome and can be done through extensive SNP discovery focused in coding regions and highly conserved putative regulatory regions.

About the Authors

Kevin McKernan and Paul McEwan both serve as vice presidents and co-chief scientists for Agencourt Bioscience Corporation, which they helped to co-found in 2000. Agencourt provides genomic services and nucleic acid purification products that help biotech and pharmaceutical companies improve the effectiveness and efficiency of their drug development pipelines. Agencourt is headquartered in Beverly, Massachusetts and is available on the Web at www.agencourt.com.

More information is available from:

Share this Story

X
You may login with either your assigned username or your e-mail address.
The password field is case sensitive.
Loading