High-throughput protein analysis integrating bioinformaticsand experimental assays.
Nucleic Acids Res. 2004 Feb 3;32(2):742-8
The wealth of transcript information that has been made publiclyavailable in recent years requires the development ofhigh-throughput functional genomics and proteomics approaches forits analysis. Such approaches need suitable data integrationprocedures and a high level of automation in order to gain maximumbenefit from the results generated. We have designed an automaticpipeline to analyse annotated open reading frames (ORFs) stemmingfrom full-length cDNAs produced mainly by the German cDNAConsortium. The ORFs are cloned into expression vectors for use inlarge-scale assays such as the determination of subcellular proteinlocalization or kinase reaction specificity. Additionally, allidentified ORFs undergo exhaustive bioinformatic analysis such assimilarity searches, protein domain architecture determination andprediction of physicochemical characteristics and secondarystructure, using a wide variety of bioinformatic methods incombination with the most up-to-date public databases (e.g. PRINTS,BLOCKS, INTERPRO, PROSITE SWISSPROT). Data from experimentalresults and from the bioinformatic analysis are integrated andstored in a relational database (MS SQL-Server), which makes itpossible for researchers to find answers to biological questionseasily, thereby speeding up the selection of targets for furtheranalysis. The designed pipeline constitutes a new automaticapproach to obtaining and administrating relevant biological datafrom high-throughput investigations of cDNAs in order tosystematically identify and characterize novel genes, as well as tocomprehensively describe the function of the encoded proteins.
采用生物信息学和实验方法联合分析转录组信息