|
PROFESSIONAL EXPERIENCE Bioinformatics Software Developer October 2004 – July 2008 NewLink Genetics Inc., 2901 S. Loop Drive - Ames, IA - 50010 Head software developer for VmatchNL, a commercial software program for large scale genome alignment. Head software developer for GeneSeqerNL, a commercial spliced alignment software program for gene structure prediction. Designed bioinformatics case studies to demonstrate the capabilities of our software. Designed and implemented a business plan for the Bioinformatics Department Wrote federal and state grants to supplement our department. Supervisor: Dr. Volker Brendel
Bioinformatics Intern / Fellowship May 2002 – Aug 2004 Pioneer Hi-Bred., 7100 NW 62nd Avenue PO Box 1000 - Johnston, IA - 50131 Rewrote the Protein Family Database (PFam) by rebuilding the hidden Markov models with alternative representations of the protein sequences Use data mining techniques to help classified previously unknown maize proteins in Pioneer’s proprietary database Supervisor: Lane Arthur Contact Information: (Mr. Arthur no longer works at the Pioneer Johnston office.)
Software Developer Jan 1998 – Nov 1999 John Deere., 3500 E. Donald. P.O. Box 270 - Waterloo, IA - 50704 Wrote, tested, debugged a wide range of code to be year 2000 compliant Supervisor: Denny Mills Contact Information: (Mr. Mills has retired from John Deere Waterloo Tractor Works)
Keywords: high-throughput genomic and proteomic data analysis, DNA sequence analysis, tools integration, software development, database development, web development, machine learning, knowledge discovery and data mining, Bayesian learning, gene structure prediction, string indexing/searching/matching, protein sequence/structure/function analysis and prediction
ACADEMIC RESEARCH EXPERIENCE
Ph.D. Dissertation Jun 2000 – Aug 2008 Iowa State University, Ames, IA Machine learning approaches for classifying proteins from sequence: focus on predicting function, subcellular localization, and misannotated proteins (Advisors: Dr. Vasant Honavar and Dr. Drena Dobbs)
EDUCATION
Iowa State University, Ames, IA Jun 2000 – Aug 2008 Doctor of Philosophy, Computer Science Overall GPA 3.78 / 4.00
Wartburg College, Waverly, Iowa Sep 1996 – May 2000 Bachelor of Technology (Honors), Mathematics and Computer Science Overall GPA 3.83 / 4.00; CS GPA 4.00 / 4.00; Math GPA 3.92 / 4.00
TECHNICAL SKILLS
PROGRAMMING Languages: Java (8 years), C/C++ (4 years), Perl (3 years), COBOL (3 years), SAS (2 years), Python (1 year), Prolog (1 year), Lisp (1 year), Scheme (1 year), Assembly (1 year), JCL (1 year) Web Tools: HTML (8 years), XML (6 years), JavaScript (4 years), Perl (3 years), Java Servlets (3 years), PHP (2 years), MySQL (2 years), CakePHP (1 year), DOM (1 year), AJAX (1 year), JSP (1 year) Database Systems: MySQL, Oracle, DB2 (combined experience of 5+ years) Operating Systems: Linux/Unix (6 years), Windows NT/2000/XP and MSDOS (10 years) Bioinformatics tools and databases
Over 8 years of experience with: sequence alignment tools (e.g. BLAST, Vmatch, ClustalW), gene prediction programs (GenScan, GeneSeqer), machine learning tools (Naive Bayes, Decision Trees, Artificial Neural Networks, SVMs) motif finding tools (MEME, HMMer), visualization tools (Apollo, Rasmol, Sting!) Major public databases (PlantGDB, GenBank, PDB, Swiss-Prot, MIPS, SCOP, Prosite, Ensembl, Entrez, Refseq, Gene, UniGene, UniProt, Gene Expression Omnibus, Stanford Microarray Database, SAGEmap, UniProt, Gene Ontology) GRANTS 1R43HG004021-01 C. Link (PI) NIH SBIR Phase I 9/1/06 – 2/28/07 Title: VmatchNL – a User-friendly Graphical Interface for Large-scale Genome Analysis. Role: Head Bioinformatics developer
1R43HG004180-01 W. Young (PI) NIH SBIR Phase I 9/26/06 – 3/31/07 Title: Exon Boundary Tags (EBTs) for Human Functional Genome Annotation. Role: Head Bioinformatics developer
SUMMARY OF OTHER IMPORTANT PROJECTS
Developed a machine learning approach to automate functional annotation of proteins. These methods discovered potential errors in annotations of protein kinases in MGD. Developed a two-stage machine learning algorithm (HDTree) that uses a decision tree from the outputs of seven Naïve Bayes based classifiers (based on a range of k-gram amino acid composition using two different Bayesian models) and a homology based search tool to predict protein function from sequence. Developed NB(k), a Naïve Bayes algorithm based on k-gram representations of proteins. NB(k) takes into account the sequential dependence found in overlapping k-grams and therefore performs as well on a wide variety of classification tasks (function, subcellular localization, and structural domains). Developed a Naïve Bayes algorithm and a Support Vector Machine algorithm to use alternative representations of proteins to predict protein function from sequence. Performed a systematic study on the effects of using random alphabets on common bioinformatics classifications problems. Developed a Decision Tree algorithm that uses alternative representations of proteins to predict protein function based on sequence. Created a new implementation of the ID3 Decision Tree algorithm that is agent-based to learn from horizontally and vertically fragmented distributed data.
OTHER SKILL SPECIFIC PROJECTS
Freelance web designer and programmer for the following sites:
NewLink Genetics Website (www.linkp.com) Indonesian Language 101 Website (www.indonesianlanguage101.com) Ames High School Wrestling Website (www.ameshighwrestling.com) International Shopper Website (http://www.internationalshopper.us) D&orfs Catering Website (http://www.dandorfs.com/catering/index.php) Ames High School Soccer Website (www.ameshighsoccer.com) -Currently disabled SubLochness – Protein Subcellular Localization Prediction Server (http://ailab.cs.iastate.edu/sublochness) -currently disabled - est. release August 2008
LIST OF PUBLICATIONS
Refereed Journal Papers
Andorf, C., Dobbs, D., and Honavar, V. (2007). Discovering Protein Function Classification Rules from Reduced Alphabet Representations of Protein Sequences. Information Sciences. In press.
Andorf, C., Dobbs, D. and Honavar, V. (2007). Exploring Inconsistencies in Genome Wide Protein Function Annotations: A Machine Learning Approach. BMC Bioinformatics 007 Aug 3; 8:284.
Invited or Refereed Book Chapters
Honavar, V., Andorf, C., Caragea, D., Dobbs, D., Reinoso-Castillo, J., Silvescu, A. Wang, X. (2002). Invited Chapter. Algorithmic and Systems Solutions for Computer Assisted Knowledge Acquisition in Bioinformatics and Computational Biology. In: Computational Biology and Genome Informatics. Wu, C., Wang, P., and Wang, J. (Ed.) World Scientific.
Honavar, V., Andorf, C., Caragea, D., Silvescu, A., and Sharma, T. (2001). Invited Chapter. Agent-Based Systems for Data-Driven Knowledge Discovery from Distributed Data Sources: From Specification to Implementation. In: Intelligent Agent Software Engineering. Plekhanova, V. and Wermter, S. (ed.). Idea Group Publisher.
Recent Refereed Conference Papers
Caragea, D., Pathak, J., Bao, J., Silvescu, A., Andorf, C., Dobbs, D., and Honavar, V. (2005). Information Integration from Semantically Heterogeneous Biological Data Sources. In: Proceedings of the 3rd International Workshop on Biological Data Management (BIDM 2005), DEXA Workshops 2005, Copenhagen, Denmark. Pp. 580-584. IEEE Computer Society.
Caragea, D., Pathak, J., Bao, J., Silvescu, A., Andorf, C., Dobbs, D. and Honavar, V. (2005). Information Integration and Knowledge Acquisition from Semantically Heterogeneous Biological Data Sources. In: Proceedings of the 2nd International Workshop on Data Integration in Life Sciences (DILS 2005), San Diego, CA. Vol. 3615, pp. 175-190. Berlin: Springer-Verlag.
Andorf, C., Silvescu, A., Dobbs, D. and Honavar, V. (2004) Learning Classifiers for Assigning Protein Sequences to Gene Ontology Functional Families. In: Proceedings of the Fifth International Conference on Knowledge Based Computer Systems (KBCS 2004), India. Andorf, C., Dobbs, D., and Honavar, V. (2002). Discovering Protein Function Classification Rules from Reduced Alphabet Representations of Protein Sequences. In: Proceedings of the Conference on Computational Biology and Genome Informatics. Durham, North Carolina: pp 1200-1206.
Honavar, V., Andorf, C., Caragea, D., Silvescu, A., Reinoso-Castillo, J., and Dobbs, D. (2001). Ontology-Driven Information Extraction and Knowledge Acquisition from Heterogeneous, Distributed Biological Data Sources. In: Proceedings of the IJCAI-2001 Workshop on Knowledge Discovery from Heterogeneous, Distributed, Autonomous, Dynamic Data and Knowledge Sources. pp 331-337.
|