The Natural Environment Research Council
Plymouth Marine Laboratory

Microbial Metagenomics

A NERC Funded research project investigating the metagenome of marine microbial communities.

Project leader Dr. Ian R Joint. Plymouth Marine Laboratory UK. email IRJ@pml.ac.uk

Home

Bioinformatics strategy

The microbial metagenomics project will produce a large amount of metagenomic sequences, i.e. DNA sequences extracted from the cells in a sample of sea water. To make most use of this sequence, it must be accompanied by metadata describing the environment from which the sequence came. We are therefore developing an integrated bioinformatics strategy where unique bar codes are used to track samples through the sequencing and annotation process, and to related annotated sequences back to the relevant metadata.

informatics flowchart

Barcoding: The barcoding database is a central feature of this project. All physical samples collected are barcoded, as are clone libraries and the sequences derived from them. All sequences submitted to EMBL will contain information from the Barcoding system in the “notes”feature, allowing the sequence to be tracked back to an original sample and location. Our barcoding system, Handlebar, includes an on-line catalogue where users may request bar codes to apply to samples, and upload data describing what each bar code has been applied to. A technical paper describing this system is currently (June 2006) in preparation.

Sequencing: Sequencing will be done through the Edinburgh Sequencing Facility. This will include some post-processing of sequences, including assembly of fosmids and quality control. All 96-well sequencing plates will be sent to the ESF with a barcode that has been registered in the barcoding database, and the returned sequences will be allocated a unique ID derived from the barcode of the plate and the well number of the sample.

Yet Another Microbial/Metagenomic Annotation Pipeline/Program: YAMAP provides a simple interface (Perl/Tk GUI) to a variety of analysis tools useful for conducting primary annotations. YAMAP is available as a Bio-Linux (Debian) package, and will be used for first pass annotation. Work is in progress to convert the various tools used by YAMAP into web services that users of Taverna could employ in their workflows. YAMAP will be tied in to a Caching system under development in Newcastle, which will collect the sequences generated in the project from users' Bio-Linux systems. Blast databases of all sequences developed by the consortium will be returned to each Bio-Linux system for new sequences to be searched against.

metaMicrobase: metaMicrobase is a Grid-based system under development specially for the metagenomics project. The system will automatically analyse data as it arises using predefined tasks, and will inform users when results of interest to them arise. The aim is to help users tackle the problem of wading through the results of analysing large sequence datasets, allowing them to personalise their own sets of analytical tasks. The system is currently under development but is building on technology developed for the Microbase project. The first prototype test system is expected to be in place by autumn 2006, and tests of the sequence collector are currently underway.

EnvBase: All award holders funded under the NERC PG&P programme are required to complete an entry in NEBC's EnvBase data catalogue.

Microarray Informatics: The microarrays in this project are being developed and used in partnership with the Liverpool Microarray Facility. Group members all have access to microarray software on Bio-Linux, including maxd (recommended for producing MIAME compliant annotations), GeneSpring, and BioConductor.

Other software: Other sequence analysis systems are under consideration, for example Megx and SEED. These systems are not yet deployed, and more information will follow later.

Copyright © 2006: The Natural Environment Research Council
Home | Contact Us