"Bioinformatics of African Pathogens and Disease Vectors" Software demonstration - ILRI, Nairobi, May 2007 VectorBase http://www.vectorbase.org ~~~ Short presentation Examples of questions and how to answer them Karyn Megy - kmegy@ebi.ac.uk European Bioinformatic Institute, Hinxton, UK Short VectorBase presentation Short presentation ================== What is VectorBase? ------------------- VectorBase is a Resource Centre for Invertebrate Vectors of Human Pathogens. Reference: 'VectorBase, a home for invertebrate vector of human pathogens' - Lawson D. et al. - Nucleic Acid Research, Database issue, Jan.2007 Who is funding VectorBase? -------------------------- The US National Institute for Allergy and Infectious Diseases (NIAID), via its Bioinformatics Resource Centre (BRC) program. Who is involved? ---------------- VectorBase is a collaborative project between several laboratories in the United-States, in England and in Greece - but we also display data from the scientific community (United-States and England so far, and soon from Kenya). Examples of questions and how to answer them ============================================ Given examples -------------- Files containing sequences and examples of keywords or questions to uses in these exercises can be found at: http://www.ebi.ac.uk/~kmegy/Nairobi07/ But if you have your own sequences or ideas, then go for it! Q-1: Get information about a gene USE: the Search engine --------------------------------- 1. On the main page, in the search engine, type the name of the gene you are interested in. Click on 'GO'. 2. This page shows the result of the query: either no query (may be try again adding a '*' to your query), either a single result (in which case you are actually directly send to 3.) or several results. Select the gene you are interested in by clicking on its name. 3. This page shows the report for this gene: - You can get its ID and description, genomic location and prediction method, - Check for its structure (length, transcript, exon number), - Look if it has any paralogs and homologs ... you can even jump to another genome! - The DAS section if about any external information that would be added to this gene (often expression data), - The transcript section gives you more information about the transcript (structure and cross-references of all sorts), - From the Transcript Report, you can go to the Exon Report or the Protein Report. You will see details about the transcript, exons and protein (length, sequence, evidences, etc.). 4. On the left hand side menu, you can choose to: - Align the genome section containing this gene to other species (An.gambiae and Ae. aegypti and Dr.melanogaster only, at the moment), - See the gene tree for this gene, - Upload it in various formats. Click around to see what you can learn about your gene! Q-2: Compare sequences USE: BLAST ---------------------- 1. From the main page, go to the BLAST page. 2. Copy/paste your sequence(s) in the box or upload a file containing these sequence(s). !! Limit of 100Kb and 10 sequences !! 3. Select the organisms, the database type (transcript, protein or genome) and the blast type (blastn, blastx, tblastn). Set additional blast parameters if required. 4. Submit your job. The next page gives you a summary of your parameters. 5. The result page displays interactive or raw results. Try both and see what is the difference! What is your sequence(s) similar to: which organism(s), which gene(s), at which location(s)? Q-3: Align two sequences USE: ClustalW ------------------------ 1. From the main page, go to the ClustalW page. 2. Copy/paste the sequences in the box or upload a file containing these sequences. !! Limit of 100Kb and 10 sequences !! 3. Select the sequence type (dna or protein). Set additional parameters if required. 4. Submit your job. The next page gives you a summary of your parameters. The result page displays raw results only. Q-4: Get a whole set of genes USE: BioMart ----------------------------- 1. From the main page, go to the BioMart page ("Data mining" section). 2. Select the dataset (VectorBase only at the moment) and the database (An.gambiae and Ae. aegypti only at the moment). Remark: su7mmary of the selected items appear in the left section of the screen. 3. Left section of the screen: select "Attributes". Right section: select "Attributes", to choose your attributes. Section "Features" for gene and transcript attributes and section "Structure" for exons and UTRs attributes. then Right section: select "Sequence", to choose the sequence header and get the sequence in itself. !! Sequence not available yet - but should be ready soon !! 4. Left section of the screen: select "Filters". Right section: select "Region", "Genes" or "Transcripts" depending on the kind of filters you want. Go and have a look at what is available! 5. Click on "count" if you want to know the number of features selected (useful to check if you will actually have what you thought you asked for ... and for checking if the export went fine). Click on "Result" to see the results - and select the export type if you want to export them (usually text file). More about BioMart at: http://www.biomart.org/install.html Q-5: Browse the genome USE: the Genome Browser ---------------------- 1. From the main page, go to the organism you want to look at. In the "Sequence data" section, click on "Genomic data". 2. Select your chromosome, or directly type in the region you are interested in. 3. If you had clicked on a chromosome, you would reach this page: overview of the chromosome (gene density, SNPs and band graphs, length, gene and SNP number). Choose the region you are interested in by typing its coordinates in the boxes or click on a region of the chromosome. 4. You reach a page with different tracks: - Chromosome The red box shows which region of the chromosome you are looking at. Click in this panel to view another region or this chromosome. - Overview Show an overview of the features in this region and its immediate surrounding. Click in this panel to change zoom in, out or centre on another region. - Detailed view The genome is on blue, above are the features on the forward strand, and under on the reverse strand. It's the most useful track, with many features. The ones you might want to look at are: - Genes (different colors for different types), - BLAST vs. UniProt, markers, SNPs, ESTs and arrays (see "Features" menu), - Match between Anopheles/Aedes/Drosophila (see "Compara" menu), - Repeats (see "Repeats" menu), - Change the decoration (start/stop codons) with the "Decoration" menu You can add/remove some features by clicking on the menus ("Features", "Comparative", "DAS sources" etc.) and un/selecting the items. You can expand/collapse the features by clicking on the [+] or [-] next to their name. - Base view Show this region of the genome at the base level and amino acid level (3 reading frames - both strands). Switch ON/OFF the track by clicking on the [+] or [-] next to its name. In the Detailed and Base Views, you can zoom in/out, or move left/right by using the buttons at the top of the tracks. Now try! Move along the chromosome, switch ON/OFF some of the tracks, add/remove some features in the Detailed View. But remember: - The larger the region, the longer it takes to load - The more tracks, the longer it takes to load More about browsing the genome at: http://www.ensembl.org/info/helpdesk/tutorials/index.html Q-6: Visualize your own data along the chromosome USE: DAS ------------------------------------------------- This is a bit tricky to do to but the person in charge of DAS will be glad to help. Have a look at: http://www.ensembl.org/info/helpdesk/tutorials/index.html