All right, welcome to the third week of plant bioinformatics. This week we'll be exploring coexpression tools. Coexpression tools, really allow us to identify genes that are similarly expressed. So, this can provide insight into one's specific expression profiling experiment that we can identify groups of genes that are similarly expressed. We can also identify genes that are similarly expressed in a more general context using gene expression databases, without actually doing a single expression profiling experiment and this is what I would consider to be coexpression analysis. We can also ask the question: Do promoters of similarly expressed genes contain common cis-elements? And we'll be exploring this in module four. We can also ask: Is there enrichment of a particular functional category or of a given pathway, as defined by gene ontology or AraCyc, et cetera.? We can actually then look at individual clusters with the AgriGO tool or with AraCyc. We can also explore genes in a given cluster and see if they're part of a given pathway. We can map expression information onto pathways and using some visual analytic tools that we'll be exploring in module five. All right. So, how can we do a co-expression analysis, on publicly available gene expression data? Well, there are several tools out there that we'll be exploring in the lab. One of the first of these was Expression Angler. In the case of Expression Angler, what we do is we simply enter gene identifier, and we select a dataset in which we would like to angle, look for coexpressed genes. Basically, we can use a similarity metric known as the Pearson correlation coefficient, to ask whether our gene of interest, and this expression pattern looks like this, is similar to other genes in the expression datasets across the samples. We do that for each gene sequentially, we compute an R value. If R is equal to one, we have perfect coexpression. If R is equal to zero, there's no correlation in the expression patterns. If R is equal to minus one, we would have the opposite response and R can range from minus one to plus one. There are other tools out there and they use other metrics like mutual rank, and you can see the lab manual for discussion of that. Now, in the case of Expression Angler, we can actually identify genes with functions in novel contexts or we can use it to identify genes that are involved with a particular biological process. Here's an example from the literature, where RGL2 was shown to be involved in floral development. So, its involvement in seed biology was long known. It's involved in perceiving gibberellic acid. Its involvement in floral development was shown by Elliot Meyerowitz's group in 2004, and they did lots of genetic analyses to show that GA and RGL2 are involved in floral development. Now, if we actually take RGL2 and we query Expression Angular with RGL2, we see that many floral homeotic genes/floral developmental genes are returned across a large dataset of sort of disparate samples. For instance, Sepallata2, Sepallata3, Agamous, Pistallata, Apetalla3, and Apetalla1. These genes are all involved in floral development. So, simply by querying with RGL2, we immediately come up with the hypothesis that RGL2 is also involved in floral development. If we also look at other genes in this list, we see that some don't have any particular annotatation associated with them, like this one here which is just express protein. We can hypothesize that the role of this expressed protein is also floral development due to this guilt by association paradigm where we see several genes being involved in floral development. Therefore, the one that's annotated as expressed protein might also be involved in floral development. Coexpression screens really can identify novel genes, and this is just a small selection of papers where coexpression analysis has been used as a basis, instead of a mutant screen to identify genes involved with a particular biological process. So, in the case of my own lab we were interested in seed biology, and no master regulator of dormancy or germination has been identified. We ask the question: Can we use coexpression analysis to identify crucial hubs in networks? So, a post-doc of mine George Basel took seed microarray database created a seed microarray database of a 175 samples. We calculated together with Anthony Bonner in Computer Science all by all gene coexpression scores in samples that are dormant or can germinate. To come up with a database of about 4.5 million interactions. Then, we visualize and analyze the network to identify hubs within that network. So, here's SeedNet. The nodes in this diagram represent the genes and the edges represent significant coexpression scores. The genes are coloured as to whether or not their increased in expression in dormant samples in red or their increase in expression in germinating samples in blue. There's three main zones that we see, so sort of a dormant zone, germinating zone, and then this transition zone between the two parts of the network. So, what we did is we actually zoomed in on part of the network and we then tested genes that are highly connected by coexpression scores, that are hubs within the coexpression networks for their ability to affect germination. All of these genes here with blue font, labeled with a blue font, in fact, showed up as having a phenotype in germination assay. The really nice thing about this particular method over just looking at expression values and differentially expressed genes, is that we had a hit rate of around 50 percent when examining hub genes within this region of the network. Of course uncharacterized hub genes in this region represent high confidence candidates awaiting further examination. So, there are lots of different co-expression networks that have been generated, now and these are all condition-dependent coexpression networks, mostly. For instance, there's FlowerNet that appeared in 2015, to identify regulators of floral development. There's a cadmium stress network from rice. There's biotic stress network that was identified by weighted gene coexpression network analysis. Then, other pan and core network analysis that identify certain genes. So, the tools that we'll be exploring today are Expression Angler, that allows condition-dependent searches in several different Arabidopsis compendia. You can also design your own bait to pull out genes that exhibit specific expression patterns in the absence of a query gene. ATTED II, which also allows condition-dependent searches in Arabidopsis and eight other plants species. AtCAST, is kind of a cool tool which allows you to identify which gene expression dataset is most similar to your own gene expression datasets. So, if you've got a mutant you don't quite know what's being affected, you can actually query against the AtCAST database to find out samples that show similar profiles across all of the genes. AraNet is partly a coexpression network but has other edges between 22,000 Arabadopsis genes, and that's supported by 19 sources and it has been projected to 27 plant species. The last tool that we're actually not exploring in the lab is WGCNA, that I mentioned on the previous slide. But this is a nice R package that we can use to generate coexpression networks. Thanks for listening and I hope you enjoy the lab.