[MUSIC] Now, the problem is that the same technology stack and to some extent even the same approach is difficult to apply in the case of the Large Synoptic Survey Telescope. The reason why this guy is producing so much more data, is not just that it's much higher resolution and can perceive a much deeper field in the sky, but also because it's returning to the same point in the sky, frequently, every three days. And so, this allows you to look at things that change over time. So asteroids, comets, a supernova, and so forth. Okay, and by comparing these images in the time series, there's all sorts of new questions you can ask. So both because of the science they are gonna do, and because of the shear scale, and because some of the complexity of the details of how the data is acquired. The previous solution wont work, and so this is motivated in a whole new area of research, to study data management techniques and data analysis techniques to support this project. So in life sciences, these high through put sequencers are capable of producing terabytes per day when running continuously. And major labs that do this work such as the McDonnell Genome Institute have 25 to 100 of these machines running all the time. So this is spitting out an enormous amount of data for I was gonna say variety of samples. So it may be individual organisms or it could even be samples from the environment where there's no one particular organism in there, but there's entire population. So for a variety of uses, these guys are able to spit out the data. In oceanography, the regional scale modes of the NSF Ocean Observatories Initiative is a project led here at U-Dub. Ocean Observatories Initiative is a multi-institutional partnership. The Regional Scale Nodes is run at the University of Washington. So this project is involving 1,000 kilometers of fiber optic cable on the seafloor connecting thousands of instruments in chemical, physical and biological. Thousands of chemical, biological, and physical sensors, including live video from the sea floor to monitor volcanic activity. Okay, so again, the database [INAUDIBLE] the data sets and data infrastructure required for this effort is significant. It has motivated a lot of new research. In the information space, there's a lot of science to be done directly on the web itself. And so just the web, a single computer can read 30 to 35 megabytes per second from one disk. And so it would take about four months just to read the entire Web. So summing it up a little bit, eScience is about the analysis of data, so the automated or semi-automated extraction of knowledge from massive volumes of data, and so your main instrument for looking for answers are the algorithms and the technology as opposed to direct inspection. There's just too much of it to look at yourself, but it's not just a matter of volume, as we'll talk about in the next segment. This is another link back to what's going on in business. There's this concept of big data and there's the three V's of Big Data that we'll talk about a little bit more next time, but let me just mention them here. The three V's are volume, variety, and velocity that you'll read about. And I'll give you where those terms came from in the next segment. The volume refers to just simply the number of rows or the number of bytes, the sheer scale. Variety is perhaps the number of columns or dimensions, but in science, for example, a lot of, in the life sciences in particular, you'll have experiments that involve accessing multiple public databases, as well as multiple sensors, your own data that you've collected and that of your colleagues, and the integration task of putting all this data together is a significant bottleneck. Even with the when the actual scale of the data is not necessarily all that bad. The complexity of the data. And then, the velocity. We start with a large optic survey telescope that, although the scale itself is enormous the fact that 40 terabytes are being collected every two days means that the infrastructure needs to keep up with that pace. And just transferring that data from the telescope facility to the data analysis facility is an engineering challenge. And you'll also see other V's here with things like veracity, can we actually trust this data. So a bit more of that next time. So to summarize here, science is in the midst of a generational shift from a data poor enterprise where there's never enough data to a data rich enterprise where there's so much of it, you don't know what to do with it. And as a result, data analysis replaced data acquisition as the new bottleneck to discovery. So it's not that the cost of going out and getting the data, it's the cost of actually analyzing the data you might already have. So this fine, but what does this have to do with business, which is probably where a lot of you are coming from, and where your interests lie. Well, what we see is that business is beginning to look a lot like what's always been happening in science. So businesses are acquiring data aggressively and keeping it around indefinitely in case it becomes useful. They're beginning to hire people that have training and skill sets that look a lot like what's been important in science for a long time especially in mathematical depth. And they're beginning to make decisions with this data that are very empirical. So always wanting to sort of backup every decision with a ClearCase based on data. And so for these reasons, I think that you can take the lessons learned in science and apply them in business, and actually, vice versa, as well, because one thing where science is lagging behind business is in the adoption of technology. There's been proportionally a lot less spent on IT infrastructure in science than there has in business, and so it's a great time for this. There's this cross-pollination of ideas between both fields. Okay, and so going back to the first slide that I gave, eScience and Data Science have essentially everything in common. So we might use examples interchangeably between the two. [MUSIC]