[MUSIC] We have just covered the basic principles of geospatial data science. Including the composition of major knowledge domains that geospatial data science represents and includes as intersection of three major knowledge domains. Including the geospatial Sciences and technologies, the cyber Instructor, competition science, as well as mathematical and statistical sciences. We had an example of divide and conquer to illustrate the importance of geospatial sciences and technologies, and the principles for solving geospatial problems. And then that leads us to this topic, which is scientific applications and drivers, which we're going to learn supposed to be quite exciting. So I would ask you questions. Looking at these two remote sensing imagery pictures. Whether this is, well, land or forest, you might have a hard time to actually tell me what they are. Well, simply, these are two day imagery representations and for differentiating or tell the difference between wetland, and forest just from two dimensional representation is a difficult task. This is a lighter, remote sensing scan of the same area, and you could tell this gets the vertical structure of the same piece of land, and on the right hand, you see clearly a gap that could be caused for instance, by a river channel underneath of some trees and forest. So we're talking about a county in the southern Illinois. The reason I pick up this county is, because it has diverse land use types, including wetland forest and even some urban use of the land, as was agricultural land. This is very challenging. If we would use remote sensing to classify the land use types traditional remote sensing such as land set. Now the the advantage of having lighter. I mentioned, for instance, the three. That program, three dimensional elevation program produces the lighter across the country in the US. That gets us the lighter, remote sensing data. We could imagine the overlay of that with the traditional remote sensing for this particular county that would get us both two dimensional land use information as well as a three dimensional. Essentially the vegetation and forest structures intuitively, that would help us to better differentiate land use types, because now that we have both the two dimensional imagery as well as a three dimensional structure that's pretty powerful to help us determine the land use types in a proper way. Now, the way of combining such data together is a non trivial and for human eyes, we could imagine we look at a piece of land. We could tell the structure vertically as well as what's beneath to differentiate different land types. But for computers to learn how to figure out the different land use types, this, includes in this case deep learning model representing the three dimensional information and oftentimes in. For instance, computer vision you two would have deep learning model constructed to represent two dimensional information. Effectively, in this case is three dimensional information from the lighter on top of that, with the traditional two dimensional information combined into this deep learning model that gets us better performance compared to traditional land use, land cover classification capability and from the slide, the picture essentially telling us some comparisons. For instance, the bottom of one. You see some river bodies. The red color is representing some developed areas such as urban land use, which, clearly from the conventional benchmark, is a misclassification, but using the deep learning model combining the lighter, three dimensional information with the two dimensional remote sensing, you see more correctly classified cases essentially consistent to the water bodies we're looking at again. If you look at other examples from this picture, you see consistent improvement of land cover classification. Now, this example shows the power of combining different types of geospatial data, and in this case, also because it's a big data scenario, including both lighter and traditional remote sensing. We need to tackle computation and data challenges, even just for a small county, involves data pre processing feature extraction and the actual classification process, including two parts training as well as the eventual application of train model to the classification process. This particular study used virtual Roger as the computing resource, which is the Geospatial supercomputer recovered when we were learning advanced cyber infrastructure in this introduction part of the course. Now we finished this part when we had the methodology fully developed about one week use of Roger, which tells you how much computing power you actually need to conduct a study like this. Now, if you consider a much broader scope of application, for instance, that land cover case for the entire US. You could think for cyber GS based problem solving dealing with geospatial big data, you would need a much bigger computing power. Likely the most advanced high performance computers constructed by the science technology communities today would still perhaps be shy of solving this land cover classification problem for the entire country or even you consider the entire globe. So we're talking about seven GS and geospatial data Science is really a frontier of driving, even computing and cyber infrastructure innovation, because we have much more demand that could be satisfied from the current state of art of computing and cyber infrastructure. Another example is related to the land cover classification, but more from emergency management point of view. As you all know, if you're familiar with the GIS. GIS has a killer application which essentially is emergency management. Because emergency management always starts with maps and digital maps is now norm for emergency managers. And in this case, the researches about mapping flood innovation at the continental scale. And there are multiple datasets again involved. The top one is river gauge measurements. About 2.7 million of those measurements across the country in the US. And the one on the left. The bottom picture there was the digital elevation model I was talking about, essentially produced by the US. Geological Survey. Now, you combine these two data sets, you will be able to figure out with some hydrological modelling exercise what is referred to be highest hide above near drainage representation, meaning when you are off, for instance, river. Depending upon the elevation of where you are and how far you are from the river channel, we will be able to figure out what is the flood level. Because we have the river water measurement and we have the elevation of where you are. We have a straightforward way of determining will be the water level at where you are. So this is what is referred to be as hide about years drainage methodology. And we want to compute this across the country using those two data sets on the left side of the slide. And this turns out to be a rather competition the intensive methodology, but represents a very exciting frontier of hydrology, and the GIS. Essentially referred to be continental hydrology in this paper listed at the bottom of the slide and again the study used Roger, as we will be using in our class. Throughout the course, we will be using cyber GS capabilities and will work with you to solve some interesting problems and using some cool geospatial big data sets. This essentially is dependent on the virtual router, advanced cyber infrastructure and cyber GS tools such as this slide shows and a screenshot called Cyber GS distributor. For those of you who are familiar with data science, you might be familiar with the Jupiter notebook. It's online environment. That and you could also work on your personal computers by Donnelly Jupiter notebooks. But Jupiter notebooks through Jupiter Hub could offer a server environment for a large number of users to access Jupiter notebook simultaneously. But the cyber GS Jupiter is using Cyber GS as a back end and the Jupiter notebook as a front end to provide this capability as a cyber GS gateway application. And in this case, we are presenting you with a complete scientific workflow for that flood mapping process and in our class. Later we will share with you this workflow and you are able to run this workflow by yourself, step by step, and I understand what each step does in this scientific work flow to map the flood innovation. Not at the national scale, because that will consume a lot of computing power, but at a smaller scale. You would understand what works for each of the boxes and the functions and parts of this workflow. This is amazing, because this methodology was published recently and done through a couple of dozens of scientific researchers to make this possible. Now you have the power to learn this methodology directly by going through each of the steps using cyber GS Jupiter. Essentially, as we learn the multiple three key modalities of cyber GS software, Cyber GS Jupiter is one of them belong to the suburbs. Yes, gateway modality, meaning this capability is, or those who are not necessarily Cyber GS experts, but able to benefit from the power of cyber GS and the power behind somebody has offered by cyber infrastructure. And there's also paper towards the later part of the course. You will be asked to read this paper, because by then you would be equipped with knowledge. To understand the technical aspects of this paper, this paper essentially illustrates the capabilities of cyber distributor, including architecture and how it works and what's under the hood it has achieved. But even the notebook itself. The notebook of mapping the flood inundation. Now we have published that online as as part of scientific literature and at the University of Illinois. We have online environment to publish this as a data set, which you could cite in the scientific literature and this one you could go to download this notebook. If you're using regular Jupiter notebook, you actually could open this up. Of course, you need the cyber GS infrastructure to run the notebook, but you will be able to look at the notebook if you have a Jupiter environment of your own. And this shows such digital artifacts, essentially mobilized by the advances of cyber GS, could be more broadly accessible as part of scientific literature and scientific knowledge. Beyond the scientific papers we have published, but also such a capability is accessible by another online scientific capability. It's called hydro share. Essentially, hydro share is a scientific community resource cyber infrastructure resource for hydrologist to share their data and models. And here hydro share is able to access CyberGIS-Jupyter within the hydro share, which is online environment for hydrologists to share data models, and we are becoming a resource in that environment for hydro share. Users to directly take advantage of CyberGIS-Jupyter, because mapping flood essentially is also a hydrological applications. And you could visualize in CyberGIS-Jupyter doing geo visualization as part of your cyber distributor exercises and notebooks, and you'll learn this throughout the course. In your exercises and labs, you will conduct geo visualization work, which is supposed to be pretty cool for you to do, because in the past, usually you would do this through desktop GIS. Now it's complete online. You're able to put together this and then even share this capability. Geovisualization, with your peers and with broader audience, because it's online will not go into details. But I want to give you a sense of what the sovereignty as Jupiter entails. In terms of its architecture. As I mentioned, it's accessed online, meaning you have a browser you're able to do your work through. CyberGIS-Jupyter and CyberGIS-Jupyter requires security control, because it's online environment and also has a cloud container capabilities. Depending upon the notebooks from different users, such notebooks will be containerized and then sent to cyber infrastructure for computation, and data management. So essentially, this goes through from the left to the right again, this is presented as part of the CyberGIS-Jupyter paper I was talking about. But I just want to give you this overview of the architecture so that you could demystify how it works. But you will have a lab to get familiar with this CyberGIS-Jupyter capability, and then you will do more work with it during our course. Another major spirit and frontier of cyber GIS and geospatial data science is a collaborative problem solving and support decision making simply because conventional GIS largely for single user applications. Now, as I was illustrating for cyber GS and the Geospatial big data, oftentimes we're solving big problems and complex problems. Requires teams, requires groups to interact with each other. And this example essentially tells that story meaning, for instance, to figure out optimal infrastructure for distributing and managing biomass bioenergy resources. For instance, California's consuming tons of bioenergy resources, but predominantly the Midwest is the production environment for the bioenergy, sources and crops. Now, how do you coordinate across the country to get the energy ready for the coastal areas at the same time store and processing the biomass in different parts of the country in the Midwest. So this, as you could consider, is very much a multi stakeholder problem and requires sophisticated competition intensive models, and tons of geospatial data who figure out your optimal solutions. And cyber GS is essentially represent the most viable pathway to such problem solving and decision making. And that's essentially a frontier of cyber GS. And it's somewhat reflected also in CyberGIS-Jupyter, as you could figure out your Jupiter notebook in Cyber GS, working as a team, for instance, and share your notebooks with others and validate your work across team members. So, in summary, for our introduction to this course, I have covered number of examples and applications, just as I did. They are computation and data intensive requires collaborations and examples on the right hand of this slide, including, for instance, flood, including, for instance, health and so on. And then there is a science behind the geospatial big data that we need to continue to advance to enable such applications and the sciences. And then such science is informing the innovation of cyber GIS as science and technology, which is built on top of advanced computing and the cyber infrastructure. So we see the importance of critical special thinking from the application side, driving this integration from the top to the bottom. And we clearly see the continuing technology transformation from the digital side is empowering this integration from bottom to top. So in this course, we're going to cover this four parts and focusing on the middle two pieces cyber GIS and Geospatial data science. And highlight the importance of integration and synthesis across this knowledge domains. And you could imagine this is a super exciting, because we have so many problems we're facing in today's world that requires such integration and synthesis and the geospatial data, science and cyber GIS rapidly changing and advancing. So this is the perfect time for us to get on top of the changes and to contribute to the future advances of this domains. So that concludes my introduction to the course, and I very much look forward to working with you throughout the course. And get you equipped with the new frontiers and the foundations of cyber GIS and geospatial data science, and hope that will help you to achieve your learning objectives, but also make your work more productive. And the contact information for both myself and Dr Armand Padmanabhan. The emails listed on this slide are the best way to get a hold of us. We would be happy always to respond back to you and very much encourage you to reach out to us. If you have any questions along the way in this course.