[MUSIC] Hello, again and welcome back. In this lesson, we're going to talk about how GIS data is structured and what that means for how you collect and process data. To understand data types, we're going to take this scene I drew and look at how it might translate into geospatial data. This is just a basic sketch and I'm no artist, but pretend for a moment that this is a real world scene of a river in blue with vegetation on both sides in green and patches of dirt with no grass or vegetation in the tan color. So now, the central question is how do we conceptually translate this location into geospatial data? Before dive in, I want you to think about data collection for a moment. In some way, data collection is always in an approximation of the real world. We can't capture everything about a location in our data, so we discard the information that's not important to our particular application of that data and collect the things that are meaningful to us whether it's how it looks or what it does. For example, think about the contacts manager application in your phone or in your Rolodex for some of you. The information you store about a person isn't a complete description of them. It's things like a phone number, an email address, maybe their physical address, maybe some other notes. As a profile of a person, it's an approximation. It's not sufficient to recreate a person by any means, but for certain uses it's exactly what we need. GIS data is the same. We simplify the world into data structures we can use in our work. There are many types of data, but we'll talk about four of them right now. Rasters and then the vector data types of points, lines and polygons. Remember that points are dimensionless, just a location in space and the size of them is purely based on how we choose to symbolize or display them on our map. Lines are one-dimensional in principle, but two-dimensional once you start collecting multiple segments to form things like roads and then polygons cover an area and are typically used to group locations that we're classifying as being of the same type. Overall, vector data is great for discrete observations. The main alternative to the vector data types is to use a raster, instead. Remember that these are grids like a chess board, where each cell or pixel can have a value. If we want to cover more area, we can't just add more features like we can with vector data. We need to add more rows or columns to the raster and indicate values even if it's a null or unknown value for everything in those rows and columns. These are the best bet for use cases where your data needs to vary continuously across the landscape. So thinking again about the scene we were just looking at. If we take it to be what reality looks like, how do we approximate it as GIS data? The first choice is what type of GIS features to use? Probably, the simplest way that we can turn this scene into the GIS data is just marking out points of features of interest. I can start by simply creating points where the dirt patches are located. This gives me a simple representation of the scene. We don't have the grass locations or the river, we could potentially turn those into points two, but that doesn't make as much sense as turning these dirt patches into points. Again, a reminder that this is an approximation of our data. What we create here isn't going to give us the exact dirt patches, but it shows us the locations of the patches and we can use data attributes, which will cover more in an upcoming lecture to help us characterize them in other ways that we can measure. For the river, I am more likely to want to turn that into a line instead of points. That makes a lot more sense for something like a river or roads. This line is still an approximation of the river's location. We don't get width information for this locations. But again, we can add attributes that give us specific information about the river that we can observe it an other way. The location itself is best represented by a line and what if I want to characterize the land cover in general? In this case, maybe I would want to use polygons. I can draw polygons representing the boundaries of these dirt patches and at that point, I have area information for them and can see how they relate to everything else a little better. I can also draw the boundaries of the vegetation areas, which happen to surround the dirt patches, but there will be holes in the vegetation polygons for them. And then if I want to, I can draw out the river as well. When I'm done, I can just draw letters in to visualize the attributes that I might put on these polygons. In a real world use case, I would most likely do polygons or points for the dirt patches. Polygons for the vegetation, depending upon the type of the vegetation and then a line for the river. What about if I want to use a raster format instead? In fact, some lane covered datasets do use rasters. To start with, let's draw out a bare bones grid, so we can visualize our raster. These lines don't actually exist in our dataset, but for us to image a raster, it helps to see them. Also raster cells, the squares here are usually of equal width and height, so ignore the not uniformity in my drawing here. Remember that a key component of a raster dataset is that every location within a cell has the same value, because each cell codes for just one value. This is easy for the first few items, which we can say, consist almost entirely of grass and I'll draw that in here. But what about for the cells that have mixed portions of dirt and grass and river? These mixed pixels can be assigned values by a couple of different rules. We can either assign by the majority value or by what value is in the center of the pixel. In this case, I'm going to approximate assigning values by majority and write in g for Grass, D for dirt, W for water based on what I think it looks like the value would be. When we're done, what we would have is cells with single colors representing their uniform values. Our raster would most likely be stored with integer values like one, two, three. Coding for these different land cover types, but here I've represented them with the letters instead. So again, let's think of the inaccuracies we introduced in our translation of our real world data into a GIS dataset. If the values of these raster cells are uniform, then we lose a ton of precision about the world here when we assign these mixed pixels to a single value. This is inherent to taking data. The best way to minimize it is by choosing the data format that makes the most sense for the place you are working with and the analysis you intend to do. We'll go through these concerns a little more in another course in the specialization, but it's something to keep in mind for now. That's it for this lesson. I hope you have a better understanding of the different GIS data types and when you would choose to use them. See you in the next lecture.