[MUSIC] Hello again and welcome back. In this lecture, we'll continue where we left off in the last lecture by discussing types of uncertainty that affect our analysis, and how we can consider them when we design our spatial studies. In this lecture, we'll cover uncertainty from our measuring devices, uncertainty from how we represent and store our data, and then uncertainty from how we analyze our data. The first type of uncertainty, uncertainty of measurement, is probably the easiest type of understanding intuitively. This is the area that is introduced because of limits of our sensing devices, or because of the conditions that we were collecting our data in. For example, some GPSs are only accurate to within about eight meters or so. Within that area, we can't say for certain that the coordinates we have are the exact coordinates, whether they're here or whether they're here. But we know we're close. For a collection of measurements that are further apart then the margin of error, this is usually fine. But when trying to analyze distances between points that are all within the margin of error of the sensing device, then we need to understand the limitations that this imposes on our analysis. We can also put error bounds on our analysis and say things like Mount Everest is 8,850 meters tall, plus or minus about five meters. We don't actually know the exact height, it could be 8,855, or it could be 8,845, or anywhere in between. Again, it may not matter because our analysis might not rely on that high level of precision, but there are many situations you might need to take it into account. Additionally, it's important to remember that the earth is changing and it's a dynamic system, and this can effect the measurements we've previously taken and how we work with them with measurements we will take in the future. Back in the office, if we were creating data through the process of digitizing, as we did in this course, we need to understand how we can introduce error into our data in the process. We discussed this a bit when we learned about digitizing, but know that sometimes you may not align correctly with the original data that you're digitizing. You'll be off by a little bit to one side, or you may end it too soon, or a little too late. And so we're inserting tiny bits of error into our data. This is where it's important to know what scale data was digitized at so you can know the limits of the analysis. At a more refined scale, you're going to see those errors start to creep in. This uncertainty also extends to different lineages of data. For a while in California there were multiple competing data sets for river information. If I wanted to analyze data built for one of those with data built on the other data set, I would need to take some sort of corrective measure that takes into account that these data were generated from different sources. The next major source of uncertainty is uncertainty in how we represent our data. The classic example of this is, a mixed raster pixel where the underlying features don't necessarily align perfectly with the boundaries of the raster pixel. This is similar to the regionalization problems we discussed in the previous lecture, but slightly different. Our choice of data type and parameters, necessarily generalizes our data here. Where the real world is highly detailed, a raster contains just the one value. Our method for choosing the value at the raster contains based on the real world can significantly impact our data analysis because entire classifications of information can disappear based on the choice. Do we classify our data based upon which one is most dominant within the raster cell? Or do we choose which one is in the center of the raster cell? Or do we use some other criteria? A similar problem occurs if we try to aggregate information in point data In the polygon data. Some polygons may only contain a handful of points while other have detailed information in many points. These polygons with few data samples maybe biased relative to the other polygons. If we're aggregating to polygons that weren't drawn based upon some shared theme at the points, then maybe the polygons boundaries are biasing our aggregation. That is, the shape of the polygon features were chosen based on some other criteria than what we're choosing to aggregate on. Similarly, maybe some polygons only have points in one corner, meaning that the polygon is biased by missing information from the rest of the locations within the polygon. Imagine we have points representing crime incidents in a city. We could potentially aggregate statistics from these incidents to police districts in the city. But we might see different patterns doing that than if we were to aggregate to electoral districts or utilities districts, etc. Each of these could be informative, but when boundaries don't mean anything to the original data, and sometimes even when they do, you might get artifacts in your result. You will have to assess whether the trends you see are real, or the result of a poorly chosen, and poorly aligned aggregation. A more general case of this is the Modifiable Areal Unit Problem, or the MAUP, which basically says that when we're creating analysis zones or polygons, the number, size and shape of these zones can dramatically affect the analysis. If you were to double the size of your analysis zones for public health analysis, for example, you might get very different results than with your original smaller zones. Choosing between them isn't simple, and there may not be an objective set of criteria to make the decision of what size and shape your zone should be. One final consideration is something we call the ecological fallacy. The ecological fallacy is a logical fallacy that deals with whether a characteristic of a zone or polygon is actually a characteristic of the locations or individuals within that zone. While we treat the data as if that's the case, we know from experience that it's often not. If our data has a median income level for a polygon, it's not that everyone within that zone makes that amount of money. There could be people making substantially less or substantially more. Similarly, if we make inferences based on this data, we need to understand that it may not, and likely will not, apply precisely to the individuals in the group. When we're constructing our data, it's important that we understand the variability like this that we discard in the pursuit of data that meets our needs. Okay, that's it for this lecture. In this lecture, we discussed sources of error for our measurement tools as well as from how we represent the data. We also discussed the modifiable areal unit problem and how it can distort our results, and ecological fallacy and how we can incorrectly apply valid results to an invalid situation by applying inferences from a group or area to an individual. Up next, we're going to talk all about topology. See you next time.