Hi, everyone. In this lecture,
we'll discuss the answers to the same question.
"Why is spatial special?"
From perspectives of spatial data characteristics and analytics.
I'll give you the answer from four different categories again.
The first law with geography,
which is about spatial autocorrelation issue.
Coordinates systems, issues on transformation between 3D and 2D space.
Uncertainty which is about the fact that spatial data is basically probabilistic.
Modifiable Areal Unit problem,
different aggregations would make
different outcomes in spatial analysis.
Dr. Tobler in 1970, stated the first law of geography,
"Everything is related to everything else,
but near things are more related than distance things."
This is a fundamental concept of
spatial autocorrelation applicable to all spatial variables.
The figure illustrates the concept of spatial autocorrelation.
Which one do you think looks more natural? Left or right?
Yes, the image on the right side looks more natural.
In reality, it's a digital elevation model which is
depicting a small portion of the surface of the earth.
The value of spatial autocorrelation is 0.8.
On the other hand, the left image is
2D random noise which has zero spatial autocorrelation
and it has no spatial dependence.
Spatial autocorrelation can be defined as the measure of
the degree to which one object is similar to other nearby objects.
So that positive spatial relationship or
spatial autocorrelation is assumed reasonable for spatial phenomena.
On the contrary, in most statistical analyses,
independence of data is generally assumed,
so the strong presence of spatial autocorrelation
makes spatial analysis complicated.
Let's take a look at the figure with two examples.
What would be your estimates of the empty cells of A and B?
How about the average value of surrounding eight cells?
If it is applied,
then A is estimated as 25 and B is 65.
It appears reasonable in terms of spatial context.
However, the original dataset of
two examples are identical as given under the figure.
Without consideration of a spatial dependence,
estimates of A and B should be the same.
Interdependence of spatial data
should be always considered in any statistical analysis of spatial data.
The next issue is coordinate system.
There are mainly two ways of representing locations with coordinates.
Geographic or 2D Cartesian.
Geographic is based on 3D, longitude lambda, latitude of phi,
and ellipsoidal height with a small h. On the other hand,
2D Cartesian is simply x and y.
When you apply any analytics to your spatial data,
you have to check the coordinates system in the very first step.
Think about the given problem.
If you have spatial data with geographic coordinates,
and the simple distance computation is not an easy task at all.
Moreover, there are thousands and
thousands different map projections and coordinates systems.
So clear understanding of coordinates system is
required for advanced spatial data management and analytics.
Another important aspect of spatial data is uncertainty.
Spatial data is basically measurement and intrinsically prone to error.
As a result, any analysis result of spatial data is stochastic,
in other words, probabilistic.
Let's see the example on the slide.
When you see $1,000 on your bank account,
it is literally $1,000, and in this course, deterministic.
On the other hand,
when you have a human trajectory collected by GPS unit,
the location X, Y inevitably has uncertainty.
As a result, when you conduct any further analysis with the trajectory,
the result is probabilistic.
So we have to understand the nature of a spatial data
and to get used to living with uncertainty in spatial analysis.
Let's take a look at another example.
Hydrological analysis came up with red region of flooding at
the precipitation rate of 1 inch/hr for 3 hours.
Two ways of presenting the analysis result,
either deterministic in the above or probabilistic in the below.
Probabilistic statement in the below is the way of living with uncertainty,
and is more realistic.
Now, let's move to the next issue.
Let's take a look at the figure with
the given dataset depending on the way of aggregation,
the results become quite different.
We call this problem as "Modifiable Areal Unit Problem",
which is often occurring to cartographic representation.
It can be described as
the same basic data yield different results when aggregated in different ways.
And it can happen with two cases,
scale effect and zoning effects as described on the slide.
One research found that correlations between
Republican voting and percentage of
old people could vary from -0.97 to +0.99,
depending on how county in Iowa were aggregated.
So completely the opposite results were produced due to the problem.
Definitely, scale and unit problems
should be carefully considered in spatial data management and analysis.