So we keep these really very simple. And then we get our data and
in here sometimes we'll have very raw data
because we're really focusing this database on handling the complex
management of the problems that you have while you're gathering the data.
So at some point you've got your raw data and you may have a separate step
that is a Python program that goes through and reads all the data in this database,
runs a Python program, and might even run another database.
And frankly, you could have more databases here, etc.
But some process that basically reads the raw data. And
then you might write another database.
Some of these will just actually go straight to analysis or
visualization in our earlier ones.
But in later, what we'll do is have this pretty data, this is the clean data.
This is the data that makes sense, right?
It's the clean data. And then we're going to write another.
So each one of these are Python program,
Python program, and now we're going to run maybe couple other Python programs.
This is going to read from the clean database and do some analysis and print us up some
data or it might read from the clean database and then try to visualize the results.
And so, these are separate steps and
each of these boxes is a separate Python program.
Now in a way, everything we've done up to this point has been
write one Python program to produce some result, right?
And we write a loop, and we read the stuff, and we make an array, and
then we print the array out.
But in this, because the problem is harder to solve and there's unreliability and
other external things, we will basically break it into multiple steps.
And we'll write a little Python program for each of these steps.
Now what we're working on is not exactly data mining. It is and it isn't.
I don't call this data mining,
because that would be overstating what we're doing.
There are many very complex data mining technologies and
that's not what we're going to cover in this course.
There are other places that you can learn about data mining and
I'd like to think that our course that we're doing here is a good preparation for
learning about data mining technology.
So there's open source things like Hadoop and Spark.
Amazon has a whole data mining operation called Redshift.
And there's many community source, and then dot, dot, dot, dot, dot, dot, dot,
dot, dot, dot.
And so, don't assume that this is all there is to data mining.
This is a particular style of data mining that I call "Personal Data Mining", right?
And it is not to say that once you're done with this you're a data mining expert,
because that would be a gross overstatement.
We're really more interested in this chapter on making
you better Python programmers
by solving some simple rudimentary data mining problems with Python programs and
then looking at those Python programs and becoming better Python programmers.
So the first thing that we're going to do is we're going to build on something that we
did in the last chapter, and that is talk to Google's Geocoding API.
And pull some data into a database and then
visualize something out of that database. And we're going to use the Google Maps API.
So you do need to be connected to the Internet when you do this.