As we have seen earlier, data visualization is an essential tool for data exploration,
and it can also be an important part of data analysis and presentation.
Sometimes, your analysis ends with data visualization.
Analysis that solely relies on data visualization can be very powerful.
Unlike many statistical models, it is directly built on the raw data
and does not require statistical assumptions.
Therefore, such analysis is considered model-free
and can be extremely useful when you have a large amount of data,
as in big data analytics.
I would like briefly discuss the principles of graphical design,
which is a very interesting but broad topic.
I mainly draw upon work from two leading expert on this topic:
Edward Tufte and Stephen Few.
According to Edward Tufte,
"Graphical excellence is that which gives to the viewer the greatest number of ideas
in the shortest time with the least ink in the smallest space."
Stephen Few states that, "Our primary visual design objective will be to present content
to readers in a manner that highlights what's important, arranges it for clarity,
and leads them through it in sequence that tells the story best."
Their work leads to several common principles for graphical design,
which are applicable even when we're only interested in designing some basic graphs.
First, erase as much "non-data-ink" as possible,
and ensure that remaining "non-data-ink" plays a supporting role.
In other words, the focus of data visualization should be the data,
not any "non-data" related stuff.
Second, it is valuable to think about how to organize the important data ink.
Last but not least, when creating data visualizations,
we should revise and explore different options and be willing to reiterate.
Let's take a look at some specific data visualization rules.
Few believe that we should maintain visual correspondence to quantity.
The graphs on the left and the right shows the same data.
Both represent the total number of passengers for Amtrak in a five-year period.
The graph on the left gives the impression of substantial growth in ridership,
whereas the graph on the right shows almost flat ridership.
Note that the y-axis on the first graph is now starting at zero.
As a result, the visual impression is quite misleading.
It is interesting to point out here
that the graph on the left is the default graph created by Excel.
We have mentioned that both pie charts
and a bar graph can be used to chart a categorical data.
Well, pie charts are widely used.
Few warns us to avoid pie charts.
His main point is that it is difficult to maintain collection
between a pie slice and the quantity that represents.
As a result, it forces us to interpret and compare pie slices,
which can be difficult for many people.
Interpreting and comparing the values of bars is relatively easy on a bar graph.
The issue is even worse when there are many categories.
Indeed, some of the worst graphics can find on the web are some creative pie charts.
For time series data, we should not use points;
we should use bars to emphasize individual values and lines to emphasize trend.
All three graphs here show the same data.
Note that they leave fairly different visual impression.
Which one do you prefer?
Both Tufte and Few warns us to avoid 3D and other gimmicks when constructing graphs
because they do not help our understanding of the data.
They violates the principle of minimizing "non-data-ink"
because it adds more ink without more data.
The design principles advocated by Edward Tufte and Stephen Few
are well-documented in their books.
In particular, I recommend "The Visual Display of Quantitative Information"
by Edward Tufte and "Show Me the Numbers" by Stephen Few.
While Excel can be an excellent tool for creating some basic graphs,
there are many specialized software tools for creating visualization from data.
Data visualization software witnessed rapid growth in the last few years,
and the new features are coming out all the time.
I point out here three leading data visualization software packages.
At this moment, the leader in data visualization software is Tableau.
Qlik and Spotfire are also very popular.
I encourage you to explore their tools
by going to their website or watch some introductory videos on YouTube.
You will be amazed at what you learn.