Data visualization is an important step in data processing It helps us more vividly observe data Matplotlib is an important plotting library for Python mostly used for two-dimensional plotting Matplotlib has convenient plotting modules able to plot high-quality and diversified plots The manifesto of Matplotlib is: simple and common tasks should be simple to perform provide options for more complex tasks You'll find that it is really so after learning it By integrating the relevant capabilities of Matplotlib pandas may realize some plotting capabilities based on Series and DataFrame For those two types of data it's often more convenient to plot with "pandas" Data visualization is an essential part of data exploration analysis, and mining Matplotlib is the basis of plotting for Python and "pandas" also allows convenient plotting based on Series and DataFrame Let's look at Mataplotlib first In Matplotlib, we mainly use the module of "pyplot" for convenient and quick plotting Of the two, the "pyplot" module provides a set of plotting API, it is similar to MATLAB Many complex structures containing a lot of plotting objects are hidden in this set of API's In actual use you only need to call some functions in it for plotting Look at how to use the pyplot module first to plot some basic plots Previously, when we introduced Matplotlib we found on its official website many legends and corresponding source codes You may learn them as well later First, look at this line chart The line chart can be regarded as a very basic graph Observe the coordinate values of the x-axis and the y-axis of these key points in this graph first and guess how it is plotted Look at its code We first need to import this module As a common practice, we abbreviate it into "plt" The work mode of pyplot module is similar to MATLAB It uses some functions in the style of command line to make various changes to plots like, to create a plot, create an area of plot or draw a line Here, as we see, in this module the most fundamental plotting function is plot() The plot() function has two basic arguments: x and y representing the data of the x-axis and the y-axis, respectively Here, we only have one list Do they represent the data of the x-axis or the y-axis? Observe the plot As we can see on the plot, what it represents must be the data of the y-axis You might already know the default x-axis data are like this, aren't they from 0 to len(obj)-1 Let's run this program The line chart has been successfully plotted It's worth mentioning that, in Spyder the plot() function allows direct plotting and display of graphs A standard plotting needs an additional statement: plt.show() and this statement is also needed under many Python environments We may save this plot or directly copy this plot Besides, we many directly save this plot through this statement in the program Add the save_path into the parentheses and the default storage format is png We may also save it into a jpg format This is the plot we saved just now with the savefig() function which is the same as what we saw at the console window Look at another example We use the "arange()" function in NumPy to generate a group of data and plot a line chart for those data and some expressions composed of them this group, this group, and this group for plotting a line chart Does this work? Yes, it does as the plot() function supports plotting from one group of data as well as from several groups of data The plotted result is like this Three curves are in the same plot Apart from line charts we may also plot some other plots Look at this plot There're a lot of dots What is the type of this plot It's known as a scatter plot How can we draw a scatter plot Quite easy. Let's see Just use the scatter() function, instead of plot() We may even add into the plot() function an argument 'o' What if we wanna plot such a bar chart Just use the bar() function, instead of plot() Apart from the above-mentioned plot forms there are many other different plot forms like histograms and pie charts In actual use we should choose appropriate forms of plot based on the characteristics of data we wanna describe For example, line charts are suitable for expressing a dataset with the regularity of continuous changes Well, for comparing several different objects at the same graduation bar charts are appropriate And pie charts are suitable for expressing proportions Have a try Change "plot" into "scatter" A scatter plot is generated in this way Then, change it into "bar" A bar chart is generated in this way Like Excel and MATLAB Matplotlib has a set of default settings allowing attribute customization You can almost control all default attributes in Matplotlib like the image size dots per inch, line width, color, style subplot, coordinate axis, and grid attribute Let's look at the two plots below, for example The first one is with a green dotted line and the other is with red diamonds Think about it What is the default color and style of plot before Is it blue Is the style of line a solid line Have a guess How has this plot changed into such a form Here's a clue As we just mentioned, the addition of an 'o' into the argument of plot() function may draw a scatter plot In fact, each argument pair (x,y) of plot() function may be followed by an optional third argument which is in the format of string to indicate the color and type of line of plot Then, as for a green dotted line what is the simplest form you can thing of g-- Can it be even simpler? Seems impossible Bingo It's really like this Let's look at the next one a shape of red diamond Sure, we can't draw a diamond inside Its form is like this, "rD". Quite vivid, right All these symbols are borrowed from MATLAB Here, a lot of marks and symbols are listed such as 'o' 'D' '*' and some color symbols as well as some line styles Let's demonstrate the program to see its actual effect We still use the line chart to be the baseline say, to draw a red dotted line Is it like this Draw one more line like green asterisks You can try other forms by yourselves Where can we find those attributes Apart from inquiries on its official website the most direct way is undoubtedly through the help() function For example, let's briefly view the help information of the plot() function Are there a lot of contents All the attributes we listed in the list can be found in the help information Just now we directly used strings to express attributes of plots Let's look at the writing of expressing with arguments I suppose you can guess their meanings For the function figure(), say there's "figsize=(8,6)" It must be the size of the figure "dpi" means the resolution of the figure Next, as we see, in the plot() function there's an attribute: color Previously, we used a single character to represent the color like r or g as well as line style, line width The actual effect is like this How about the final function? Can you guess its meaning: legend() It has an argument: loc, that is, upper left Yes, the upper left You might have guessed it "loc" is the abbreviation of "location" meaning "to put the legend to the upper left" "loc" may has different values for representing locations of legend The text of legend is specified by the "label" argument "loc" has an argument value of boss type: best to look for the best location of legend In this figure, for example, the best location is the upper left instead of the lower left where it may overlap the figure Amazing, isn't it Moreover, many other attributes are also settable like, very important words What can be added to the plot Common one include titles, x-axis labels, y-axis labels which are generated with title() xlabel() and ylabel(), respectively The meanings of plots or figures may be clearer with these words As we saw above, in some plots, several curves are put into the same plot We may wonder whether we can draw plots in different areas Yes, we can In Matplotlib, in a plot, it's possible to plot in several areas of a plot, i.e., subplots The functions of subplot() or subplots() allow plotting of subplots in different areas The axes() function also allows subplotting yet in a different way The areas of subplots it determines may overlap First, let's look at the way in which areas are totally separated Look at the forms of the three plots The first one contains two rows in one column the second one, one row in two columns the third one, two rows in two columns Let's guess the meanings of the arguments of subplot() function You may have guessed it The first and second arguments represent the quantity of row and column, respectively The third argument should also be easy to guess It's the serial number of the area For example, this one means a subplot of one row in two columns The first one is numbered 1 The second one is numbered 2 represents the two areas, respectively Well, if no subplot is set what is its default form Should it be subplot(1,1,1), or as subplot(111) Sure, this statement is often omitted Look at an example In different areas of a plot, compare and plot curves of sin and cos functions in the interval of [-π, π] like this We may use two such subplot() functions to define subplot areas of two rows in one column and then just put data into them Here, with the linspace() function in the numpy module we uniformly generate a dataset within the interval We may observe the y values in two plots when the x values are the same The subplots() function in Matplotlib can also achieve similar effect Let's look at the detailed writing of the program First, use the subplots() function to specify that the subplot has 2 rows in 1 column The first returned value of the function is the plot object itself The second returned value is the subplots Here, there're two subplots We may name them, say, as "ax0" and "ax1" and subsequent plotting may be directly realized with the plot() method of subplot object also allowing more flexible setting of subplot title In addition, if subplot titles are set we'd better set the distance between two plots so that no overlapping of plot or word may occur The argument "hspace" indicates the distance in the vertical direction The horizontal distance is specified by the argument "wspace" The functions of plt.subplot() and plt.subplots() are also frequently used You need to understand and master them and the latter is more flexible Next, let's look at the axes() function This function has four arguments left, bottom, width, and height "left" indicates its distance from the left boundary "bottom" indicates its distance from the bottom line and the next two arguments are the width and height of the plot The ranges of these arguments are all (0,1) Think why You may have guessed it This distance is not physical distance but the percentage distance relative to the coordinate axis 0.8 is to calculate based on 80% of the total distance For example, let's look at the second curve Its subarea range is like this: the distance, width, and height are all between 0-1; the result is like this Be more careful when the subplot area is determined with the axes() function than with the functions of subplot() and subplots() It's necessary to estimate the range of argument to determine the display ranges of different subplots Well, just now, we introduced how to set some attributes for some drawn plots in Matplotlib As we see the use of many attributes may make this plot clearer and enrich it enabling plot viewers to better understand your intention So, when drawing plots we should add all necessary attributes Then, look at plotting with "pandas" By integrating the relevant capabilities of Matplotlib pandas may realize some plotting capabilities based on Series and DataFrame For those two types of data it is more convenient sometimes to plot with pandas than with Matplotlib For example, we plot with some previously-acquired data of quotesdf on American Express Company First, we select data Here, say, we select the closing prices "close" from the first 10 records and then directly call the plot() method of Series to plot It's necessary to explain the way of selection here We used the loc method of DataFrame object which is different from iloc "iloc" is based on the location of data for selecting data areas while "loc" is based on data labels for selecting data areas For example, [:9, 'close'] here means that the row labels range from 0 to 9 9 included The column label is the data area of "close" To select continuous row labels or column labels of areas, separate them with colons To separate multiple discontinuous row labels or column labels, separate label names with commas and put them into square brackets otherwise, there would be ambiguity as the separator between row and column is also comma The plotted result is like this In this way, the "index" of Series may be directly used as the x-axis data and "values", as the y-axis data, for plotting If the same is plotted with the plot() function in the pyplot module it's needed to indicate the "index" a little more troublesome Go on What if we wanna plot the closing prices "close" and the opening prices "open" into two curves Can we also select the corresponding data first The acquired result is a DataFrame and then, we may also directly call the plot() method of DataFrame to plot. The result is like this In this way, the "index" of DataFrame may be directly used as the x-axis data and each group of "values", used as the y-axis data of each curve Very convenient, right Think again here The "index" of the two curves just happens to be data like 0, 1, 2 Think about it If the "index" of DataFrame is date can we better feel the convenience of this way of plotting Of course, during plotting with pandas, the writing of argument of plotting function may also be similar to the pyplot module Look at this example As for the DataFrame of djidf we acquired above if we're to plot the x-axis as the stock code "code" and the y-axis as the latest trading price "price" we may assign them to the x and y arguments of plotting function The "kind" argument indicates the type of plot and "bar" indicates a bar chart "scatter" indicates a scatter plot and "pie", a pie chart, among others It's worth noticing that to set the attribute of plot, such as the title we may use the set() method of plotting object or directly use the title() function etc in the pyplot module to set them The two may be flexibly combined for use In this part, we introduced the methods of utilizing the pyplot module and the pandas module in Matplotlib for plotting They are the fundamentals of plotting in Python and frequently used Sure, there're many other easy-to-use Python plotting libraries such as the Matplotlib-based plotting library: Seaborn which may more conveniently and rapidly plot attractive graphs with diversified capabilities You may choose them when there are actual needs, especially a lot of plotting