Some time ago, I discussed various ways to produce a pie chart in gnuplot. I have looked at the search terms that bring people to my blog, and it seems that this topic is one of the more popular, so I thought that it might be worthwhile to explore the issue more.
Since I am a Linux fan, I am quite fine with calling various scripts from gnuplot: pipes are rather convenient, and I can write a small script which does the data processing. But I see the downside of it, too: these solutions (while you can make it work under windows) will require extra steps, if not run under linux. I also understand that you might not want to delve into the nuances of gawk, for instance. So, I was wondering whether we could do everything in gnuplot, without relying on something external. And the answer is, yes, we can! (Otherwise, I wouldn't be writing this post at all...) What we are going to do is probably the dirtiest of hacks, for we will use a function of gnuplot, which was never intended to be used in such a ramshackle way. But we want to produce graphs, and this is not a coding beauty contest, after all!
If you recall, we had a data file with two columns: one designating years, the other one containing the values belonging to those particular years. Something like this, which we will call pie.dat:
1989 0.1
1990 0.2
1991 0.2
1992 0.05
1993 0.15
1994 0.3
You might also recall that the way in which we produced the pie chart was to plot arcs of a circle, the parameters determined by the second column. Now, the problem was (and this is why we had to use an foreign script) that those parameters cannot be set at run-time, so to speak, we had to hard-wire them into the gnuplot script. So, the question is really how we can access individual values of a data file, say, the 5th number in the 2nd column, and then do this repeatedly. The snag is that gnuplot hasn't a dedicated function to perform this task, so we have to look for a function that is dedicated to something else. Since there are not too many gnuplot functions that operate on a file, we can easily find the one that "returns" a value. We will use the fit function, and use the fitting parameter as a means of returning the sought-after value from the file. I know, I know! This is the ultimate abuse of 'fit', and we shouldn't do this at all! But what the heck! You haven't got to tell anyone how you produced that bloody figure!
Now, we have to find a proper fitting function. Remember, we want to pick the value of a particular number in the second column, say. Well, being a physicist, I would say that the Dirac-delta will do the job. However, we have to soften our stance a bit, and use something that is gnuplot-friendlier. For better or worse, I will choose the following function
f(x,a)=(x>a-0.5?(x<a+0.5?b:0):0)
You can recognise our old friend, the ternary operator, making its appearance twice in the definition of f(x,a). So, we first check, if the value 'x' is larger, than a-0.5. If so, we check whether it is smaller, than a+0.5. If this condition is fulfilled, we assign the value 'b', otherwise 0. By plotting it, you can see it for yourself that this function is a rectangle of height 'b' and width 1, centred on 'a', and 0 everywhere else. 'b' is going to be our fit parameter. Now, if you fit this function as
fit [0:7] f(x,2) 'pie.dat' u 0:2 via b
print b
then 0.2 will be printed. But that is the value of the 3rd number in the second column of pie.dat! (The rows are numbered starting with 0, that is why 2 means the 3rd row.) Since the value 'b' can be used in any subsequent expressions, definitions etc. as a number, we have found a way to extract any single number from any data file.
In order to proceed, we have to find a method to step through the rows of a column, one by one. For this purpose, we will use the reread command of gnuplot. You can learn the basic idea by issuing ?reread and ?if. 'reread' just repeatedly reads the file invoked by the last load command, and we can use 'if' to set some criterion as to how many times this repeated loading should take place. This is a primitive 'for' cycle, but it will do. (In gnuplot 4.3, the option 'for' was introduced in the 'plot' command, but that would not do too much good here, for we still have got to figure out the actual numbers.)
With these in mind, we write the following two scripts, the first of which named pie.gnu
reset
os=1.3
FIT_LIMIT=1e-8
L=6.0
f(x,a)=(x>a-0.5?(x<a+0.5?b:0):0)
r(x)=abs(2*x-0.5); g(x)=sin(x*pi); b(x)=cos(x*pi/2.0)
set view 30, 20
set parametric
set isosample 2, 2
unset border
unset tics
unset key
set ticslevel 0
unset colorbox
set urange [0:1]
set vrange [0:1]
set xrange [-2:2]
set yrange [-2:2]
set zrange [0:3]
A=0.0; D=0.0
set multiplot
# First, we draw the 'box' around the plotting volume
set palette model RGB functions 0.9, 0.9,0.95
splot -2+4*u, -2+4*v, 0 w pm3d
set palette model RGB functions 0.8, 0.8, 0.85
splot cos(u*2*pi)*v, sin(u*2*pi)*v, 0 w pm3d
call 'pie_r.gnu'
unset multiplot
while the second one named 'pie_r.gnu'
unset parametric
b=0.3
set yrange [*:*]
fit [0:L] f(x,D) 'pie.dat' u 0:2 via b
B=b
fit [0:L] f(x,D) 'pie.dat' u 0:1 via b
D=D+1.0
set palette model RGB functions r(D/L), g(D/L), b(D/L)
set parametric
set yrange [-2:2]
set urange [A:A+B]
set label 1 "%g", b at os*cos(2*pi*(A+B/2.0)), os*sin(2*pi*(A+B/2.0)), 0.2 cent
splot cos(u*2*pi)*v, sin(u*2*pi)*v, 0.2 w pm3d
A=A+B
if(D<L) reread
Now, let us see what is happening here. In the first script, 'os' will be used to place the labels later on. The gnuplot variable FIT_LIMIT is set to that value, so that the fit values are more accurate. It might be necessary to change it, if your labels are not what you expect. For the arcs, it should not really matter, because the default value is going to be accurate enough for any practical plots. 'L' is the number of data that we have (note that data are numbered starting with 0). Next, we define 4 functions. The first one is our fit function, the other three are used to colour the arcs. You can change these, if you are not satisfied with the colour scheme that you get. Any three functions will do, which are defined on [0:1], and return with a value in [0:1]. You can read about this in the post in which I discussed phonged surfaces, sometime in late May.
The next couple of lines set up the various ranges of our plot. I wrote about this in my first pie chart post, and the lines giving the background of the plot should also be familiar from that post. 'A' and 'D' are our control variables that we manipulate in 'pie_r.gnu'. We then draw the shadow of the pie, and finally, call 'pie_r.gnu'.
Let us take a look at 'pie_r.gnu'. First, we extract the value in the second column, and then in the first one. We will use this latter one to produce the label. Note that we have got to unset the parametric plot, otherwise, the fitting function would not work. Also note that we re-set the yrange. This is necessary, because the actual plot is in the [-2:2] range, while the fit is on values around 1990. Then we increment the value 'D' (this is the ordinal number of the row that we are currently processing), and re-set our palette, using the three functions, r(x), g(x), and b(x) that we defined in 'pie.gnu'. Then, using the extracted values, we set the range of the parametric plot, and define the label, and plot the arc. Finally, we check, if we have called the script enough times. Loading 'pie.gnu' will produce the following graph:
A couple of comments about this script: as I already mentioned, you can change the colour scheme by using various functions. However, if you are really lazy, you can simply generate three random numbers by
r=rand(0); g=rand(0); b=rand(0)
and assign these values to the next palette.
Second, you can easily implement an "explosion", by shifting one or several of the arcs by shift*cos(2*pi*(A+B/2)) and shift*sin(2*pi*(A+B/2)), based on some condition that you set. It should not be hard, either, to plot a real 3D pie char based on the scheme that I outlined about two months ago.
Third, if the sum of your numbers is not normalised to one, you can easily fix it by adding an extra loop to your script: if you have a script called 'pie_r2.dat', containing
unset parametric
b=0.3
set yrange [*:*]
fit [0:L] f(x,D) 'pie.dat' u 0:2 via b
D=D+1.0
G=G+b
if(D<L) reread
and call this in 'pie.gnu' immediately before 'pie_r.gnu', then 'G' will just be the sum of all numbers in column 2, and you can use this to normalise the numbers when you call 'pie_r.gnu'.
And last, the only thing you need in advance is the number of records you want to process, 'L'. This is the only thing you have to set by hand, all the rest is automatic. If you want to learn how to avoid this small "difficulty", you should read the post on the 2nd of August.
I hope that this script can convince people that we haven't got to rely on any outside tool, and can still produce a decent pie chart in gnuplot.I also feel that this approach is, in some sense, much cleaner, than the one that was based on gawk. Some people will probably contest this statement, but doing everything in gnuplot ensures that we retain the platform independence.