Monday, 31 August 2009

Filled curves in gnuplot

Gnuplot has a plotting option called filledcurve. I do not want to dwell on the specifics of this, because I think that most about it can be learned by looking at the demo page. What I would like to discuss instead is how this can be used to turn our graph into a sleek figure. This is not something that you would put in a paper publication, on the other hand, on a poster, a web page or similar, it looks much better, than what you would get using the default. I just show the image, and we will discuss the script later.




reset
f(x) = x+20
g(x) = x*x*x-2.0*x*x+x+10-x*x*x*x*0.01
set sample 101
set parametric
set trange [0:4]
set table 'filled.tab'
plot g(t)+5.0*(0.5-rand(0)), f(t)+5*(0.5-rand(0))
unset table
unset parametric
set iso 2, 100
set xrange [0:100]; set yrange [0:50]
set table 'filledbg.tab'
splot y*0.5
unset table

set key left
unset colorbox
set border lc rgb "white"
set grid front lc rgb "#dddddd"
set xlabel 'Time [a.u.]' tc rgb "white"
set ylabel 'Amplitude [a.u.]' tc rgb "white"
set palette defined (0 0.2 0.2 0.2, 1 0.6 0.6 0.6)
set object 1 rect from screen 0, 0 to screen 1, 1 behind fc rgb "black" fillstyle solid 1.0
p 'filledbg.tab' w image t '', \
'filled.tab' u 0:1:2 w filledcurve above lt 1 lc rgb "#467F1E" t '', \
'' u 0:1:2 w filledcurve below lt 1 lc rgb "#802020" t '', \
'' u 0:1 w l lt 8 lw 1 t 'First', '' u 0:2 w l lt 9 lw 1 t 'Second'


The first part of the code is nothing but generating some dummy data. If you have something to plot, this is unnecessary, and the actual plot takes only 12 lines altogether. However, if you want to have a gradient in the background, you will have to generate 'filledbg.tab'. This is the file that we will plot as an image, and this is what will give the gradient. I would also point out here that in order to generate 'filledbg.tab', we need the range of the actual plot. If you don't know it beforehand, you can use a trick to determine this. I have discussed this at great length in a few of my previous posts, so you could look it up there.

Now, sometimes you do need a background for the plot, and this is not just some silly whim, but you might have to match the background of your poster, or your transparency. In order to achieve this, we will draw a rectangle behind the graph, and fill it with the desired colour. What we should keep in mind, however, is that without taking extra measures, the axis labels will have the default colour of black, and this will not look very good on a black background. Therefore, you should specify the colour in the declaration of the labels.
In the plot, first we draw the gradient. Without defining the grid in the front, the grid would be obscured, therefore, we push it to the forefront. The real curves are drawn four times: first we draw the filled curves, simply using three columns of our data file. (One column can be dropped, in which case an axis or a function could be used to define the boundaries.) If the two curves cross each other, we could have different colours, depending on the relative size of the functions that we plot. This is why we have to plot our curves for the second time: we have different colour for when the first curve is above and below the second one. The last two plot are just there to draw the actual curves in different colours.

Sunday, 30 August 2009

The plot thickens - shadow to a curve

This is Sunday afternoon, so I will discuss an easy plot, namely, how we can give the impression that a 2D plot is in 3D, simply by adding a slight shadow to a curve. And the plot thickens, because the curve itself is thick.:) Here is the figure that we will have



and here is the script that produced the image

reset
h(x) = x*x*x-5*x*x+x+20
a = 0.05
b-0.5
unset colorbox

set iso 2, 2
set table 'thickbg.dat'
sp [0:5] [0:30] y
unset table

set sample 20
set table 'thick.dat'
plot [0:5] h(x)+5*(0.5-rand(0))
unset table

set xlabel '{/Helvetica=14 Time}
set ylabel '{/Helvetica=14 Price}'
set tics font "Helvetica, 14" nomirror
set grid front
unset grid
set palette defined (0 0.95 0.95 0.95, 1 0.95 0.95 0.95)
plot [0:5] [0:30] 'thickbg.dat' w ima t '',\
'thick.dat' u ($1+a):($2+b) w l lc rgb "#eeeeee" lw 19 t '',\
'' u ($1+a):($2+b) w l lc rgb "#dddddd" lw 17 t '',\
'' u ($1+a):($2+b) w l lc rgb "#cccccc" lw 15 t '',\
'' u ($1+a):($2+b) w l lc rgb "#bbbbbb" lw 13 t '',\
'' u ($1+a):($2+b) w l lc rgb "#aaaaaa" lw 11 t '',\
'' u ($1+a):($2+b) w l lc rgb "#999999" lw 9 t '',\
'' u 1:2 w l lc rgb "#ff0000" lw 10 t  ''

The whole (and very simple) idea is that we will add a replica of the curve that we want to plot to the figure, in some shade of gray, and shifted a bit, so as to give the impression that this curve is the shadow of the other one.

First, we define three functions. h(x) will generate some data (if you have a data file to plot, then obviously, you can skip this, and the set table... plot h(x) ... unset table section), and 'a' and 'b' define the shift in the horizontal and vertical directions, respectively. We also add a background to the figure, in gray. In order to do so, we splot 'y' (or anything else, for that matter) to a file, and in our real plot, we plot this as an image, with the defined colour palette.

There is some subtlety to the lines
set grid front
unset grid

At first, this appears silly to set something and then unset it immediately afterwards, but there is a very good reason for that. Odd as it might sound, the placement of the tics is defined by set grid, i.e., if we want visible tics on the background, we have to push them to the front, otherwise, the background of our figure would cover them. We do this by setting grid front. However, we do not actually want any grid lines, so we unset them. The effect on the tics remains, however.
The last step before plotting our curve is to plot the shadow. We do it in 6 steps, and in each step we reduce line width and the whiteness of the colour. If you look at the colour definitions, consecutive lines are in darker shades of gray. The reason for this is that in this way the edge of the shadow will not be sharp. However, if you are satisfied with a sharp shadow, you can skip 5 out of these 6 steps, and draw the shadow only once. In fact, this does not look that bad at all, as you can see in the figure below.




Well, this is all for today.

Friday, 21 August 2009

Changing the aspect ratio of axes

A couple of days ago, I saw a question on the gnuplot discussion board about how to change the ratio of the three axes in a 3D plot independently. In 2D, this is not a problem, for one can use the command
set size xsize, ysize

where xsize and ysize are relative to the canvas size. If, e.g., you type
set size 1, 0.5
then the figure will be just as wide as it was originally, but the height will be reduced by a factor of two. Now, the problem is that in 3D the above-mentioned instruction will lead to a figure whose height (i.e., the z coordinate axis) is reduced by a factor of two, but the width is unchanged. However, the width is determined by x and y, therefore, the x-y axes will be linked in this way. This results in the pitiable situation that we can't have an x axis, which is twice as long as the y axis. There's got to be a way around this problem! Sure enough, there is, and this is what we will set out to solve now. This will require some hand-work, but it is fairly simple otherwise.

The original problem on the discussion board was to set the xrange as [-6:6], the yrange as [-3:3], and the zrange as [-2:2], while keeping the units on each axis equal, i.e., have aspect ratios 6:3:2. What we can do is the following: we can easily change the z:x ratio, simply setting the size. Then, instead of plotting over the whole y range, we will only plot over one part of it, and yes, you've guessed correctly, we will use the ternary operator for that. The only trick is that we specify a yrange which is actually larger than the one that we want to plot. So, here is our script, and we will discuss it below
reset
set iso 100,100
set xrange [-6:6]
set yrange [-3:9] # We should have [-3:3], but do not rush!
set zrange [-2:2]
set ticslevel 0
unset key
unset colorbox
set xtics nomirror
set ytics -3,2,3
set ztics -2,2,2
set xtics -6,2,6
set border 1+16

set xlabel 'x-label'
set zlabel 'z-label' offset graph 0.05,0,-0.1
set ylabel 'y-label' offset graph -0.05,-0.25,0
f(x) = (x<3?x:1/0)
set size 1,0.7
set arrow 1 from -6,-3,-2 to -6,3,-2 nohead
set arrow 2 from 6,-3,-2 to 6,3,-2 nohead
set arrow 3 from -6,3,-2 to 6,3,-2 nohead

set palette rgbformulae 33,13,10
splot sin(x)*cos(f(y)) with pm3d

In the beginning, as usual, we set up the figure. The only thing to watch out is the unusual yrange: we specify more than we actually will use. On the border, we draw only the x, and z axes, but not the x2, y, and y2 axes. We will do that by hand, where we draw 3 headless arrows. Before we plot anything, we define a function which we will use to restrict the plotted yrange to [-3:3]. In the plotting function we pass not y, but f(y), so that nothing will be plotted beyond y=3. Having done the heavy work, we plot the function, and get the following image



Now, a couple of comments are in order here. One is that, since we set the y axis by hand, we have got to set the ylabel by hand, too. This is why we specified the offsets for the labels. (For the zlabel it wouldn't have been necessary, it just made the figure look a bit tidier.) The second remark is that it really depends on the view what looks the same on the 3 axes: that is why there is no generally applicable number can be given in the set size command. You have to try and determine it on a case-by-case basis.

Sunday, 9 August 2009

Plotting the recession - some basic data analysis with gnuplot

Today, we will see how we can do some very rudimentary data analysis, using gnuplot. Of course, if you want to analyse your data, then you can either use a dedicated script, written in a far better fit language, e.g., in Perl or in gawk, or you can do the whole thing in a tabular environment, using Excel or Origin, or Grace, or Labplot, or I do not know what. And I have also read somewhere that gnuplot conforms with the Unix philosophy, "one task - one tool", and we should not do any data processing with it. Now, there are at least two problems that I see with this approach: one is that some data analysis are really simple, and it is not clear, why one should use a separate tool for that. Furthermore, it will always cause some trouble (at least, in my experience) when the data analysis is not done from gnuplot, for one can never be sure how and on which version of the analysed data file the figure was based. And pipes can only be used on Linux/Unix-type systems, thus, the platform independence of gnuplot is lost. But more importantly, and if we are at the "philosophy" anyway, it is completely unclear (at least, to me) what constitutes plotting: just placing the curves, points and dots on an empty canvas, and writing a couple of labels for the axes, or this and the choice of the curves, points and dots. The problem is that one can in most cases decide what to do with erroneous points and the like only when it is already plotted and shown. So, if that is the case, why should we sequester data analysis from the plotting? Are they not the two complementing parts of the very same procedure?

So, these are my reasons for doing data processing in gnuplot. Of course, in most cases, I do that using an external script, and piping it from the gnuplot prompt. But that would deprive windows users from the pleasures of doing something fancier:)

By the way, if you look at some of the demo scripts of gnuplot 4.3, you will realise that the developers of gnuplot have already parted with the idea of "one task - one tool".

The kind of data processing we will do today requires a "recursive" access to data points, in particular, I will demonstrate how we can calculate the derivative and the integral of a series of data points. Calculating the derivative and integral of an analytic function is trivial, even if you do not have the analytical answer: for the derivative, you can simply plot f(x+dx)-f(x), where dx is just some pre-defined number, and for the integral you can apply a recursive scheme, detailed on the gnuplot demo web site. But they work on known functions, and we would like to calculate something based on "measured" data. For this, I will take the weekly oil price from this year, which you can download from the link. Let us suppose that we plot the oil price, but we also want to highlight the time intervals where the price is dropping. This requires us to calculate the derivative at each point (In this particular case, we could get away just by comparing the value and the next, but we have got to get access to the values anyway, and we would get it by fitting a linear function, whose linear coefficient gives the derivative. So, for better of worse, I will just stick with the derivative.)

For a starter, let us see our two scripts! As I have already pointed out, we will have to walk through the values, one by one, so we will need a 'for' loop, which means two scripts: a "master", and one that we call repeatedly. I have discussed this method at great length in my recent posts.
reset
unset key
A=0; sa=1; sb=0; integ=0.0
f(x) = a*x+b
g(x) = (x>0?GPVAL_Y_MIN:GPVAL_Y_MAX)
set xlabel 'Time [weeks]'
set ylabel 'Crude oil price [$]'
p 'oil.dat' u 0:8

xmax = GPVAL_DATA_X_MAX

set print 'oild.dat' append
l 'derint_r.gnu'
set print

plot 'oild.dat' u 1:(g($2)) w filledcurve lc rgb "#eeeeee", 'oil.dat' u 0:8 w l

and our 'for' loop, that we call 'derint_r.gnu'
a=1.0; b=40.0
fit [A:A+1] f(x) 'oil.dat' u 0:8 via a, b
integ = integ+(a*A+b+a*(A+1)+b)/2.0
if(sa*a<0 a="" b="" integ="" print="" sa="a;" sb="b<br">if(sa*a>0) print A, a, b, integ; sa=a; sb=b
A=A+1
if(A<xmax) reread


The first part of the main script should be clear: we set up the style of our figure, and do a dummy plot to learn something about our data file, 'oil.dat'. f(x) is our linear function, while g(x) is just a helper, whose role will become evident a bit later. Having set up the plot, we call 'derint_r.gnu' 'xmax' times. So, what is in 'derint_r.gnu'? We simply fit our linear function over a range that contains only two points, and we step through the whole file by moving those two points by one in each call to the loop. Then we calculate the integral by adding the area of the next trapezium, and then depending on whether the sign of the derivative, 'a', is different to the previous one, we print out the results once (if the signs are identical), or twice (if they are different). This distinction is necessary because we want to plot rectangles where the derivative is negative. After we are done with the data processing, we close our new file by setting the print, and plot the results, to get the following figure



Note that in the plotting, we called the function g(x), which takes on the value of the minimum of the yrange, if the argument is negative (the price is dropping), while returns the maximum of the yrange otherwise. Of course, there are many things that we could do to improve the image, but I wanted only to demonstrate how we can do the data processing part. You can dress up the image as you like, and for that you can also draw on many a post of mine.

Saturday, 8 August 2009

Return of the wall chart - entirely in gnuplot

On the 7th of June, I showed how one can draw a wall chart in gnuplot. However, that method relied on the knowledge of the particular function we wanted to plot, therefore, would not work for data files. We have since learnt a thing or two, and it is high time to return to that type of plot, and see whether we could use something that we have picked up on the way. Yes, you are right, the answer lies in the affirmative, and the situation is even far better: all we have to do is to take the scripts from yesterday, and change two lines. It could not be simpler!

Before we turn to the complete chart, first we should see how we can plot just one single plane, taken the values from the file. For the sake of simplicity, I will use 'ribbon.dat', given in the post yesterday. Also, if you haven't read that post, you should now, for I will simply change two lines, but I will not go through the whole code again.

If you recall, our data file, after some processing in gnuplot, looked like this (well, at least, the first column)
#Surface 0 of 1 surfaces

#IsoCurve 0, 3 points
#x y z type
0  13  0.901743 i
1  13  0.901743 i
2  13  0 i

#IsoCurve 1, 3 points
#x y z type
0  12  0.860648 i
1  12  0.860648 i
2  12  0 i
...

This is what we called 'rib2.dat'. We used the every keyword of splot to single out the first (0) and second (1) line in gnuplot. Now, what would happen, if we plotted the second and third lines? We would get a single plane, standing vertically, and having a "shape" given by the 'z' values, as in this figure:




Just for the record, our plot is produced by these three lines in 'ribbon_r.gnu'
r=rand(0); g=rand(0); b=rand(0)
set palette model RGB defined (0 r/cm g/cm b/cm, 1 r/cm g/cm b/cm)
splot 'rib2.dat' every ::1::2 u (A-1):2:3 w pm3d


i.e., we restrict the 'x' values to 'A-1', and plot the second and third columns of the second and third records in each block. (Remember that the numbering of the records begins with 0!)
The figure is almost OK, with the tiny glitch that the negative values are plotted "downwards", i.e., the reference line is 0. We can fix this with the helper function
zr(x) = (x==0?zmin:x)

which returns the minimum of the zrange, if the 'z' value is smaller, than 0, otherwise it returns 'z'. So, by modifying the plot as
r=rand(0); g=rand(0); b=rand(0)
set palette model RGB defined (0 r/cm g/cm b/cm, 1 r/cm g/cm b/cm)
splot 'rib2.dat' every ::1::2 u (A-1):2:(zr($3)) w pm3d

we get



Remember that, since we use a random number to determine the RGB values, the colour scheme of the plot will be different after each call of the script, and this is why the plot looks a bit different to the previous one. Nevertheless, the point is not this, but that the plots go up from the bottom of the graph to the corresponding 'z' values. If you are satisfied with this, you can stop here, there is nothing more to it. However, we could turn what we have into a "real" wall chart by adding a "roof" to the walls. All it takes is four extra lines in our 'for' loop, which will now look like this (this is wall_r.gnu for reference)

A=A+1
set table 'rib.dat'
plot filename u A:A
unset table

set table 'rib2.dat'
splot 'rib.dat' mat
unset table

r=rand(0); g=rand(0); b=rand(0)
set palette model RGB defined (0 r/cm/cm g/cm/cm b/cm/cm, 1 r/cm/cm g/cm/cm b/cm/cm)
splot 'rib2.dat' every ::1::2 u (A-1):2:(zr($3)):3 w pm3d

set palette model RGB defined (0 r g b, 1 r g b)
splot 'rib2.dat' every ::0::1 u (A-1+B*$1):2:3 w pm3d

set palette model RGB defined (0 r/cm g/cm b/cm, 1 r/cm g/cm b/cm)
splot 'rib2.dat' every ::1::2 u (A-1+B):2:(zr($3)):3 w pm3d

if(A<C) reread


Note that we plot the same file, 'rib2.dat', thrice, each time with a different shade of the same colour: first with the darkest, and this is the plane farthest from the viewer, then the roof of the walls, i.e., the ribbons, with the lightest, and finally, the front plane of the walls with a medium shade. The main script reads as follows

reset
filename="ribbon.dat"
mult=1.2; cm=1.4
A=0; B=0.3;
set xlabel 'x axis [a.u.]'; set ylabel 'y axis [a.u.]';
set ticslevel 0
unset key; unset colorbox
set tics out nomirror
set border 1+2+4+8+16+32+64+256+512 back
mini(x)=(x<0 mult:x="" mult="" x="">0?x*mult:x/mult)
zr(x) = (x==0?zmin:x)

set isosample 100, 3
set xrange [0:1]; set yrange [0:1]
set table 'bg.dat'
splot x
unset table; set xrange [*:*]; set yrange [*:*]

splot filename mat
C=GPVAL_DATA_X_MAX+1
ymax = GPVAL_DATA_Y_MAX+1
zmin = mini(GPVAL_DATA_Z_MIN)
zmax = maxi(GPVAL_DATA_Z_MAX)
set xrange [-0.5:C-0.5]
set yrange [0:ymax]
set zrange [zmin:zmax]
set cbrange [0:1]

set multiplot
set palette model RGB defined (0 0.3 0.3 0.85, 1 0.9 0.9 0.95)
splot 'bg.dat' u ($1*C-0.5):($2*ymax):(zmin):3 w pm3d, \
'' u ($1*C-0.5):(ymax):(1.9999*$2*(zmax-zmin)+zmin):3 w pm3d, \
'' u (-0.5):($1*ymax):(1.9999*$2*(zmax-zmin)+zmin):(0) w pm3d
set cbrange [zmin:zmax]

unset border; unset tics; unset xlabel; unset ylabel; unset zlabel

l 'wall_r.gnu'

unset multiplot

This is identical to 'ribbon.gnu' from yesterday, with the exception of the definition of the helper function 'zr(x)'. The resulting figure looks like this one




There are two more things that we could do with this graph: One is that we can add a grid to the surfaces by adding these three lines to our 'for' loop
set style line 1 lt -1 lw 0.5 ps 0
set pm3d implicit at s
set pm3d depthorder hidden3d 1

(Or, we could add this to our main script, but only after plotting the background, otherwise, that will be gridded, too.) By doing so, we get the following figure



The second is that instead of drawing the back plane of the wall, we could add a front plane, thereby "closing" the volume. We do this by replacing the first plot command in our 'for' loop by
splot 'rib2.dat' every ::0:(ymax-2):1:(ymax-1) u (A-1+B*$1):(0):(zz($2,$3)) w pm3d

and adding the helper function
zz(x,y) = (x==1?zmin:y)

to the main script. What the plotting command does is to take only the last two blocks, and choose only the first and second records. The last two blocks will have 1 and 0 in the second column, so we use that and the helper function to determine which 'z' value we want to pick to draw our rectangle in the front.
Of course, you should plot the front plane after you plotted the ribbon on the top, otherwise, parts of it might be covered. Once you do that, you get the following image



I believe, there is not too much else that we could do here, but we have gone a long way, and produced a decent-looking wall chart! I should also point out that, if you want to use the second plot, i.e., the wall chart without the roofs, you can still add a grid. All you have to do is to add those three lines that we discussed in connection with the grid on the complete wall chart, and you can, naturally, add a grid to the ribbon plot that we discussed yesterday.

Friday, 7 August 2009

Ribbon charts - without the gawk script

Yesterday, we saw how we can produce a ribbon chart from an arbitrary data file, and with the help of an external script. You can still make it work under windows, although you might have to call the script outside gnuplot. Probably, it would be better to do everything in gnuplot, wouldn't it? So, can we do something to alleviate the situation. But, of course, we can, we just have got to use those little grey cells! If you are patient and read further, I will explain how you can draw this graph completely automatically! (Uhm, this is rather similar to that from yesterday... Never mind, the point is that you haven't got to do any work!)



If you recall, the key to plotting the ribbons was to convert the original data file (see the beginning of my previous post) into something like this:
0 0 0.857354
1 0 0.857354

0 1 0.551386
1 1 0.551386

0 2 0.243002
1 2 0.243002

0 3 -0.24
1 3 -0.24

0 4 -0.422674
1 4 -0.422674
...

i.e., we need two rows of three columns, separated by a linefeed. We will need some data processing for this, but we can manage. First, we will duplicate all columns, simply plotting into a file. By issuing
set table 'rib1.dat'
plot 'ribbon.dat' u 1:1
unset table

we get the following file, 'rib1.dat'
#Curve 0 of 1, 14 points
#x y type
0.857354  0.857354  i
0.551386  0.551386  i
0.243002  0.243002  i
-0.24 -0.24  i
-0.422674 -0.422674  i
-0.513352 -0.513352  i
-0.660823 -0.660823  i
-0.0962106 -0.0962106  i
-0.0152513 -0.0152513  i
0.489453  0.489453  i
0.876023  0.876023  i
0.945061  0.945061  i
0.860648  0.860648  i
0.901743  0.901743  i

This is almost what we want, two columns of the same numbers, namely those from the first column of 'ribbon.dat'. The only difficulty is that it contains a third column, which in this case is just a set of i, but if we were to plot this file as a matrix, at x=2 all values would be equal to zero, so it would be a distorted ribbon. How can we get rid of that letter? Well, we will turn the into a form similar to that at the beginning of this post, simply by
set table 'rib2.dat'
splot 'rib.dat' mat
unset table

where now 'rib2.dat' looks like this
#Surface 0 of 1 surfaces

#IsoCurve 0, 3 points
#x y z type
0  13  0.901743 i
1  13  0.901743 i
2  13  0 i

#IsoCurve 1, 3 points
#x y z type
0  12  0.860648 i
1  12  0.860648 i
2  12  0 i
...

which is very similar to the desired format, for now we can simply call an splot, without the matrix specifier. But we still have an erroneous value in every third line. We will remove that using the 'every' specifier of plot. If you haven't done it yet, you should definitely issue the command
?every

You will learn a lot of interesting things. What we want to do here is to plot only the first end second record in each block. In that case, it does not matter what is in the third record, because it will not be processed by gnuplot. So, we take our 'rib2.dat' file, and plot it as
splot 'rib2.dat' every ::0::1 u 1:2:3 w pm3d

which returns the first (0) and second (1) column in each block. Thus, with these three easy steps, we created a ribbon, which happens to represent the first column in our original data file. All that is left is to walk through the columns in 'ribbon.dat', and do the same thing again and again. Note that we specified the columns that we are going to use as 1:2:3. We will need this later, when we shift the next ribbon to a new position.

Then, let us suppose that we are a bit lazy, and we do not fancy following the prescription above. We can then resort to the recipe outlined a couple of days ago in connection with the pie charts and the bar graphs. If you haven't read those posts, you should now, for I am going to use everything mentioned in them.

Since the three steps above are repetitious, we will put them in a 'for' loop. As I discussed it earlier, the way to do this is to write the instructions in a separate file, and 'reread' them as many times as necessary. I.e., we will have a file, 'ribbon_r.gnu', which contains the following lines
A=A+1
set table 'rib.dat'
plot 'ribbon.dat' u 1:1
unset table

set table 'rib2.dat'
splot 'rib.dat' mat
unset table

r=rand(0); g=rand(0); b=rand(0)
set palette model RGB defined (0 r/cm g/cm b/cm, 1 r g b)
splot 'rib2.dat' every ::0::1 u (A-1+B*$1):2:3 w pm3d
if(A<C) reread

So, this is our 'for' loop, and this is where we call it, 'ribbon.gnu'

reset
filename="ribbon.dat"
mult=1.2; cm=2.5
A=0; B=0.3;
set xlabel 'x axis [a.u.]'; set ylabel 'y axis [a.u.]';
set ticslevel 0
unset key; unset colorbox
set tics out nomirror
set border 1+2+4+8+16+32+64+256+512 back
mini(x)=(x<0 mult:x="" mult="" x="">0?x*mult:x/mult)

set isosample 100, 3
set xrange [0:1]; set yrange [0:1]
set table 'bg.dat'
splot x
unset table; set xrange [*:*]; set yrange [*:*]

splot filename mat
C=GPVAL_DATA_X_MAX+1
ymax = GPVAL_DATA_Y_MAX+1
zmin = mini(GPVAL_DATA_Z_MIN)
zmax = maxi(GPVAL_DATA_Z_MAX)
set xrange [-0.5:C-0.5]
set yrange [0:ymax]
set zrange [zmin:zmax]
set cbrange [0:1]

set multiplot
set palette model RGB defined (0 0.3 0.3 0.85, 1 0.9 0.9 0.95)
splot 'bg.dat' u ($1*C-0.5):($2*ymax):(zmin):3 w pm3d, \
'' u ($1*C-0.5):(ymax):(1.9999*$2*(zmax-zmin)+zmin):3 w pm3d, \
'' u (-0.5):($1*ymax):(1.9999*$2*(zmax-zmin)+zmin):(0) w pm3d
set cbrange [zmin:zmax]

unset border; unset tics; unset xlabel; unset ylabel; unset zlabel
l 'ribbon_r.gnu'
unset multiplot

This looks formidable at first, but we can easily decipher all. First, in our 'for' loop, in 'ribbon_r.gnu', we simply wrote down the three steps required to created one ribbon, we re-set the colour palette, and call the loop as many times as many columns there are (C).

In the main script, in 'ribbon.gnu', we first set up the main properties of the figure, and we define the width (B) of the ribbons, the multiplier (mult), which determines by how much the figure is bigger, than the minimum and maximum of the data points in 'ribbon.dat', and we also define 'cm', which will determine how "deep" the colour modulation of the ribbons is. By setting this to a number close to 1, we get a more or less homogeneous colour, while setting it to something large, we get a wildly modulated colour palette for the ribbons. Note the two helper function, 'mini' and 'maxi', which are used to set the zrange, and which make use of the value of 'mult'. The next couple of lines produces the background, which you can skip, if you do not want a very fancy graph. Then we make a dummy plot, just to determine the various ranges. You can read about this in my last but one post. Immediately after the multiplot, we plot 'bg.dat', i.e., the background of the figure. After this we call 'ribbon_r.gnu', and finally, escape from the multiplot.

A couple of notes: in the 'for' loop, we generated the colours by picking three random numbers. This means that your plot will have different colours after successive calls of 'ribbon.gnu'. You can change this, if you want to have some definite colouring scheme.

As I have already pointed out, the colour modulation of the ribbons is given by the variable 'cm', at the beginning of 'ribbon.gnu', while the size of the figure with respect to the range of the values in 'ribbon.dat' is given by 'mult'. If you set it to something smaller, than 1, be prepared for some clipping of the data!

Finally, these two scripts will produce the graphs automatically for you, the only thing you have to set is the name of the file, and, of course, you have to have the data in the format given at the very beginning of my previous post, i.e., the data points must be arranged in a matrix.

Tomorrow (or whenever I have time), I will show what else this script can be used for, with some small modification. So long!

Thursday, 6 August 2009

Making a ribbon chart in gnuplot

While I am not a vary big fan of this kind of chart, I admit that on occasion, it might lend a refreshing new look to a graph. Besides, we can use it for something else, but more on that later. So, we will set to make this graph



from a data file similar to this
0.857354 0.943516 0.546751 -0.588053 -0.548767
0.551386 0.896607 0.163013 -0.166411 -0.442178
0.243002 0.310189 0.465029 0.0790944 0.378762
-0.24 -0.0625021 0.226812 0.48102 0.451281
-0.422674 -0.444781 0.314025 0.626378 0.822247
-0.513352 -0.893642 0.220384 0.800136 1.1908
-0.660823 -0.513169 0.316238 1.08866 1.25814
-0.0962106 -0.130051 0.0965893 0.944557 1.01623
-0.0152513 -0.0945605 0.468029 0.240179 0.556154
0.489453 0.726084 0.398192 -0.162081 -0.167408
0.876023 0.997108 0.139124 -0.281058 -0.581462
0.945061 1.19238 0.359603 -0.706723 -0.687105
0.860648 0.907059 0.217838 -0.540168 -0.592025
0.901743 1.02898 0.242619 -0.337589 -0.584658

which we will call ribbon.dat. There is an easy way to make this graph, and this solution requires an external script. Tomorrow, I will show a solution, if a bit convoluted, entirely in gnuplot, so that even those of you, who do not fancy a gawk one-liner, can make a similar figure.

The basic idea is to take the columns of ribbon.dat one by one, and turn them into a surface plot, whose derivative in the x direction is zero, while it takes the successive values of the column along the y axis. The gawk one-liner that does this transformation for us is given here
#!/bin/bash
gawk -v c=$2 '{printf "0 %d %g\n1 %d %g\n\n", NR-1, $c, NR-1, $c}' $1

I.e, we take the column designated by the value of 'c', and instead of listing the members, we print out six times the length of the column numbers, of the form
0 0 0.857354
1 0 0.857354

0 1 0.551386
1 1 0.551386

0 2 0.243002
1 2 0.243002

0 3 -0.24
1 3 -0.24

0 4 -0.422674
1 4 -0.422674
...

(If you look carefully, you will notice that this is nothing but the first four elements in the first column.) In this list, the first column is the 'x' value, the second column is the 'y' value, and the third is the 'z' value. We repeat all 'z' values twice, once with 0, and once with 1 'x' value. By plotting this, we will produce a ribbon of width 1. All that remains is to call the script multiple times, and change the colouring accordingly. With these in mind, our gnuplot script reads as
reset
unset key; unset colorbox
set tics out nomirror
set border 1+2+4+8+16+32+64+256+512 back
a=0.0
b=0.3
C=5.0

set xrange [-0.5:C-0.5]; set yrange [0:14]; set zrange [-2:2]; set cbrange [-2:2]
C=C+1.0
set xlabel 'x axis [a.u.]'; set ylabel 'y axis [a.u.]';
set ticslevel 0

set table 'bg.dat'
splot x
unset table

set multiplot
set palette model RGB defined (0 0 0 0.8, 1 0.9 0.9 0.9)
splot 'bg.dat' u 1:2:(-2):3 w pm3d, '' u 1:(14):($2/14*4-2):3 w pm3d, '' u (-0.5):2:($1-2):(-0.5) w pm3d

unset border; unset tics; unset xlabel; unset ylabel; unset zlabel
set palette model HSV function 1-a/C, 1-gray, 1-a/C
sp '<./ribbon.sh ribbon.dat 1' u (a+b*$1):2:3 w pm3d
a=a+1
set palette model HSV function 1-a/C, 1-gray, 1-a/C
sp '<./ribbon.sh ribbon.dat 2' u (a+b*$1):2:3 w pm3d
a=a+1
set palette model HSV function 1-a/C, 1-gray, 1-a/C
sp '<./ribbon.sh ribbon.dat 3' u (a+b*$1):2:3 w pm3d
a=a+1
set palette model HSV function 1-a/C, 1-gray, 1-a/C
sp '<./ribbon.sh ribbon.dat 4' u (a+b*$1):2:3 w pm3d
a=a+1
set palette model HSV function 1-a/C, 1-gray, 1-a/C
sp '<./ribbon.sh ribbon.dat 5' u (a+b*$1):2:3 w pm3d
unset multiplot


Here, 'b' is the width of the ribbons, 'C' is the number of ribbons we want to plot, and 'a' is just a variable for convenience. If you do not want to have the fancy background, you can skip lines starting with 'set table' to 'unset border'. Of course, if you do this, then you have got to move the line 'unset border;...' to after the first plot. In the rest of the script, we plot the columns, one by one, and also change the colour palette, so that the ribbons will have different colours. You can skip those lines, too, if you are satisfied with one colour for all the ribbons. I should also point out that the repeated call to the script can be made automatic, if you put those three lines into another gnuplot script, and call that script via reread. I discussed how to this this in my last two posts, so if you are interested in that, you should also read those. In those posts, you will also find hints as to how to avoid having to specify the various ranges by hand, and how to get the required values with the help of gnuplot. It should not be hard to implement those things in this script, so it is easy to make a "self-running" ribbon-plot-maker.

Well, this was about the easy way to make ribbon charts. In my next post, I will show how to do this in gnuplot, without having to rely on an external script. Till then!

Sunday, 2 August 2009

Plotting a matrix with bargraphs

In some cases, it is common practice to plot a matrix with bargraphs, instead of a colour map, or something similar. For instance, in physics, people usually plot the density matrix (if it does not contain too many elements) in this way. I have already discussed how bargraphs can be generated, but that method called an external gawk script, which had the advantage that we hadn't got to know the number of elements in advance. Yesterday, I showed how we can avoid the use of an external script, provided we know how many points we want to plot. But could we make something better, and produce a plot entirely with gnuplot, and without knowing how many, and of what magnitude the elements are? We could, by making use of some gnuplot variables. This is what I am going to discuss today. Since I will build upon the tricks that we exploited yesterday, I would recommend that you read that post, if you haven't already done so.

Let us suppose that we have the following data file, called 'bar_matrix.dat', with 4 by 4 real numbers
1 0.5 2.2 0.5
1.2 0.6 -2.4 0.2
0.1 0.5 -1.8 -1.5
1.2 0.3 0.6 1.9

and we want to generate a plot similar to this:



Following the recipe from yesterday, we have to read out the numbers in the matrix, one by one, and plot the cuboids one by one. In order to give the impression of a 3D object, the three visible sides of the cuboids must be painted with different shades of the same colour, but that should not be a problem, we have already seen how to do that. However, there is one difficulty here: we will have to use multiplot, which means that in order to keep the relative size of the cuboids, we have to fix all three ranges, xrange, yrange, and zrange. In addition, determining the xrange and yrange serves not only some aesthetic purpose, but they will also control our 'for' cycle. So, what can we do to crack this nut?

In order to answer the question, we have got to think about how gnuplot plots: first, if no ranges are specified, data are read in, then based on certain values, the ranges are calculated. So, if we issue
plot 'something.dat' using 1:2

the first, and second columns of 'something.dat' will be read, and based on the minimum and maximum values of the first column, the xrange is calculated, and likewise for the yrange. But then these values are stored as a special variable in gnuplot. So, if you are interested in the minimum and maximum of your xrange, you can do
print GPVAL_DATA_Y_MIN
print GPVAL_DATA_Y_MAX

You can also print out all variables, both user-defined and internal, by typing
show variables all

Now, this means that we will have access to some properties of our data file, provided that the data file has been plotted at least once. This is what we will use in the following script, called 'bar_matrix.gnu'.

reset
filename="bar_matrix.dat"
w=0.4
FIT_LIMIT=1e-8
mult=1.05; cd=1.4

f(x,a)=(abs(x-a)<0.5?d:0)
R(x)=abs(2*x-0.5); G(x)=sin(x*pi); B(x)=cos(x*pi/2.0)

set view 60, 20
set parametric; set isosample 2, 2
unset border; unset tics; unset key; unset colorbox
set ticslevel 0
set urange [0:1]; set vrange [0:1]

splot 'bar_matrix.dat' mat
set zrange [mult*GPVAL_DATA_Z_MIN:mult*GPVAL_DATA_Z_MAX]
Y=GPVAL_DATA_Y_MAX+1
X=GPVAL_DATA_X_MAX+1

set xrange [1-w:X+2*w]

set multiplot

C=0; xx=1; yy=Y
call 'bar_matrix_r.gnu'

set palette model RGB functions 0.9, 0.9,0.95
splot -w+u*(X+3*w), -w+v*(Y+w), 0 w pm3d

C=1; xx=1; yy=Y
call 'bar_matrix_r.gnu'
unset multiplot


In addition to this file, we will need two more. Remember, in order to plot a matrix, we have to use two 'for' cycles, each requiring an external gnuplot script. So, we also have 'bar_matrix_r.gnu'
yy=Y
call 'bar_matrix_r2.gnu'
xx=xx+1
if(xx<X+1) reread

and 'bar_matrix_r2.gnu'
unset parametric
yy=yy-1
d=0.3
set yrange [*:*]
fit [0:Y] f(x,yy) filename u 0:xx via d
set yrange [-w:Y+2*w]
set parametric
r=R(yy/Y); g=G(yy/Y); b=B(yy/Y)
if(C==0 && d<0) \
set palette defined (0 r g b, 1 r g b);\
splot w*u+xx, w*v+yy, d w pm3d;\
r=r/cd; g=g/cd; b=b/cd;\
set palette defined (0 r g b, 1 r g b);\
splot w*u+xx, yy, v*d w pm3d;\
r=r/cd; g=g/cd; b=b/cd;\
set palette defined (0 r g b, 1 r g b);\
splot w+xx, yy+w*u, v*d w pm3d;

if(C==1 && d>0) \
set palette defined (0 r g b, 1 r g b);\
splot w*u+xx, yy+w*v, d w pm3d;\
r=r/cd; g=g/cd; b=b/cd;\
set palette defined (0 r g b, 1 r g b);\
splot w*u+xx, yy, v*d w pm3d;\
r=r/cd; g=g/cd; b=b/cd;\
set palette defined (0 r g b, 1 r g b);\
splot w+xx, yy+w*u, v*d w pm3d;

if(yy>0) reread


Having seen the scripts, let us discuss what the three files do. The beginning of 'bar_matrix.gnu' should be familiar by now, there is nothing new in it. The first interesting thing happens where we do a dummy plot of the matrix, just to extract the number of rows and columns, and the range of the data values. We use GPVAL_DATA_Z_MIN and GPVAL_DATA_Z_MAX to set the common zrange of all consequent figures, and GPVAL_DATA_X_MAX and GPVAL_DATA_Y_MAX to determine the number of iterations (and the xrange, and yrange, of course). Note that we use a multiplier, 'mult', defined at the beginning of the file, so that the top and bottom of the plot will definitely not be clipped. Then we clear our plot by setting the multiplot, and calling 'bar_matrix_r.gnu'. This file controls the 'for' cycles over rows. It does nothing, but re-sets the value of the inner 'for' cycle, increments its own variable, and calls 'bar_matrix_r2.gnu'. In this second 'for' loop is where the actual plotting takes place. In that file, we call our fitting routine, set the palette, and plot the 3 sides of the cuboids. Note that we want to paint the sides in different colours, so after plotting the top, we produce a darker shade by dividing all RGB values by 'cd', and after plotting the front side, we reduce the RGB values once more, to draw the right hand side.

Now, this script contains an 'if' statement, with the variable 'C' and 'd'. The reason for this is that we call 'bar_matrix_r.gnu' twice: first we plot all bars that represent a negative value, then draw the z=0 plane, and finally plot the positive values. This is what happens in the last couple of lines of 'bar_matrix.gnu'. If you are certain that your matrix contains positive values only, you can get rid of the 'if' statement, and also of the first call of 'bar_matrix_r.gnu'.
If you now call 'bar_matrix.gnu' from gnuplot, you will get the figure at the beginning of this plot. You can then add labels and so on, as you wish. I would also like to point out that human intervention is required only at the very beginning of 'bar_matrix.gnu', where we specify the name of the file that we want to plot. Otherwise, everything is automatic, and will just happen by the stroke of the key. You should also keep in mind that we had a dummy plot, so if you want to plot into a file, you have got to set the terminal after this plot, lest you should not have the dummy plot in your file.

Saturday, 1 August 2009

Pie charts - entirely in gnuplot!!!

Some time ago, I discussed various ways to produce a pie chart in gnuplot. I have looked at the search terms that bring people to my blog, and it seems that this topic is one of the more popular, so I thought that it might be worthwhile to explore the issue more.

Since I am a Linux fan, I am quite fine with calling various scripts from gnuplot: pipes are rather convenient, and I can write a small script which does the data processing. But I see the downside of it, too: these solutions (while you can make it work under windows) will require extra steps, if not run under linux. I also understand that you might not want to delve into the nuances of gawk, for instance. So, I was wondering whether we could do everything in gnuplot, without relying on something external. And the answer is, yes, we can! (Otherwise, I wouldn't be writing this post at all...) What we are going to do is probably the dirtiest of hacks, for we will use a function of gnuplot, which was never intended to be used in such a ramshackle way. But we want to produce graphs, and this is not a coding beauty contest, after all!

If you recall, we had a data file with two columns: one designating years, the other one containing the values belonging to those particular years. Something like this, which we will call pie.dat:
1989 0.1
1990 0.2
1991 0.2
1992 0.05
1993 0.15
1994 0.3


You might also recall that the way in which we produced the pie chart was to plot arcs of a circle, the parameters determined by the second column. Now, the problem was (and this is why we had to use an foreign script) that those parameters cannot be set at run-time, so to speak, we had to hard-wire them into the gnuplot script. So, the question is really how we can access individual values of a data file, say, the 5th number in the 2nd column, and then do this repeatedly. The snag is that gnuplot hasn't a dedicated function to perform this task, so we have to look for a function that is dedicated to something else. Since there are not too many gnuplot functions that operate on a file, we can easily find the one that "returns" a value. We will use the fit function, and use the fitting parameter as a means of returning the sought-after value from the file. I know, I know! This is the ultimate abuse of 'fit', and we shouldn't do this at all! But what the heck! You haven't got to tell anyone how you produced that bloody figure!

Now, we have to find a proper fitting function. Remember, we want to pick the value of a particular number in the second column, say. Well, being a physicist, I would say that the Dirac-delta will do the job. However, we have to soften our stance a bit, and use something that is gnuplot-friendlier. For better or worse, I will choose the following function
f(x,a)=(x>a-0.5?(x<a+0.5?b:0):0)


You can recognise our old friend, the ternary operator, making its appearance twice in the definition of f(x,a). So, we first check, if the value 'x' is larger, than a-0.5. If so, we check whether it is smaller, than a+0.5. If this condition is fulfilled, we assign the value 'b', otherwise 0. By plotting it, you can see it for yourself that this function is a rectangle of height 'b' and width 1, centred on 'a', and 0 everywhere else. 'b' is going to be our fit parameter. Now, if you fit this function as

fit [0:7] f(x,2) 'pie.dat' u 0:2 via b
print b

then 0.2 will be printed. But that is the value of the 3rd number in the second column of pie.dat! (The rows are numbered starting with 0, that is why 2 means the 3rd row.) Since the value 'b' can be used in any subsequent expressions, definitions etc. as a number, we have found a way to extract any single number from any data file.

In order to proceed, we have to find a method to step through the rows of a column, one by one. For this purpose, we will use the reread command of gnuplot. You can learn the basic idea by issuing ?reread and ?if. 'reread' just repeatedly reads the file invoked by the last load command, and we can use 'if' to set some criterion as to how many times this repeated loading should take place. This is a primitive 'for' cycle, but it will do. (In gnuplot 4.3, the option 'for' was introduced in the 'plot' command, but that would not do too much good here, for we still have got to figure out the actual numbers.)

With these in mind, we write the following two scripts, the first of which named pie.gnu
reset
os=1.3
FIT_LIMIT=1e-8
L=6.0
f(x,a)=(x>a-0.5?(x<a+0.5?b:0):0)
r(x)=abs(2*x-0.5); g(x)=sin(x*pi); b(x)=cos(x*pi/2.0)

set view 30, 20
set parametric
set isosample 2, 2
unset border
unset tics
unset key
set ticslevel 0
unset colorbox
set urange [0:1]
set vrange [0:1]
set xrange [-2:2]
set yrange [-2:2]
set zrange [0:3]

A=0.0; D=0.0
set multiplot
# First, we draw the 'box' around the plotting volume
set palette model RGB functions 0.9, 0.9,0.95
splot -2+4*u, -2+4*v, 0 w pm3d
set palette model RGB functions 0.8, 0.8, 0.85

splot cos(u*2*pi)*v, sin(u*2*pi)*v, 0 w pm3d

call 'pie_r.gnu'
unset multiplot


while the second one named 'pie_r.gnu'

unset parametric
b=0.3
set yrange [*:*]
fit [0:L] f(x,D) 'pie.dat' u 0:2 via b
B=b
fit [0:L] f(x,D) 'pie.dat' u 0:1 via b
D=D+1.0
set palette model RGB functions r(D/L), g(D/L), b(D/L)
set parametric
set yrange [-2:2]
set urange [A:A+B]
set label 1 "%g", b at os*cos(2*pi*(A+B/2.0)), os*sin(2*pi*(A+B/2.0)), 0.2 cent
splot cos(u*2*pi)*v, sin(u*2*pi)*v, 0.2 w pm3d
A=A+B
if(D<L) reread


Now, let us see what is happening here. In the first script, 'os' will be used to place the labels later on. The gnuplot variable FIT_LIMIT is set to that value, so that the fit values are more accurate. It might be necessary to change it, if your labels are not what you expect. For the arcs, it should not really matter, because the default value is going to be accurate enough for any practical plots. 'L' is the number of data that we have (note that data are numbered starting with 0). Next, we define 4 functions. The first one is our fit function, the other three are used to colour the arcs. You can change these, if you are not satisfied with the colour scheme that you get. Any three functions will do, which are defined on [0:1], and return with a value in [0:1]. You can read about this in the post in which I discussed phonged surfaces, sometime in late May.

The next couple of lines set up the various ranges of our plot. I wrote about this in my first pie chart post, and the lines giving the background of the plot should also be familiar from that post. 'A' and 'D' are our control variables that we manipulate in 'pie_r.gnu'. We then draw the shadow of the pie, and finally, call 'pie_r.gnu'.

Let us take a look at 'pie_r.gnu'. First, we extract the value in the second column, and then in the first one. We will use this latter one to produce the label. Note that we have got to unset the parametric plot, otherwise, the fitting function would not work. Also note that we re-set the yrange. This is necessary, because the actual plot is in the [-2:2] range, while the fit is on values around 1990. Then we increment the value 'D' (this is the ordinal number of the row that we are currently processing), and re-set our palette, using the three functions, r(x), g(x), and b(x) that we defined in 'pie.gnu'. Then, using the extracted values, we set the range of the parametric plot, and define the label, and plot the arc. Finally, we check, if we have called the script enough times. Loading 'pie.gnu' will produce the following graph:




A couple of comments about this script: as I already mentioned, you can change the colour scheme by using various functions. However, if you are really lazy, you can simply generate three random numbers by

r=rand(0); g=rand(0); b=rand(0)

and assign these values to the next palette.

Second, you can easily implement an "explosion", by shifting one or several of the arcs by shift*cos(2*pi*(A+B/2)) and shift*sin(2*pi*(A+B/2)), based on some condition that you set. It should not be hard, either, to plot a real 3D pie char based on the scheme that I outlined about two months ago.

Third, if the sum of your numbers is not normalised to one, you can easily fix it by adding an extra loop to your script: if you have a script called 'pie_r2.dat', containing

unset parametric
b=0.3
set yrange [*:*]
fit [0:L] f(x,D) 'pie.dat' u 0:2 via b
D=D+1.0
G=G+b
if(D<L) reread


and call this in 'pie.gnu' immediately before 'pie_r.gnu', then 'G' will just be the sum of all numbers in column 2, and you can use this to normalise the numbers when you call 'pie_r.gnu'.

And last, the only thing you need in advance is the number of records you want to process, 'L'. This is the only thing you have to set by hand, all the rest is automatic. If you want to learn how to avoid this small "difficulty", you should read the post on the 2nd of August.

I hope that this script can convince people that we haven't got to rely on any outside tool, and can still produce a decent pie chart in gnuplot.I also feel that this approach is, in some sense, much cleaner, than the one that was based on gawk. Some people will probably contest this statement, but doing everything in gnuplot ensures that we retain the platform independence.