Sunday, 28 June 2009

Bubble graphs with a different method

Yesterday, I showed a simple way to producing bubble graphs with gnuplot. That method relied on manipulating the eps file. Today, we will try another option, this time plotting everything on the appropriate terminal. At the end, we will have the following figure:

First, let us have a look at the script, assuming that we have the following data to plot:
```0 -0.0694726
0.20202 0.233484
0.40404 0.424311
0.606061 0.546688
0.808081 0.580043
1.0101 0.862214
1.21212 0.907601
1.41414 0.759692
1.61616 0.884879
1.81818 0.784566
2.0202 0.71774...```

```reset
f(x) = A*exp(-x*x/B/B)
rx=0.107071; ry=0.057876; A = 1; B = 0.2; C=0.5*rx; D=-0.4*ry
g(u,v) = (2*cos(u)*v*rx+C)*(2*cos(u)*v*rx+C)+(3.5*sin(u)*v*ry+D)*(3.5*sin(u)*v*ry+D)
unset key; unset colorbox; set view map
set xrange [-0.15:5.2]; set yrange [-0.7:0.95]
set parametric; set urange [0:2*pi]; set vrange [0:1]
set isosamples 20, 20; set samples 30
set palette model HSV functions 1, 1-f(gray), 1+2*f(gray)
splot cos(u)*rx*v+0.000000,sin(u)*ry*v+0.000000, g(u,v) w pm3d, \
cos(u)*rx*v+0.202020,sin(u)*ry*v+0.233484, g(u,v) w pm3d, \
cos(u)*rx*v+0.404040,sin(u)*ry*v+0.424311, g(u,v) w pm3d, \
cos(u)*rx*v+0.606061,sin(u)*ry*v+0.546688, g(u,v) w pm3d, \
cos(u)*rx*v+0.808081,sin(u)*ry*v+0.580043, g(u,v) w pm3d, \ ...```

First, we define a Gaussian; this will be the colouring function, and then define a couple of variables that go into that function. Having done this, we define the function that determines the argument of f(x). The next couple of lines simply sets the ranges and the parametric plot with the parametric ranges, and finally, the number of samples. The last thing we have to define is the palette function. We choose red (i.e., the hue is equal to 1), and the saturation and value are given by 1-f(x) and 1+2*f(x), where the argument is the gray value. At this point, we are ready to plot the points in question. We will simply draw circles with origin x,y, where the x and y values are taken from the data file that we showed above. Note that in fact we are drawing ellipses with axes rx and ry. The reason for this is that the aspect ratio of the plot is not equal to 1, i.e., were we to draw circles, they would look ellipses on the plot. The value of rx and ry are determined by the plot ranges xrange and yrange. (We will see this in the gawk script below.) When plotting the points, we have to plot one circle for each data point, i.e., we have to call the plot function many times, while C and D give the centre of the white spot. increasing C or D will push the white points to the edge of the circles.

Now, a few words on the various parameters above. The value A determines how bright the bubble will be at its brightest point. 1 corresponds to white, values smaller than 1 give a darker tinge of red. B determines how tight the white spot is. Obviously, rx and ry are the size, so if you want to have smaller circles, you could scale them accordingly, keeping their ratio.

We could easily write a script that takes a data file with two (or more) columns, and turns it into a gnu script along the lines presented above. A possible implementation in gawk is here.
```#!/bin/bash

gawk  '{
if(\$0!~/#/) {
x[i] = \$1
y[i] = \$2
if(i==0) { mx = x[i]; my = y[i] }
if(i>0)   {
if(max < x[i]) max = x[i]
if(mix > x[i]) mix = x[i]
if(may < y[i]) may = y[i]
if(miy > y[i]) miy = y[i]
}
i++
}
}
END { eps = 0.03
lx = mix-eps*(max-mix)
hx = max+eps*(max-mix)
ly = miy-eps*(may-miy)
hy =  may+eps*(may-miy)
print "reset"
print "f(x) = A*exp(-x*x/B/B)"
printf "rx=%f; ry=%f; A = 1; B = 0.2; C=0.5*rx; D=-0.4*ry\n", 0.02*(hx-lx), 0.035*(hy-ly)
print "g(u,v) = (2*cos(u)*v*rx+C)*(2*cos(u)*v*rx+C)+(3.5*sin(u)*v*ry+D)*(3.5*sin(u)*v*ry+D)"
print "unset key; unset colorbox; set view map"
printf "set xrange [%f:%f]; set yrange [%f:%f]\n", lx, hx, ly, hy
print "set parametric; set urange [0:2*pi]; set vrange [0:1]"
print "set isosamples 20, 20; set samples 30"
print "set palette model HSV functions 1, 1-f(gray), 1+2*f(gray)"
printf "splot "
for(k=0;k<i-1;k++) {
printf "cos(u)*rx*v+%f,sin(u)*ry*v+%f, g(u,v) w pm3d, \\\n", x[k], y[k]
}
printf "cos(u)*rx*v+%f,sin(u)*ry*v+%f, g(u,v) w pm3d\n", x[i-1], y[i-1]
}' \$1```

At the beginning, we fill up the x[] and y[] vectors, while, at the same time, determining the minimum and maximum of these two vectors. Then we use these values to determine xrange and yrange (I defined them a little bit bigger than the minimum and maximum of the vectors, so that the circles are confined in the plot.), and the value of rx and ry, as well. At the end, we simply call the plot function with the arguments that we take from the x[] and y[] vectors.

I believe it should be fairly easy to modify the script, should you want to make some changes to it. Finally, a word of caution: since we use some 900 samples for each circle we plot, this is going to be reflected in the file size, if you use a vector output. As a comparison with the method that I discussed yesterday, the size of that postscript file was something around 20 kB, while the size of the file we would produce with the present method is about 600 kB. We have this big difference simply because yesterday we re-defined one of the symbols, while today we plot each point, without any reference to a particular symbol. Therefore, if you want to include the plot in a publication, it is better to use yesterday's method. For bitmap files, png, jpeg and the like, there should be no significant difference in size.

Saturday, 27 June 2009

Making bubble graphs with gnuplot

Today, I will try to explain how one can make a bubble graph with gnuplot. This is a low-level hack, and is probably not for the faint-hearted. On another occasion, I will discuss how this can be done in a more convenient way, at the price of producing larger files. (Well, this is true, only if one makes postscript figures, but for raster, like png, it shouldn't matter.)

One of gnuplot's most appealing features (at least, for me) is that the postscript file it produces is actually human-readable, and quite instructive. This makes it possible to easily change the plot even when it is in fact finished in the sense that gnuplot has produced it.
So, if you feel a bit adventurous, you could try to see what can be achieved by manipulating the postscript file. Plot the data points with point type 7 (as a matter of fact, it doesn't really matter which style you use, but in what follows, I will assume that we have point type 7), and redirect the output to a postscript file, i.e., you would have a gnu script as in

```reset
unset key
set border back
set terminal postscript eps enhanced
set output 'bubble.eps'
plot 'bubble.dat' u 1:2 w p pt 7 ps 2```

Simple enough, isn't it? (Do not forget to reset the terminal, once you are done with the plot. Otherwise, all your consequent graphs will be written to 'bubble.eps'.)

Now, that we have the postscript file, open it in any text editor, and search for the line
`/CircleF { stroke [] 0 setdash hpt 0 360 arc }`

This should be somewhere around line 161. This is just the definition of point type 7, and this definition is used in the actual plotting of your data, somewhere at the end of the postscript file. We will replace the line above by
```/CircleF { stroke [] 0 setdash 0 0 0 arc
/ColorSpace /DeviceRGB
/Coords [currentpoint exch hpt 1 mul sub exch hpt 0.2 mul add 0 currentpoint 150]
/Function << /FunctionType 2
/Function { 1 }
/Domain [0 1]
/C1 [ 0.8 0 0 ]
/C0 [ 1 1 1 ]
/N 1>> >> shfill} def```

While I am not going to write a tutorial on the postscript language, we could certainly see what this does. What we have to keep in mind is that postscript works on stack variables, i.e., we can access various quantities in a top-to-bottom direction, with the last operand being at the top.

First, we take the current x and y coordinates (this is the first currentpoint instruction), and reverse the order of x and y (exch), subtract the value hpt from x (hpt 1 mul sub), reverse the order again, so we are back to the original x-y order, add 0.2*hpt to y (hpt 0.2 mul add). With this set of operations we defined the centre of the colour of our bubble, which is denoted by the next 0. (So, the centre of colour will be to the left and to the top of the geometric centre. North-west, that is.)

Then we define the origin as currentpoint, and the radius as 150. The next couple of lines define the function type and the domain of this function. /C1 and /C0 determine the colour of the object when the function in question takes on the value of 1 (this will be a bit dark shade of red) and 0 (which will be glowing white). The last line just fills the circle. Now, that we know how this works, we can tweak our figure, if needed. As I said, the size of the bubbles will be determined by the last number on the line beginning with /Coords, and we can move the centre of the colour by changing 0.2 to something else (this would be the y direction), or 1 (x direction). In this way, we can where the light appears to come from. Finally, the colour can be changed by replacing the lines in /C1 and /C0. All in all, we have produced the following figure. While this might appear a bit grainy, this is only an artefact of the postscript viewer. If you print it out, it's going to be all right.

As I promised at the beginning, I will show an easier route to achieving the same result. However, we can already see that since the manipulation of the postscript file doesn't require any prior knowledge about anything, this process can easily be scripted, so that you haven't got to replace the lines by hand every time you want to make bubbles.

Friday, 26 June 2009

Broken axis revisited

In one of my first posts, I described a method by which one can draw broken axes in gnuplot. Now, that method suffers from many problems, most notably that it relied on gnuplot 4.3 (still in development), and one had to do almost everything by hand. Today we will look at a different method, which eliminates these problems, and which requires very little human intervention. So, we would like to have something like this graph:

(This is just a sin function, broken at 4.5, and for the right hand side, displaced by 3). The recipe that we are going to follow takes only a couple of lines. But, let us see the code!
```reset
A=4.5
B=3.0
C=0
D=10
E=1
eps=0.05*E
eps2=0.005*(D-C)
unset key
f(x) = (x<A?1:1/0)
g(x) = (x>A?1:1/0)
h(x) = (x<A?x:x+B)
set xtics 0, 2, A
set xtics add (gprintf("%.0f", 6+B) 6)
set xtics add (gprintf("%.0f", 8+B) 8)
set xtics add (gprintf("%.0f", 10+B) 10)

set xlabel 'Time [s]'
set ylabel 'Position [m]'
set yrange [-E:E]
set arrow 1 from A-eps2, -E to A+eps2, -E nohead lc rgb "#ffffff" front
set arrow 2 from A-eps2, E to A+eps2, E nohead lc rgb "#ffffff" front
set arrow 3 from A-eps-eps2, -E-eps to A+eps-eps2, -E+eps nohead front
set arrow 4 from A-eps+eps2, -E-eps to A+eps+eps2, -E+eps nohead front
set arrow 5 from A-eps-eps2, E-eps to A+eps-eps2, E+eps nohead front
set arrow 6 from A-eps+eps2, E-eps to A+eps+eps2, E+eps nohead front
plot [C:D] f(x)*sin(x) w l lt 1, g(x)*sin(h(x)) w l lt 1```

The first several lines are just definitions: 'A' is the position at which we want to break the line, 'B' is the value by which the right hand side of the graph will be shifted, 'C', 'D' and 'E' are just definitions for the x and y ranges, while 'eps' and 'eps2' will be needed for the drawing of the slanted tics representing the discontinuity in the axes.

The first really important definition is that of f(x), g(x) and h(x), which are helper functions (green lines). In all three cases, we use the ternary operator that I discussed in my previous post. Basically, f(x) returns 1, if x is smaller than 'A', g(x) does just the opposite, while h(x) produces a shift of its argument, if x is larger than 'A'.

Then we set the labels on the x axis (blue line in the code). But watch out: we do it only up to 'A', because the axis is going to be broken at 'A'. In order to make up for the missing tic marks, we add them in the next three lines (red). We can save some work, and can eliminate the possibility of messing something up, if we ask gnuplot to compute the values for us; this is done by the string conversion function gprintf, and adding the converted string.
`set xtics add (gprintf("%.0f", 10+B) 10)`

The last bit is drawing the break points on the axis. In order to do so, we draw 6 arrows with no heads. Note that the first two arrows are white, and they serve as a beauty plaster: we put them on top of the axes, to give the impression that the axes are broken. Also note that since arrows are drawn earlier than axes and borders, we have got to explicitly instruct gnuplot to bring the arrows to the front of the figure. The last four line segments are slightly slanted, to give a more appealing look to the graph. Once we are done with the set up of the figure, we can plot the function. The only thing we have to keep in mind is that the right hand side should be shifted by 'B', which is done by calling h(x) in the argument of sin(x).

The procedure for the y axis is very similar, so I just dodge the discussion of that. I should point out that you can easily change the angle of the last 4 arrows by multiplying the shift of the y coordinates of the end points by something, and it is just as easy to move them farther from each other, if you replace the definition of 'eps2'. Modifying 'eps' will change their length, but not their angle.

Till next time!

Saturday, 20 June 2009

Plotting an inequality in 2D

In one of the comments, someone asked how an inequality can be plotted in gnuplot. Well, there are two solutions (at least) for this problem. One of them is discussed in the following page, if you need only the curve separating the region of the x-y plane that fulfils the inequality from that that doesn't. But the question was how to shade these regions. We will look at the inequality
`x3 - 2xy + y3>0`

The way to do this is to define a function that has a constant value, if the inequality holds true, and has an indefinite value, if this is not the case. The value of 1/0 is quietly ignored in gnuplot, thus we can define our function as follows
`f(x,y)=(x*x*x - 2*x*y + y*y*y>0)?1:1/0`

where we used the ternary operator, which always has the form
`A?B:C`

i.e., returns 'B', if 'A' is true, and 'C' otherwise. (Incidentally, the value 'B' can be a ternary operator in itself, and in this way, multiple conditions can be imposed.) In the function above, f(x,y) = 1, if x3 - 2xy + y3>0, and f(x,y) = 1/0 (undefined), if x3 - 2xy + y3<0. Once we defined our function, we can plot it in the usual way. The complete script and the figure is given below.
```reset
f(x,y)=(x*x*x - 2*x*y + y*y*y>0)?1:1/0
unset colorbox
set isosample 300, 300
set xlabel 'x'
set ylabel 'y'
set sample 300
set pm3d map
splot [-2:2] [-2:2] f(x,y) t 'x^3 - 2xy + y^3>0'```

Stacking shiny histograms

As I promised last week, we will re-visit the problem of the shiny histograms, and we will stack them. After that discussion, it will be plain sailing, because we have everything at our disposal, we have simply got to re-arrange the columns a bit. Obviously, there are two ways of stacking histograms, and I will give a script for both cases. Thus, we will achieve something like either of these plots. (The data file is the same as that in the post on 14th June.)

The difference of these two plots is just that the second one is normalised, so all values are divided by the sum in a particular group.
Now, as for the script generating the first graph, that reads as this:
```reset
f(x, a, b, c) = a*exp(-(x-b)*(x-b)/c/c)
A = 0.6; B = 0.5; C = 0.2; D=0.5
unset key; unset colorbox; unset xtics; set pm3d map
set yrange [0:6.6]
set parametric; set urange [0:1]; set vrange [0:1]
set xrange [0:5.5]
set multiplot
set isosamples 100, 100
set palette model HSV functions 2.0/3.0, 0.4-0.4*gray, (4.0+2.0*gray)/6.0
splot 5.5*u, 6.6*v, 2*u+v w pm3d
set isosamples 2, 2
set palette model HSV function 1.0, 1.0-f(gray, A, B, C), (1.0+2.0*f(gray, A, B, C))/3.0
splot D*u+0.5, 0.1.0*v, u w pm3d,\
D*u+1.5, 1.0*v, u w pm3d,\
D*u+2.5, 2.0*v, u w pm3d,\
D*u+3.5, 1.2*v, u w pm3d,\
D*u+4.5, 1.6*v, u w pm3d
unset ytics; unset ylabel
set palette model HSV function 0.75, 1.0-f(gray, A, B, C), (1.0+2.0*f(gray,A, B, C))/3.0
splot D*u+0.5,1.0+2.0*v,u w pm3d,\
D*u+1.5, 1.0+1.1*v, u w pm3d,\
D*u+2.5, 2.0+1.2*v, u w pm3d,\
D*u+3.5, 1.2+2.4*v, u w pm3d,\
D*u+4.5, 1.6+2.1*v, u w pm3d
set label 1 "First" at 0.5, -1.32 rotate by 45
set label 2 "Second" at 1.5, -1.32 rotate by 45
set label 3 "Third" at 2.5, -1.32 rotate by 45
set label 4 "Fourth" at 3.5, -1.32 rotate by 45
set label 5 "Fifth" at 4.5, -1.32 rotate by 45
set palette model HSV function 0.5, 1.0-f(gray, A, B, C), (1.0+2.0*f(gray,A, B, C))/3.0
splot D*u+0.5, 3.0+1.0*v, u w pm3d,\
D*u+1.5, 2.1+3.2*v, u w pm3d,\
D*u+2.5, 3.2+3.4*v, u w pm3d,\
D*u+3.5, 3.6+1.1*v, u w pm3d,\
D*u+4.5, 3.7+1.5*v, u w pm3d
unset multiplot```

Most of the script has been analysed in the previous post, so I will point out the differences only. If you recall, when we plotted the bargraphs side by side, we had to shift the 'x' position, so that they didn't overlap. Now, since we want to stack them, we have to shift the 'y' position, and in particular, we have to shift it by a value that renders the bars just on top of each other. So, if you look at the three lines highlighted in blue, you will notice that the multiplier of the dummy variable 'v' is always added to the shift of 'v' in the consecutive plot, i.e., in the first plot, the multiplier of 'v' is 1 and the shift is 0, in the second plot, the multiplier of 'v' is 1.1 and the shift is 1, and in the third plot, the multiplier of 'v' is 3.2, and the shift is 2.1, which happens to be the sum of 1 and 1.1. These were the appropriate shift for the second column in the plot. In the rest of the script, we just have to follow the same recipe, and indeed, this is what happens. (I won't demonstrate this, though:-)

Obviously, one would like to do this automatically, and this can be done with the following gawk script:
```#!/bin/bash

gawk  'BEGIN {i=0; max=0}
{
if(\$0!~/#/) {
label[i] = \$1
sum=0
for(j=2;j<=NF;j++) {
v[i,j-1] = \$j
sum+=\$j
}
if(max<sum) max=sum
i++
}
}
END {
print "reset"
print "f(x, a, b, c) = a*exp(-(x-b)*(x-b)/c/c)"
print "A = 0.6; B = 0.5; C = 0.2; D=0.5"
print "unset key; unset colorbox; unset xtics; set pm3d map"
printf "set yrange [0:%f]\n", max
printf "set parametric; set urange [0:1]; set vrange [0:1]\n"
printf "set xrange [0:%f]\n", i+0.5
print "set multiplot"
print "set isosamples 100, 100"
print "set palette model HSV functions 2.0/3.0, 0.4-0.4*gray, (4.0+2.0*gray)/6.0"
printf "splot %f*u, %f*v, 2*u+v w pm3d\n", i+0.5, max
print "set isosamples 2, 2"
for(l=1;l<j-1; l++) {
if(l==j-2) {
for(k=0;k<i;k++) {
printf "set label %d \"%s\" at %f, -%f rotate by 45\n", k+1, label[k], k+0.5, max/5.0
}
}
if(l==2) print "unset ytics; unset ylabel"
printf "set palette model HSV function %f, 1.0-f(gray, A, B, C), (1.0+2.0*f(gray, A, B, C))/3.0\n", 1.0-(l-1)/(j-1)
printf "splot "
for(k=0;k<i-1;k++) {
printf "D*u+%f,%f+%f*v,u w pm3d,\\\n", k+0.5, base[k], v[k,l]
base[k]+=v[k,l]
}
printf "D*u+%f,%f+%f*v,u w pm3d\n", i-0.5, base[i-1], v[i-1,l]
base[i-1]+=v[i-1,l]
}
print "unset multiplot"

}' \$1```

It is very easy to turn this script into one that puts out normalised stacks. All we have to do is to divide the values by the appropriate sum, and re-set the 'y' range to [0:1.1] or something similar, shift the 'y' position of the labels (remember, it was defined as a function of 'max' above, but max is just 1.1 in the present case), and there is nothing else. This difference is highlighted in blue.

```#!/bin/bash

gawk  'BEGIN {i=0; max=0}
{
if(\$0!~/#/) {
label[i] = \$1
sum=0
for(j=2;j<=NF;j++) {
v[i,j-1] = \$j
sum+=\$j
}
if(sum!=0)
for(j=2;j<=NF;j++) {
v[i,j-1] /= sum
}
i++
}
}
END {
print "reset"
print "f(x, a, b, c) = a*exp(-(x-b)*(x-b)/c/c)"
print "A = 0.6; B = 0.5; C = 0.2; D=0.5"
print "unset key; unset colorbox; unset xtics; set pm3d map"
print "set yrange [0:1.1]"
print "set parametric; set urange [0:1]; set vrange [0:1]"
printf "set xrange [0:%f]\n", i+0.5
print "set multiplot"
print "set isosamples 100, 100"
print "set palette model HSV functions 2.0/3.0, 0.4-0.4*gray, (4.0+2.0*gray)/6.0"
printf "splot %f*u, 1.1*v, 2*u+v w pm3d\n", i+0.5
print "set isosamples 2, 2"
for(l=1;l<j-1; l++) {
if(l==j-2) {
for(k=0;k<i;k++) {
printf "set label %d \"%s\" at %f, -0.2 rotate by 45\n", k+1, label[k], k+0.5
}
}
if(l==2) print "unset ytics; unset ylabel"
printf "set palette model HSV function %f, 1.0-f(gray, A, B, C), (1.0+2.0*f(gray, A, B, C))/3.0\n", 1.0-(l-1)/(j-1)
printf "splot "
for(k=0;k<i-1;k++) {
printf "D*u+%f,%f+%f*v,u w pm3d,\\\n", k+0.5, base[k], v[k,l]
base[k]+=v[k,l]
}
printf "D*u+%f,%f+%f*v,u w pm3d\n", i-0.5, base[i-1], v[i-1,l]
base[i-1]+=v[i-1,l]
}
print "unset multiplot"

}' \$1```

Sunday, 14 June 2009

False 3D bargraphs in gnuplot

Yesterday, we saw how cuboids representing bargraphs can easily be drawn in gnuplot. That implementation was completely 3D, so to speak. Today we will try something simpler. The idea is that in order to suggest that something is in 3D, it is not necessary to draw its 2D projection accurately, it is enough, if we can give the impression that the object is a three-dimensional one. Perhaps, the simplest way to do this is to colour the surface properly: by making some parts shiny, we can pretend that light is reflected from a curved surface. This is the trick that we are going to use today. First, we will take an easy example, and then later we can expand it. We will draw the following graph in a couple of lines:

As usual, we give the gnu script first, and then dissect it, step by step.
```reset

f(x, a, b, c) = a*exp(-(x-b)*(x-b)/c/c)
A = 0.6
B = 0.5
C = 0.2

unset key
unset colorbox
unset xtics
set ytics 0,1,4
set parametric
set urange [0:0.5]
set vrange [0:1]

set pm3d map
set xrange [0:10]
set yrange [0:4]

set multiplot
set isosamples 100, 100
set palette model HSV function 2.0/3.0, 0.4-0.4*gray, (4.0+2.0*gray)/6.0
splot 20*u, 4*v, 2*u+v w pm3d

set isosamples 100, 2
set palette model HSV function 0.95, 1-f(gray, A, B, C), (1+2*f(gray, A, B, C))/3
splot u+1,v,u w pm3d,\
u+2,2*v,u w pm3d, \
u+3,3*v,u w pm3d, \
u+4,2*v,u w pm3d, \
u+5,3*v,u w pm3d, \
u+6,3.5*v,u w pm3d, \
u+7,1.4*v,u w pm3d, \
u+8,3*v,u w pm3d, \
u+9,1*v,u w pm3d

unset multiplot
```

First, we define the function according to which we are going to colour the bars. This is done by defining f(x,a,b,c) and the parameters A, B, and C. The the following couple of lines define the graph properties. The first relevant line is in blue, where we draw the background of our graph. We define the palette for pm3d using the HSV colour space. As in RGB, the colour is given as a triplet, the first member of which is the hue (in this case it will be something close to blue), the second is the saturation, which we defined to be whiter, if the value of gray is higher, and finally, the value. We then plot '20*u, 4*v, 2*u+v' on our map. By plotting the value 2*u+v, we will get a gradient that runs from bottom to top, left to right. You can modify this, if you want to change the direction in which the background becomes whiter.

Having drawn the background, we can plot the bars. This is done in the next step, where we always plot something like 'u+shift,value*v,u', where shift is the position of the bar, and value is its height. Note that before calling splot, we re-defined our palette, using the function f(x,a,b,c) from the beginning of the script. 'a' will set how white the colour becomes, 'b' is the position of the highest saturation value (or the centre of the pattern), and 'c' determines how tight the colour gradient is.

Now, let us see how this procedure can be automated. As previously, we will use a gawk script, which processes a data file with N+1 columns, the first setting the label, and the last N containing N values (i.e., instead of on set of bars, we will have N sets bars). Here is the script,

```#!/bin/bash

gawk  'BEGIN {i=0; max=0}
{
if(\$0!~/#/) {
label[i] = \$1
for(j=2;j<=NF;j++) {
v[i,j-1] = \$j
if(max<v[i,j-1]) max=v[i,j-1]
}
i++
}
}
END {
print "reset"
print "f(x, a, b, c) = a*exp(-(x-b)*(x-b)/c/c)"
print "A = 0.6; B = 0.5; C = 0.2"
print "unset key; unset colorbox; unset xtics; set pm3d map"
printf "set yrange [0:%f]\n", max
printf "set parametric; set urange [0:%f]; set vrange [0:1]\n", 1.0/(j-1)
printf "set xrange [0:%d]\n", i+1
print "set multiplot"
print "set isosamples 100, 100"
print "set palette model HSV functions 2.0/3.0, 0.4-0.4*gray, (4.0+2.0*gray)/6.0"
printf "splot %d*u, %f*v, 2*u+v w pm3d\n", (i+1)*(j-1), max
print "set isosamples 100, 2"
for(l=1;l<j-1; l++) {
if(l==j-2) {
for(k=0;k<i;k++) {
printf "set label %d \"%s\" at %f, -%f rotate by 45\n", k+1, label[k], k+0.5, max/5.0
}
}
if(l>1) print "unset ytics; unset ylabel"
printf "set palette model HSV function %f, 1.0-f(gray, A, B, C), (1.0+2.0*f(gray, A, B, C))/3.0\n", 1.0/l
printf "splot "
for(k=0;k<i;k++) {
printf "u+%f,%f*v,u w pm3d,\\\n", k+l/(j-1), v[k,l]
}
printf "u+%f,%f*v,u w pm3d\n", i+l/j, v[i,l]
}
print "unset multiplot"

}' \$1```

and here is the data file that I used to produce the figure below:
```First 1.0 2.0 1.0
Second 1.0 1.1 3.2
Third 2.0 1.2 3.4
Fourth 1.2 2.4 1.1
Fifth 1.6 2.1 1.5```

I believe, it should be fairly easy to hack the script to suit your needs. The points of interest are the colour of the background and the bars, and the position of the labels. Small tweaks in the script will give you just anything you would want. Next time I will come back to this script, and we will see how this can be modified to produce stacked bargraphs. So long!

Saturday, 13 June 2009

3D bargraphs in gnuplot, one more time

Some time ago, on 25th May, we discussed how we can draw bargraphs based on cylinders. What was good about them is that we gave the graph a 3D look by adding phong to the surface. The downside is that cylinders are not the conventional shapes to draw something like this. (As a matter of fact, there is one exception: when one wants to draw a bargraph in 2D, but give it a phong on the 2D surface, which gives the impression that what we are looking at is the projection of a cylinder. I will come to this point on a later date.) Instead, people plot a rectangular cuboid. Today we will try our hands at that. At the end of the day, we will have the following graph:

First, let us see how we create the figure above. The gnu script for drawing one cuboid is as follows:
```set multiplot
set palette defined (0 1 0.5 0.5, 1 1 0.5 0.5)
splot u, v, B w pm3d

set palette defined (0 1 0.3 0.3, 1 1 0.3 0.3)
splot A+u, 0, v w pm3d

set palette defined (0 0.8 0 0, 1 0.8 0 0)
splot A+A, u, B*v w pm3d

unset multiplot```

First, we define our palette. This should have only one colour per face, because we do not want to add any gradient, phong etc. (If you insist, you can add that very easily.) Then we plot the top of the cuboid, at a z-level of B, and shifted horizontally by A. In the next step, we re-define our palette, which will now have a slightly darker shade of red, and plot the front surface. In the final step, we move to the third visible face, make it darker, and plot it. In this way, we have drawn one cuboid. The rest is nothing but the repetition of these three steps, and the general set-up of the graph. Obviously, by changing the order of the palette definitions, one can change the direction of the lighting.

Instead of copying the complete gnu script here, I will just give the gawk script, which processes a file of 2 columns, the first being the label of the bars, and the second containing the actual values. As usual, you can call the script from gnuplot without writing the file to disc as
`load "< gawk-script.sh my_data_file.dat"`

So, this is what we have:
```#!/bin/bash

gawk  'BEGIN {i=0; max=0}
{
if(\$0!~/#/) {
label[i] = \$1
v[i] = \$2
if(max<v[i]) max=v[i];
i++
}
}
END {
print "reset"
print "set view 60, 20; set parametric; set isosample 2, 2"
print "unset key; unset colorbox"
print "set ticslevel 0"
print "set urange [0:0.5]; set vrange [0:1]"
printf "set xrange [0:%d]; set yrange [-%f:%f]; set zrange [0:%f]\n", i, i/2, i/2, max
print "set multiplot"
print "set border 1+2+4+8+16+32+64+256+512; unset ytics; unset xtics"
print "set palette model RGB functions 0.9, 0.9,0.95"
printf "splot %d*u, -%f+%d*v, 0 w pm3d\n", 2*i, i/2, i
print "unset border; unset xtics; unset ytics; unset ztics"
print "set palette model RGB functions 1, 254.0/255.0, 189.0/255.0"
printf "splot 0, -%f+%d*v, %f*u w pm3d, %d*u, %f, %f*v w pm3d\n", i/2, i, 2*max, 2*i, i/2, max

print "set palette defined (0 1 0.5 0.5, 1 1 0.5 0.5)"
printf "splot "
for(j=0;j<i-1;j++) {
printf "%d+u, v, %f w pm3d,\\\n", j, v[j]
}
if(i>1) printf "%d+u, v, %f w pm3d\n", i-1, v[i-1]

print "set palette defined (0 1 0.3 0.3, 1 1 0.3 0.3)"
printf "splot "
for(j=0;j<i-1;j++) {
printf "%d+0.5, 2*u, v*%f w pm3d,\\\n", j, v[j]
}
if(i>1) printf "%d+0.5, 2*u, v*%f w pm3d\n", i-1, v[i-1]

print "set palette defined (0 0.8 0 0, 1 0.8 0 0)"
for(j=0;j<i;j++) {
printf "set label %d \"%s\" at %f, %f rotate by 40\n", j+1, label[j], j+0.3, -i/2+1
}
printf "splot "
for(j=0;j<i-1;j++) {
printf "%d+u, 0, v*%f w pm3d,\\\n", j, v[j]
}

if(i>1) printf "%d+u, 0, v*%f w pm3d\n", i-1, v[i-1]

print "unset multiplot"

}' \$1```

The generalisation to more than 1 data column should be trivial: you have only got to choose your colours, and re-size the bars in the x-y directions accordingly.

Sunday, 7 June 2009

Wall charts with gnuplot

Today we will discuss how to make a wall chart with gnuplot. There is a simpler version of this, the so-called fence plot, that you can find on gnuplot's surface demo page, and we will build on this. By the end of our adventure, we will have produced this graph:

So, let us see what is to be done. First, I will just give the gnu script, and then discuss segments of it.
```reset

A = 4
B = 3
C = 1
unset key
unset colorbox
f(x,y) = abs(sin(x*x+y*y)/(x*x+y*y)) # We plot this function
set isosample 2, 30
set parametric
set urange [-B:B]
set vrange [0:1]
set xrange [-A:A]
set yrange [-B:B]
set zrange [0:C]
set ticslevel 0 # We need this, in order to shift the 0 of the z axis

dx = 0.3 # Thickness of the walls

set multiplot
set border 1+2+4+8+16+32+64+256+512
# Plot the "background" first
set palette model RGB functions 0.9, 0.9,0.95
splot -A+2*A*v, B*u,0 w pm3d

unset border; unset xtics; unset ytics; unset ztics
set palette model RGB functions 1, 254.0/255.0, 189.0/255.0
splot -A, B*u, C*v w pm3d, A*u, B, C*v w pm3d

x0 = -A+1
y0 = 0.0
set palette model RGB functions 0, gray/2+0.5, 0.0
splot x0-dx, u, f(u,y0) w l lt -1 lw 0.2, \
x0-dx*v, u, f(u,y0) w pm3d,\
x0, u, f(u,y0)*v with pm3d, \
x0, u, f(u,y0) w l lt -1 lw 0.2

x0 = x0+1
y0 = y0+0.5
set palette model RGB functions gray/2+0.5, 0.0, 0.0
splot x0-dx, u, f(u,y0) w l lt -1 lw 0.2, \
x0-dx*v, u, f(u,y0) w pm3d,\
x0, u, f(u,y0)*v with pm3d, \
x0, u, f(u,y0) w l lt -1 lw 0.2

x0 = x0+1
y0 = y0+0.5
set palette model RGB functions 0, 0, gray/2+0.5
splot x0-dx, u, f(u,y0) w l lt -1 lw 0.2, \
x0-dx*v, u, f(u,y0) w pm3d,\
x0, u, f(u,y0)*v with pm3d, \
x0, u, f(u,y0) w l lt -1 lw 0.2

x0 = x0+1
y0 = y0+0.5
set palette model RGB functions gray/2+0.5, gray/2+0.5, 0
splot x0-dx, u, f(u,y0) w l lt -1 lw 0.2, \
x0-dx*v, u, f(u,y0) w pm3d,\
x0, u, f(u,y0)*v with pm3d, \
x0, u, f(u,y0) w l lt -1 lw 0.2

x0 = x0+1
y0 = y0+0.5
set palette model RGB functions gray/2+0.5, 0, gray/2+0.5
splot x0-dx, u, f(u,y0) w l lt -1 lw 0.2, \
x0-dx*v, u, f(u,y0) w pm3d,\
x0, u, f(u,y0)*v with pm3d, \
x0, u, f(u,y0) w l lt -1 lw 0.2

x0 = x0+1
y0 = y0+0.5
set palette model RGB functions 0, gray/2+0.5, gray/2+0.5
splot x0-dx, u, f(u,y0) w l lt -1 lw 0.2, \
x0-dx*v, u, f(u,y0) w pm3d,\
x0, u, f(u,y0)*v with pm3d, \
x0, u, f(u,y0) w l lt -1 lw 0.2

x0 = x0+1
y0 = y0+0.5
set palette model RGB functions gray/3+0.3, 0, 0
splot x0-dx, u, f(u,y0) w l lt -1 lw 0.2, \
x0-dx*v, u, f(u,y0) w pm3d,\
x0, u, f(u,y0)*v with pm3d, \
x0, u, f(u,y0) w l lt -1 lw 0.2
unset multiplot```

In the first part, we set up the plotting ranges and the function that we want to plot. Note that in order to shift the 0 of the z axis, we have explicitly got to call 'set ticslevel 0'. We can also notice that in the graph above, the top surface is invariant under translations along the x axis, therefore, we can reduce the sampling frequency in that direction. This shouldn't matter, if you make a bitmap file, but it is going to save you a lot of space, if you produce a vector output, e.g., eps or svg.

Having finished the set-up, we plot the background, i.e., we will just plot three planes, the x-y (this one in gray), the y-z, and the x-z. These latter ones are in yellow. We set the colours in the colour scheme of the appropriate plots, by calling 'set palette model RGB functions 0.9, 0.9,0.95', which produces a shade of gray with a bluish tinge. Also note that we unset the border, and the tics immediately after the first plot. The reason for this is that in this way, we can avoid having gnuplot plot the axes and the tics more than once. By the way, before the first plot, we set the border as 'set border 1+2+4+8+16+32+64+256+512', which draws the frame along all, but the three front edges. You can learn how to set the border by reading the output of the command
`?border`

After having plotted the 'background' of the graph, we plot the first wall. In order to give it some 3D lookout, we have got to plot actually 4 different things: the vertical wall facing us, the top of the body, and the two relevant edges. The first two plots are surfaces, while the latter two are lines, so we call appropriate modifiers to splot, namely, 'with pm3d' and 'with lines'. We have also got to specify the colour of the walls, which we did by issuing 'set palette model RGB functions 0, gray/2+0.5, 0.0' before the first plot. The rest of the script is nothing but the repetition of these seven lines. Letting x0 = x0 + 1, we move the walls along the x axis, while letting y0 = y0 + 0.5, we plot a new function each time. Since we want each wall to have a different colour, we also need to change the palette in each step.

There are two remarks that we can make. The first is that the order of plotting the wall, the top and the two edges depends on the view, so don't be surprised, if you rotate the figure and all of a sudden, it looks messy! The second comment is that you can easily 'close' the graphs, by adding one parametric plot to each of the walls, which would be something like this:
`x0-dx*v, -B, f(x0,-B)*(B+u)/(2*B) w pm3d`

This is it for today!

Thursday, 4 June 2009

A comment on phonged surfaces in Gnuplot - how can the colour space be expanded?

A couple of days ago (on the 24th of May, to be more specific), we discussed a way to add phong to a plot. However, that method worked only for a single colour (red in that particular case), but it would not do any good for a palette. Palette is the name of the colour space that is used in pm3d to colour surfaces. At the end of the post, I also showed a phonged surface with a general surface, and I must say that that wasn't very appealing. The problem is, if you take a look at this graph

you will immediately notice that apart from the phong, not too much can be seen, for the simple reason that the surface is not coloured, so it is very hard to get a feeling of the heights and depths. Using pm3d, we can colour the surface, but we can't add phong. So, the question is, how do we combine the figure above with this one:

While it appears to be trivial, it turns out to be rather hard. But will manage, don't worry, and at the end of the day, we will have figures that look like these:

Understanding the colouring scheme

To begin, we have got to understand how the colouring in pm3d is done. When you plot without specifying the colour scheme, you are given the default, which looks like this (look at the colour box at the bottom, don't worry about the curves)

If you dig a bit deeper, e.g., by issuing the help command
`?rgbformulae`

you will be presented with cryptic messages, like
```Some nice schemes in RGB color space
3,11,6   ... green-red-violet
23,28,3  ... ocean (green-blue-white); try also all other permutations
21,22,23 ... hot (black-red-yellow-white)
30,31,32 ... color printable on gray (black-blue-violet-yellow-white)
33,13,10 ... rainbow (blue-green-yellow-red)
34,35,36 ... AFM hot (black-red-yellow-white)```

So, what does this all mean?

When invoking pm3d, first, the colour range (cbrange) is calculated (or specified), and then, using the colour range, all values of the function z(x,y) are mapped to the [0:1] interval. Using this value, which is called gray, the three components of the colour palette are calculated through three functions, red(gray), green(gray) and blue(gray). When you invoke the command
`set palette rgbformulae 7, 5, 15`

(This is the default, so you actually haven't got to invoke this command...), you assign function 7 to red(gray), function 5 to green(gray) and function 15 to blue(gray). But then, how on Earth does one figure out which function to use? Well, the easy solution is to issue
`show palette rgbformulae`

upon which, this list is shown:
```* there are 37 available rgb color mapping formulae:
0: 0               1: 0.5             2: 1
3: x               4: x^2             5: x^3
6: x^4             7: sqrt(x)         8: sqrt(sqrt(x))
9: sin(90x)       10: cos(90x)       11: |x-0.5|
12: (2x-1)^2       13: sin(180x)      14: |cos(180x)|
15: sin(360x)      16: cos(360x)      17: |sin(360x)|
18: |cos(360x)|    19: |sin(720x)|    20: |cos(720x)|
21: 3x             22: 3x-1           23: 3x-2
24: |3x-1|         25: |3x-2|         26: (3x-1)/2
27: (3x-2)/2       28: |(3x-1)/2|     29: |(3x-2)/2|
30: x/0.32-0.78125 31: 2*x-0.84       32: 4x;1;-2x+1.84;x/0.08-11.5
33: |2*x - 0.5|    34: 2*x            35: 2*x - 0.5
36: 2*x - 1
* negative numbers mean inverted=negative colour component
* thus the ranges in `set pm3d rgbformulae' are -36..36```

So, with these in mind, the default pm3d colouring scheme is sqrt(gray) for red, x^3 for green and sin(360x) for blue. These three functions are shown in the previous figure.

So far, so good, but how do we turn all this into a phong on a general surface? If you recall from the post on phonged surfaces, what we did was to make the colour whiter when the value of the Gaussian function was high, and redder, when it was small. This worked very well for a single colour, because in that case, we simply fixed the red function at 1, and changed the value of the green and blue components according to the value of the Gaussian. But if we want to retain the pm3d colouring and apply phong at the same time, we have a bit of a problem here, because all of a sudden, the colour of a point on the surface will depend on two quantities: the value if z(x,y), and the value of f(x,y,z), which is our Gaussian function. However, the colour functions take only one argument, and nothing more. How will we parametrize a 2D space with only one variable? Well, we will discretise it in a smart way. Let us suppose, that we want to have only 3 white levels, and let us suppose, for a moment, that we are not restricted to the [0:1] interval for gray, but it can run between 0 and 4. Then, using only the red function for now, we will take the following scheme:
```red(x) = sqrt(x)*3/3 + 0, if x is in [0:1]
red(x) = sqrt(x-1)*2/3 + 1/3, if x is in [1:2]
red(x) = sqrt(x-2)/3 + 2/3, if x is in [2:3]
red(x) = sqrt(x-3)*0 + 3/3, if x is in [3:4]```

So, what happens here is that the red component has the same functional behaviour (namely, sqrt(x)) in all four intervals, but the base value is shifted upwards (to the white), while the amplitude is decreased in such a way that red(x) is always in the [0:1] interval. Obviously, we can play the same trick with the green and blue components. What we should notice is that in each interval, the argument of the original function is the fractional part of x, and that the coefficient of sqrt(x) plus the base line adds up to one. Therefore, we can conveniently write
`red(x) = RED(frac(x))*(1-floor(x)/N) + floor(x)/N`

where frac(x) is just the fractional part of x,
`frac(x) = x - floor(x)`

and RED(x) is the original colouring function (sqrt(x) in the example above). N was the number of white levels (3 above) that we wanted to have. Now we have almost everything at our disposal in order to add phong to an arbitrary surface. The last thing to do is to make sure that the argument of red(x) etc. falls into the [0:1] interval. We achieve this buy expanding the colour range with respect to the zrange (which is the interval of possible z values). With these in mind, the gnu file would read as follows:

```reset

unset colorbox
unset key
unset border
unset tics
f(x,y) = sin(x*x+y*y)/(x*x+y*y)

A = 100.0
zi = -0.4
za = 1.0
ci = zi
ca = zi + (za-zi)*A

set isosample 150, 150
set cbrange [ci:ca]
set xrange [-3:3]
set yrange [-3:3]
set zrange [zi:za]

set table 'phong2.dat'
splot f(x,y)
unset table

# These are the functions of the original colour palette
r(x) = sqrt(x)
g(x) = x**3.0
b(x) = q(x<0.5?sin(x*2.0*pi):0.0)

# The Gaussian phong
h(x,a,b) = exp(-(x-a)*(x-a)/b/b)

# These are helper functions
R(x) = A*x - A*floor(x) # Fractional part of x
L(x) = floor(x)         # This is just a short-hand for the floor function
F(x) = L(A*x)/(ca-ci)   # The floor function divided by the colour range

# This is where we want to place the white spot
x0 = -0.3
y0 = -1.0
z0 = f(x0, y0)

# And these are the new colour scheme
red(gray) = r(R(gray))*(1.0-F(gray))+F(gray)
green(gray) = green(R(gray))*(1.0-F(gray))+F(gray)
blue(gray) = blue(R(gray))*(1.0-F(gray))+F(gray)

set palette model RGB functions red(gray), green(grey), blue(gray)
splot 'phong2.dat' u 1:2:3:((A-1)*(za-zi)*h(\$3,z0,0.08)*h(\$1,x0,0.3)*h(\$2,y0,0.3)+\$3)) w pm3d ```

That is, first we decide that we want to have 100 white levels, this is A = 100. Then we fix the zrange according to the function that we want to plot, and after that, we specify the colour range. Here we shouldn't forget that have got to stretch the colour range to 100 times the zrange. Next, we identify the functions that we will need. For a start, we begin with the default pm3d scheme, whose functions we have already discussed above. Then a couple of helper functions follows, with the definition of the new colour scheme. Immediately before plotting the surface, we set the palette, and then we call the file that we produced a couple of lines earlier. We plot the surface using the x, y, z values, and for the colouring, we stretch the range of gray as we discussed above: the value of z(x,y) (this is the 3rd column, \$3) will determine the base colour, and the value of the Gaussian will determine how much white tinge this base colour will have.

Well, there is not too much more to it. Once you decided your colour scheme and the number of white levels, the rest should be plain sailing.