Friday, 26 February 2010

Phong on histograms with a one-liner (almost:-)

We have seen in the last couple of posts that with the new concept of functions, quite a few interesting effects can be achieved. Today I would like to show a trick that solves a problem that I discussed some time ago, when we made shiny histograms using a for loop in gnuplot. We will do the same thing here, but in two lines only. It is quick, and the results are just as good as in that case.

So, here is my data file
1
3
4
2
3
5
2
which I will just name as 'bar.dat', and here is our script
reset
unset key
set style fill solid 1.0
set yrange [0:6]

colour = "#080000"
f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)
w(n) = 0.8*cos(n/230.0*pi/2.0)

plot for [n=1:230:2] 'bar.dat' u 0:(f($1,n)):(w(n)) with boxes lc rgbcolor colour
Simple enough, let us see what it does! The first four lines are just the usual settings, although, the yrange is really irrelevant. I set it only for aesthetic reasons (otherwise, gnuplot would set the yrange automatically to [1:5] for the data file above, and we wouldn't see one of the columns). Then we define a variable called 'colour'm which we will eventually overwrite in our function definition of f(x,n). f(x,n) returns x, thus, in this regard it would be absolutely useless, but when doing so, it actually prints a string to 'colour'. The next function is w(n), which will determine in what fashion our colour will converge to white.
Finally, we plot the data file some 115 times, each time with a smaller, and shinier box. At the end, we get something like this

We can very easily change the direction of the light. All we have to do is define a new function that shifts the bars as we progress with our for loop. So, the new script could be something like this
reset
unset key
set style fill solid 1.0
set yrange [0:6]

colour = "#080000"
f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)
w(n) = 0.8*cos(n/230.0*pi/2.0)
shift(x,n) = x-0.8*n/850.0

plot for [n=1:230:2] 'bar.dat' u (shift($0,n)):(f($1,n)):(w(n)) with boxes lc rgbcolor colour
with a result as in this graph
It should be really easy to modify the script to accommodate more data sets. Well, this is for now. I don't actually know what I will write about next time, but I am sure that there will be something!
Cheers,
Zoltán

Thursday, 25 February 2010

Parametric plot from a file II.

As I promised yesterday, we will take a closer look at the pie chart, once more, and see how we can utilise what we have learnt recently. I should point out here, that this is not the only way of plotting a pie from a file. If you feel like building your gnuplot from source, you can check out either the CVS tree, or the patch tracker, where you can find a patch that makes it possible to plot slices. You can see a demo here. But we will try a different route here.

First, here is our data file (it could be anything, really)
1 1 Dolphins
2 1 Whales
2 0 Sharks
3 0 Penguins
4 1 Kiwis
5 0 Tux

and here is our script
reset
unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
set urange [0:1]
set vrange [0:2*pi]
set macro
sum = 0.0
ssum = 0.0
n = 0
PLOT = "splot 0, 0, 1/0 with pm3d"

count(x) = (ssum = ssum + $1, 1)
g(x,y,n) = \
sprintf(", \
u*cos((%.2f+%.2f*v)/ssum), \
u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", x, y, x, y, n)

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"
set parametric; set pm3d map; 
eval(PLOT)

There is really nothing that we haven't discussed before: we set a couple of things at the beginning, but most importantly, the macro, and sum, ssum, and n. Then we define a string, PLOT, and two functions. One is to sum the values in our file (we need this, so that we can scale the full range of angles to two pi), and another one, that writes our PLOT command for later use. Note that the first plot in PLOT is actually empty, we plot 1/0. This seems a bit silly, doesn't it? Well, it does, but there is a good reason: successive plots must be separated by a comma, and if we have an empty plot at the very beginning, then we can put the commas before the plots, not after, and in this way, we needn't keep track of which plot we are actually processing. Remember, yesterday we used a separate counter, and an if statement, to determine, whether we need the comma, or not. This we can avoid here.
Next we call the two dummy plots, and finally, we evaluate our PLOT string. Oh, no! At the very end, we marvel in awe at the figure that we produced.

So far, so good, but what if we wanted to add labels, e.g., the value of the slice? That is really easy. All we have to do is to define a function that produces the label. Here is our updated script
reset
unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
set urange [0:1]
set vrange [0:2*pi]
set macro
sum = 0.0
ssum = 0.0
n = 0
PLOT = "splot 0, 0, 1/0 with pm3d"
LABEL = ""

count(x) = (ssum = ssum + $1, 1)
g(x,y,n) = \
sprintf(", \
u*cos((%.2f+%.2f*v)/ssum), \
u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", x, y, x, y, n)

lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f; ", \
x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, sprintf("%2.f", x)), \
sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"
set parametric; set pm3d map; set border 0; unset tics; unset colorbox;
set size 0.6,1
eval(LABEL)
eval(PLOT)
We have an addition string, LABEL, which we initialise with the value "". Then we define a function that prints "set label ..." with the proper positions, and finally, we insert this function in the definition of f(x). Of course, once we called f(x) in the plot, we have to evaluate the string LABEL. So, this is what we get

This script can trivially be modified to print strings that are stored in our file. Watch just the following two lines
...
lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3)), \
sum = sum+x, n = n + 1, x)
...
and we are done. Here is the new pie

Now, you might wonder why we had three columns in our data file, if we didn't want to use it. Well, perhaps, we wanted, just haven't got time till now. So, what could we do with those ones and zeros in the second column? We will make the pie explode! It is really simple, we have to modify two lines in our last script

...
g(x,y,n,dx,dy) = \
sprintf(", \
%.2f+u*cos((%.2f+%.2f*v)/ssum), \
%.2f+u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", 0.2*dx, x, y, 0.2*dy, x, y, n)

lab(alpha, x, r) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
x, (1.25+0.2*r)*cos(alpha), (1.25+0.2*r)*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n, \
$2*cos(2*pi*sum/ssum+pi*x/ssum), \
$2*sin(2*pi*sum/ssum+pi*x/ssum)), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3), $2), \
sum = sum+x, n = n + 1, x)
...
And here is the pie, when exploded

I should mention here that if you are not happy with the colours, it is really easy to help it: all we have to do is to modify the colour palette, using whatever colour combinations. We have covered a lot of material today. Till next time,
Gnuplotter

Wednesday, 24 February 2010

Plotting in 6 dimensions - parametric plot from a file

Today I would like to touch on a vast subject, so prepare for a long post. However, I hope that the post will be worthwhile, for I want to discuss something that cannot be done in any other way. In due course, we will see how we can use gnuplot to create parametric plots from a file. What I mean by that is the following: if you want to plot, say, 10 similar objects, whose size is determined by the first column in a file. Of course, there are cases, when one can manipulate the size, e.g., if there is a pre-defined symbol, we can use one of the columns in a file to determine the size of the symbol. As an example, we can do this
plot 'foo' u 1:2:3 with point pt 6 ps var
which will draw circles whose radius is given by the third column in 'foo'. This is all well, but it works for a limited number of cases only, namely, when there is a symbol to start out with. But what happens, if we want to draw an object that is not a symbol, e.g., arcs of a circle, whose angle is given by one of the columns in a file, or cylinders, whose height is a variable, read from a file. As you can guess from these two suggestions, what we will do is to draw a pie, and a bar chart. I understand that we have done this a couple of times before, but this time, we will stay entirely in the realm of gnuplot, and the scripts are really short. We just have to figure out what to write in the scripts. But beyond this, I will also show how we can plot in 6 dimensions. We will plot ellipses on a plane (first 2 columns), whose two axes are given by the 3rd and 4th axis, the orientation by the 5th, and the colour by the 6th. If you are really pressed for it, you can add three more dimensions: if you draw ellipsoids in 3D, take all three axes from a file, and also the orientation, that would make 9 dimensions altogether. Quite a lot!

So, let us get down to business! The first thing that I would like to discuss is the evaluate command. This is a really nifty way of shortening repetitive commands. Let us suppose that we want to place 10 arrows on our graph, and only the first coordinate of the arrows changes, otherwise everything is the same. Setting one arrow would read as follows
set arrow from 0, 0 to 1, 1
Of course, there are quite a few settings that we could specify, but this was supposed to be a minimal example. Then, the next arrow should be
set arrow from 1, 0 to 2, 1
and so on. What if we do not want to write this line a thousand times, and we do not want to search for the coordinate that we are to change, the first one, in this case? We could try the following
a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)
This function takes 'x', and returns a string with all the settings and coordinates. So, we are almost done. The only thing we should do is to make gnuplot understand that what we want it to treat a(x) as a command, not as a string. Enter the eval command: it takes whatever string is presented to it, and turns it into a command. Thus, the following script creates 5 arrows, all parallel to each other, and consecutively shifted to the rigth
a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)
eval a(0)
eval a(1)
eval a(2)
eval a(3)
eval a(4)
I believe, this is a much simpler and cleaner procedure, than this
set arrow from 0, 0 to 1, 1
set arrow from 1, 0 to 2, 1
set arrow from 2, 0 to 3, 1
set arrow from 3, 0 to 4, 1
set arrow from 4, 0 to 5, 1
I should mention here that if chunks of a command are the same, another method of abbreviating them is to use macros. Those are disabled by default, so first we have to set it. Then it works as follows
set macro
ST = "using 1:2 with lines lt 3 lw 3"
plot 'foo' @ST, 'bar' @ST 
i.e., the term @ST is expanded using the definition above, therefore, this plot is equivalent to this one
plot 'foo' using 1:2 with lines lt 3 lw 3, 'bar' using 1:2 with lines lt 3 lw 3
but the previous one is much more readable. I would also say that using capitals for the macros is probably not a bad idea, because then they cannot be mistaken for standard gnuplot commands. This much in the way of macros!

So, we have the evaluate command, and we have a new concept for functions. Then let us take a closer look at the following code
a(x) = sprintf("set arrow from %d, 0 to %d, 1;\n", x, x+1)
ARROW = ""
f(x) = (ARROW = ARROW.a(x), x)
plot 'foo' using 1:(f($1))
and let us suppose that our file 'foo' contains the following 5 lines
1
3
5
7
9
After plotting 'foo', the string 'ARROW' will be the following
set arrow from 1, 0 to 2, 1;
set arrow from 3, 0 to 4, 1;
set arrow from 5, 0 to 6, 1;
set arrow from 7, 0 to 8, 1;
set arrow from 9, 0 to 10, 1;
I.e., we have a string, which contains instructions for setting 5 arrow. If, at this point, we simply evaluate this string, all 5 arrows will be set. Therefore, we have found a way of using a file to set the coordinates of an arrow. (N.B., if it was for the arrows only, we wouldn't have had to do anything, since there is a plotting style, 'with vector', as we discussed some weeks ago.)

We will use this trick to create a parametric plot, taking parameter values from a file, first plotting the ellipses! Again, we have got to create some dummy data, and since we now need 6 columns, we will use the errorbars

reset
f(x) = rand(0)
set sample 50
set table 'ellipse.dat'
plot [0:10] '+' using (20*f($1)):(20*f($1)):(f($1)):(f($1)):(3.14*f($1)):(f($1)) w xyerror
unset table
which will produce 6 columns and 50 lines. Having produced some data, let us see what we can do with it. Here is our script:

PRINT(x, y, a, b, alpha, colour) = \
sprintf("%f+v*(%f*cos(u)*cos(%f)-%f*sin(u)*sin(%f)),
%f+v*(%f*cos(u)*sin(%f)+%f*sin(u)*cos(%f)),
%f with pm3d", x, a, alpha, b, alpha, y, b, alpha, b, alpha, colour)
PLOT = "splot "
num = -1
count(x) = (num = num+1, 1)
g(x) = (PLOT = PLOT.PRINT($1, $2, $3, $4, $5, $6), \
($0 < num ? PLOT=PLOT.sprintf(",\n") : 1/0))
plot 'ellipse.dat' u 1:(count($1))
plot 'ellipse.dat' using 1:(g($1))

unset key
set parametric
set urange [0:2*pi]
set vrange [0:1]
set pm3d map
set size 0.5, 1
eval(PLOT)

First, we have the definition of a print function that looks rather ugly, but is quite simple. We want to plot
a*v*cos(u), b*v*sin(u), colour
where a, and b are the axes of the ellipse, and colour is going to specify, well, its colour. However, we want to translate the ellipse to its proper position, and we also want to rotate it by an amount given by the 5th column, so we have to apply a two-dimensional rotation on the object. Therefore, we would end up with a function similar to this
x+v*(a*cos(u)*cos(alpha)-b*sin(u)*sin(alpha)), y + v*(a*cos(u)*sin(alpha)+b*sin(u)*cos(alpha)), colour
Now you know why that print function looked so complicated! After this, we define a string, PLOT, that we will expand as we read the file. But before that, we have to count the lines in the file. The reason for that is that successive plots must be separated by a comma, but there shouldn't be a comma after the last plot. So, we just have to know where to stop placing commas in our string. Then we define the function that does nothing useful, but concatenates the PLOT string as it reads the file. Here we use the number of lines that we determine in a dummy plot. At this point we are done with the functions, all we have to do is plotting.
First we count, then plot g(x). At this point, we have the string that we need. We only have to set up our plot. Remember, we have a parametric plot, where the range of one of the variables is in [0:2*pi], while the other one is in [0:1]. Easy. Then we just have to evaluate our plot string, and we are done. Look what we have made here: a six-dimensional plot!


I think that this script is much less complicated, than many that we have discussed in the past. Short and clear, thanks to the eval command, and the new concept of functions. Besides, we pulled off a trick that was impossible by other means. I started out saying that we will create bars and pie. I believe, having seen the trick, it should be quite simple now, but in case you insist on seeing it, I will discuss it in my next post.

Wednesday, 10 February 2010

The map, the inline function, and the macro

Some time ago, on the 26th of July, to be more accurate, I showed how somewhat decent-looking maps can be created with gnuplot. With the wisdom of hindsight, that was a rather ugly hack, I must say. Even worse, it seems that it is not quite fail-safe. At least, I have obtained reports complaining about it. Could we, then, do better? Could we, perhaps, throw out that disgusting gawk script, with all the hassle that comes with it? Could we, possibly, manage the whole affair in gnuplot? Sure, we could. And here is how, just keep reading!

On the 17th of January, we saw that in the new version of gnuplot, functions take on a funny property, namely, they can contain algebraic statements not related to the return value. We also saw that this feature can be used to perform searches of some sort: as we "plot" a file, and step through the numbers in the file, we can assign values to variables, provided that some conditions are fulfilled. It is easy to see that in this way, we can determine the minimum or the maximum of a data file, e.g. But we can do much more than that.

We should also recall from that old post on the map what the contour file looks like. In case you have forgotten, here is a small section of it
# Contour 0, label:        2
-0.391812  3.63636  2
...
-0.959596  3.50978  2

-0.959596  3.50978  2
...
-0.391812  3.63636  2


# Contour 1, label:      1.5
-1.20098  4.51515  1.5
-1.16162  4.54423  1.5
-1.15982  4.54545  1.5
...

What we have to realise is the following: first, contours lines belonging to the same level are not necessarily contiguous (this is quite obvious, for there is no reason why they should be), and if there is a discontinuity, it manifests itself in a single blank line in the contour file, and second, contour lines belonging to different levels are separated by two blank lines. So, in the data file above, there is a blank between the lines -0.959596 3.50978 2, and
-0.959596 3.50978 2, and there are two blanks between -0.391812 3.63636 2, and # Contour 1, label: 1.5. By the way, the third column is the value of that particular contour line.

This observation has at least one important consequence: we can decide which contour line we want to plot, simply by using the index keyword. You might recall, that indexing the data file pulls out one data block, which is defined by a chunk of data flanked by two blank lines.

Now, what about the labels, and the white space that they need? Well, the white space is quite easy: what we will plot is not the contour line, but a function, which returns an undefined value at the place of the white space, e.g., this one (whatever eps and xtoy mean)
f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)
Normally we would plot the contour lines as
plot 'contour.dat' using 1:2 with lines
but instead of this, now we will use this
plot 'contour.dat' using 1:(f($1,$2)) with lines
This will leave out those points which are too close to (x0, y0). And the labels? Well, that is not difficult either. Take this function
lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")
and this plot
plot 'contour.dat' using 1:2:(lab($1,$2)) with labels
This will put the labels at (x0, y0), and even better, we haven't got to set the labels by hand, they are taken from the data file.

So, we have seen how we can plot the contour, leave out some white space, and then put a label at that position. The only remaining question is how we determine where the label should be. And this is where we come back to our inline functions. For the sake of example, let us take this function and the accompanying plot

g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)
plot 'contour.dat' using 1:(g($1,$2))
What on Earth does this plot do? The plot itself does absolutely nothing: it is always 1/0. However, while we are doing this, we set the value of x0, and y0, if the two arguments are not too close to the edge of the plot. This latter condition is needed, otherwise labels could fall on the border, which doesn't look particularly nice.

By now, we have all the bits and pieces, we have only got to put them together. Let us get down to business, then!

I will split the script into two: the first produces the dummy data, while the second does the actual plotting. So, first, the data production.
reset
filename = "cont.dat"
xi = -5; xa = 0; yi = 2; ya = 5;
xl = xi + 0.1*(xa - xi); xh = xa - 0.1*(xa-xi);
yl = yi + 0.1*(ya - yi); yh = ya - 0.1*(ya-yi);
xtoy = (xa-xi) / (ya-yi)
set xrange [xi:xa]
set yrange [yi:ya]
set isosample 100, 100
set table 'test.dat'
splot sin(1.3*x)*cos(.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)
unset table
set cont base
set cntrparam level incremental -3, 0.5, 3
unset surf
set table filename
splot sin(1.3*x)*cos(0.9*y)+cos(.8*x)*sin(1.9*y)+cos(y*.2*x)
unset table

What we should pay attention to here is the definition of a handful of variables at the very beginning. Some are already obvious, like xi, xa and the like, and some will become clear in the second part. Now, the plotting takes place here

reset
unset key
set macro
set xrange [xi:xa]
set yrange [yi:ya]

set tics out nomirror
set palette rgbformulae 33,13,10
eps = 0.05

g(x,y)=(((x > xl && x < xh && y > yl && y < yh) ? (x0 = x, y0 = y) : 1), 1/0)
f(x,y) = ((x-x0)*(x-x0)+(y-y0)*(y-y0)*xtoy*xtoy > eps ? y : 1/0)
lab(x,y) = ( (x == x0 && y == y0) ? stringcolumn(3) : "")

ZERO = "x0 = xi - (xa-xi), y0 = yi - (ya-yi), b = b+1"
SEARCH = "filename index b using 1:(g($1,$2))"
PLOT = "filename index b using 1:(f($1,$2)) with lines lt -1 lw 1"
LABEL = "filename index b using 1:2:(lab($1,$2)) with labels"

b = 0
plot 'test.dat' with image, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL, @ZERO, \
@SEARCH, @PLOT, @LABEL

A bit convoluted, isn't it? OK, we will walk through the script, line by line.

First is the range setting, and then ticks go to the outside, just for aesthetic reasons. We also define eps here, which determines how much "white space" we have for the labels. Then we define the three functions that we discussed above. We have already seen eps, and the meaning of it, but what about xtoy? Despite its name, this is not something to play with, rather the ratio of x to y, or more precisely, the ratio of the xrange to the yrange. This is needed, if the two ranges are of different order of magnitude, e.g., if xrange is something like [0:1], while yrange is [0:1000]. But this ratio is automatically calculated at the beginning, you haven't got to worry about it.

After this, we define 4 macros. These are abbreviations for longer chunks of code, and make life really easier. The idea is that when confronted with a macro, gnuplot expands it as a string, and then acts accordingly. In my opinion, if written properly, macros can make the script rather readable.

The first of the macros, ZERO, is needed, because in our SEARCH macro, which is nothing but a call to the function g(x,y), if the condition is not satisfied for a particular data block, then x0, y0 wouldn't be updated, therefore, the label would end up at the wrong position. At the same time, ZERO also increments the value of b, which determines which data block we are actually plotting. b is used in the indexing in the macros SEARCH, PLOT, and LABEL. We have already mentioned SEARCH, PLOT plots the contour with the white space at the position given by x0, and y0 (this is calculated in the SEARCH macro), and finally, LABEL places the value of the contour line at that position.

At this point, we have defined everything, all that is left is plotting. We do it 13 times, because our zrange, or the contour lines were given between -3, and 3, with steps of 0.5. In this particular case, there are only 10 contour lines, and gnuplot will complain that the last 3 data blocks are empty, but this is not an error, only a warning. Shouldn't we look at the figure, perhaps? But of course! Here it is:



The only thing that I should like to point out is that the white space is made for a particular contour line, but there is no guarantee that, if the contour lines are too close to each other, the label does not cover a neighbouring contour line. If that happens, I would simply suggest to increase the contour spacing by incrementing the parameter in the set cntrparam line.

I hope that this method proves better, than the other one, and that it will be easier to use. In the next post, I will re-visit the inline functions, and show a nifty trick with them. Cheers,
Gnuplotter