Tuesday, 21 September 2010

Projecting contours

Karl asked a question some time ago, in which he wanted to know how one can produce this graph. As I pointed out in my reply, it is rather easy, if we can rotate the data file by 90 degrees. I will only post a skeleton here, you can dress up the graph at your will.

For a start, here is our data file, which we will call 'out.dat'
-0.299  -0.265  -0.215  -0.151  -0.078  0.000   0.078   0.151   0.215   0.265   0.299
-0.513  -0.455  -0.368  -0.259  -0.134  0.000   0.134   0.259   0.368   0.455   0.513
-0.694  -0.616  -0.499  -0.351  -0.181  0.000   0.181   0.351   0.499   0.616   0.694
-0.833  -0.738  -0.596  -0.411  -0.191  0.037   0.243   0.430   0.600   0.739   0.833
-0.919  -0.812  -0.624  -0.271  0.287   0.736   0.767   0.658   0.697   0.819   0.920
-0.949  -0.832  -0.582  0.048   1.186   2.000   1.680   1.007   0.781   0.851   0.949
-0.919  -0.812  -0.624  -0.271  0.287   0.736   0.767   0.658   0.697   0.819   0.920
-0.833  -0.738  -0.596  -0.411  -0.191  0.037   0.243   0.430   0.600   0.739   0.833
-0.694  -0.616  -0.499  -0.351  -0.181  0.000   0.181   0.351   0.499   0.616   0.694
-0.513  -0.455  -0.368  -0.259  -0.134  0.000   0.134   0.259   0.368   0.455   0.513
-0.299  -0.265  -0.215  -0.151  -0.078  0.000   0.078   0.151   0.215   0.265   0.299
and its "rotated" pair, 'out2.dat'
-0.299  -0.513  -0.694  -0.833  -0.919  -0.949  -0.919  -0.833  -0.694  -0.513  -0.299
-0.265  -0.455  -0.616  -0.738  -0.812  -0.832  -0.812  -0.738  -0.616  -0.455  -0.265
-0.215  -0.368  -0.499  -0.596  -0.624  -0.582  -0.624  -0.596  -0.499  -0.368  -0.215
-0.151  -0.259  -0.351  -0.411  -0.271  0.048   -0.271  -0.411  -0.351  -0.259  -0.151
-0.078  -0.134  -0.181  -0.191  0.287   1.186   0.287   -0.191  -0.181  -0.134  -0.078
0.000   0.000   0.000   0.037   0.736   2.000   0.736   0.037   0.000   0.000   0.000
0.078   0.134   0.181   0.243   0.767   1.680   0.767   0.243   0.181   0.134   0.078
0.151   0.259   0.351   0.430   0.658   1.007   0.658   0.430   0.351   0.259   0.151
0.215   0.368   0.499   0.600   0.697   0.781   0.697   0.600   0.499   0.368   0.215
0.265   0.455   0.616   0.739   0.819   0.851   0.819   0.739   0.616   0.455   0.265
0.299   0.513   0.694   0.833   0.920   0.949   0.920   0.833   0.694   0.513   0.299

These were produced in octave by the function
f(x,y) = sin(y/4)*cos(x/4)+exp(-x*x - y*y/3)
Instead of actually rotating the date file, I simply interchanged the variables, and printed out the file for a second time, for I was a bit lazy...

Anyway, this is what we have to do:
reset
unset key
set contour base
set pm3d at ss

set xrange [-2:10]
set yrange [0:12]

splot for [i=1:10:2] 'out.dat' u (-2):0:i w l lt i, \
for [i=1:10:2] 'out2.dat' u 0:(12):i w l lt i,\
'out.dat' matrix w pm3d

This is really simple: We plot the contours by plotting 'out.dat', and 'out2.dat' column by column, and keeping the first and second coordinates constant. In this way, we "project" those columns onto the y-z and x-z planes. In order to make the contours more visible, we have to specify an xrange and yrange which is a bit bigger, than our actual data set. At the end, we plot the data file as surface. If we set the contour beforehand, we will see the contours on the bottom.
And here is the figure that we have just produced. Don't be fooled by the fact that there are only three lines on the x-z plane: since our function was symmetric in y with respect to, some contour lines will overlap. And again, this graph should still be properly annotated.

Thursday, 26 August 2010

A small (or big) diversion

In the past year, I have been trying to argue on these pages that gnuplot has some advantages over many other plotting utilities. Ultimate control over graph properties, the simplicity of plotting, the ease of scripting. These are the strengths of gnuplot that spring to mind first. At least, to my mind. At the same time, I also have to admit that there are weaknesses. And these weaknesses all come down to the same deficiency: the lack of a certain modularity, both at the user level, and in the code. This makes expanding gnuplot extremely difficult, at times, impossible. At the user level, we have to use what we have, and there are not too many options when it comes to even such simply tasks as calculating the average of a data set. If something is not implemented in the code, the user is not "supposed" to use it. Now, in gnuplot 4.4, some of these problems can be overcome with some witty scripting, and mainly, abuse of procedures. If you want to see some nasty hacks, just skim through these pages. And these issues all become even more problematic, when it comes to trying to fix the problem at the developer's level. The code, as it is written now, does not support straightforward expansion, even implementing as simple things as, again, calculating the average of a data set are somewhat tricky. At yet another level, even if the code is fixed, the original gnuplot code is not published under GPL, therefore, changing it does not mean that the battle is won: one can't just take the code, modify it, and put it up on a web page.

These were the problems that I have realised in the past couple of years, and this is why I decided to re-think certain things, and start the development of a plotting utility. I wanted to keep what was good in gnuplot, but I wanted to right the wrongs. I was seriously pondering where and how to set out, when someone pointed out to me that I am not the first person, who faces the same dilemma, and that there is another project already underway. In fact, the other project was quite advanced, when I caught glimpse of it. And I have to say that what I saw was rather impressive. It impressed me, because its development is done along the lines that I mentioned above, it is thought-over well, and it is already quite mature. It can easily compete with gnuplot, for most things are already implemented, and it has the modularity that I missed so much. And it is under GPL, so you can do whatever you want. Well, almost.

With these remarks, I would like to call your attention to pyxplot. If you haven't seen it yet, please, visit the web site, and give it a try! Their main site is here, and you can find a number of very pleasing plots here. Most of your gnuplot scripts will work "out of the box", and those that won't, can be tweaked very easily. At the end of the pdf manual, you can find a discussion on what the differences between gnuplot and pyxplot are, and how you can make your scripts work. At the same time, enjoy the convenience of easy unit manipulations, data analysis, things like Fourier transforms and filtering, numerical integration and differentiation, the option for defining not only simple functions but procedures for your common tasks.

As a concluding remark, I would also like to announce a parallel blog of mine, pyxplot-tricks, where I will discuss how and what can be achieved in pyxplot. I will still try to keep gnuplot-tricks active, and I will certainly answer questions posted here.
Cheers,
Zoltán

Wednesday, 14 July 2010

Fence plots with a some-liner

About this time last year, I showed how one can produce fence plots in gnuplot, even if the data is from a file, not from a function. (The function plot is somewhat trivial, you can find it amongst the demos.) I used some heavy data processing then, though everything was handled in gnuplot. With the arrival of version 4.4, all that machinery can be made much simpler. If you continue on reading, you fill find a quite straightforward and flexible method.

For a start, we will need some data. Instead of generating it in gnuplot, I will just post my data file, which reads as follows

1 2 3 1 2 5
2 2 4 2 3 1
3 4 3 6 1 1
4 2 3 1 1 4
5 3 2 5 4 3
6 2 3 6 5 3
We have six columns here, but only five are the data: the first columns is for indexing, or whatever you like. This does not change the idea. In what follows, I will call this file '3fill.dat'

Now, our first script looks like this

reset
unset key
unset colorbox
set ytics offset 0,-1
set ticslevel 0
min = 0
col = 5

DATA = ""
DATA2 = ""
PALETTE = "set palette defined ("

pr(x, y) = sprintf("%f %f\n", x, y)
zero_line(x, y) = DATA.sprintf("\n").DATA2.sprintf("\n%f %f\n", x, y)
zero_pal(x) = sprintf("%d %.3f %.3f %.3f", x, rand(0), rand(0), rand(0))

f(x, y) = ($0 == 0 ? (DATA = zero_line($1, x), DATA2 = pr($1, min), PALETTE = PALETTE.zero_pal(y).", ") : \
        (DATA = DATA.pr($1, x), DATA2 = DATA2.pr($1, min)), x)

plot for [i=2:col+1] '3fill.dat' u 1:(f(column(i), i))

DATA = DATA.sprintf("\n").DATA2

set print '3fill.tab'
print DATA
set print

eval(PALETTE.zero_pal(col+2).")")

splot for [i=0:col-1] '3fill.tab' every :::(2*i)::(2*i+1) u 1:(i):2:(i+2) w pm3d
and the figure produced is here


Once you absorb it, the script is really simple. The first line where something actually happens is where we define DATA, DATA2, and PALETTE. What we will do is to read in the data from the file, and then add the numbers to a string. Once we have read all data, we print the string to a file, and then use that file as our new data file. The reason for this is that we have, in some sense, for to duplicate our data: in each column, we have one set of data, and what this set determines is a curve. In order to produce a fence, however, we need a surface. The simples way of getting this surface is to print the data twice. Of course, we have to modify the data a bit, but this is the only trick here.

We define three functions, pr, which is just a short-hand for formatted printing of two numbers, zero_line, which is again, a printing routine, when the record number is zero, i.e., when we are processing the first data point in each column, and zero_pal, which generates a new palette colour, as we enter a new column. If you are satisfied with some readily available palette, you can skip this function, and any calls to it.

Next, we define a function, f(x,y), which makes use of the three above-mentioned functions, and amounts to the data-duplication process. In order to reduce the complexity of the problem, we will print each column, and its duplicate in the same file. This, however, means that we have got to differentiate between various data sets. We do this by inserting and extra blank line each time we are faced with a new column. This is why we have to distinguish the $0 == 0 case in f(x,y). Also, the palette has to be re-defined only if there is a new data set.

If you watch carefully here, there are two strings, DATA, and DATA2. DATA2 is to hold the duplicate, while DATA is an ever-expanding string with the original, and the duplicate data. In the duplicate, we don't actually hold any duplicate, we simply print the indices (these are in the first column), and a constant number, min, which we defined at the very beginning. The value of this determines how tall (or how deep) the fences will be.

In order to generate the duplicated data set, first we call a dummy plot with f(x,y), add the last duplicate, DATA2 to DATA (this is required, because we concatenate DATA and DATA2 only, when $0 == 0, i.e., the last data set is not added to DATA automatically. Having defined DATA, we print the everything to a file. Note that we could process a number of columns easily, thanks to the for loop in the dummy plot.

At this point, we have everything in the new data file, we have only got to plot it. Recall, that all columns are in one file now, and we have to separate them in the new plot. This is why we use the 'every' keyword when stepping through the data sets.

We can easily add a grid to the figure, if we call another plot at the very end of our script: just add

for [i=0:col-1] '3fill.tab' every :::(2*i)::(2*i+1) u 1:(i):2 w l lt -1

after the last line, and get the following figure

If the order of the plots is exchanged, the "wires" will be covered by the panes of the fence, so it will not give this impression of semi-transparency. This is it for today. In my next post, I will elaborate on for what else we can use the trick above.
Cheers,
Zoltán

Sunday, 13 June 2010

Broken axis, once more

I have discussed this subject at least on two occasions, and in fact, most of the present post was already described in one of my very first posts. However, I thought that it might be worthwhile to dust it off, especially, that this is a scientifically relevant feature, which is still missing in gnuplot. But with a bit of work, we can rectify the problem. In short, there are situations, when we just have to break the axis, simply because there is a large gap between two relevant ranges of a plot. Take this example, for instance
plot [0:40] 20.0*atan(x-20.0) + 32 + sin(x)
which would result in the following graph:


The problem is obvious: the function has some interesting modulation close to 0, and close to 60, but it is rather dull between these two extrema. The solution is to cut out the segment between 10, and 60, say.

What we will use is a very handy function in gnuplot 4.4, which lets the user set the position of the graph exactly. In a multiplot, we would usually set the position as
set multiplot 
set size 1, 0.3
set origin 0, 0.5
plot 'foo' using 1:2 

set size 1, 0.2
set origin 0, 0
plot 'bar' using 3:4
which produces two graphs of size 1, 0.3, and 1, 0.2, respectively, and places them in such a way that their bottom left corner is at (0, 0.5), and (0, 0). But when we say "the bottom left corner", we actually mean the whole figure, tic marks, axis labels, everything. This means at least two things. One is that if we want to break the axis using a multiplot, the ranges will not necessarily be proportional on the figure, simply because the size referred to the size of the whole graph, and not only to the plotting area. Second, if the size of the tic labels is different in the two graphs, they will no longer be aligned properly. This would happen, e.g., if we were to plot over the ranges [0:9], and [1000:10009]: the first range requires labels of width 1, while the second labels of width 5, therefore, the second graph would be narrower, and its left vertical axis shifted to the right, at least, as far as the plotting area is concerned.

In gnuplot 4.4, however, one can set the positions of the plots, and not the whole graph. This is achieved by issuing a command similar to this
set lmargin at screen 0.1
which aligns the left vertical axis of the graph with the point that is at screen position 0.1 along the horizontal direction. There are three more margins, rmargin, tmargin, and bmargin, setting the right hand side, the top, and the bottom of the graph. Specifying the plot's corners explicitly removes the above-mentioned problem with the alignments.

Having said this, our script could read as follows
reset
unset key
bm = 0.15
lm = 0.12
rm = 0.95
gap = 0.03
size = 0.75
y1 = 0.0; y2 = 11.5; y3 = 58.5; y4 = 64.0

set multiplot
set xlabel 'Time [ns]'
set border 1+2+8
set xtics nomirror
set ytics nomirror
set lmargin at screen lm
set rmargin at screen rm
set bmargin at screen bm
set tmargin at screen bm + size * (abs(y2-y1) / (abs(y2-y1) + abs(y4-y3) ) )

set yrange [y1:y2]
plot [0:40] 20.0*atan(x-20.0) + 32 + sin(x)

unset xtics
unset xlabel
set border 2+4+8
set bmargin at screen bm + size * (abs(y2-y1) / (abs(y2-y1) + abs(y4-y3) ) ) + gap
set tmargin at screen bm + size + gap
set yrange [y3:y4]

set label 'Power [mW]' at screen 0.03, bm + 0.5 * (size + gap) offset 0,-strlen("Power [mW]")/4.0 rotate by 90

set arrow from screen lm - gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1)+abs(y4-y3) ) ) - gap / 4.0 to screen \
lm + gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1) + abs(y4-y3) ) ) + gap / 4.0 nohead

set arrow from screen lm - gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1)+abs(y4-y3) ) ) - gap / 4.0  + gap to screen \
lm + gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1) + abs(y4-y3) ) ) + gap / 4.0 + gap nohead

set arrow from screen rm - gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1)+abs(y4-y3) ) ) - gap / 4.0 to screen \
rm + gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1) + abs(y4-y3) ) ) + gap / 4.0 nohead

set arrow from screen rm - gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1)+abs(y4-y3) ) ) - gap / 4.0  + gap to screen \
rm + gap / 4.0, bm + size * (abs(y2-y1) / (abs(y2-y1) + abs(y4-y3) ) ) + gap / 4.0 + gap nohead

plot [0:40] 20.0*atan(x-20.0) + 32 + sin(x)

unset multiplot

The first couple of lines specify how big a figure we want to have: bm, lm, and rm are the bottom, left, and right margins, respectively. We also define the size of the gap, which we will have between the two plots. y1 through y4 are the definitions of our plot ranges. In other words, the interval between y2, and y3 will be cut out of our figure.

In the multiplot environment, we set the axes (for the bottom figure on the bottom, left, and right, while for the top figure on the top, left, and right), the axis labels (the vertical label we have to set by hand, for otherwise it would be centred on the vertical axis of the bottom or top figure, but not on the whole), and set the positions of the figures. Note that the definition used for tmargin and bmargin makes sure that the two plotted intervals are proportional. Before plotting the second curve, we also set four small headless arrows, which are meant to represent the break in the axes. It can be left out, if not desired, or they can be replaced by two dashed vertical lines.


This method can also be used, if one wants to plot a single curve or data set, but with logarithmic axis on one interval, and linear on the other.

Saturday, 12 June 2010

Ministry of Sillier Walks

In my last post, I have shown how we can define or read an array of number from a file. Having constructed the array, the question naturally arises: can we use it for something else, to do something that we could not do otherwise. The answer is yes, although in the present post, there will not be anything that I haven't discussed here or there. We have only got to put the pieces together, and with a little bit of work, we can produce three-dimensional column stacked histograms at ease. I will use the same datafile, but I repeat it here:

"" France Germany Japan Nauru
Defense 9163 4857 2648 9437
Agriculture 3547 5378 1831 1948
Education 7722 7445 731 9822
Industry 4837 147 3449 6111
"Silly walks" 3441 7297 308 7386

Then, let us see, how we are going to make that histogram! For the sake of example, we will draw four cylinders, which will be striped according to the values that the columns in the data file take. We could do two things here. Provided that we have already read the values into an array, we could draw 4 times 5 cylinders of various size and colour. Alternatively, we could draw 4 cylinders, and colour them with stripes that we take from the palette. In this case, we have to find some way of defining the proper palette. Both methods have advantages and disadvantages. The advantage of the first one is that it is faster, because the cylinders are monocolour, which means that we need only two isosamples. On the other hand, we have to have "nested" for loops (which we would just simply write out). The advantage of the second method is that we can get away with one for loop, but we need many isosamples, because the cylinders contain many colour, though the transition between the colours is required to be abrupt. But since we do not know where the boundary between two colours is, we would need many isosamples. Consequently, it will be slow.

When defining our array, we will employ the trick from the last post, and we will do something very similar, when we determine our palette. In addition, we will also have to calculate the sum in the columns, for the cylinders' height should be proportional to that. If, on the other hand, we opt for re-scaling the cylinders to the same height, we, again, need the sum. With these preliminary remarks, the first version of our script could look like this

reset

unset key
unset colorbox
unset xtics; unset xlabel
unset ytics; unset ylabel
unset ztics
file = 'marimekko.dat'   
cylinder = 'cylinder.dat'

set ticslevel 0   
set border 1+2+4+8
set parametric; set urange [0:2*pi]; set vrange [0:1]; set iso 2, 200
set table cylinder
splot cos(u), sin(u), v, cos(u)*v, sin(u)*v, 1
unset table
unset parametric

col = 4 
row = 5 
sm = 0.0
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
h(x,a) = (x <= a ? 0.0 : 1.0)

ARRAY = "b(x, y) = 0"
SUM = "s(x,a) = 0"
PALETTE = "set palette defined (-1 0.7 0 0"

array(x, c) = (sm = h($0,0.5)*sm + x, ARRAY = ARRAY.sprintf(" + g(x,%d)*h(y,%.3f)", c, sm), \
                SUM = SUM.sprintf(" + %f*g(x,%d)", x, c), x )
pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f", c-0.1, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7), x
ff(x, c) = (array(x, c), x)

plot for [i=2:col+1] file every ::1 using 0:( ff(column(i), i) )
plot file every ::1 using 0:(pal(1, column(0)))

eval(ARRAY)
eval(PALETTE.")")
eval(SUM)
         
set xrange [4:5+3*col]
set yrange [-10:3*col-9]

splot for [i=2:col+1] cylinder using ($1+3*i):2:($3*s(i,i)):(b(i,$3*s(i,i))) with pm3d

The first couple of lines set up the graph, and they are trivial, as is the plot to the 'cylinder.dat'. The definition of g(x,a) should be familiar from the last post, and h(x,a) is nothing but the Heaviside function. We have to deal with a matrix, therefore, the string ARRAY begins with "b(x,y) = 0", which will, in due course, become the definition of a two-variable function. SUM, that will bring s(x,a) to life, also appears to define a two-variable function, but this is only apparent: s(x,a) is exactly as much of a two-variable function as is g(x,a). A mere convenience, nothing more. We also begin the definition of a palette, but we stop short of its completion: that will be done during the first plot.

In the array(x,c) function, we read in the numbers from the file, and at the same time, we also calculate the sums in the columns. This function is really similar to that from the last post. We also define a function for filling up the palette. For now, this function writes random colours into the palette at every integer.

Having defined the functions, we "plot" our data file, so as to read in the numbers. We plot the second column of the same file again, in order to prepare our palette. Note that this latter plot is really nothing but a strange way of creating a for loop: we do something as many times as there are elements in a column. In both plots, by applying the every keyword, we skip the first line, which is the column header.

At this point we are almost done: the only remaining thing is the evaluation of our new definitions, and the actual plot. The setting of the xrange and yrange is necessary only in order to ensure that the cylinders are cylinders, not elliptical.

Our first script results in the graph below:
 

This is OK for a start, but could we improve it a bit? With some work, we could. For one thing, we could add the sum at the top of each cylinder. This can easily be done by augmenting the last plot with the line
for [i=2:col+1] file every ::0::0 using (3*i):(0):(s(i,i)+5e3):(sprintf("%d", s(i,i))) w labels

We can also give a title to each cylinder by reading out the values in the first row. This is done by adding
for [i=2:col+1] file every ::0::0 using (3*i):(-2):(0):(stringcolumn(i)) w labels centre

Finally, we can easily add a legend to the figure. All we have to do is to read out the first column of our data file, and draw 5 cylinders with the appropriate colour. We can achieve this by invoking
file using (0):(0):(2e4-$0*5e3):1 w labels right, \
for [i=1:row] cylinder using ($1+5):($2-5.0):($3*2e3+i*5e3):(i-1) with pm3d
in the last plot. With these modifications, the complete script would look like this
reset

unset key
unset colorbox
unset xtics; unset xlabel
unset ytics; unset ylabel
unset ztics
file = 'marimekko.dat'   
cylinder = 'cylinder.dat'

set ticslevel 0   
set border 1+2+4+8
set parametric; set urange [0:2*pi]; set vrange [0:1]; set iso 2, 200
set table cylinder
splot cos(u), sin(u), v, cos(u)*v, sin(u)*v, 1
unset table
unset parametric

col = 4 
row = 5 
sm = 0.0
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
h(x,a) = (x <= a ? 0.0 : 1.0)

ARRAY = "b(x, y) = 0"
SUM = "s(x,a) = 0"
PALETTE = "set palette defined (-1 0.7 0 0"

array(x, c) = (sm = h($0,0.5)*sm + x, ARRAY = ARRAY.sprintf(" + g(x,%d)*h(y,%.3f)", c, sm), \
                SUM = SUM.sprintf(" + %f*g(x,%d)", x, c), x )
pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f", c-0.1, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7), x
ff(x, c) = (array(x, c), x)

plot for [i=2:col+1] file every ::1 using 0:( ff(column(i), i) )
plot file every ::1 using 0:(pal(1, column(0)))

eval(ARRAY)
eval(PALETTE.")")
eval(SUM)
         
set xrange [4:5+3*col]
set yrange [-10:3*col-9]

splot for [i=2:col+1] cylinder using ($1+3*i):2:($3*s(i,i)):(b(i,$3*s(i,i))) with pm3d, \
for [i=2:col+1] file every ::0::0 using (3*i):(0):(s(i,i)+5e3):(sprintf("%d", s(i,i))) w labels, \              
file using (0):(0):(2e4-$0*5e3):1 w labels right, \
for [i=1:row] cylinder using ($1+5):($2-5.0):($3*2e3+i*5e3):(i-1) with pm3d, \  
for [i=2:col+1] file every ::0::0 using (3*i):(-2):(0):(stringcolumn(i)) w labels centre  
and would the following figure:

Well, this is sort of OK, but what if we still do not like it? There are two things that we could easily implement, and would change the character of our graph completely. One is that we can give the cylinders a true 3D lookout, by adding phongs to them. The other one is that we could remove the legends, and add it to one of the cylinders.

So, let us see what we could do in the way of phonging. This is really simple: if we think about it, the phong is nothing but a white spot on our graph, where the saturation of the colours increases towards the centre of the spot. Therefore, all we have to do is to insert white into the palette, but to do it in a way that white saturates all little cylinders. We could, then, modify our palette function as

pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f, %.1f 1 1 1", \
      c-0.01, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7, c+0.99), x)
and add a function that changes the saturation as

colour(x,y) = 0.25*exp(-(x-0.7)**2/0.2-(y+0.7)**2/0.2)
If we look at the palette function, the random numbers will be between 0-0.7, and the function colour(x,y) will add 0.25 to those numbers at the centre of the spot, which, in this particular case, will be at 45 degrees with respect to the x axis. When plotting, we have to add this function to our cylinders.

As for the labels, we might want to place them at the centre of each coloured cylinder in the last cylinder, representing Nauru. That is, we use the first column, and add the labels from that at half of the height of the cylinders whose size is read from the fifth column. We could use this function
lab(x) = (sm = sm + x, sm-0.5*x)
The value of sm is updated when a new value is read from the fifth column, and the return value of the function is just the cumulative sum minus half of the last value. Note that since we use sm, which was also utilised in array(x,c), we will have to re-set its value to zero before we use it.

With these modifications, the complete script reads as
reset

unset key
unset colorbox
unset xtics; unset xlabel
unset ytics; unset ylabel
unset ztics
file = 'marimekko.dat'
cylinder = 'cylinder.dat'

set ticslevel 0
set border 1+2+4+8
set parametric; set urange [0:2*pi]; set vrange [0:1]; set iso 2, 200
set table cylinder
splot cos(u), sin(u), v, cos(u)*v, sin(u)*v, 1
unset table
unset parametric

col = 4
row = 5
sm = 0.0
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
h(x,a) = (x <= a ? 0.0 : 1.0)

ARRAY = "b(x, y) = 0"
SUM = "s(x,a) = 0"
PALETTE = "set palette defined (-1 0.7 0 0, -0.5 1 1 1"

array(x, c) = (sm = h($0,0.5)*sm + x, ARRAY = ARRAY.sprintf(" + g(x,%d)*h(y,%.3f)", c, sm), \
  SUM = SUM.sprintf(" + %f*g(x,%d)", x, c), x )
pal(x, c) = (PALETTE = PALETTE.sprintf(", %.1f %.3f %.3f %.3f, %.1f 1 1 1", \
      c-0.01, rand(0)*0.7, rand(0)*0.7, rand(0)*0.7, c+0.99), x)
ff(x, c) = (array(x, c), x)

colour(x,y) = 0.25*exp(-(x-0.7)**2/0.2-(y+0.7)**2/0.2)
lab(x) = (sm = sm + x, sm-0.5*x)
 
plot for [i=2:col+1] file every ::1 using 0:( ff(column(i), i) )
plot file every ::1 using 0:(pal(1, column(0))) 

eval(ARRAY)
eval(PALETTE.")")
eval(SUM)

set xrange [4:5+3*col]
set yrange [-10:3*col-9]

splot for [i=2:col+1] cylinder using ($1+3*i):2:($3*s(i,i)):(b(i,$3*s(i,i))+colour($1,$2)) with pm3d, \
for [i=2:col+1] file every ::0::0 using (3*i):(0):(s(i,i)+5e3):(sprintf("%d", s(i,i))) w labels, \
sm = 0, file using (3*col+4):(1):(lab($5)):1 w labels left, \
for [i=2:col+1] file every ::0::0 using (3*i):(-2):(0):(stringcolumn(i)) w labels right rotate by 30

I think I cannot add more to this figure, we have explored and exhausted all possibilities here. Next time I will come back to an older topic from this blog, and show how that can be done in an elegant way, applying the new functionalities of gnuplot 4.4.
Cheers,
Zoltán

Sunday, 9 May 2010

Ministry of Silly Walks

In a comment last week, someone asked whether it was possible to draw a Marimekko plot, i.e., a histogram in which both directions contain relevant information. In other words, the question is whether we could draw a square, and populate it with rectangles in such a way that the area of the rectangles is read from a file. I thought that it should be possible, but on the way, I also found out a couple of interesting things. If you keep reading, you will see it for yourself, how we can define a vector, whose value is taken from a data file, and how we can manipulate the elements of that vector. In some sense, this is similar to the trick that we made use of, when we generated a parametric plot from a file. But we can do things in a slightly better fashion.

First, we will need a data file, and for the sake of conformity with the question of the commenter, I will just use this
"" France Germany Japan Nauru
Defense 9163 4857 2648 9437
Agriculture 3547 5378 1831 1948
Education 7722 7445 731 9822
Industry 4837 147 3449 6111
"Silly walks" 3441 7297 308 7386

(We can already deduce that with the sole exception of Japan, countries spend a large chunk of their GDP on silly walk.)

Now, our first attempt could be this:

reset

file = 'marimekko.dat'

set style data histograms
set style histogram columnstacked
set style fill solid border -1
set boxwidth 1.0

set xrange [-1:5]
set yrange [0:5e4]

plot newhistogram at 0, file u 2 title col, \
newhistogram at 1, file u 3 title col, \
newhistogram at 2, file u 4 title col, \
newhistogram at 3, file  u 5 title col, \
and it should be quite obvious that this is not what we want:

It just falls short of our expectations in every respect: There is a gap between the columns, the colours are not consistent, and the width of the columns is equal. The only reasonable thing that happened here is that we can actually set the position of the columns. This will become important later on.

Let us try to improve on the figure, step by step. First, we will place the histograms in a multiplot, for that will make life a lot easier: this is our only way of manipulating the column width during the plot. In this spirit, our second script will be this:

reset

file = 'marimekko.dat'

set style data histograms
set style histogram columnstacked
set style fill solid border -1
set boxwidth 1.0

set xrange [-1:5]
set yrange [0:5e4]
set multiplot

plot newhistogram at 0, file u (f($2)) title col
plot newhistogram at 1, file u (f($3)) title col
plot newhistogram at 2, file u (f($4)) title col
set boxwidth 0.3 
plot newhistogram at 2.65, file  u (f($5)) title col
unset multiplot

This is somewhat better, for the colours are now consistent, and we also see that the last column has a different width. We also see how the positioning works: the right hand side of Japan's column is at 2.5, and since the width of Nauru's column is 0.3, its centre has got to be shifted by 0.15 with respect to 2.5. That adds up to 2.65. However, if we watch closely, we will also notice that the ytics and labels are drawn four times; after all, we have four plots. What, if we unset the ytics after the first plot? Well, we would end up with this

Rather upsetting! The problem is that once the tics are unset, the size of the figure changes, so we can no longer count on the plots' proper alignment. However, there is an easy remedy for this: all we have to do is not to unset the ytics, but to set them invisible. That is, we can do
plot newhistogram at 0, file u (f($2)) title col
set ytics ("      ", 30000)
plot newhistogram at 1, file u (f($3)) title col
plot newhistogram at 2, file u (f($4)) title col
set boxwidth 0.3 
plot newhistogram at 2.65, file  u (f($5)) title col
where we have 6 white spaces in the quote. You might wonder why on Earth 6. Well, the answer is that the label "30000" is actually " 30000", which takes up 6 characters' space. With this trick, we get

We have already achieved quite a lot, and slowly, but surely, we are getting to our goal. Just do not despair!

The next thing that we would need is proper scaling of the columns: we want all of them to be between 0 and 100 (%), i.e., we would have to sum all columns first, and then divide the values by the sum. And that is the snag: we have four columns, and we have to do the summing for each column independently, and before the final plots. Otherwise, our multiplot will be messed up. And this is where the array comes in handy: if we just had an array, and could retrieve values from it, we would be saved. And of course, we can do this. Let us take a small detour!

If we think about it, the array (5, 4, 6, 7, 8) is nothing but a finite series: its first element is 5, second element is 4, and so on. But we could also look at the series as a function: a mapping from the set of natural numbers to, well, to anything. In the example above, to natural numbers. It doesn't matter. My point is that an array is a function, a function for which h(0) = 5, h(1) = 4, h(2) = 6, h(3) = 7, and h(4) = 8. As long as this is true, we do not care what h(1.1) is. We need the function's values only at integer numbers. Then the only question is how we could define this function "on the fly". Being a physicist, and a lazy man, I would propose the following:
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
h(x) = 5 * g(x,0) + 4 * g(x,1) + 6 * g(x,2) + 7 * g(x,3) + 8 * g(x,4)
g(x,a) is (apart from some numerical factors) nothing but a very primitive representation of a Dirac-delta, centred on 'a'. You can convince yourself that h(x) defined in this way fulfils the requirements above.

After this digression, let us see what we can do with this, and issue the following commands!
ARRAY = "h(x) = 0"
array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter+1) )
ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter-1) : 1), total = total + x, x)
plot 'marimekko.dat' using 0:(ff($2, 2))
At this point, the variable ARRAY should look something like this
ARRAY = "h(x) = 0 + 91.630000*g(x,2) + 35.470000*g(x,2) + 77.220000*g(x,2) + 48.370000*g(x,2) + 34.410000*g(x,2)"
and if we evaluate it, the function value h(2) returns the sum of the numbers in the second column. (Apart from a factor of 100, of course.) Note that in order to take out the first line, which is the header, we have to use the condition
($0 > 0 ? ARRAY = array(x, counter-1) : 1)
which updates ARRAY only if we are processing the second record, at least. Also note that in order to get the sum of all columns, all we have to do is call this plot as many times as many columns there are. In the light of this, our next script could be this
reset

file = 'marimekko.dat'
col = 4

g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
ARRAY = "h(x) = 0"
array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter) )
 
ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter) : 1), x)
plot for [i=2:col+1] 'marimekko.dat' using 0:(ff(column(i), i)) 

set xrange [-1:3]
set yrange [0:110]

eval(ARRAY);

set style data histograms
set style histogram columnstacked
set style fill solid border -1
set boxwidth 1.0

set multiplot
plot newhistogram at 0, file u ($2/h(2)) title col
set ytics ("    " 20)
plot newhistogram at 1, file u ($3/h(3)) title col
plot newhistogram at 2, file u ($4/h(4)) title col
set boxwidth 0.3
plot newhistogram at 2.65, file u ($5/h(5)) title col
unset multiplot
and this results in this figure

So, we are almost there: the columns are rescaled, and placed neatly next to each other. The only missing ingredient is the setting of the widths. But that is really easy: we only have to determine what the grand total is, and then scale the columns accordingly. Our script can, then, be modified as follows
reset
file = 'marimekko.dat'
col = 4

total = 0.0
g(x,a) = (abs(x-a) < 0.1 ? 1 : 0)
ARRAY = "h(x) = 0"
array(x, counter) = ( ARRAY.sprintf(" + %f*g(x,%d)", x/100.0, counter+1) )

ff(x, counter) = (($0 > 0 ? ARRAY = array(x, counter-1) : 1), total = total + x, x)
plot for [i=2:col+1] 'marimekko.dat' using 0:(ff(column(i), i)) 

set xrange [-0.3:1]
set yrange [0:110] 
eval(ARRAY);

set style data histograms
set style histogram columnstacked
set style fill solid border -1   

total = total / 100.0
position = 0.0

set multiplot
set boxwidth h(2)/total
plot newhistogram at position, file u ($2/h(2)) title col

set ytics ("    " 20)
set boxwidth h(3)/total; position = position + (h(2)+h(3))/total/2.0
plot newhistogram at position, file u ($3/h(3)) title col

set boxwidth h(4)/total; position = position + (h(3)+h(4))/total/2.0
plot newhistogram at position, file u ($4/h(4)) title col

set boxwidth h(5)/total; position = position + (h(4)+h(5))/total/2.0
plot newhistogram at position, file u ($5/h(5)) title col
unset multiplot
and this is what we wanted!

Adding labels to the rectangles is relatively easy: we could do the following

plot file using (position):(l($5)):5 with labels tc rgb "#ffffff"
where l(x) is a function that keeps track of the previous values of the column, and adds them as new values are processed. The definition of this function should be trivial.

The last thing that I would add here is that by using macros, we can tidy up the script: we no longer would need all those long and repetitive lines. In fact, we could also add another instruction to our 'ff' function, which would generate the plot command. The advantage of that is that in this way, we do not have to repeat the plot commands four times: we simply put that in our for loop, and then evaluate the resulting string. I discussed this trick in my last post, so, if you are interested in the details, you can look it up there.

Monday, 26 April 2010

Bending the arrows - "delaying" the plot

The other day, I would have needed a couple of curved arrows on my plot, so I started to work out a method to get what I wanted. This, however, turned out to be rather interesting, so I thought that I would share the details with you.

First, we should just define what I mean by a curved arrow. Perhaps, the easiest way to define it is to show a plot, similar to this


In gnuplot, when one wants an arrow, one can invoke the following command:
set arrow from 0,0 to 1,1
or something similar. This will produce a straight arrow from (0,0) to (1,1). But what if we wanted to have an arrow, which is not straight. Well, in this case, we set a very short arrow, and draw a curve separately. The key to this is to set the arrow in such a way that it is tangential to the curve at the end point. It is easy to see that the following script would just do that

reset
unset key
eps = 0.001

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1
cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)
f(x) = 0.5+(x-1)*(x-1.2)*(x-1.4)

x1 = 0.5
x3 = 1.95
new_x = x1 + (1.0-eps)*(x3-x1)
set arrow from new_x, f(new_x) to x3,f(x3) as 1
plot [0:3] sin(x) with point ps 1 pt 6, f(x)*cut(x,x1,new_x) with line lt -1

First, we define an arrow style that we will use later. The arrow will be 0.03 screen sizes big, and the two angles determining the shape of the head are 15, and 45 degrees, respectively. Finally, we stipulate that the arrow be black, i.e. linetype -1. Then we define a window function, cut, which depends on the two end points, x1, and x3 (the reason for 3 will become clear soon), and the curve, f(x). In our plot, beyond what we actually want to plot, we will also plot f(x), but only between x1, and new_x, where new_x is a bit off with respect to the second end point. The degree of "bitness" is given by eps, which was defined at the beginning. However, before we actually plot anything, we have got to set the arrow, between new_x, f(new_x), and x3, f(x3). This construction ensures that the arrow is tangential to the curve.
At this point, we are ready to plot, which we actually execute in the next, and last line.

What we have created is great, but there are problems: first, we have to define our function, f(x), beforehand, we have to set the arrows by hand, and we also have to add the appropriate lines to our plot command. Quite tedious. There has got to be a better way!

For the say of example, let us suppose that we want a curved arrow that, say, connects (0,0) and (1,1) via a parabola that passes through the point (0.5, 0.25). If we are really pressed for it, we could do the following: First, we have to figure out the parameters of our parabola. In this case, it is quite easy, for it is nothing but x*x. Then we would draw a parabola between (0,0), and (0.99, 0.9801), and then draw an arrow from (0.99, 0.9801) to (1,1).

First, let us see, how we figure out the parameters of our parabola! We have two end points, and a "control" point, i.e., we have to solve the following set of equations
y1 = a*x1*x1 + b*x1 + c
y2 = a*x2*x2 + b*x2 + c
y3 = a*x3*x3 + b*x3 + c
for the unknown a, b, and c. You can convince yourself that the following will do

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)
a = A(x1,y1,x2,y2,x3,y3)
b = B(x1,y1,x2,y2,x3,y3)
c = C(x1,y1,x2,y2,x3,y3)

We have done most of the hard work, the only thing that remains is how we "automate" this whole machinery, i.e., what do we do, if we have several arrows that we want to set. Again, as so many times in the past, we will utilise this new notion of function definition: the fact that a function is not only a x -> f(x) mapping, but this mapping, and a set of possibly unrelated instructions. What we will do is to define a "function" that sets our arrows, and, as the supplementary instruction, augments the plot command accordingly. First, let us take the following function definition


arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
        a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
        PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
        ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))
and try to understand what it does! For a start, it takes 6 arguments, which are nothing but the coordinates of the end points, and the control point. Then, it defines new_x, which we have already seen in the first example. In the next step, based on the 6 input arguments, calculates the three parameters of our parabola, and in the next line, adds the plot of this parabola to a string called PLOT. When adding to PLOT, we simply use the sprintf function. In the last line, we concatenate a string called ARROW, and another one, produced by another sprintf. It is easy to see that this sprintf returns the definition of an arrow between new_x, f(new_x), and x3, f(x3). We should also note that this line is the last line, which consequently means that whatever happens here is returned.

At this point we are really done, we only have to "populate" our plot. The full script takes on the form
reset
unset key
eps = 0.01

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1

cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)

ARROW = ""
PLOT = "p [0:3] sin(x) w p ps 1 pt 6"
arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
        a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
        PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
        ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))

ARROW = arrow(0,0,1,1.5,pi/2,1.03)
ARROW = arrow(0,0,1,0.3,pi/2,0.97)
eval(ARROW)
eval(PLOT)
which would result in the graph shown here:


Now it is clear what was PLOT: it is nothing, but the actual plot that we want to have. This is the string to which we concatenate our parabolae, one by one, every time we define a new arrow. After we defined all our arrows, we have two strings, ARROW, and PLOT. As such, they are no good, they will become instructions only when we evaluate them. That is what we do in the last two lines.

I would like to point out that my main reason for posting this was not that it can be used for creating curved arrows, but that this method is quite general. First, we can add to the plot, if that is needed, without having to keep track of all the tiny details. Second, the set command can be "fooled" by using the sprintf function. With the help of the string augmentation and the eval command, we can actually use parameters in our set instruction very efficiently.

Well, this is for today. I am waiting for suggestions as to what we should discuss next time. Cheers,
Zoltán

Tuesday, 16 March 2010

Bubble plots

Yesterday, I discussed a method for adding an edge to an arbitrary symbol. If you recall (or roll down on this page), the idea was to trick gnuplot into plotting our data file twice, but in a way that each point was plotted twice in succession. Now, what if we plotted more times? There was really nothing special about the number 2, so there is no reason why we could not do this. But if we can, then we should, and see what comes out of it. With very small modifications, our script from yesterday can be turned into a bubble graph, like this


So, let us see how the machinery works!

reset
plot 'new_bubble1.dat' u 0:2
red_n = GPVAL_DATA_X_MAX

plot 'new_bubble2.dat' u 0:2
blue_n = GPVAL_DATA_X_MAX

plot 'new_bubble3.dat' u 0:2
green_n = GPVAL_DATA_X_MAX

rem(x,n) = x - n*(x/n)
size(x,n) = 3*(1-0.8*rem(x,n)/n)
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n))
blue(x,n) = sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255)
green(x,n) = sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n))

posx(X,x,n) = X + 0.03*rem(x,n)/n
posy(Y,x,n) = Y + 0.03*rem(x,n)/n

unset key
set border back
level = 40
plot for [n=0:level*(red_n+1)-1] 'new_bubble1.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level*(blue_n+1)-1] 'new_bubble2.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level*(green_n+1)-1] 'new_bubble3.dat' using (posx($1,n,level)):(posy($2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
Again, the first three plots are there for determining the sample size, and nothing more. We, thus, start out with a number of function definitions. The first one is a remainder function, the second one uses the remainder to return the size of the bubble, the third one is a simple helper function, returning values between 0 and 240, and red, blue, and green determine the colour of our bubbles. If you look carefully, you will notice that these colours are successively whiter as the remainder increases. Finally, again by making use of our remainder function, we define two position shifts: in order to give the impression that the bubbles are lit from the top right corner, we have to shift successive circles in that direction. The value of this shift is important in the sense that, if chosen too high, the circles belonging to the same data point will no longer cover each other. (This is not necessary a tragedy, see below.)

Then we decide to have 40 colour levels (we could have anything up to 255, although it might be a bit time consuming and unnecessary), and call our plots. The structure is the same as it was yesterday: we use a for loop for each data set, move the circles a bit, and set the colours to whiter shades. That is all.

Now, what happens, if we take too big a value for the shift? This, actually, might lead to interesting effects, as shown in this graph, where droplets represent the data points.




After having seen the simplest implementation, we should ask whether it is possible to add some decorations. E.g., whether it is possible to add a thin black edge to the symbols. It is relatively simple, as the following script shows. We only have to re-define some of our functions as follows
size(x,n) = (rem(x,n) == 0 ? 3.3 : 3*(1-0.8*rem(x,n)/n))
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n)))
blue(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255))
green(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n)))

posx(X,x,n) = (rem(x,n) < 2 ? X : X + 0.03*rem(x,n)/n)
posy(Y,x,n) = (rem(x,n) < 2 ? Y : Y + 0.03*rem(x,n)/n)
All these functions do is to check whether we are plotting the first round, and if so, set the colour to black. There is a small difference in the shifts, for we do not move the circles, if they are in the first or the second round. The reason is obvious, as is the result

OK, so we can plot bubbles, with or without black circumference, but we would also like to add a legend. Well, that is simple, in fact, nothing could be simpler. Just add the following the following three lines to our code

set label 1 'Red bubbles' at 9,6 left
set label 2 'Blue bubbles' at 9,5 left
set label 3 'Green bubbles' at 9,4 left
and the following six
for [n=0:level-1] 'new_bubble1.dat' using (posx(8.5,n,level)):(posy(6,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level-1] 'new_bubble2.dat' using (posx(8.5,n,level)):(posy(5,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level-1] 'new_bubble3.dat' using (posx(8.5,n,level)):(posy(4,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
and we are done! All we do here is to plot our data files in a silly way: we plot a single point at (8.5,6), (8.5,5), and (8.5,4). The plotting of the data file does not happen in this sense, we use it for convenience's sake only. (This trick can also be used for the post from yesterday.) There, you have it!

Defining new symbols

Some time ago, I showed a method with which we could add a "frame" to a symbol. If you recall, what we did was to plot everything twice, and in order to duplicate our data set, we used a simple gawk script. Now, there is another way of doing this, one which does not rely on the gawk script, in fact, on any external script. I will discuss this method today. The gist of the trick is discussed in the old post, therefore, you are encouraged to cast, at least, a cursory glance at that, if you haven't yet done it.

As I have already pointed out, we had to duplicate our data set. To be more accurate, we haven't got to duplicate anything, we have simply got to plot the data twice. Now, the difficulty is that is we do this in a primitive way, issuing the plot command twice, and taking the same data set, the points might overlap, and leads to some undesired results. So, the task is to plot the data set twice, but to plot each plot twice, and not the data set as a whole. For this, we will use the for loop introduced in gnuplot 4.4, and the 'every' keyword. To cut a long story short, I give my script here, and discuss it afterwards.
reset 
plot 'new_symbol1.dat' u 0:2
red_n = GPVAL_DATA_X_MAX

plot 'new_symbol2.dat' u 0:2
blue_n = GPVAL_DATA_X_MAX

plot 'new_symbol3.dat' u 0:2
green_n = GPVAL_DATA_X_MAX

parity(n) = (n/2.0 == int(n/2.0) ? 0 : 1)
size(n) = 2 - parity(n)*0.4
colour(n,r,g,b) = sprintf("#%02X%02X%02X", parity(n)*r, parity(n)*g, parity(n)*b)

unset key
set border back
plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0) ,\
for [n=0:2*blue_n+1] 'new_symbol2.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 9 ps size(n) lc rgb colour(n,100,100,255) ,\
for [n=0:2*green_n+1] 'new_symbol3.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 5 ps size(n) lc rgb colour(n,0,150,0)
Then, let us see what we have here! The first 6 lines are only to retrieve the number of data points in our data sets. If you know this from somewhere else, you can skip these, with the caveat that 'red_n', 'blue_n', and 'green_n' should still be defined somewhere.

Next we define three functions, the first of which determines the parity of an integer, returning 1, if the number is odd, and 0, if it is even. The second function returns a number, depending on the parity of its argument. Surprising as it is, this function will determine the size if the symbol, when we plot. Finally, the third function returns a string, which is equal to the colour given by the triplet (r,g,b), if the first argument, 'n', is odd, and black, if the first argument is even. At this point, it should be clear that we could have defined a function that returns a different colour for even numbers.

We are done with everything, but the plotting, so let us do that! As you see, for each data set, we step through the numbers, but not once, but twice: first plotting in black, and second, plotting with some decent colour. At the same time, we change the symbol size, so that the black symbols are always a bit bigger, than the red, blue, or green. Once all three plots have been called, the following graph will appear:

We can see that the symbols overlap each others, as they should. Now, what about the keys, should we need them? Well, that requires some handwork, but it is not hard, actually. The following self-explanatory script should do
set label 1 'Red symbols' at 1.3, 8 left
plot for [n=0:2*red_n+1] 'new_symbol1.dat' using 1:2 \
every ::(n/2)::(n/2) with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
n=0, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0), \
n=1, '-' using 1:2 with p pt 7 ps size(n) lc rgb colour(n,255,0,0)
1 8
e
1 8
e

and this produces the following figure

Friday, 26 February 2010

Phong on histograms with a one-liner (almost:-)

We have seen in the last couple of posts that with the new concept of functions, quite a few interesting effects can be achieved. Today I would like to show a trick that solves a problem that I discussed some time ago, when we made shiny histograms using a for loop in gnuplot. We will do the same thing here, but in two lines only. It is quick, and the results are just as good as in that case.

So, here is my data file
1
3
4
2
3
5
2
which I will just name as 'bar.dat', and here is our script
reset
unset key
set style fill solid 1.0
set yrange [0:6]

colour = "#080000"
f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)
w(n) = 0.8*cos(n/230.0*pi/2.0)

plot for [n=1:230:2] 'bar.dat' u 0:(f($1,n)):(w(n)) with boxes lc rgbcolor colour
Simple enough, let us see what it does! The first four lines are just the usual settings, although, the yrange is really irrelevant. I set it only for aesthetic reasons (otherwise, gnuplot would set the yrange automatically to [1:5] for the data file above, and we wouldn't see one of the columns). Then we define a variable called 'colour'm which we will eventually overwrite in our function definition of f(x,n). f(x,n) returns x, thus, in this regard it would be absolutely useless, but when doing so, it actually prints a string to 'colour'. The next function is w(n), which will determine in what fashion our colour will converge to white.
Finally, we plot the data file some 115 times, each time with a smaller, and shinier box. At the end, we get something like this

We can very easily change the direction of the light. All we have to do is define a new function that shifts the bars as we progress with our for loop. So, the new script could be something like this
reset
unset key
set style fill solid 1.0
set yrange [0:6]

colour = "#080000"
f(x,n) = (colour = sprintf("#%02X%02X%02X", 128+n/2, n, n), x)
w(n) = 0.8*cos(n/230.0*pi/2.0)
shift(x,n) = x-0.8*n/850.0

plot for [n=1:230:2] 'bar.dat' u (shift($0,n)):(f($1,n)):(w(n)) with boxes lc rgbcolor colour
with a result as in this graph
It should be really easy to modify the script to accommodate more data sets. Well, this is for now. I don't actually know what I will write about next time, but I am sure that there will be something!
Cheers,
Zoltán

Thursday, 25 February 2010

Parametric plot from a file II.

As I promised yesterday, we will take a closer look at the pie chart, once more, and see how we can utilise what we have learnt recently. I should point out here, that this is not the only way of plotting a pie from a file. If you feel like building your gnuplot from source, you can check out either the CVS tree, or the patch tracker, where you can find a patch that makes it possible to plot slices. You can see a demo here. But we will try a different route here.

First, here is our data file (it could be anything, really)
1 1 Dolphins
2 1 Whales
2 0 Sharks
3 0 Penguins
4 1 Kiwis
5 0 Tux

and here is our script
reset
unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
set urange [0:1]
set vrange [0:2*pi]
set macro
sum = 0.0
ssum = 0.0
n = 0
PLOT = "splot 0, 0, 1/0 with pm3d"

count(x) = (ssum = ssum + $1, 1)
g(x,y,n) = \
sprintf(", \
u*cos((%.2f+%.2f*v)/ssum), \
u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", x, y, x, y, n)

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"
set parametric; set pm3d map; 
eval(PLOT)

There is really nothing that we haven't discussed before: we set a couple of things at the beginning, but most importantly, the macro, and sum, ssum, and n. Then we define a string, PLOT, and two functions. One is to sum the values in our file (we need this, so that we can scale the full range of angles to two pi), and another one, that writes our PLOT command for later use. Note that the first plot in PLOT is actually empty, we plot 1/0. This seems a bit silly, doesn't it? Well, it does, but there is a good reason: successive plots must be separated by a comma, and if we have an empty plot at the very beginning, then we can put the commas before the plots, not after, and in this way, we needn't keep track of which plot we are actually processing. Remember, yesterday we used a separate counter, and an if statement, to determine, whether we need the comma, or not. This we can avoid here.
Next we call the two dummy plots, and finally, we evaluate our PLOT string. Oh, no! At the very end, we marvel in awe at the figure that we produced.

So far, so good, but what if we wanted to add labels, e.g., the value of the slice? That is really easy. All we have to do is to define a function that produces the label. Here is our updated script
reset
unset key; set border 0; unset tics; unset colorbox; set size 0.6,1
set urange [0:1]
set vrange [0:2*pi]
set macro
sum = 0.0
ssum = 0.0
n = 0
PLOT = "splot 0, 0, 1/0 with pm3d"
LABEL = ""

count(x) = (ssum = ssum + $1, 1)
g(x,y,n) = \
sprintf(", \
u*cos((%.2f+%.2f*v)/ssum), \
u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", x, y, x, y, n)

lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f; ", \
x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, sprintf("%2.f", x)), \
sum = sum+x, n = n + 1, x)

plot 'new_pie.dat' u 1:(count($1)), '' u 1:(f($1))

PL = "with pm3d"
set parametric; set pm3d map; set border 0; unset tics; unset colorbox;
set size 0.6,1
eval(LABEL)
eval(PLOT)
We have an addition string, LABEL, which we initialise with the value "". Then we define a function that prints "set label ..." with the proper positions, and finally, we insert this function in the definition of f(x). Of course, once we called f(x) in the plot, we have to evaluate the string LABEL. So, this is what we get

This script can trivially be modified to print strings that are stored in our file. Watch just the following two lines
...
lab(alpha, x) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
x, 1.2*cos(alpha), 1.2*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3)), \
sum = sum+x, n = n + 1, x)
...
and we are done. Here is the new pie

Now, you might wonder why we had three columns in our data file, if we didn't want to use it. Well, perhaps, we wanted, just haven't got time till now. So, what could we do with those ones and zeros in the second column? We will make the pie explode! It is really simple, we have to modify two lines in our last script

...
g(x,y,n,dx,dy) = \
sprintf(", \
%.2f+u*cos((%.2f+%.2f*v)/ssum), \
%.2f+u*sin((%.2f+%.2f*v)/ssum), \
%d @PL", 0.2*dx, x, y, 0.2*dy, x, y, n)

lab(alpha, x, r) = sprintf("set label \"%s\" at %.2f, %.2f centre; ", \
x, (1.25+0.2*r)*cos(alpha), (1.25+0.2*r)*sin(alpha))

f(x) = (PLOT = PLOT.g(2*pi*sum, x, n, \
$2*cos(2*pi*sum/ssum+pi*x/ssum), \
$2*sin(2*pi*sum/ssum+pi*x/ssum)), \
LABEL = LABEL.lab(2*pi*sum/ssum+pi*x/ssum, stringcolumn($3), $2), \
sum = sum+x, n = n + 1, x)
...
And here is the pie, when exploded

I should mention here that if you are not happy with the colours, it is really easy to help it: all we have to do is to modify the colour palette, using whatever colour combinations. We have covered a lot of material today. Till next time,
Gnuplotter

Wednesday, 24 February 2010

Plotting in 6 dimensions - parametric plot from a file

Today I would like to touch on a vast subject, so prepare for a long post. However, I hope that the post will be worthwhile, for I want to discuss something that cannot be done in any other way. In due course, we will see how we can use gnuplot to create parametric plots from a file. What I mean by that is the following: if you want to plot, say, 10 similar objects, whose size is determined by the first column in a file. Of course, there are cases, when one can manipulate the size, e.g., if there is a pre-defined symbol, we can use one of the columns in a file to determine the size of the symbol. As an example, we can do this
plot 'foo' u 1:2:3 with point pt 6 ps var
which will draw circles whose radius is given by the third column in 'foo'. This is all well, but it works for a limited number of cases only, namely, when there is a symbol to start out with. But what happens, if we want to draw an object that is not a symbol, e.g., arcs of a circle, whose angle is given by one of the columns in a file, or cylinders, whose height is a variable, read from a file. As you can guess from these two suggestions, what we will do is to draw a pie, and a bar chart. I understand that we have done this a couple of times before, but this time, we will stay entirely in the realm of gnuplot, and the scripts are really short. We just have to figure out what to write in the scripts. But beyond this, I will also show how we can plot in 6 dimensions. We will plot ellipses on a plane (first 2 columns), whose two axes are given by the 3rd and 4th axis, the orientation by the 5th, and the colour by the 6th. If you are really pressed for it, you can add three more dimensions: if you draw ellipsoids in 3D, take all three axes from a file, and also the orientation, that would make 9 dimensions altogether. Quite a lot!

So, let us get down to business! The first thing that I would like to discuss is the evaluate command. This is a really nifty way of shortening repetitive commands. Let us suppose that we want to place 10 arrows on our graph, and only the first coordinate of the arrows changes, otherwise everything is the same. Setting one arrow would read as follows
set arrow from 0, 0 to 1, 1
Of course, there are quite a few settings that we could specify, but this was supposed to be a minimal example. Then, the next arrow should be
set arrow from 1, 0 to 2, 1
and so on. What if we do not want to write this line a thousand times, and we do not want to search for the coordinate that we are to change, the first one, in this case? We could try the following
a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)
This function takes 'x', and returns a string with all the settings and coordinates. So, we are almost done. The only thing we should do is to make gnuplot understand that what we want it to treat a(x) as a command, not as a string. Enter the eval command: it takes whatever string is presented to it, and turns it into a command. Thus, the following script creates 5 arrows, all parallel to each other, and consecutively shifted to the rigth
a(x) = sprintf("set arrow from %d, 0 to %d, 1", x, x+1)
eval a(0)
eval a(1)
eval a(2)
eval a(3)
eval a(4)
I believe, this is a much simpler and cleaner procedure, than this
set arrow from 0, 0 to 1, 1
set arrow from 1, 0 to 2, 1
set arrow from 2, 0 to 3, 1
set arrow from 3, 0 to 4, 1
set arrow from 4, 0 to 5, 1
I should mention here that if chunks of a command are the same, another method of abbreviating them is to use macros. Those are disabled by default, so first we have to set it. Then it works as follows
set macro
ST = "using 1:2 with lines lt 3 lw 3"
plot 'foo' @ST, 'bar' @ST 
i.e., the term @ST is expanded using the definition above, therefore, this plot is equivalent to this one
plot 'foo' using 1:2 with lines lt 3 lw 3, 'bar' using 1:2 with lines lt 3 lw 3
but the previous one is much more readable. I would also say that using capitals for the macros is probably not a bad idea, because then they cannot be mistaken for standard gnuplot commands. This much in the way of macros!

So, we have the evaluate command, and we have a new concept for functions. Then let us take a closer look at the following code
a(x) = sprintf("set arrow from %d, 0 to %d, 1;\n", x, x+1)
ARROW = ""
f(x) = (ARROW = ARROW.a(x), x)
plot 'foo' using 1:(f($1))
and let us suppose that our file 'foo' contains the following 5 lines
1
3
5
7
9
After plotting 'foo', the string 'ARROW' will be the following
set arrow from 1, 0 to 2, 1;
set arrow from 3, 0 to 4, 1;
set arrow from 5, 0 to 6, 1;
set arrow from 7, 0 to 8, 1;
set arrow from 9, 0 to 10, 1;
I.e., we have a string, which contains instructions for setting 5 arrow. If, at this point, we simply evaluate this string, all 5 arrows will be set. Therefore, we have found a way of using a file to set the coordinates of an arrow. (N.B., if it was for the arrows only, we wouldn't have had to do anything, since there is a plotting style, 'with vector', as we discussed some weeks ago.)

We will use this trick to create a parametric plot, taking parameter values from a file, first plotting the ellipses! Again, we have got to create some dummy data, and since we now need 6 columns, we will use the errorbars

reset
f(x) = rand(0)
set sample 50
set table 'ellipse.dat'
plot [0:10] '+' using (20*f($1)):(20*f($1)):(f($1)):(f($1)):(3.14*f($1)):(f($1)) w xyerror
unset table
which will produce 6 columns and 50 lines. Having produced some data, let us see what we can do with it. Here is our script:

PRINT(x, y, a, b, alpha, colour) = \
sprintf("%f+v*(%f*cos(u)*cos(%f)-%f*sin(u)*sin(%f)),
%f+v*(%f*cos(u)*sin(%f)+%f*sin(u)*cos(%f)),
%f with pm3d", x, a, alpha, b, alpha, y, b, alpha, b, alpha, colour)
PLOT = "splot "
num = -1
count(x) = (num = num+1, 1)
g(x) = (PLOT = PLOT.PRINT($1, $2, $3, $4, $5, $6), \
($0 < num ? PLOT=PLOT.sprintf(",\n") : 1/0))
plot 'ellipse.dat' u 1:(count($1))
plot 'ellipse.dat' using 1:(g($1))

unset key
set parametric
set urange [0:2*pi]
set vrange [0:1]
set pm3d map
set size 0.5, 1
eval(PLOT)

First, we have the definition of a print function that looks rather ugly, but is quite simple. We want to plot
a*v*cos(u), b*v*sin(u), colour
where a, and b are the axes of the ellipse, and colour is going to specify, well, its colour. However, we want to translate the ellipse to its proper position, and we also want to rotate it by an amount given by the 5th column, so we have to apply a two-dimensional rotation on the object. Therefore, we would end up with a function similar to this
x+v*(a*cos(u)*cos(alpha)-b*sin(u)*sin(alpha)), y + v*(a*cos(u)*sin(alpha)+b*sin(u)*cos(alpha)), colour
Now you know why that print function looked so complicated! After this, we define a string, PLOT, that we will expand as we read the file. But before that, we have to count the lines in the file. The reason for that is that successive plots must be separated by a comma, but there shouldn't be a comma after the last plot. So, we just have to know where to stop placing commas in our string. Then we define the function that does nothing useful, but concatenates the PLOT string as it reads the file. Here we use the number of lines that we determine in a dummy plot. At this point we are done with the functions, all we have to do is plotting.
First we count, then plot g(x). At this point, we have the string that we need. We only have to set up our plot. Remember, we have a parametric plot, where the range of one of the variables is in [0:2*pi], while the other one is in [0:1]. Easy. Then we just have to evaluate our plot string, and we are done. Look what we have made here: a six-dimensional plot!


I think that this script is much less complicated, than many that we have discussed in the past. Short and clear, thanks to the eval command, and the new concept of functions. Besides, we pulled off a trick that was impossible by other means. I started out saying that we will create bars and pie. I believe, having seen the trick, it should be quite simple now, but in case you insist on seeing it, I will discuss it in my next post.