Thursday, 3 December 2009

Defining some new plot styles

I have long wanted to write a blog post on this subject, but somehow, I never got the time to do it. But the time has come, and I will do it now. One of the plot styles that I actually like in origin (no matter what I think of the software in overall) is the one where the circumference of a circle is drawn in a colour different to that of the body of the circle, as in this graph


The only glitch is that this is not supported in gnuplot. Well, this is not completely true, because one can use a new prologue for the postscript output, but that works for postscript only, nothing else. What should we do then? There are two options: one has already been discussed, in connection with the bubble charts, sometime in early September. Nevertheless, that method is feasible only when the points do not overlap. What we did there was to plot the data twice (or however many times). But the problem is that those are actually two different plots, so if the adjacent circles overlap, we will not have what we want. Just try it, if you are still not convinced! The second option is that we keep reading. So, the question is then, how could we trick gnuplot into thinking that we have only one plot, not two. Well, the short answer is that we plot only one plot, not two, and we then make sure that points in the plot are coloured properly. I think it is time to show my script, that should make everything clear.

reset
set table 'two.dat'
plot [0:10] sin(x)+0.1*rand(0), cos(x)+0.1*rand(0)
unset table

f(x) = cos(x*pi)
set cbrange [-1:2]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000", 2 "#0000ff")

plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) index 0 \
with points pt 7 ps var lt palette title 'red', \
'' using 1:2:(2+f($0)/5.0):(f($0+1)*2) index 1 \
with points pt 7 ps var lt palette title 'blue'

As always, we generate some data. Note, that we plot sin(x) and cos(x), which will be in the same data file, but in two separate data blocks. This will become important later. What we need to know about data blocks is that in the data file they are separated by pairs of blank lines, and that we can ask gnuplot to plot only certain data blocks. We indicate our choice with the use of the index keyword. If you want to know more about this, issue the
?index
command, and look at the data file 'two.dat'!

OK, so we have some data, if a bit obscure at the moment. Next, we define a funny function. If you look closely, you will realise that f(x) takes the value of -1 for odd, and 1 for even numbers. In principle, any function should do here, but one has got to be a bit careful: those functions that take 1 or -1 at isolated points might not work. If you do not want to dive into the details of this, take my word for it that f(x) defined above will just be perfect for our purposes.

The next step is the definition of a colour palette with the colour range. (We take off the colour box, too, and set the border to back, but these are irrelevant nuances.) We use the palette for colouring our graph: in the first curve, every other point will be black and red, while in the second curve, every other point will be black and blue. Thus, I have already divulged my trick: we have to duplicate our data set in a way that every line is copied at its position, so, in 'two.dat' we will have something like this
...
 9.29293 -0.93128  i
 9.29293 -0.93128  i
 9.39394 -0.978266  i
 9.39394 -0.978266  i
 9.49495 -0.923256  i
 9.49495 -0.923256  i
...

For this we use an external gawk script, but it is really just one line, it could not be any simpler. Once we duplicated our data, we plot it, and colour every second point as black, and every second point as red. Also note that we use the index keyword to choose the data block that we need. If you have only one data block, you can skip this. But not only do we colour the points differently, but we also resize them: after all, if all had the same size, we would not see the ones that are plotted first. For all this machinery, we use the fact that when a 2D plot is given as 4 columns, the size of the points can be determined by the third column, while the fourth column can be assigned to take care of the colour. In this case, the colour is taken from the palette that we defined above. The general form of such a plot is
plot 'foo' using 1:2:3:4 with points pt 7 ps var lt palette
where 'var' signifies the variable point size (ps), while palette is the colour. Note that we use our snappy f(x) function to choose the size and the colour of every second point, simply by counting the ordinal number of the particular data record, $0: For even numbers, we set the size 2+1/5, and the colour to black (colour value of -1), while for odd numbers, the size is 2-1/5, and the colour is red (colour value of 0). If you have a single data, the whole script would be simply
f(x) = cos(x*pi)
set cbrange [-1:1]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000")

plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) \
with points pt 7 ps var lt palette title 'red'
Well, this is it for today. Next time I will try to show some of the new stuff in gnuplot 4.4. Cheers,
Zoltán

Tuesday, 1 December 2009

New version of gnuplot is released!

I have been waiting for this for a long time, but at long last, it has happened! The new version of gnuplot has been released with the designation 4.4. You can download the binary or the source code from the sourceforge repository. In the future, I will discuss the new things, and show what can be done with them.
Cheers,
Zoltán

Restricting fit parameters

Chris asked an interesting question today, namely, how one can restrict the fit range in gnuplot. What he meant by that was not the range of the data points (that is really easy, the syntax is the same as for plot), but the range of fit parameters. In some cases, it is a quite reasonable requirement, because we might know from somewhere that certain parameter values just do not make any sense. As it turns out, it is rather easy to achieve this in gnuplot. All we have to do is to come up with a function that restricts its values in the desired range.

After this interlude, let us see an example! We will create some data with the following gnuplot script:
reset
a=1.0; b=1.0; c=1.0
f(x) = a*exp(-(x-b)*(x-b)/c/c)
set table 'restrict.dat'
plot [-2:4] f(x)+0.1*(rand(0)-0.5)
unset table

We take a Gaussian, with some noise added to it. Naturally, we would like to fit a Gaussian to this data, and in particular, f(x). But what, if our model is such that 'a' must be in the range [1.1:2], 'b' must be in the range [0.1:0.9], and 'c' must be in the range [0.5:1.5]? We just use in our fit, instead of f(x), another function, g(x), say, of the form
g(x) = A(a)*exp(-(x-B(b))*(x-B(b))/C(c)/C(c))
where A(a), B(b), and C(c) take care of our restrictions. These functions are somewhat arbitrary, but for better or worse, I will take the following three arcus tangents
# Restrict a to the range of [1.1:2]
A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

# Restrict b to the range of [0.1:0.9]
B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

# Restrict c to the range of [0.5:1.5]
C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5
which would look like this

The point here is that as x runs from negative infinity to positive infinity, A(x) runs between 1.1, and 2.0, and likewise for B(x), and C(x). Then the fit goes on as it would normally. Our script is, thus, the following in its full glory:

# Restrict a to the range of [1.1:2]
A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

# Restrict b to the range of [0.1:0.9]
B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

# Restrict c to the range of [0.5:1.5]
C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5

a=0.0
b=0.5
c=0.9
fit f(x) 'restrict.dat' via a, b, c

g(x) = A(aa)*exp(-(x-B(bb))*(x-B(bb))/C(cc)/C(cc))
aa=1.5
bb=0.5
cc=0.9
fit g(x) 'restrict.dat' via aa, bb, cc

plot f(x), g(x), 'restrict.dat' w p pt 6

and it produces the following graph:

At this point, we should not forget, that what we are interested in is not the value of 'aa', 'bb', or 'cc', but the value of 'a', 'b', and 'c'. This means that what we have to take is
A(aa), B(bb), and C(cc). If you print the values of 'a', 'b', and 'c' from the fit to f(x), the value of 'aa', 'bb', and 'cc', and the value of A(aa), B(bb), and C(cc), we get the following results
gnuplot> pr a, b, c
0.984221984191135 0.996600824504231 1.00765240463672

gnuplot> pr aa, bb, cc
-3442408.91578921 1443864.45093385 -0.201236322474146

gnuplot> pr A(aa), B(bb), C(cc)
1.10000008322047 0.899999823634477 0.936788734177637

Obviously, the second print does not make too much sense, we have to compare the last one to the first one. We can here see that we got values in the ranges [1.1:2], [0.1:0.9], and [0.5:1.5], as we wanted to.

Sunday, 29 November 2009

Broken histograms

Sometimes, a histogram is just a bit awkward, for the simple reason that one or two values are extremely high compared to the rest of the graph. In the case of a standard graph, we would use a broken axis to bring all points to the same order of magnitude. We can play the same trick with histograms, in fact, it is, in some sense, even simpler, than the broken axes. All we have to do is to plot a thick line at the proper position in the proper colour. This is the graph that we are going to make today


Our data file, brokenhist.dat, is as follows
"Jan" 1 2
"Feb" 44 4
"Mar" 3 1
"Apr" 2 25
"May" 4 5
"June" 2 1
and here is our script:
reset
blue = "#babaff"
set xrange [-0.5:5.5]
set yrange [0:11]
set isosample 2, 100
set table 'brokenhist_b.dat'
splot 1-exp(-y/2.0)
unset table

unset colorbox
set border 1
set xtics rotate by 45 nomirror offset 0, -1
unset ytics
f(x) = (x < 6 ? x : (x < 30 ? x-17 : x-35) )
g(x) = (x < 6 ? 1/0 : 6)
set boxwidth 0.85
set style fill solid 0.8 border -1
set style data histograms
set palette defined (0 "#ffff   ff", 1 "#babaff")
plot 'brokenhist_b.dat' w ima,\
'brokenhist.dat' u (f($2)) t 'Red bars',\
'' u (f($3)) lc rgb "#00bb00" t 'Green bars', \
'' u 0:(f($2)):2 w labels center offset 0,0.5 t '',\
'' u ($0+0.25):(f($3)):3 w labels center offset 0,0.5 t '',\
'' u 0:(-1):xticlabel(1) w l t '', \
'' u ($0+0.12):(g($3)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t '', \
'' u ($0-0.12):(g($2)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t ''
OK, so let us look at the code! The first couple of lines are required only, if you want to have some posh background. Likewise, you can drop the 'unset colorbox' line, when you have a white background. We set only the bottom axis, which means that we have to unset the ytics and set to xtics to nomirror. Then we have two helper functions. The definitions of these depend on where you want to have the break point in the histogram. In this particular case, I took 6, but it is arbitrary.

In the next step, we set the properties of the histogram, like the width of the columns, the fill style, and the data style. We also define a palette, but this is needed for the background only. For white background, you can skip this step. You can also skip the first plot, because that is nothing but our fancy background.

The actual plotting begins after this. We plot the two sets of columns, and also plot the data file with labels. The labels are placed at the top of each column (this is why we could do away with the yaxis.) We also 'plot' the axis labels, and finally, plot the two break points. Note that the plotting of the break points is automatic, once we have the definitions of the two helper functions. If you want to have a steeper cut, you could

'' u ($0+0.12):(g($3)+0.12):(0.25):(0.5) w vectors lt -1 lc rgb blue lw 5 nohead t ''
e.g., which stretches the vectors in the vertical direction. Otherwise, we have finished the plot, there is nothing else to do.
I should point out here that, in order to have a seamless cut, we have to use a colour for the vectors that is identical to the background at that particular point. This implies that we could not have a gradient at y=6. The background colour is virtually constant at y=6 (c.f. the definition of 'brokenhist_b.dat'. While it would not be impossible to implement a cut over a gradient, I believe, it is probably not worth the trouble it involves.

Sunday, 22 November 2009

Update

I have had some time, so I moved all recent posts to their permanent place on
my web page
. I have "sexed up" the homepage a bit, so, hopefully, browsing will be a tad easier. Let me know, if there are any problems! (I know that there is a small glitch with the cascaded style sheets on IE6. IE8 should work without problems. Firefox 3.5 is also OK.) There is a zipped version of the complete site, if you want to read it off-line.
Comments should still be posted here, please!



Cheers,
Zoltán

Some basic statistics with gnuplot


In my previous post, I mentioned a patch that you can compile into gnuplot, and that should make plots with some statistical properties a bit easier. Now, the problem with that patch is that, if you don't want to, or can't take the trouble of compiling gnuplot for yourself, it is no use. However, for most things contained in the patch, there is a work-around that should function properly in gnuplot 4.2. I will discuss those now.

The first thing I did with the statistical patch was to plot the mean, minimum and maximum of a data set. This can easily be done in the following way.
reset
# Produce some dummy data
set sample 200
set table 'stats2.dat'
plot [0:10] 0.5+rand(0)
unset table

set yrange [0:2]
unset key

# Retrieve statistical properties
plot 'stats2.dat' u 1:2
min_y = GPVAL_DATA_Y_MIN
max_y = GPVAL_DATA_Y_MAX

f(x) = mean_y
fit f(x) 'stats2.dat' u 1:2 via mean_y

# Plotting the minimum and maximum ranges with a shaded background
set label 1 gprintf("Minimum = %g", min_y) at 2, min_y-0.2
set label 2 gprintf("Maximum = %g", max_y) at 2, max_y+0.2
set label 3 gprintf("Mean = %g", mean_y) at 2, max_y+0.35
plot min_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \
max_y with filledcurves y1=mean_y lt 1 lc rgb "#bbddbb", \
'stats2.dat' u 1:2 w p pt 7 lt 1 ps 1

At the beginning of our script, we just produce some dummy data, and call a dummy plot. This plot does nothing but fills in the values of the minimum and maximum of the data set. Then we fit a constant function. You can convince yourself that this returns the average of the data set.

In the plotting section, we produce three labels that tell us something about the data set, and plot the data range with shaded region. Easy enough, and in just a couple of lines, we created this figure


Now, what should we do, if we were to calculate the standard deviation. Well, we know how to calculate the average, so we will use that. Here is the script:

reset
set sample 200
set table 'stats2.dat'
plot [0:10] 0.5+rand(0)
unset table

set yrange [0:2]
unset key
f(x) = mean_y
fit f(x) 'stats2.dat' u 1:2 via mean_y

stddev_y = sqrt(FIT_WSSR / (FIT_NDF + 1 ))

# Plotting the range of standard deviation with a shaded background
set label 1 gprintf("Mean = %g", mean_y) at 2, min_y-0.2
set label 2 gprintf("Standard deviation = %g", stddev_y) at 2, min_y-0.35
plot mean_y-stddev_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \
mean_y+stddev_y with filledcurves y1=mean_y lt 1 lc rgb "#bbbbdd", \
mean_y w l lt 3, 'stats2.dat' u 1:2 w p pt 7 lt 1 ps 1

What we utilise here is the fact that the fit function also sets a couple of variables. One of them is the sum of the residuals, which is called FIT_WSSR, while another is the number of degrees of freedom, FIT_NDF. However, we know that the number of degrees of freedoms is one less, than the number of data points, for we fit a function with a single parameter. Therefore, if we take the square root of the sum of residuals divided by the number of degrees of freedom plus one, we get the standard deviation. The rest of the plot is trivial, and this script results in the following graph:

Incidentally, this can also be used for removing points that are very far from the mean. The following script takes out those data that are more than one standard deviation away from the mean.
reset
set sample 200
set table 'stats2.dat'
plot [0:10] 0.5+rand(0)
unset table

set yrange [0:2]
unset key
f(x) = mean_y
fit f(x) 'stats2.dat' u 1:2 via mean_y

stddev_y = sqrt(FIT_WSSR / (FIT_NDF + 1 ))

# Removing points based on the standard deviation
set label 1 gprintf("Mean = %g", mean_y) at 2, min_y-0.15
set label 2 gprintf("Sigma = %g", stddev_y) at 2, min_y-0.3
plot mean_y w l lt 3, mean_y+stddev_y w l lt 3, mean_y-stddev_y w l lt 3, \
'stats2.dat' u 1:(abs($2-mean_y) < stddev_y ? $2 : 1/0) w p pt 7 lt 1 ps 1
with the corresponding figure

Only the last line is relevant: we use the ternary operator to decide whether we want to keep the point: if the deviation from the mean is less, than the standard deviation, we hold on to our data, otherwise, we replace it by 1/0, which is undefined, and gnuplot quietly ignores it. If you want to learn more about the working of the ternary operator, check out my post on the plotting of an inequality.

We have, thus, already found a solution for two of the problems addressed in the patch. What about the third one, adding arrows to the plot at the position of the minimum or maximum, say. We can do that, too. Here is the script:

reset
set sample 50
set table 'stats1.dat'
plot [0:10] 0.5+rand(0)
unset table

set yrange [0:2]
unset key
plot 'stats1.dat' u 1:2
min_y = GPVAL_DATA_Y_MIN
max_y = GPVAL_DATA_Y_MAX

plot 'stats1.dat' u ($2 == min_y ? $2 : 1/0):1
min_pos_x = GPVAL_DATA_Y_MIN
plot 'stats1.dat' u ($2 == max_y ? $2 : 1/0):1
max_pos_x = GPVAL_DATA_Y_MAX

# Automatically adding an arrow at a position that depends on the min/max
set arrow 1 from min_pos_x, min_y-0.2 to min_pos_x, min_y-0.02 lw 0.5
set arrow 2 from max_pos_x, max_y+0.2 to max_pos_x, max_y+0.02 lw 0.5
set label 1 'Minimum' at min_pos_x, min_y-0.3 centre
set label 2 'Maximum' at max_pos_x, max_y+0.3 centre
plot 'stats1.dat' u 1:2 w p pt 6

First, we retrieve the values of the minimum and the maximum by using a dummy plot. Having done that, we retrieve the positions of the minimum and maximum, by calling a dummy plot on the columns
plot 'stats1.dat' u ($2 == min_y ? $2 : 1/0):1

What this line does is substitute min_y, when the second column (whose minimum we extracted before) is equal to the minimum, and an undefined value, 1/0, otherwise. The minimum of this plot is nothing, but the x position of the first minimum. Likewise, had we assigned
min_pos_x = GPVAL_DATA_Y_MAX

that would have given the position of the last minimum of the data file. Obviously, these distinctions make sense only, if there are more than one minimum or maximum. Knowing the x and y positions of the minimum and maximum, we can easily set the arrows. We, thus, have the following figure


Adding labels showing the value should not be a problem now.

Well, this is for today. Till next time!

Monday, 9 November 2009

Patching gnuplot

One of the major advantages of open-source code is that if you would like to add some new features, you can easily do that. This applies to gnuplot, too, in fact, doing that does not require anything special. I have been quite inactive on this blog recently, and the reason is that Philipp Janert and I have been working on a patch to gnuplot.

The steps of patching gnuplot are described on gnuplot's main web page. There are a number of patches uploaded to gnuplot's patch tracker, on which quite a few new features, still in the development phase, are published. It is really worthwhile to try them out, first, to provide feedback as to what is useful and what is not, and second, to help the developers to find bugs and other glitches, like what the syntax of a command should be and so on.

Our patch is related to an old debate as to what gnuplot really is. At many a place, you will find the statement that "gnuplot is a plotting utility, not a statistical analysis package". I have nothing against this statement, however, when saying so, we have to tell what we mean by plotting. So, is plotting just placing a thousand dots at positions that represent our data? Or do we want more? E.g., throwing out data points that are unreasonably far from the mean. Or showing the mean, and the standard deviation? Or calling the reader's attentional to some special points, like the minimum or the maximum in a data set? And many similar things. I believe, plotting requires much more, than just showing the measurement data: a plot makes sense only, if we can point out what is to be pointed out. By the way, fitting falls into this category, and fitting has been an integral part of gnuplot for ages. The point being that the original statement (gnuplot is a plotting utility, not a statistical analysis package) has been wrong for a long time.

The patch that I mentioned above was announced yesterday on the gnuplot development mailing list and you can find the patch for the source and the
documentation on patch tracker. I have put a couple of examples on my gnuplot web site under patch. You can also find the full documentation.

I would like to ask you, if you feel crafty and you can, download the patch, and try it, and let us know whether you find it useful, what else, do you think, we could do with it and so on. It would really help the development. Once the patch makes it to the main code, I will discuss various option on these pages.

Just to wet your appetite, here is a figure that you could very easily make with the new patch. (You can find the code on my web site.)



Many cheers,
Zoltán

Friday, 16 October 2009

A long shadow is cast on the plot...

A couple of days ago, Cedric asked the question how we could add a shadow to a pm3d surface plot. In some sense, the problem turned out to be easier, than I had expected, but I am not sure that I did what he actually meant. Anyway, we could use it as our working hypothesis, and refine it, if necessary.

The idea of putting "phong" on the surface was discussed ages ago, and I won't re-open that question here. For the shadow, we will just replot our surface (defined as z(x,y)) in a little bit strange way: instead of letting the x and y run through their corresponding ranges, we will restrict y to be equal to the maximum of the yrange. By doing so, we get this
Here is our (short) script:
reset
unset colorbox; unset key
set iso 2, 50
set parametric; set urange [0:1]; set vrange [0:1.2]
set table 'tb.tab'
splot u, v, 1
unset table
unset parametric
set iso 100, 100
set xrange [-3:3]
set yrange [-3:3]
set zrange [0:1.2]
set table 't.tab'
splot exp(-x*x-y*y)+0.1*rand(0)
unset table
set ticslevel 0
set pm3d
set cbrange [0:4]
f(x,y,z,a,b,s) = z*(exp(-(x-a)*(x-a)/s-(y-b)*(y-b)/s)/3.0+0.66)
set palette defined (0 "#ff2222", 1 "#ffeeee", 2 "#aaaaaa", 3 "#2222ff", 4 "#8888ff")
splot 'tb.tab' u ($1*6.0-3.0):(3):2:($2+3.0) w pm3d, \
'' u (-3):($1*6.0-3.0):2:($2+3.0) w pm3d, \
'' u ($1*6.0-3.0):($2*6.0-3.0):(0):(2.1) w pm3d, \
't.tab' u ($1+1.0):(3):($3*0.7):(2.1) w pm3d,\
'' u 1:2:3:(f($1,$2,$3,-0.5,-0.5,.8)) w pm3d
First, we create the data that will be our background, then some dummy data, which in this particular case will be a noise Gaussian function in 2D. Then we move the surface to the bottom of the zrange, by setting the ticslevel to 0. The next step is the definition of the cbrange. We need to "overdefine" this, i.e., our cbrange is much larger, than the actual data range. The reason for this is that in this way, we can use the same colour palette, and we do not have resort to multiplot. The basic problem with multiplot is that we since we want to plot onto the same graph, we would have to re-set the border, tics, labels, and so on. By using the same plot, and same palette, we can avoid all this hassle. The price we pay is that our palette will be a bit more complicated, but we can live with that. Now, we have to define our palette, which will change between red and almost white for [0:1], and between blue, and almost white for [3:4]. We also defined a value at 2, which we will use for colouring the shadow, but the two main ranges are [0:1], and [3:4]. Note that we define disjoint ranges, thereby not walking into a trouble with the end points.
Finally, we plot our background, and function. Pay attention to plots like
splot 'tb.tab' u ($1*6.0-3.0):(3):2:($2+3.0) w pm3d
This will plot a plane between x [-3:3], and z [0:1] at y=3, with a colour given by the value of ($2+3.0). This is nothing but pushing all z values in the plot into the [3:4] colour range.

At the very end, we plot our function's shadow, namely,
't.tab' u ($1+1.0):(3):($3*0.7):(2.1) w pm3d
where the x values are shifted by 1.0 (so as to give the impression that the surface is lit from the (-1:-1) direction), the y values are all restricted to 3, which is the maximum of the yrange, and the z values are multiplied by 0.7, again for the same reason. Colouring is done by using one single value, 2.1. The very last step is to plot the surface itself, using the colouring given by f(...). If you want to have another direction for the lighting, or have a tighter focus, you should change the parameters in this function.

Tuesday, 13 October 2009

The shiny histograms, again

Do you remember those shiny histograms that we discussed some long, long time ago, at the beginning of the summer? And do you also remember how much of a hassle it was to create them, since we relied on an external gawk script, and we had to build our rectangles, one by one? And have you ever wondered what all that fuss was about and whether there was an easier solution? A one-liner, perhaps? If the answer to these questions lies in the affirmative, go no further! We will discuss a method of making those histograms, without an external script, only with legal gnuplot commands, and in 5 lines. I understand that 5 lines is just 4 lines longer, than one would expect from a one-liner, but on the other hand, three out of those 5 lines are equivalent, so I don't feel so bad about this any more:)
OK, so here is our figure

here is our data file, which we will call 'hist.dat'
1 2 3
2 2 2
4 10 3
5 1 4
5 6 2
and here are our scripts, 'hist.gnu',
reset
unset key; set xtics nomirror; set ytics nomirror; set border front;
div=1.1; bw = 0.9; h=1.0; BW=0.9; wd=10; LIMIT=255-wd; white = 0
red = "#080000"; green = "#000800"; blue = "#000008"
set auto x
set yrange [0:11]
set style data histogram
set style histogram cluster gap 1
set style fill solid
set boxwidth bw
set multiplot
plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blue
unset border; set xtics format " "; set ytics format " "; set ylabel " "
call 'hist_r.gnu'
unset multiplot
and 'hist_r.gnu'
bw=BW*cos(white/LIMIT*pi/2.0); set boxwidth bw; white=white+wd
red = sprintf("#%02X%02X%02X", 128+white/2, white, white);
green = sprintf("#%02X%02X%02X", white, 128+white/2, white);
blue = sprintf("#%02X%02X%02X", white, white, 128+white/2);
rep
if(white<LIMIT) reread

Then let us see what is happening here! At the beginning, we define various variables, most notably, white, red, green, and blue. The rest up to the first plot command is nothing but setting up the figure: we define the range, tell gnuplot to treat our data as histogram, set the width of the bars, and finally, set multiplot.
There is nothing exciting in the first plot, except, that we specify the colour of the bars as

plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blue
The strings red, green, and blue were defined at the beginning of our first script, thus, we learn here that gnuplot will accept any defined (and valid) string as the specifier of the colour. After our first plot, we unset the border, re-set the format of the xtic and ytic to empty, and do likewise with the ylabel. Should there be an xlabel, we should have to do the same there. Having done this, we call our second script, which we will dissect now. This is really nothing but a 'for' loop, that we have discussed a couple of times before. In fact, quite a few times. The first two commands re-set the widths of the bars in the next plot. Note that I set the width in such a way that it would draw the outline of a circle as we step through the values of white. This is what we increment next, mind you!

The next three lines are basically identical: we re-define the colours, using a sprintf command in each step. If you recall how the RGB colours are defined, we have to create a string that looks like
#00FF00
say. This is what our sprintf command will do, returning a string of this form that depends on the value of 'white'. There are some small nuances in the corresponding colour channel of red, green, and blue, respectively, namely, that the base colour for red was
#080000
and we want to linearly interpolate between this colour, and white,
#FFFFFF
so, we have to apply the relevant linear function, but there is nothing beyond this. Obviously, if you are unhappy with the colour scheme that I have (I know full well that these are not the best colours...), this is the place where you would have to tamper with the script. When we are done with re-defining the widths, and the colours, we simply replot our histogram, and do that, as long as the value of white is smaller, than the limit that we set at the beginning. In this particular case, 245. In most cases, we do not need this many plots, by the way! For a raster plot, 10-12 steps, for a vector format, something like 15-16 steps should be more than enough.

At the end, we shouldn't forget about unsetting the multiplot. The only difficulty that I see with this figure is that it is not so straightforward to use a key. However, it is not terribly hard to come up with a solution for this problem: all we have to do is to put three vertical labels on the top of the first, second, and third column, indicating what they represent.

Monday, 12 October 2009

Putting figures into perspective

As I promised yesterday, this time, we will try to skew our figure, as if it was in 3D, and we were looking at it from an angle. I must admit that this is something that one would not put in a publication, unless it is a poster or presentation, perhaps. In those cases, however, it might lend a refreshing new, errr.., perspective to our data. Before diving into the script, here is the figure, so you can decide whether you want to read on:)


Then, here is our script that was responsible for the figure
reset
xmin = 0; xmax = 10; zmin = -0.4; zmax = 0.75
set view 60, 30
unset key; unset colorbox
set border 1+16+128+1024
unset ytics; set xtics out nomirror; set ticslevel 0
set yrange [0:-0.1]
set zrange [zmin:zmax]
set grid front; unset grid
set xtics xmin,2,xmax-1
set ztics zmin,0.2,zmax
f(x) = exp(-x/4.0)*sin(x)
c(x) = exp(-(x-xmax/3.0)*(x-xmax/3.0)/1.0)
set xlabel 'Time [a.u.]'
set label 2 'Amplitude [a.u.]' at graph -0.35, 0.3 rotate by 90
set parametric
set iso 3, 3
set urange [xmin:xmax]
set vrange [zmin:zmax]
set table 'perspective1.dat'
splot u, 0, v
unset table
set urange [xmin+0.2:xmax-0.2]
set table 'perspective2.dat'
splot u, 0, f(u)
unset table
unset parametric
set size 1.4, 1.4
set palette defined (0 "#7b6cff", 1 "#eeeeff")
splot 'perspective1.dat' u 1:2:3:(c($1)) w pm3d, \
'perspective2.dat' u ($1+0.05):2:($3-0.02) w l lw 6 lc rgb "#888888", \
'' w l lw 6 lt 1


The trick that we use here is to plot in 3D, and take off all those elements of the figure that we do not actually need. In the beginning, we define a couple of variables, in order to make our life a bit easier. Then we unset the colorbox and the key, and set only those borders that we want to see. If you are interested in how I came up with those numbers in the definition of the border, just issue the command
?border

in the gnuplot prompt. After this, we keep unsetting things, and define our ranges. The next noteworthy command is
set grid front; unset grid

At first, this seems silly, but the reason is real: in the figure we have a background, which would obscure our tic marks. The way out of this problem is to push the tic marks to the front, which is achieved by setting the grid to the front. Since we do not actually need the grid, we unset it, but the setting, its front position, is still there. The tic marks inherit the position of the grid, and thus, they will be in the forefront.

The function definitions are f(x), the function that we want to plot, and c(x), which will determine the colouring of our background. Modify them accordingly.

We set the zlabel by hand, because it might not be possible to turn it by 90 degrees otherwise. (Not all terminals would support it.) Then we plot the background, and the function, both to a file, so that it will be easier to colour them in the next step, where we plot the real thing. Note that in the plot of the background, we use four columns, while there are only 3 in the data file. The fourth one determines the colour as the position on the graph. The colour is given by the function c(x), and the palette, which we defined one line earlier. You should change these two things, if you are not satisfied with the background you get. Finally, we plot the function twice. Once in gray, a bit shifted to the right and down, and for the second time, in red. In this way, we add a shadow to our curve. If you want to improve the shadow, you should look at my post from the 30th August

I should also add that we discussed a vertical skew. In case you want to skew the figure horizontally, all you have got to do is to plot on the x-y plain instead of x-z.

Sunday, 11 October 2009

Adding a variable grid to a plot

Useful as it is, sometimes the standard grid is not the best way to guide the eye when reading off values in a graph. Today I would like to discuss a couple of other options, none of them standard in gnuplot. Fortunately, we haven't got to leave the realm of gnuplot, everything can be done without any external scripts. To make things even more appealing, the solution requires only a couple of lines of code.

Our first attempt will be to "project" the values of a curve to the two axes, as in this figure


The data file, helpergrid.dat, contains only a couple of lines,
1 1
4 2
9 3
16 4
25 5
36 6

and the script reads as follows
reset
unset key
set xrange [0:40]
set yrange [0:7]
set xlabel 'Effort [a.u.]'
set ylabel 'Output [a.u.]'
p 'helpergrid.dat' u 1:2:(0):1 w xerror ps 0 lt 0 lw 0.5 lc rgb "#ff0000", \
'' u 1:2 w i lt 0 lw 0.5 lc rgb "#ff0000", \
'' u 1:2 w lp pt 7 ps 1.5

Drawing the vertical lines is easy, for there is a plotting style in gnuplot, impulse, which does just that. The horizontal ones are not that trivial, but take only one line of code: the idea is that we trick gnuplot into drawing the horizontal lines by something else. In this case, this something else is the error bar, in particular, the xerror bar. This is what happens in the first plotting line, after we set up the figure. The first plot calls our data file, and draws the errors that span from x=0 to x=x_data. We also specify the line type, 0, which will be dashed, the line width, and the line colour. The second plot draws the vertical lines, plotting our data with impulses. Again, we specify the line type, width, and colour, so that it conforms with the previous one. This is necessary, because, when left alone, gnuplot assigns a new line type to each new plot. Finally, we plot the data. This was simple.

I should add here that we can draw a "full" grid based on the data values. If we replace the plotting command by
p 'helpergrid.dat' u (40):2:(0):(40) w xerror ps 0 lt 0 lw 0.5 lc rgb "#ff0000", \
'' u 1:(7) w i lt 0 lw 0.5 lc rgb "#ff0000", \
'' u 1:2 w lp pt 7 ps 1.5

then the horizontal lines will be drawn between 0 and 40, while the vertical ones between 0 and 7. Now, it might happen that you don't know the xrange and yrange, in which case you can use the variables GPVAL_X_MIN, GPVAL_X_MAX, GPVAL_Y_MIN, GPVAL_Y_MAX to specify the plot. Of course, we have to call a dummy plot beforehand. We have discussed this a couple of times. If in doubt, look up the post on how to plot a martix with bars.

Well, we have cleared the first barrier, but what if we wanted to place the numbers where the data points are? This is quite simple: we only have to plot the file twice more, this time with xticlabel and yticlabel. So, here is our script
reset
unset key
set xrange [0:40]; set yrange [0:7]
set xlabel 'Effort [a.u.]'
set ylabel 'Output [a.u.]'
p 'helpergrid.dat' u 1:2:(0):1 w xerror ps 0 lt 0 lw 0.5 lc rgb "#ff0000", \
'' u 1:2 w i lt 0 lw 0.5 lc rgb "#ff0000", \
'' u 1:2 w lp pt 7 ps 1.5, \
'' u 1:(0):xticlabel(1) w p ps 0, '' u (0):2:yticlabel(2) w p ps 0

and here is our figure



Note that in this case, if you want to produce a full grid (wall-to-wall, floor-to-ceiling), we can just set the grid, and forget about the error bars. The script for that case would be
reset
unset key
set xrange [0:40]; set yrange [0:7]
set xlabel 'Effort [a.u.]'
set ylabel 'Output [a.u.]'
set grid lt 0 lw 0.5 lc rgb "#ff0000"
p 'helpergrid.dat' u 1:2 w lp lt 3 pt 7 ps 1.5, \
'' u 1:(0):xticlabel(1) w p ps 0, '' u (0):2:yticlabel(2) w p ps 0

and this would result in this figure.




I think what we covered today was fairly simple, but still useful. Next time I will show how figures can be skewed, yet another way of making the reader perceive a 2D plot in 3D surroundings. Till then!

Saturday, 3 October 2009

More on histograms

Someone asked me whether it would be possible to make a histogram that looks like those that MS Office can create: filled with gradient, casting a shadow, and labelled according to their value. Frankly, I just couldn't admit defeat, and I had to figure out a way. But then, it turned out to be rather simple, so I thought that I would share it here. We are going to make this figure




from this data, called 'msbar.dat'
Max 1.0 1.0 1.2
Min 0.9 0.2 0.95
Avg 0.95 0.5 1.1


I have already discussed a way to beefing up the histograms (It was titled Shiny histograms, or something similar.), but I must say, that method is a bit convoluted. The reason for this is that we used a parametric plot. We can avoid that by plotting to a file first, and then plotting the file as many times as needed. We can't save the trouble of having to read our data file, but this is rather simple. We can do that either by employing a very primitive external script, or writing the script in gnuplot, as we discussed it in, say, the post on the recession graph. Given that we haven't got to process any data, this latter method is probably less desirous. Gnuplot is not for printing lines and the like, after all.

After the introduction, let us see the script!
reset
# First, the gradient for the bars
set xrange [0:1]; set yrange [0:1]; set isosample 2, 200
set table 'msbar_bar.dat'
splot y
unset table

# Then we make the shadow
set isosample 200, 50
set table 'msbar_bar_sh.dat'
splot (1.0-exp((x-1.0)*20.)) #*(1.0-exp((y-1.0)*20.0))
unset table

reset
unset key; unset colorbox
xw = 0.2; set boxwidth xw; sh = 0.03
gf(x) = x*x*x*x #change this, if a tighter gradient is needed
set cbrange [0:7]
set xrange [-0.5:2.5]
set yrange [0:1.4]
set grid ytics lw 0.5 lc rgb "#868686"
set xtics nomirror
set ylabel 'Performance [a.u.]'
set palette defined (0 '#e0e8f5', 1 '#31bd71', 2 '#e0e8f5', 3 '#d99795', 4 '#e0e8f5', 5 '#9ab5e4', 6 "#ffffff", 7 "#a2a2a2")
plot 'msbar.dat' u 0:(0):xticlabel(1) w l, \
'' u ($0-xw):($2+0.1):(stringcolumn(2)) w labels, \
'' u ($0):($3+0.1):(stringcolumn(3)) w labels, \
'' u ($0+xw):($4+0.1):(stringcolumn(4)) w labels, \
'msbar_bar_sh.dat' u ($1*xw-1.5*xw+sh):($2*1.0-sh):($3+6.0) w ima, \
'' u ($1*xw-.5*xw+sh):($2*1.0-sh):($3+6.0) w ima, \
'' u ($1*xw+.5*xw+sh):($2*1.2-sh):($3+6.0) w ima, \
'' u ($1*xw-1.5*xw+sh+1):($2*0.9-sh):($3+6.0) w ima, \
'' u ($1*xw-.5*xw+sh+1):($2*0.2-sh):($3+6.0) w ima, \
'' u ($1*xw+.5*xw+sh+1):($2*0.95-sh):($3+6.0) w ima, \
'' u ($1*xw-1.5*xw+sh+2):($2*0.95-sh):($3+6.0) w ima, \
'' u ($1*xw-.5*xw+sh+2):($2*0.5-sh):($3+6.0) w ima, \
'' u ($1*xw+.5*xw+sh+2):($2*1.1-sh):($3+6.0) w ima, \
'msbar_bar.dat' u ($1*xw-1.5*xw):($2*1.0):(gf($3)) w ima, \
'' u ($1*xw-.5*xw):($2*1.0):(gf($3)+2.0) w ima, \
'' u ($1*xw+.5*xw):($2*1.2):(gf($3)+4.0) w ima, \
'' u ($1*xw-1.5*xw+1):($2*0.9):(gf($3)) w ima, \
'' u ($1*xw-.5*xw+1):($2*0.2):(gf($3)+2.0) w ima, \
'' u ($1*xw+.5*xw+1):($2*0.95):(gf($3)+4.0) w ima, \
'' u ($1*xw-1.5*xw+2):($2*0.95):(gf($3)) w ima, \
'' u ($1*xw-.5*xw+2):($2*0.5):(gf($3)+2.0) w ima, \
'' u ($1*xw+.5*xw+2):($2*1.1):(gf($3)+4.0) w ima, \
'msbar.dat' u ($0-xw):2 w boxes lt -1 lw 0.5 lc rgb "#4f81bd", \
'' u ($0):3 w boxes lt -1 lw 0.5 lc rgb "#4f81bd", \
'' u ($0+xw):4 w boxes lt -1 lw 0.5 lc rgb "#4f81bd"


If you look at the graph, we have three different gradients, and the shadow. That makes four colour schemes altogether. Since we would like to save the difficulties associated with multiplot, we have to cram all those colour schemes into one colour range. More on this a bit later.

So, first we plot the gradient that will fill the bars, and then the "surface" that will represent our shadow. Then we set up our figure: we take off the keys, and unset the colour box. We also specify the width of our bars 'xw', and set the box width accordingly. This latter step is needed, because we want to have a border to the bars. The shadow shift, 'sh' is also defined here. In the next line, we set our colour range, in this particular case, between [0:7]. As we have pointed out, we need 4 colour ranges, and they should not overlap, therefore, we need 3 gaps between these ranges. For the sake of simplicity, we define the gap to be 1, and all the ranges to be one. This is why we end up with a total colour range of [0:7]. If you have more, or less bars to plot, you can re-define this range accordingly. The next couple of steps are trivial, up to the definition of the colour schemes. We want to have only simply gradients, thus, it is enough to define the colours at the end points. If you are unhappy with the colouring of your bars, you should change the colours here.

Having set up the figure, we can plot the data. First, we plot the labels for the xtics, and the values of the bars. Next comes the shadow. It might be a bit of an overkill for this figure, so you can skip all lines up to 'msbar_bar.dat'. Note that we simply go through all the data points in our file, and shift the shadow in each step. The height of the shadow is given by the product of the second column (which takes values between 0 and 1) of 'msbar_bar_sh.dat' and the particular data point in 'msbar.dat'. We also add a small downwards and rightwards shift to the shadows, lest they should be covered by the bars. The most important point, however, is that we plot the shadow as
'' u ($1*xw-.5*xw+sh):($2*1.0-sh):($3+6.0) w ima

i.e., the third column is shifted by 6. We do this, so that the shadow is pushed into the [6:7] range, where the shadow's colour scheme is defined. Plotting the bars happens in a similar fashion, the only difference is that we do not add 'sh' to the columns, and we push the values into the range of the appropriate colour range. At the very end, we plot the data file once more, this time with boxes, so that the bars can have a border to them.

If you want to use this plot many times, it might be worthwhile to implement it in a script in the language of your choice. The only thing required is printing lines and numbers. In pseudocode, it would look something like this:
for i
  for j
     d = read(datafile(i,j))
     print "'msbar_bar_sh.dat' u ($1*xw-(%d-1.5)*xw+sh):($2*%d-sh):($3+6.0) w ima, \", i, d
     print "'msbar_bar.dat' u ($1*xw-(%d-1.5)*xw):($2*%d):($3+%d) w ima, \", i, d, i
  end
end


Now, suppose that you want to add a legend to the figure. The usual way, setting the key, will not work here, for obvious reasons. However, we can easily add that to the figure. We just have got to find some space for it. Since there is no space left on the figure, we have to make some: we will use multiplot, and specify the size of the main figure to be less, than 1. Therefore, adding the following lines to our gnu script should produce the legend.
set multiplot
set size 0.8, 1
... (The main plot, identical to our original plot)
set origin 0.75, 0
unset border; unset xtics; unset ytics; unset ylabel; unset xlabel
set label 1 at first -0.15, 1.35 "Flatland"
set label 2 at first -0.15, 1.15 "Curvedland"
set label 3 at first -0.15, 0.95 "Land"
plot 'msbar_bar.dat' u ($1*xw-0.4):($2/10.0+1.3):(gf($3)) w ima, \
'msbar_bar.dat' u ($1*xw-0.4):($2/10.0+1.1):(gf($3)+2) w ima, \
'msbar_bar.dat' u ($1*xw-0.4):($2/10.0+0.9):(gf($3)+4) w ima

unset multiplot

The script above results in the figure below:

Friday, 2 October 2009

The turning of the histogram

I have been off for some time now, and I thought that it would be high time to post something on gnuplot. I have chosen an easy subject, but one that could turn out to be useful. At least, I have seen people searching for a solution to this problem with the histograms. If you needed it in the past, you have probably realised that gnuplot can make only vertical histograms. But sometimes, this is not the best way to represent data, and a horizontal version would be much better. For one thing, the horizontal one might take up much less space. If you want to learn how to produce the figure below, keep reading!



Making the graphs will require some work, but everything is quite straightforward. What we should note, however, is that we will still produce an upright image that we have to rotate later. If you set the terminal to postscript, you will probably use the figure in LaTeX, where you simply have to tell the compiler to rotate the figure by 90 degrees. If you set the terminal to a raster format, or print the screen, you can rotate the image in many applications, but if you want to adhere to the command line, you can, at least, under Linux, use the convert command as
convert -rotate 90 figure_in.png figure_out.png

The figure above was produced based on the following data file
1989 0.1
1990 0.2
1991 0.2
1992 0.05
1993 0.15
1994 0.3
1995 0.1

(We have seen these data many times, you should know it by heart by now!) After this interlude, let us see the code! I will discuss it afterwards.
reset
set key at graph 0.24, 0.85 horizontal samplen 0.1
set style data histogram
set style histogram cluster gap 1
set style fill solid border -1
set boxwidth 0.8
set xtic rotate by 90 scale 0
unset ytics
set y2tics rotate by 90
set yrange [0:0.35]; set xrange [-0.5:6.5]
set y2label 'Output' offset -2.5
set xlabel ' '
set size 0.6, 1
set label 1 'Year' at graph 0.5, -0.1 centre rotate by 180
set label 2 'Nowhere' at graph 0.09, 0.85 left rotate by 90
set label 3 'Everywhere' at graph 0.2, 0.85 left rotate by 90
p 'pie.dat' u 2 title ' ', '' u ($2/2.0+rand(0)/10.0) title ' ', '' u 0:(0):xticlabel(1) w l title ''

The first line after the reset is required, because we have to make our key ourselves. We simply specify the coordinates and that that we want to have a horizontal key (i.e., one in which the keys are placed to the right of the previous one), with a length of 0.1. If you want smaller or larger space between the keys, you can modify this, by adding the spacing flag to the set command. You can see the exact syntax by issuing ?key in the gnuplot prompt.
In the next four lines, we set up our histogram. The content of these lines might depend on what exactly we intend to plot. For more on this, check out the gnuplot demo page! We then rotate the xtics by 90 degrees. We do this, because the whole image will be rotated, and by rotating the xtics, we make sure that those will be horizontal at the end. For the very same reason, we also unset the ytics, and set the y2tics. We also set the y2label and xlabel. This latter one is empty, but we still need the space for it, so we set it to ' '. Beware the white space!

The next important step is the setting of the aspect ratio of our figure, which will be 0.6:1. Having done that, we introduce 3 labels: one for the xlabel, and two for the key. The placement of these labels is somewhat arbitrary, and depends on the particular terminal that you use. But only these three numbers and the coordinates of the key need any tweaking, really. At the very end, we plot our data. I plotted the second column twice, with an added random numbers, so that we can have two columns at each data point. Note that we plot the first column, too, but pass it to the xlabel command. By doing that, we can automate the labelling of the xtics, using the first column in our data file. Also note the specification of the titles: in the first two cases, the single quotes include a white space, while in the third one, there is nothing.

Tuesday, 15 September 2009

The impossible gnuplot graphs

This post is going to be a short one. I would only like to announce that most of the material presented here is collected in a more orderly fashion on my web page .



Comments should still be posted here.
Cheers!

Sunday, 13 September 2009

The slice of the pie II.

Yesterday, I tried my hand at an exploded 3D pie chart. While we can use that method in particular cases, it is not "fool-proof": we might have to set the parameters by hand, should something cover something that it wasn't supposed to cover. Obviously, this situation is not ideal, even a bit disturbing, so I began to think what we could do differently.

The problem yesterday was that we used multiplot, and multiplot does not respect the depth ordering, i.e., we cannot let gnuplot determine the distance of the elements from the viewer, for it would miserably fail, simply because those elements are not parts of the same plot. Therefore, this whole affair could be much easier, if we were able to manage to put everything, or at least, the crucial pieces in one plot. Well, I have done the hard work, you have just got to read on to see the magic trick. I will start out with a particular example, but then I will show how this can be automated, so you just have to give the file name, and the rest will be done by the script. But, just to wet your appetite, here is the figure



Now, let us get down to business. We have six data points, which, for the sake of simplicity, are called
A=0.05*2*pi; B=0.3*2*pi; C=0.4*2*pi; D=0.1*2*pi; E=0.1*2*pi; FF=0.05*2*pi;

The last one is not really relevant, for the sum of these numbers is 2 pi(e), and FF is not used anywhere at all. If you recall, the reason for having to use multiplot was that when we put the pieces together, we changed the palette after each step, so the next piece, by default, had to be in a separate plot, and then we couldn't use depth ordering any more. Since we need different colours for the slices, the only way out of the loophole that I mentioned above is that we plot everything into a file, and then plot the file. When doing so, we need to specify the colour by a separate function, but that is really easy. After this introduction, let us see the script, which I will discuss further.
reset
b=0.5; a=0.5; r=1.0; s=0.1; m=1.5; eps=1e-4; N=6
A=0.05*2*pi; B=0.3*2*pi; C=0.4*2*pi; D=0.1*2*pi; E=0.1*2*pi; FF=0.05*2*pi

f(x,n) = (x>n?0.0:1.0)
F(x) = 1+f(x,A-eps)+f(x,A+B)+f(x,A+B+C)+f(x,A+B+C+D)+f(x,A+B+C+D+E)
at(y,x) = (x==0.0?0:(atan2(y,x)>0.0?F(atan2(y,x)):F(atan2(y,x)+2*pi)))

c(u,v,q)=cos(u)*r*v+q*cos(A/2); s(u,v,q)=sin(u)*r*v+q*sin(A/2)
C(u, q) = cos(u)*r+q*cos(A/2); S(u,q) = sin(u)*r+q*sin(A/2)
z = s+a; Z(v) = s+a*v

rg(x) = abs(cos(100*x/7))
gg(x) = abs(cos(100*x/11-pi/2))
bg(x) = abs(cos(100*x/13+pi/2))

set view 30, 20; set parametric
unset border; unset tics; unset key; unset colorbox; set ticslevel 0
set pm3d depthorder; set pal maxcolor N+1
set vr [0:1]; set xr [-1.5:1.5]; set yr [-1.5:1.5]; set zr [0:2]; set cbr [0:2*pi]

set multiplot
set iso 2, 2
set table 'pieslice2.dat'
set ur [A+eps:2*pi]
splot C(u,-eps), S(u,-eps), Z(v)
set ur [0+eps:A-eps]
splot C(u,b), S(u,b), Z(v)
set ur [0:1]
splot u*r, 0, Z(v), u*r*cos(A), u*r*sin(A), Z(v), \
u*r+b*cos(A/2), b*sin(A/2), Z(v), u*r*cos(A)+b*cos(A/2), u*r*sin(A)+b*sin(A/2), Z(v)
unset table

set iso 2, 80
set table 'pieslice3.dat'
set ur [A+eps:2*pi]
splot c(u,v,0-eps), s(u,v,0-eps), z
set iso 2, 2
set ur [0:A]
splot c(u,v,b), s(u,v,b), z
unset table
unset multiplot

set multiplot
set pal functions rg(gray)/m, gg(gray)/m, bg(gray)/m
sp 'pieslice2.dat' u 1:2:3:(at($2,$1)) w pm3d

set pal functions rg(gray), gg(gray), bg(gray)
splot 'pieslice3.dat' u 1:2:3:(at($2,$1)) w pm3d
unset multiplot


OK, so let us see what is happening here! In the first line, we define a couple of things that will determine the look-out of our figure. 'b' will be the shift of the cut-out, 'a' is the pie's height, 'r' is its radius, 'm' will determine the shade of the sides, with respect to the top, and 'eps' is just a small number whose role will become clear in a second. And finally, 'N' is the number of our data points. Then come the data points. After this comes the heavy part. Literally.

f(x,n) is a Heaviside function, which we use in the colour scheme. Remember that we want to colour the pie according to a function that depends on the azimuth angle. We will have a number of colours, and change the pm3d colour whenever the angle crosses the next value in our data set. If you look closely, our F(x) increases by one at each such point. (This is why we needed the Heaviside function.) That is, we could use F(angle) to colour our plot! However, there is a small glitch: since we are going to plot into a file first, we lose the information on the angle, and we will have to undo the polar-Cartesian transformation. This is why we need some distorted version of the atan2 functions. With the definition given here, we can make sure that it returns values in [0:2 PI], and there isn't a phase jump at (-1, 0).

Having defined these 3 functions, we define our shapes. These are identical (with the exception of the trivial third/second variable, 'q') to those shown yesterday, so I will not discuss them here. The next three lines will determine the colour scheme. If you are unhappy with the pie that you get, you should change these lines here.

The next 4 lines define some properties of the image. There are only two things to watch out for: one is that we fix the number of colours in our palette, namely, 7. The second is that we fix the range of the palette: this is done by the
set cbr [0:2*pi]

where cbr stands for cbrange.

By now, we have set everything, so we are ready to plot, first to files. We will have two files, pieslice2.dat and pieslice3.dat. The first one will hold the side of the pies, while the second is the top. The only reason for putting them in two separate files is that we want darker colours for the sides. These plots are trivial, and should speak for themselves. The only exception is the shift by 'eps'. We did this, so that the atan2 function is everywhere defined. Otherwise, we would have points at (x,y) = (0,0), where the atan2 function would return an undefined value, and that would just give us a hole in the plot. Also note that we have set the number of isosamples. We have only 2, where high resolution is not required, and set it to larger numbers only where needed. This improves on speed, but more importantly, reduces the size of the file considerably, if redirected to some vector graphics format, e.g., eps, or pdf. Believe me, this really makes a lot of difference!

At this point, we have all data in two files, so we can simply plot them. We plot them separately, because we want to colour them differently, so we have to change the palette. But these steps are really straighforward. The pie is there, if you want to put a background, labels, etc., you can do it here.

At the beginning, I mentioned that this whole business can be done automatically. If you look at the script above, you will notice that the variables and all related functions are defined at the beginning: N was the number of samples, then we had the data points, and finally, F(x) depends on the data. So, if we have a script that prints all these into a file, we can load that file there, and the rest is unchanged. The improved version would look like this:

reset
b=0.5; a=0.5; r=1.0; s=0.1; m=1.5; eps=1e-4

f(x,n) = (x>n?0.0:1.0)
load 'pie_l.gnu'
at(y,x) = (x==0.0?0:(atan2(y,x)>0.0?F(atan2(y,x)):F(atan2(y,x)+2*pi)))

c(u,v,q)=cos(u)*r*v+q*cos(A/2); s(u,v,q)=sin(u)*r*v+q*sin(A/2)
C(u, q) = cos(u)*r+q*cos(A/2); S(u,q) = sin(u)*r+q*sin(A/2)
z = s+a; Z(v) = s+a*v

rg(x) = abs(cos(100*x/7))
gg(x) = abs(cos(100*x/11-pi/2))
bg(x) = abs(cos(100*x/13+pi/2))

set view 30, 20; set parametric
unset border; unset tics; unset key; unset colorbox; set ticslevel 0
set pm3d depthorder; set pal maxcolor N+1
set vr [0:1]; set xr [-1.5:1.5]; set yr [-1.5:1.5]; set zr [0:2]; set cbr [0:2*pi]

set multiplot
set iso 2, 2
set table 'pieslice2.dat'
set ur [A+eps:2*pi]
splot C(u,-eps), S(u,-eps), Z(v)
set ur [0+eps:A-eps]
splot C(u,b), S(u,b), Z(v)
set ur [0:1]
splot u*r, 0, Z(v), u*r*cos(A), u*r*sin(A), Z(v), \
u*r+b*cos(A/2), b*sin(A/2), Z(v), u*r*cos(A)+b*cos(A/2), u*r*sin(A)+b*sin(A/2), Z(v)
unset table

set iso 2, 80
set table 'pieslice3.dat'
set ur [A+eps:2*pi]
splot c(u,v,0-eps), s(u,v,0-eps), z
set iso 2, 2
set ur [0:A]
splot c(u,v,b), s(u,v,b), z
unset table
unset multiplot

set multiplot
set pal functions rg(gray)/m, gg(gray)/m, bg(gray)/m
sp 'pieslice2.dat' u 1:2:3:(at($2,$1)) w pm3d

set pal functions rg(gray), gg(gray), bg(gray)
splot 'pieslice3.dat' u 1:2:3:(at($2,$1)) w pm3d
unset multiplot

where 'pie_l.gnu' contains the following two lines

N=6
F(x) = 1+f(x,0.314)+f(x,2.199)+f(x,4.712)+f(x,5.34)+f(x,5.969)

and nothing else. At this point, we can either let a script write these numbers to 'pie_l.gnu', or just print it to the standard output, and re-direct that output to gnuplot as
load '< somescript somedata.dat'


The following gawk script should do the job, provided that you want to process the first column in your data file. The script also normalises, so arbitrary numbers can be used.
#!/bin/bash

gawk 'BEGIN {sum=0.0; i=0}
    {       a[i] = sum+$1
            sum += $1
            i++
    }
    END {   printf "N=%d\n", i
            printf "F(x) = 1"
            for(j=0;j<i-1;j++) printf "+f(x,%f)", 6.28318530717959*a[j]/sum
            printf "\n"
    }' $1


This was a long post, but I hope that it is more or less clear what we are doing, and after all, it is not hard at all.
Cheers!

Saturday, 12 September 2009

The slice of the pie

I have already shown various ways of making a pie chart in gnuplot. We will brush off one of the methods, and see how we can move one of the slices a bit. At the end of the day, we will have the following graph



For the sake of simplicity, I will demonstrate the method for four data points, which we will call
A=0.2*2*pi; B=0.3*2*pi; C=0.4*2*pi; D=0.1*2*pi;

In this particular case, they add up to 2 pi(e);) Once we see what and how to do, we can drop this condition, and use any data. But first, let us see the script
reset
b=0.5; a=0.5; r=1.0; s=0.1; m=1.5
c(u,v)=cos(u)*r*v+BB*cos(A/2); s(u,v)=sin(u)*r*v+BB*sin(A/2)
C(u) = cos(u)*r+BB*cos(A/2); S(u) = sin(u)*r+BB*sin(A/2)
z = s+a; Z(v) = s+a*v
A=0.2*2*pi; B=0.3*2*pi; C=0.4*2*pi; D=0.1*2*pi;
set view 30, 20; set parametric
unset border; unset tics; unset key; unset colorbox
set ticslevel 0
set vr [0:1]; set xr [-1.8:1.8]; set yr [-1.8:1.8]; set zr [0:2]
set multiplot
set ur [0:1]; set pal mo RGB func 0.8, 0.8, 0.8
splot 3.6*u-1.8, 3.6*v-1.8, 0 w pm3d

BB=b
set ur [0:r]; set pal mo RGB func 1/m, 0, 0
splot BB*cos(A/2)+cos(A)*u, BB*sin(A/2)+sin(A)*u, Z(v) w pm3d
splot BB*cos(A/2)+u, BB*sin(A/2), Z(v) w pm3d

set ur [0:A]; set pal mo RGB func 1/m, 0, 0
splot C(u), S(u), Z(v) w pm3d
set ur [A:A+B]; set pal mo RGB func 0, 1/m, 0; BB=0.0; rep
set ur [A+B:A+B+C]; set pal mo RGB func 0, 0, 1/m; rep
set ur [A+B+C:A+B+C+D]; set pal mo RGB func 1/m, 1/m, 0; rep
set ur [0:r]; set pal mo RGB func 0, 1/m, 0
splot cos(A)*u, sin(A)*u, v*a+s w pm3d

BB=b
set ur [0:A]; set pal mo RGB func 1, 0, 0
splot c(u,v), s(u,v), z w pm3d
set ur [A:A+B]; set pal mo RGB func 0, 1, 0; BB=0.0; rep
set ur [A+B:A+B+C]; set pal mo RGB func 0, 0, 1; rep
set ur [A+B+C:A+B+C+D]; set pal mo RGB func 1, 1, 0; rep
unset multiplot


At the beginning, we define several plot-related constants, such as, 'b', which gives the displacement of the red slice, a, which is the height of the pie, r, which is the radius or the arcs, s, which determines by how much the pie is lifted from ground, and m, which we will use in the colouring: the sides of the slices will be m times darker, than the top, so as to give the impression that the top is lit, while the sides are in the shadow.

Then we define 6 functions, which are just helpers, so that the rest of the code will look tidier. These are basically the parametrisation of the sides and the tops. Having done this, we define A, B, C, and D, our data points, then we set a couple of things on the figure, and in the multiplot, plot the "ground".

Once we have the ground on which to build, we plot the sides of the pie. Note that since we have two cuts, we would, in principle, need 4 new surfaces. However, one can never be seen, so we can just drop that altogether. Then we plot the cylinder, each segment in a different colour. Since those plots are actually the same, except for the range that we re-set every time, we can simply replot everything. This makes the code a bit cleaner, I believe. Also note that the colours are divided by m, which we defined at the beginning. Finally, we put on the cap, in a brighter colour. Adding the labels, if needed, should be quite straightforward. You can look up the details in my other pie chart posts.

The only trick that we used here was that we shifted the cut-out slice by such an amount that only one surface was partially covered by the rest of the pie. In general, it is quite hard to determine which surface will cover which, if we let the cut-out closer, so I cheated here a bit.

Thursday, 3 September 2009

Bubble graphs in gnuplot, once more

We have already discussed two ways of making a bubble plot. Now we will see a third one, which is probably the best of these, for it doesn't have the overhead of a) hacking the postscript file, b) plotting something with pm3d. This is so simple! Here is my graph




and here is the code
reset
set sample 30
set table 'bubble.dat'
p [0:5] sin(x*3.0)*exp(-x)+rand(0)/10.0
unset table

xs=-0.007
ys=0.002
unset key
set object 1 rect from screen 0,0 to screen 1,1 behind fillcolor rgb "#ddddff"
set border back
set xlabel 'Time [s]'
set ylabel 'Position [m]'
p [-0.1:5] 'bubble.dat' u 1:2 w p ps 3 pt 7 lc rgb "#ff0000", \
'' u ($1+xs):($2+ys) w p ps 2.6 pt 7 lc rgb "#ff2222", \
'' u ($1+xs):($2+ys) w p ps 2.2 pt 7 lc rgb "#ff4444", \
'' u ($1+xs):($2+ys) w p ps 1.8 pt 7 lc rgb "#ff6666", \
'' u ($1+xs):($2+ys) w p ps 1.4 pt 7 lc rgb "#ff8888", \
'' u ($1+xs):($2+ys) w p ps 1.0 pt 7 lc rgb "#ffaaaa", \
'' u ($1+xs):($2+ys) w p ps 0.6 pt 7 lc rgb "#ffcccc", \
'' u ($1+xs):($2+ys) w p ps 0.2 pt 7 lc rgb "#ffeeee"


The red parts are really irrelevant: the first one just produces some dummy data, we have seen this so many times that it is becoming boring. Then we define two variables (blue) 'xs' and 'ys', and draw a rectangle, just to have some background. In the last 8 lines we plot our data 8 times, each time in a different colour (the colour successively becomes whiter) and in different size (the points successively become smaller). At the same time, the centre of the circles is shifted by 'xs' and 'ys', so that we give the impression that the spheres are lit from the top left corner. By modifying these values, you can move your virtual light source.

The are only three caveats here: one is that since we apply a shift to the data points, we have got to make sure that our xrange and yrange supports the shifter data. This is why, while we have data in [0:5], our xrange is actually [-0.1:5]. The yrange does not require special attention in this case.

The second caveat is that it is perfect for a raster image, it will look quite nasty on a vector image. If you plan on printing the graph through postscript or pdf, you will have to define more levels for the transition from red to white. But the idea is the same, you will simply have more plots.

And third, when you define your 'xs' and 'ys', you have to make sure that the successive plots fall on the first one, otherwise your bubbles will have some funny shape.

Monday, 31 August 2009

Filled curves in gnuplot

Gnuplot has a plotting option called filledcurve. I do not want to dwell on the specifics of this, because I think that most about it can be learned by looking at the demo page. What I would like to discuss instead is how this can be used to turn our graph into a sleek figure. This is not something that you would put in a paper publication, on the other hand, on a poster, a web page or similar, it looks much better, than what you would get using the default. I just show the image, and we will discuss the script later.




reset
f(x) = x+20
g(x) = x*x*x-2.0*x*x+x+10-x*x*x*x*0.01
set sample 101
set parametric
set trange [0:4]
set table 'filled.tab'
plot g(t)+5.0*(0.5-rand(0)), f(t)+5*(0.5-rand(0))
unset table
unset parametric
set iso 2, 100
set xrange [0:100]; set yrange [0:50]
set table 'filledbg.tab'
splot y*0.5
unset table

set key left
unset colorbox
set border lc rgb "white"
set grid front lc rgb "#dddddd"
set xlabel 'Time [a.u.]' tc rgb "white"
set ylabel 'Amplitude [a.u.]' tc rgb "white"
set palette defined (0 0.2 0.2 0.2, 1 0.6 0.6 0.6)
set object 1 rect from screen 0, 0 to screen 1, 1 behind fc rgb "black" fillstyle solid 1.0
p 'filledbg.tab' w image t '', \
'filled.tab' u 0:1:2 w filledcurve above lt 1 lc rgb "#467F1E" t '', \
'' u 0:1:2 w filledcurve below lt 1 lc rgb "#802020" t '', \
'' u 0:1 w l lt 8 lw 1 t 'First', '' u 0:2 w l lt 9 lw 1 t 'Second'


The first part of the code is nothing but generating some dummy data. If you have something to plot, this is unnecessary, and the actual plot takes only 12 lines altogether. However, if you want to have a gradient in the background, you will have to generate 'filledbg.tab'. This is the file that we will plot as an image, and this is what will give the gradient. I would also point out here that in order to generate 'filledbg.tab', we need the range of the actual plot. If you don't know it beforehand, you can use a trick to determine this. I have discussed this at great length in a few of my previous posts, so you could look it up there.

Now, sometimes you do need a background for the plot, and this is not just some silly whim, but you might have to match the background of your poster, or your transparency. In order to achieve this, we will draw a rectangle behind the graph, and fill it with the desired colour. What we should keep in mind, however, is that without taking extra measures, the axis labels will have the default colour of black, and this will not look very good on a black background. Therefore, you should specify the colour in the declaration of the labels.
In the plot, first we draw the gradient. Without defining the grid in the front, the grid would be obscured, therefore, we push it to the forefront. The real curves are drawn four times: first we draw the filled curves, simply using three columns of our data file. (One column can be dropped, in which case an axis or a function could be used to define the boundaries.) If the two curves cross each other, we could have different colours, depending on the relative size of the functions that we plot. This is why we have to plot our curves for the second time: we have different colour for when the first curve is above and below the second one. The last two plot are just there to draw the actual curves in different colours.