Thursday, 3 December 2009

Defining some new plot styles

I have long wanted to write a blog post on this subject, but somehow, I never got the time to do it. But the time has come, and I will do it now. One of the plot styles that I actually like in origin (no matter what I think of the software in overall) is the one where the circumference of a circle is drawn in a colour different to that of the body of the circle, as in this graph

The only glitch is that this is not supported in gnuplot. Well, this is not completely true, because one can use a new prologue for the postscript output, but that works for postscript only, nothing else. What should we do then? There are two options: one has already been discussed, in connection with the bubble charts, sometime in early September. Nevertheless, that method is feasible only when the points do not overlap. What we did there was to plot the data twice (or however many times). But the problem is that those are actually two different plots, so if the adjacent circles overlap, we will not have what we want. Just try it, if you are still not convinced! The second option is that we keep reading. So, the question is then, how could we trick gnuplot into thinking that we have only one plot, not two. Well, the short answer is that we plot only one plot, not two, and we then make sure that points in the plot are coloured properly. I think it is time to show my script, that should make everything clear.

```reset
set table 'two.dat'
plot [0:10] sin(x)+0.1*rand(0), cos(x)+0.1*rand(0)
unset table

f(x) = cos(x*pi)
set cbrange [-1:2]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000", 2 "#0000ff")

plot "< gawk '{print \$0; print \$0}' two.dat" using 1:2:(2+f(\$0)/5.0):(f(\$0+1)) index 0 \
with points pt 7 ps var lt palette title 'red', \
'' using 1:2:(2+f(\$0)/5.0):(f(\$0+1)*2) index 1 \
with points pt 7 ps var lt palette title 'blue'
```

As always, we generate some data. Note, that we plot sin(x) and cos(x), which will be in the same data file, but in two separate data blocks. This will become important later. What we need to know about data blocks is that in the data file they are separated by pairs of blank lines, and that we can ask gnuplot to plot only certain data blocks. We indicate our choice with the use of the index keyword. If you want to know more about this, issue the
```?index
```
command, and look at the data file 'two.dat'!

OK, so we have some data, if a bit obscure at the moment. Next, we define a funny function. If you look closely, you will realise that f(x) takes the value of -1 for odd, and 1 for even numbers. In principle, any function should do here, but one has got to be a bit careful: those functions that take 1 or -1 at isolated points might not work. If you do not want to dive into the details of this, take my word for it that f(x) defined above will just be perfect for our purposes.

The next step is the definition of a colour palette with the colour range. (We take off the colour box, too, and set the border to back, but these are irrelevant nuances.) We use the palette for colouring our graph: in the first curve, every other point will be black and red, while in the second curve, every other point will be black and blue. Thus, I have already divulged my trick: we have to duplicate our data set in a way that every line is copied at its position, so, in 'two.dat' we will have something like this
```...
9.29293 -0.93128  i
9.29293 -0.93128  i
9.39394 -0.978266  i
9.39394 -0.978266  i
9.49495 -0.923256  i
9.49495 -0.923256  i
...
```

For this we use an external gawk script, but it is really just one line, it could not be any simpler. Once we duplicated our data, we plot it, and colour every second point as black, and every second point as red. Also note that we use the index keyword to choose the data block that we need. If you have only one data block, you can skip this. But not only do we colour the points differently, but we also resize them: after all, if all had the same size, we would not see the ones that are plotted first. For all this machinery, we use the fact that when a 2D plot is given as 4 columns, the size of the points can be determined by the third column, while the fourth column can be assigned to take care of the colour. In this case, the colour is taken from the palette that we defined above. The general form of such a plot is
```plot 'foo' using 1:2:3:4 with points pt 7 ps var lt palette
```
where 'var' signifies the variable point size (ps), while palette is the colour. Note that we use our snappy f(x) function to choose the size and the colour of every second point, simply by counting the ordinal number of the particular data record, \$0: For even numbers, we set the size 2+1/5, and the colour to black (colour value of -1), while for odd numbers, the size is 2-1/5, and the colour is red (colour value of 0). If you have a single data, the whole script would be simply
```f(x) = cos(x*pi)
set cbrange [-1:1]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000")

plot "< gawk '{print \$0; print \$0}' two.dat" using 1:2:(2+f(\$0)/5.0):(f(\$0+1)) \
with points pt 7 ps var lt palette title 'red'
```
Well, this is it for today. Next time I will try to show some of the new stuff in gnuplot 4.4. Cheers,
Zoltán

Tuesday, 1 December 2009

New version of gnuplot is released!

I have been waiting for this for a long time, but at long last, it has happened! The new version of gnuplot has been released with the designation 4.4. You can download the binary or the source code from the sourceforge repository. In the future, I will discuss the new things, and show what can be done with them.
Cheers,
Zoltán

Restricting fit parameters

Chris asked an interesting question today, namely, how one can restrict the fit range in gnuplot. What he meant by that was not the range of the data points (that is really easy, the syntax is the same as for plot), but the range of fit parameters. In some cases, it is a quite reasonable requirement, because we might know from somewhere that certain parameter values just do not make any sense. As it turns out, it is rather easy to achieve this in gnuplot. All we have to do is to come up with a function that restricts its values in the desired range.

After this interlude, let us see an example! We will create some data with the following gnuplot script:
```reset
a=1.0; b=1.0; c=1.0
f(x) = a*exp(-(x-b)*(x-b)/c/c)
set table 'restrict.dat'
plot [-2:4] f(x)+0.1*(rand(0)-0.5)
unset table
```

We take a Gaussian, with some noise added to it. Naturally, we would like to fit a Gaussian to this data, and in particular, f(x). But what, if our model is such that 'a' must be in the range [1.1:2], 'b' must be in the range [0.1:0.9], and 'c' must be in the range [0.5:1.5]? We just use in our fit, instead of f(x), another function, g(x), say, of the form
```g(x) = A(a)*exp(-(x-B(b))*(x-B(b))/C(c)/C(c))
```
where A(a), B(b), and C(c) take care of our restrictions. These functions are somewhat arbitrary, but for better or worse, I will take the following three arcus tangents
```# Restrict a to the range of [1.1:2]
A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

# Restrict b to the range of [0.1:0.9]
B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

# Restrict c to the range of [0.5:1.5]
C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5
```
which would look like this

The point here is that as x runs from negative infinity to positive infinity, A(x) runs between 1.1, and 2.0, and likewise for B(x), and C(x). Then the fit goes on as it would normally. Our script is, thus, the following in its full glory:

```# Restrict a to the range of [1.1:2]
A(x) = (2-1.1)/pi*(atan(x)+pi/2)+1.1

# Restrict b to the range of [0.1:0.9]
B(x) = (0.9-0.1)/pi*(atan(x)+pi/2)+0.1

# Restrict c to the range of [0.5:1.5]
C(x) = (1.5-0.5)/pi*(atan(x)+pi/2)+0.5

a=0.0
b=0.5
c=0.9
fit f(x) 'restrict.dat' via a, b, c

g(x) = A(aa)*exp(-(x-B(bb))*(x-B(bb))/C(cc)/C(cc))
aa=1.5
bb=0.5
cc=0.9
fit g(x) 'restrict.dat' via aa, bb, cc

plot f(x), g(x), 'restrict.dat' w p pt 6
```

and it produces the following graph:

At this point, we should not forget, that what we are interested in is not the value of 'aa', 'bb', or 'cc', but the value of 'a', 'b', and 'c'. This means that what we have to take is
A(aa), B(bb), and C(cc). If you print the values of 'a', 'b', and 'c' from the fit to f(x), the value of 'aa', 'bb', and 'cc', and the value of A(aa), B(bb), and C(cc), we get the following results
```gnuplot> pr a, b, c
0.984221984191135 0.996600824504231 1.00765240463672

gnuplot> pr aa, bb, cc
-3442408.91578921 1443864.45093385 -0.201236322474146

gnuplot> pr A(aa), B(bb), C(cc)
1.10000008322047 0.899999823634477 0.936788734177637
```

Obviously, the second print does not make too much sense, we have to compare the last one to the first one. We can here see that we got values in the ranges [1.1:2], [0.1:0.9], and [0.5:1.5], as we wanted to.