Thursday, 3 December 2009

Defining some new plot styles

I have long wanted to write a blog post on this subject, but somehow, I never got the time to do it. But the time has come, and I will do it now. One of the plot styles that I actually like in origin (no matter what I think of the software in overall) is the one where the circumference of a circle is drawn in a colour different to that of the body of the circle, as in this graph


The only glitch is that this is not supported in gnuplot. Well, this is not completely true, because one can use a new prologue for the postscript output, but that works for postscript only, nothing else. What should we do then? There are two options: one has already been discussed, in connection with the bubble charts, sometime in early September. Nevertheless, that method is feasible only when the points do not overlap. What we did there was to plot the data twice (or however many times). But the problem is that those are actually two different plots, so if the adjacent circles overlap, we will not have what we want. Just try it, if you are still not convinced! The second option is that we keep reading. So, the question is then, how could we trick gnuplot into thinking that we have only one plot, not two. Well, the short answer is that we plot only one plot, not two, and we then make sure that points in the plot are coloured properly. I think it is time to show my script, that should make everything clear.

reset
set table 'two.dat'
plot [0:10] sin(x)+0.1*rand(0), cos(x)+0.1*rand(0)
unset table

f(x) = cos(x*pi)
set cbrange [-1:2]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000", 2 "#0000ff")

plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) index 0 \
with points pt 7 ps var lt palette title 'red', \
'' using 1:2:(2+f($0)/5.0):(f($0+1)*2) index 1 \
with points pt 7 ps var lt palette title 'blue'

As always, we generate some data. Note, that we plot sin(x) and cos(x), which will be in the same data file, but in two separate data blocks. This will become important later. What we need to know about data blocks is that in the data file they are separated by pairs of blank lines, and that we can ask gnuplot to plot only certain data blocks. We indicate our choice with the use of the index keyword. If you want to know more about this, issue the
?index
command, and look at the data file 'two.dat'!

OK, so we have some data, if a bit obscure at the moment. Next, we define a funny function. If you look closely, you will realise that f(x) takes the value of -1 for odd, and 1 for even numbers. In principle, any function should do here, but one has got to be a bit careful: those functions that take 1 or -1 at isolated points might not work. If you do not want to dive into the details of this, take my word for it that f(x) defined above will just be perfect for our purposes.

The next step is the definition of a colour palette with the colour range. (We take off the colour box, too, and set the border to back, but these are irrelevant nuances.) We use the palette for colouring our graph: in the first curve, every other point will be black and red, while in the second curve, every other point will be black and blue. Thus, I have already divulged my trick: we have to duplicate our data set in a way that every line is copied at its position, so, in 'two.dat' we will have something like this
...
 9.29293 -0.93128  i
 9.29293 -0.93128  i
 9.39394 -0.978266  i
 9.39394 -0.978266  i
 9.49495 -0.923256  i
 9.49495 -0.923256  i
...

For this we use an external gawk script, but it is really just one line, it could not be any simpler. Once we duplicated our data, we plot it, and colour every second point as black, and every second point as red. Also note that we use the index keyword to choose the data block that we need. If you have only one data block, you can skip this. But not only do we colour the points differently, but we also resize them: after all, if all had the same size, we would not see the ones that are plotted first. For all this machinery, we use the fact that when a 2D plot is given as 4 columns, the size of the points can be determined by the third column, while the fourth column can be assigned to take care of the colour. In this case, the colour is taken from the palette that we defined above. The general form of such a plot is
plot 'foo' using 1:2:3:4 with points pt 7 ps var lt palette
where 'var' signifies the variable point size (ps), while palette is the colour. Note that we use our snappy f(x) function to choose the size and the colour of every second point, simply by counting the ordinal number of the particular data record, $0: For even numbers, we set the size 2+1/5, and the colour to black (colour value of -1), while for odd numbers, the size is 2-1/5, and the colour is red (colour value of 0). If you have a single data, the whole script would be simply
f(x) = cos(x*pi)
set cbrange [-1:1]; unset colorbox; set border back

set palette defined (-1 "#000000", 1 "#ff0000")

plot "< gawk '{print $0; print $0}' two.dat" using 1:2:(2+f($0)/5.0):(f($0+1)) \
with points pt 7 ps var lt palette title 'red'
Well, this is it for today. Next time I will try to show some of the new stuff in gnuplot 4.4. Cheers,
Zoltán

3 comments:

  1. I think this can be done in a less complicated way, but it might have some side-effect I'm not aware of. Simply graph the data twice in one plot with a slightly large point / line size (two sizes larger seems to do the trick). If you're using a legend you can turn off the outline plots so it doesn't get cluttered.

    For example:

    set style line 1 lw 2
    set style line 2 lw 4

    plot "file.dat" index 0 with lines notitle ls 2, \
    "file.dat" index 0 with lines title "Dataset 1" ls 1, \
    "file.dat" index 1 with lines notitle ls 2, \
    "file.dat" index 1 with lines title "Dataset 2" ls 1

    etc.

    ReplyDelete
  2. I am afraid, I have to disagree here: if the points overlap, your scheme won't work at all. This was the whole point. That is why we have to plot each point twice, and not each data set twice. You can take the script from above, and set the sample frequency to 200, and you'll see what I meant.
    Cheers,
    Zoltán

    ReplyDelete
  3. Very, very nice post.
    What I got is what I haven't gotten for long time.
    You saved me.
    (I'm going to understand all now.)
    (I don't agree to Skunkiller, either. The procedure doesn't work for what is discussed here.)

    ReplyDelete