Tuesday, 16 March 2010

Bubble plots

Yesterday, I discussed a method for adding an edge to an arbitrary symbol. If you recall (or roll down on this page), the idea was to trick gnuplot into plotting our data file twice, but in a way that each point was plotted twice in succession. Now, what if we plotted more times? There was really nothing special about the number 2, so there is no reason why we could not do this. But if we can, then we should, and see what comes out of it. With very small modifications, our script from yesterday can be turned into a bubble graph, like this

So, let us see how the machinery works!

```reset
plot 'new_bubble1.dat' u 0:2
red_n = GPVAL_DATA_X_MAX

plot 'new_bubble2.dat' u 0:2
blue_n = GPVAL_DATA_X_MAX

plot 'new_bubble3.dat' u 0:2
green_n = GPVAL_DATA_X_MAX

rem(x,n) = x - n*(x/n)
size(x,n) = 3*(1-0.8*rem(x,n)/n)
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n))
blue(x,n) = sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255)
green(x,n) = sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n))

posx(X,x,n) = X + 0.03*rem(x,n)/n
posy(Y,x,n) = Y + 0.03*rem(x,n)/n

unset key
set border back
level = 40
plot for [n=0:level*(red_n+1)-1] 'new_bubble1.dat' using (posx(\$1,n,level)):(posy(\$2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level*(blue_n+1)-1] 'new_bubble2.dat' using (posx(\$1,n,level)):(posy(\$2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level*(green_n+1)-1] 'new_bubble3.dat' using (posx(\$1,n,level)):(posy(\$2,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
```
Again, the first three plots are there for determining the sample size, and nothing more. We, thus, start out with a number of function definitions. The first one is a remainder function, the second one uses the remainder to return the size of the bubble, the third one is a simple helper function, returning values between 0 and 240, and red, blue, and green determine the colour of our bubbles. If you look carefully, you will notice that these colours are successively whiter as the remainder increases. Finally, again by making use of our remainder function, we define two position shifts: in order to give the impression that the bubbles are lit from the top right corner, we have to shift successive circles in that direction. The value of this shift is important in the sense that, if chosen too high, the circles belonging to the same data point will no longer cover each other. (This is not necessary a tragedy, see below.)

Then we decide to have 40 colour levels (we could have anything up to 255, although it might be a bit time consuming and unnecessary), and call our plots. The structure is the same as it was yesterday: we use a for loop for each data set, move the circles a bit, and set the colours to whiter shades. That is all.

Now, what happens, if we take too big a value for the shift? This, actually, might lead to interesting effects, as shown in this graph, where droplets represent the data points.

After having seen the simplest implementation, we should ask whether it is possible to add some decorations. E.g., whether it is possible to add a thin black edge to the symbols. It is relatively simple, as the following script shows. We only have to re-define some of our functions as follows
```size(x,n) = (rem(x,n) == 0 ? 3.3 : 3*(1-0.8*rem(x,n)/n))
c(x,n) = floor(240.0*rem(x,n)/n)
red(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", 255, c(x,n), c(x,n)))
blue(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), c(x,n), 255))
green(x,n) = (rem(x,n) == 0 ? "#000000" : sprintf("#%02X%02X%02X", c(x,n), 255, c(x,n)))

posx(X,x,n) = (rem(x,n) < 2 ? X : X + 0.03*rem(x,n)/n)
posy(Y,x,n) = (rem(x,n) < 2 ? Y : Y + 0.03*rem(x,n)/n)
```
All these functions do is to check whether we are plotting the first round, and if so, set the colour to black. There is a small difference in the shifts, for we do not move the circles, if they are in the first or the second round. The reason is obvious, as is the result

OK, so we can plot bubbles, with or without black circumference, but we would also like to add a legend. Well, that is simple, in fact, nothing could be simpler. Just add the following the following three lines to our code

```set label 1 'Red bubbles' at 9,6 left
set label 2 'Blue bubbles' at 9,5 left
set label 3 'Green bubbles' at 9,4 left
```
and the following six
```for [n=0:level-1] 'new_bubble1.dat' using (posx(8.5,n,level)):(posy(6,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb red(n,level) , \
for [n=0:level-1] 'new_bubble2.dat' using (posx(8.5,n,level)):(posy(5,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb blue(n,level) , \
for [n=0:level-1] 'new_bubble3.dat' using (posx(8.5,n,level)):(posy(4,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
```
and we are done! All we do here is to plot our data files in a silly way: we plot a single point at (8.5,6), (8.5,5), and (8.5,4). The plotting of the data file does not happen in this sense, we use it for convenience's sake only. (This trick can also be used for the post from yesterday.) There, you have it!

1. IT IS A VERY NICE SUGGESTION, THANK YOU LOTS! ........................................

2. cheers.

You're missing the "replot" command in the last block of commands, in order to plot the legend bubbles.

Also, it's a bit clumsy to have to provide axis coordinates for the position of the legend (rather than graph coordinates), since if your data is changed and you re-run the script it may be off-graph.

Using GPVAL_X_MAX and other variables would allow you to calculate the position in graph coordinates :

[code]
graphx(x)=GPVAL_X_MIN+x*(GPVAL_X_MAX-GPVAL_X_MIN)
graphy(y)=GPVAL_Y_MIN+y*(GPVAL_Y_MAX-GPVAL_Y_MIN)
set label 1 'Red bubbles' at graphx(0.9),graphy(0.5) right
replot for [n=0:level-1] 'new_bubble3.dat' using (posx(graphx(0.9)+0.2,n,level)):(posy(graphy(0.5),6,n,level)) \
every ::(n/level)::(n/level) with p pt 7 ps size(n,level) lc rgb green(n,level)
[/code]

3. 4. Hello Joce,

I don't think that a replot is missing there: what I had in mind was to place the 3 'set label' commands before the plot, and add the three legends to the first and only plot command.
But I like your idea about the automatic placement of the legend. Thanks for that!
Cheers,
Zoltán

5. I really like your posts. The ideas are brilliant and I find the posts much more enlightening than the gnuplot manual. Although, manual is a must. However, you could plot bubbles as images. I use the following way
http://fityk-tutorials.blogspot.com/2010/11/gnuplot-cygwin-and-gnuplot-custom.html

but probably you know it already.

6. Hi Kostya,

Many thanks for the kind words! As a matter of fact, I have seen this option before. I think, it was introduced in version 4.3, but I have never thought of using this in real life. However, I have to admit that your graphs look quite cool. Thanks for sharing it!
Cheers,
Zoltán

7. This comment has been removed by the author.