## Sunday, 28 June 2009

### Bubble graphs with a different method

Yesterday, I showed a simple way to producing bubble graphs with gnuplot. That method relied on manipulating the eps file. Today, we will try another option, this time plotting everything on the appropriate terminal. At the end, we will have the following figure:

First, let us have a look at the script, assuming that we have the following data to plot:
`0 -0.06947260.20202 0.2334840.40404 0.4243110.606061 0.5466880.808081 0.5800431.0101 0.8622141.21212 0.9076011.41414 0.7596921.61616 0.8848791.81818 0.7845662.0202 0.71774...`

`resetf(x) = A*exp(-x*x/B/B)rx=0.107071; ry=0.057876; A = 1; B = 0.2; C=0.5*rx; D=-0.4*ryg(u,v) = (2*cos(u)*v*rx+C)*(2*cos(u)*v*rx+C)+(3.5*sin(u)*v*ry+D)*(3.5*sin(u)*v*ry+D)             unset key; unset colorbox; set view mapset xrange [-0.15:5.2]; set yrange [-0.7:0.95]set parametric; set urange [0:2*pi]; set vrange [0:1]                         set isosamples 20, 20; set samples 30                                         set palette model HSV functions 1, 1-f(gray), 1+2*f(gray)                     splot cos(u)*rx*v+0.000000,sin(u)*ry*v+0.000000, g(u,v) w pm3d, \cos(u)*rx*v+0.202020,sin(u)*ry*v+0.233484, g(u,v) w pm3d, \cos(u)*rx*v+0.404040,sin(u)*ry*v+0.424311, g(u,v) w pm3d, \cos(u)*rx*v+0.606061,sin(u)*ry*v+0.546688, g(u,v) w pm3d, \cos(u)*rx*v+0.808081,sin(u)*ry*v+0.580043, g(u,v) w pm3d, \ ...`

First, we define a Gaussian; this will be the colouring function, and then define a couple of variables that go into that function. Having done this, we define the function that determines the argument of f(x). The next couple of lines simply sets the ranges and the parametric plot with the parametric ranges, and finally, the number of samples. The last thing we have to define is the palette function. We choose red (i.e., the hue is equal to 1), and the saturation and value are given by 1-f(x) and 1+2*f(x), where the argument is the gray value. At this point, we are ready to plot the points in question. We will simply draw circles with origin x,y, where the x and y values are taken from the data file that we showed above. Note that in fact we are drawing ellipses with axes rx and ry. The reason for this is that the aspect ratio of the plot is not equal to 1, i.e., were we to draw circles, they would look ellipses on the plot. The value of rx and ry are determined by the plot ranges xrange and yrange. (We will see this in the gawk script below.) When plotting the points, we have to plot one circle for each data point, i.e., we have to call the plot function many times, while C and D give the centre of the white spot. increasing C or D will push the white points to the edge of the circles.

Now, a few words on the various parameters above. The value A determines how bright the bubble will be at its brightest point. 1 corresponds to white, values smaller than 1 give a darker tinge of red. B determines how tight the white spot is. Obviously, rx and ry are the size, so if you want to have smaller circles, you could scale them accordingly, keeping their ratio.

We could easily write a script that takes a data file with two (or more) columns, and turns it into a gnu script along the lines presented above. A possible implementation in gawk is here.
`#!/bin/bashgawk  '{  if(\$0!~/#/) {   x[i] = \$1   y[i] = \$2   if(i==0) { mx = x[i]; my = y[i] }   if(i>0)   {         if(max < x[i]) max = x[i]         if(mix > x[i]) mix = x[i]         if(may < y[i]) may = y[i]         if(miy > y[i]) miy = y[i]   }   i++ } } END { eps = 0.03  lx = mix-eps*(max-mix)  hx = max+eps*(max-mix)  ly = miy-eps*(may-miy)  hy =  may+eps*(may-miy)  print "reset"  print "f(x) = A*exp(-x*x/B/B)"  printf "rx=%f; ry=%f; A = 1; B = 0.2; C=0.5*rx; D=-0.4*ry\n", 0.02*(hx-lx), 0.035*(hy-ly)  print "g(u,v) = (2*cos(u)*v*rx+C)*(2*cos(u)*v*rx+C)+(3.5*sin(u)*v*ry+D)*(3.5*sin(u)*v*ry+D)"  print "unset key; unset colorbox; set view map"  printf "set xrange [%f:%f]; set yrange [%f:%f]\n", lx, hx, ly, hy  print "set parametric; set urange [0:2*pi]; set vrange [0:1]"  print "set isosamples 20, 20; set samples 30"  print "set palette model HSV functions 1, 1-f(gray), 1+2*f(gray)"  printf "splot "    for(k=0;k<i-1;k++) {         printf "cos(u)*rx*v+%f,sin(u)*ry*v+%f, g(u,v) w pm3d, \\\n", x[k], y[k]    }    printf "cos(u)*rx*v+%f,sin(u)*ry*v+%f, g(u,v) w pm3d\n", x[i-1], y[i-1] }' \$1`

At the beginning, we fill up the x[] and y[] vectors, while, at the same time, determining the minimum and maximum of these two vectors. Then we use these values to determine xrange and yrange (I defined them a little bit bigger than the minimum and maximum of the vectors, so that the circles are confined in the plot.), and the value of rx and ry, as well. At the end, we simply call the plot function with the arguments that we take from the x[] and y[] vectors.

I believe it should be fairly easy to modify the script, should you want to make some changes to it. Finally, a word of caution: since we use some 900 samples for each circle we plot, this is going to be reflected in the file size, if you use a vector output. As a comparison with the method that I discussed yesterday, the size of that postscript file was something around 20 kB, while the size of the file we would produce with the present method is about 600 kB. We have this big difference simply because yesterday we re-defined one of the symbols, while today we plot each point, without any reference to a particular symbol. Therefore, if you want to include the plot in a publication, it is better to use yesterday's method. For bitmap files, png, jpeg and the like, there should be no significant difference in size.