Tuesday 13 October 2009

The shiny histograms, again

Do you remember those shiny histograms that we discussed some long, long time ago, at the beginning of the summer? And do you also remember how much of a hassle it was to create them, since we relied on an external gawk script, and we had to build our rectangles, one by one? And have you ever wondered what all that fuss was about and whether there was an easier solution? A one-liner, perhaps? If the answer to these questions lies in the affirmative, go no further! We will discuss a method of making those histograms, without an external script, only with legal gnuplot commands, and in 5 lines. I understand that 5 lines is just 4 lines longer, than one would expect from a one-liner, but on the other hand, three out of those 5 lines are equivalent, so I don't feel so bad about this any more:)
OK, so here is our figure

here is our data file, which we will call 'hist.dat'
1 2 3
2 2 2
4 10 3
5 1 4
5 6 2
and here are our scripts, 'hist.gnu',
reset
unset key; set xtics nomirror; set ytics nomirror; set border front;
div=1.1; bw = 0.9; h=1.0; BW=0.9; wd=10; LIMIT=255-wd; white = 0
red = "#080000"; green = "#000800"; blue = "#000008"
set auto x
set yrange [0:11]
set style data histogram
set style histogram cluster gap 1
set style fill solid
set boxwidth bw
set multiplot
plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blue
unset border; set xtics format " "; set ytics format " "; set ylabel " "
call 'hist_r.gnu'
unset multiplot
and 'hist_r.gnu'
bw=BW*cos(white/LIMIT*pi/2.0); set boxwidth bw; white=white+wd
red = sprintf("#%02X%02X%02X", 128+white/2, white, white);
green = sprintf("#%02X%02X%02X", white, 128+white/2, white);
blue = sprintf("#%02X%02X%02X", white, white, 128+white/2);
rep
if(white<LIMIT) reread

Then let us see what is happening here! At the beginning, we define various variables, most notably, white, red, green, and blue. The rest up to the first plot command is nothing but setting up the figure: we define the range, tell gnuplot to treat our data as histogram, set the width of the bars, and finally, set multiplot.
There is nothing exciting in the first plot, except, that we specify the colour of the bars as

plot 'hist.dat' u 1 lc rgb red, '' u 2 lc rgb green, '' u 3 lc rgb blue
The strings red, green, and blue were defined at the beginning of our first script, thus, we learn here that gnuplot will accept any defined (and valid) string as the specifier of the colour. After our first plot, we unset the border, re-set the format of the xtic and ytic to empty, and do likewise with the ylabel. Should there be an xlabel, we should have to do the same there. Having done this, we call our second script, which we will dissect now. This is really nothing but a 'for' loop, that we have discussed a couple of times before. In fact, quite a few times. The first two commands re-set the widths of the bars in the next plot. Note that I set the width in such a way that it would draw the outline of a circle as we step through the values of white. This is what we increment next, mind you!

The next three lines are basically identical: we re-define the colours, using a sprintf command in each step. If you recall how the RGB colours are defined, we have to create a string that looks like
#00FF00
say. This is what our sprintf command will do, returning a string of this form that depends on the value of 'white'. There are some small nuances in the corresponding colour channel of red, green, and blue, respectively, namely, that the base colour for red was
#080000
and we want to linearly interpolate between this colour, and white,
#FFFFFF
so, we have to apply the relevant linear function, but there is nothing beyond this. Obviously, if you are unhappy with the colour scheme that I have (I know full well that these are not the best colours...), this is the place where you would have to tamper with the script. When we are done with re-defining the widths, and the colours, we simply replot our histogram, and do that, as long as the value of white is smaller, than the limit that we set at the beginning. In this particular case, 245. In most cases, we do not need this many plots, by the way! For a raster plot, 10-12 steps, for a vector format, something like 15-16 steps should be more than enough.

At the end, we shouldn't forget about unsetting the multiplot. The only difficulty that I see with this figure is that it is not so straightforward to use a key. However, it is not terribly hard to come up with a solution for this problem: all we have to do is to put three vertical labels on the top of the first, second, and third column, indicating what they represent.

3 comments:

  1. Hi,

    Just to let you know how amazing are your plots! Keep the nice work!
    Best,
    jef

    ReplyDelete
  2. Thanks for useful information for gnuplot. I have a question to change the color of histogram. Do you know how to change the color of histogram?

    ReplyDelete
  3. Good Article about The shiny histograms, again

    Post by
    Term Papers

    ReplyDelete