## Sunday, 29 November 2009

### Broken histograms

Sometimes, a histogram is just a bit awkward, for the simple reason that one or two values are extremely high compared to the rest of the graph. In the case of a standard graph, we would use a broken axis to bring all points to the same order of magnitude. We can play the same trick with histograms, in fact, it is, in some sense, even simpler, than the broken axes. All we have to do is to plot a thick line at the proper position in the proper colour. This is the graph that we are going to make today

Our data file, brokenhist.dat, is as follows
```"Jan" 1 2
"Feb" 44 4
"Mar" 3 1
"Apr" 2 25
"May" 4 5
"June" 2 1
```
and here is our script:
```reset
blue = "#babaff"
set xrange [-0.5:5.5]
set yrange [0:11]
set isosample 2, 100
set table 'brokenhist_b.dat'
splot 1-exp(-y/2.0)
unset table

unset colorbox
set border 1
set xtics rotate by 45 nomirror offset 0, -1
unset ytics
f(x) = (x < 6 ? x : (x < 30 ? x-17 : x-35) )
g(x) = (x < 6 ? 1/0 : 6)
set boxwidth 0.85
set style fill solid 0.8 border -1
set style data histograms
set palette defined (0 "#ffff   ff", 1 "#babaff")
plot 'brokenhist_b.dat' w ima,\
'brokenhist.dat' u (f(\$2)) t 'Red bars',\
'' u (f(\$3)) lc rgb "#00bb00" t 'Green bars', \
'' u 0:(f(\$2)):2 w labels center offset 0,0.5 t '',\
'' u (\$0+0.25):(f(\$3)):3 w labels center offset 0,0.5 t '',\
'' u 0:(-1):xticlabel(1) w l t '', \
'' u (\$0+0.12):(g(\$3)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t '', \
'' u (\$0-0.12):(g(\$2)+0.12):(0.25):(0.25) w vectors lt -1 lc rgb blue lw 5 nohead t ''
```
OK, so let us look at the code! The first couple of lines are required only, if you want to have some posh background. Likewise, you can drop the 'unset colorbox' line, when you have a white background. We set only the bottom axis, which means that we have to unset the ytics and set to xtics to nomirror. Then we have two helper functions. The definitions of these depend on where you want to have the break point in the histogram. In this particular case, I took 6, but it is arbitrary.

In the next step, we set the properties of the histogram, like the width of the columns, the fill style, and the data style. We also define a palette, but this is needed for the background only. For white background, you can skip this step. You can also skip the first plot, because that is nothing but our fancy background.

The actual plotting begins after this. We plot the two sets of columns, and also plot the data file with labels. The labels are placed at the top of each column (this is why we could do away with the yaxis.) We also 'plot' the axis labels, and finally, plot the two break points. Note that the plotting of the break points is automatic, once we have the definitions of the two helper functions. If you want to have a steeper cut, you could

```'' u (\$0+0.12):(g(\$3)+0.12):(0.25):(0.5) w vectors lt -1 lc rgb blue lw 5 nohead t ''
```
e.g., which stretches the vectors in the vertical direction. Otherwise, we have finished the plot, there is nothing else to do.
I should point out here that, in order to have a seamless cut, we have to use a colour for the vectors that is identical to the background at that particular point. This implies that we could not have a gradient at y=6. The background colour is virtually constant at y=6 (c.f. the definition of 'brokenhist_b.dat'. While it would not be impossible to implement a cut over a gradient, I believe, it is probably not worth the trouble it involves.