So, these are my reasons for doing data processing in gnuplot. Of course, in most cases, I do that using an external script, and piping it from the gnuplot prompt. But that would deprive windows users from the pleasures of doing something fancier:)
By the way, if you look at some of the demo scripts of gnuplot 4.3, you will realise that the developers of gnuplot have already parted with the idea of "one task - one tool".
The kind of data processing we will do today requires a "recursive" access to data points, in particular, I will demonstrate how we can calculate the derivative and the integral of a series of data points. Calculating the derivative and integral of an analytic function is trivial, even if you do not have the analytical answer: for the derivative, you can simply plot f(x+dx)-f(x), where dx is just some pre-defined number, and for the integral you can apply a recursive scheme, detailed on the gnuplot demo web site. But they work on known functions, and we would like to calculate something based on "measured" data. For this, I will take the weekly oil price from this year, which you can download from the link. Let us suppose that we plot the oil price, but we also want to highlight the time intervals where the price is dropping. This requires us to calculate the derivative at each point (In this particular case, we could get away just by comparing the value and the next, but we have got to get access to the values anyway, and we would get it by fitting a linear function, whose linear coefficient gives the derivative. So, for better of worse, I will just stick with the derivative.)
For a starter, let us see our two scripts! As I have already pointed out, we will have to walk through the values, one by one, so we will need a 'for' loop, which means two scripts: a "master", and one that we call repeatedly. I have discussed this method at great length in my recent posts.
reset unset key A=0; sa=1; sb=0; integ=0.0 f(x) = a*x+b g(x) = (x>0?GPVAL_Y_MIN:GPVAL_Y_MAX) set xlabel 'Time [weeks]' set ylabel 'Crude oil price [$]' p 'oil.dat' u 0:8 xmax = GPVAL_DATA_X_MAX set print 'oild.dat' append l 'derint_r.gnu' set print plot 'oild.dat' u 1:(g($2)) w filledcurve lc rgb "#eeeeee", 'oil.dat' u 0:8 w l
and our 'for' loop, that we call 'derint_r.gnu'
a=1.0; b=40.0 fit [A:A+1] f(x) 'oil.dat' u 0:8 via a, b integ = integ+(a*A+b+a*(A+1)+b)/2.0 if(sa*a<0 a="" b="" integ="" print="" sa="a;" sb="b<br">if(sa*a>0) print A, a, b, integ; sa=a; sb=b A=A+1 if(A<xmax) reread 0>
The first part of the main script should be clear: we set up the style of our figure, and do a dummy plot to learn something about our data file, 'oil.dat'. f(x) is our linear function, while g(x) is just a helper, whose role will become evident a bit later. Having set up the plot, we call 'derint_r.gnu' 'xmax' times. So, what is in 'derint_r.gnu'? We simply fit our linear function over a range that contains only two points, and we step through the whole file by moving those two points by one in each call to the loop. Then we calculate the integral by adding the area of the next trapezium, and then depending on whether the sign of the derivative, 'a', is different to the previous one, we print out the results once (if the signs are identical), or twice (if they are different). This distinction is necessary because we want to plot rectangles where the derivative is negative. After we are done with the data processing, we close our new file by setting the print, and plot the results, to get the following figure
Note that in the plotting, we called the function g(x), which takes on the value of the minimum of the yrange, if the argument is negative (the price is dropping), while returns the maximum of the yrange otherwise. Of course, there are many things that we could do to improve the image, but I wanted only to demonstrate how we can do the data processing part. You can dress up the image as you like, and for that you can also draw on many a post of mine.