Monday, 26 April 2010

Bending the arrows - "delaying" the plot

The other day, I would have needed a couple of curved arrows on my plot, so I started to work out a method to get what I wanted. This, however, turned out to be rather interesting, so I thought that I would share the details with you.

First, we should just define what I mean by a curved arrow. Perhaps, the easiest way to define it is to show a plot, similar to this


In gnuplot, when one wants an arrow, one can invoke the following command:
set arrow from 0,0 to 1,1
or something similar. This will produce a straight arrow from (0,0) to (1,1). But what if we wanted to have an arrow, which is not straight. Well, in this case, we set a very short arrow, and draw a curve separately. The key to this is to set the arrow in such a way that it is tangential to the curve at the end point. It is easy to see that the following script would just do that

reset
unset key
eps = 0.001

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1
cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)
f(x) = 0.5+(x-1)*(x-1.2)*(x-1.4)

x1 = 0.5
x3 = 1.95
new_x = x1 + (1.0-eps)*(x3-x1)
set arrow from new_x, f(new_x) to x3,f(x3) as 1
plot [0:3] sin(x) with point ps 1 pt 6, f(x)*cut(x,x1,new_x) with line lt -1

First, we define an arrow style that we will use later. The arrow will be 0.03 screen sizes big, and the two angles determining the shape of the head are 15, and 45 degrees, respectively. Finally, we stipulate that the arrow be black, i.e. linetype -1. Then we define a window function, cut, which depends on the two end points, x1, and x3 (the reason for 3 will become clear soon), and the curve, f(x). In our plot, beyond what we actually want to plot, we will also plot f(x), but only between x1, and new_x, where new_x is a bit off with respect to the second end point. The degree of "bitness" is given by eps, which was defined at the beginning. However, before we actually plot anything, we have got to set the arrow, between new_x, f(new_x), and x3, f(x3). This construction ensures that the arrow is tangential to the curve.
At this point, we are ready to plot, which we actually execute in the next, and last line.

What we have created is great, but there are problems: first, we have to define our function, f(x), beforehand, we have to set the arrows by hand, and we also have to add the appropriate lines to our plot command. Quite tedious. There has got to be a better way!

For the say of example, let us suppose that we want a curved arrow that, say, connects (0,0) and (1,1) via a parabola that passes through the point (0.5, 0.25). If we are really pressed for it, we could do the following: First, we have to figure out the parameters of our parabola. In this case, it is quite easy, for it is nothing but x*x. Then we would draw a parabola between (0,0), and (0.99, 0.9801), and then draw an arrow from (0.99, 0.9801) to (1,1).

First, let us see, how we figure out the parameters of our parabola! We have two end points, and a "control" point, i.e., we have to solve the following set of equations
y1 = a*x1*x1 + b*x1 + c
y2 = a*x2*x2 + b*x2 + c
y3 = a*x3*x3 + b*x3 + c
for the unknown a, b, and c. You can convince yourself that the following will do

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)
a = A(x1,y1,x2,y2,x3,y3)
b = B(x1,y1,x2,y2,x3,y3)
c = C(x1,y1,x2,y2,x3,y3)

We have done most of the hard work, the only thing that remains is how we "automate" this whole machinery, i.e., what do we do, if we have several arrows that we want to set. Again, as so many times in the past, we will utilise this new notion of function definition: the fact that a function is not only a x -> f(x) mapping, but this mapping, and a set of possibly unrelated instructions. What we will do is to define a "function" that sets our arrows, and, as the supplementary instruction, augments the plot command accordingly. First, let us take the following function definition


arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
        a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
        PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
        ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))
and try to understand what it does! For a start, it takes 6 arguments, which are nothing but the coordinates of the end points, and the control point. Then, it defines new_x, which we have already seen in the first example. In the next step, based on the 6 input arguments, calculates the three parameters of our parabola, and in the next line, adds the plot of this parabola to a string called PLOT. When adding to PLOT, we simply use the sprintf function. In the last line, we concatenate a string called ARROW, and another one, produced by another sprintf. It is easy to see that this sprintf returns the definition of an arrow between new_x, f(new_x), and x3, f(x3). We should also note that this line is the last line, which consequently means that whatever happens here is returned.

At this point we are really done, we only have to "populate" our plot. The full script takes on the form
reset
unset key
eps = 0.01

set style arrow 1 head filled size screen 0.03, 15, 45 lt -1

cut(x,x1,x3) = ((x >= x1 && x <= x1 + (1.0-eps)*(x3-x1)) ? 1.0 : 1/0)

denom(x1, x2, x3) = x1*x1*(x2-x3) + x1*(x3*x3-x2*x2) + x2*x3*(x2-x3)
A(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*y1 + (x3-x1)*y2 + (x1-x2)*y3 ) / denom(x1,x2,x3)
B(x1,y1,x2,y2,x3,y3) = ( (x3*x3-x2*x2)*y1 + (x1*x1-x3*x3)*y2 + (x2*x2-x1*x1)*y3 ) / denom(x1,x2,x3)
C(x1,y1,x2,y2,x3,y3) = ( (x2-x3)*x2*x3*y1 + (x3-x1)*x1*x3*y2 + (x1-x2)*x1*x2*y3 ) / denom(x1,x2,x3)

ARROW = ""
PLOT = "p [0:3] sin(x) w p ps 1 pt 6"
arrow(x1,y1,x2,y2,x3,y3) = (new_x = x1 + (1.0-eps)*(x3-x1), \
        a = A(x1,y1,x2,y2,x3,y3), b = B(x1,y1,x2,y2,x3,y3), c = C(x1,y1,x2,y2,x3,y3), \
        PLOT = PLOT.sprintf(", cut(x,%f,%f)*(%f*x*x+%f*x+%f) with lines lt -1", x1, x3, a, b, c), \
        ARROW.sprintf("set arrow from %f, %f to %f,%f as 1; ", new_x, a*new_x*new_x + b*new_x + c, x3, y3))

ARROW = arrow(0,0,1,1.5,pi/2,1.03)
ARROW = arrow(0,0,1,0.3,pi/2,0.97)
eval(ARROW)
eval(PLOT)
which would result in the graph shown here:


Now it is clear what was PLOT: it is nothing, but the actual plot that we want to have. This is the string to which we concatenate our parabolae, one by one, every time we define a new arrow. After we defined all our arrows, we have two strings, ARROW, and PLOT. As such, they are no good, they will become instructions only when we evaluate them. That is what we do in the last two lines.

I would like to point out that my main reason for posting this was not that it can be used for creating curved arrows, but that this method is quite general. First, we can add to the plot, if that is needed, without having to keep track of all the tiny details. Second, the set command can be "fooled" by using the sprintf function. With the help of the string augmentation and the eval command, we can actually use parameters in our set instruction very efficiently.

Well, this is for today. I am waiting for suggestions as to what we should discuss next time. Cheers,
Zoltán

8 comments:

  1. As for suggestions, I actually have two:

    First, the mosaic plot (also called "Marimekko" plot).
    Let's suppose that we have our data organized like this:

    "" France Germany Japan Nauru
    Defense 9163 4857 2648 9437
    Agriculture 3547 5378 1831 1948
    Education 7722 7445 731 9822
    Industry 4837 147 3449 6111
    "Silly walks" 3441 7297 308 7386

    and we want to make a plot like this:
    http://j.imagehost.org/0655/spending.png

    This form of plotting can be thought as a stacked histogram taken one step further; we might even think of it as a histogram stacked in both dimensions.

    One can read from the charts how much do each major category (country in the example) contribute to the total, as well as how much the secondary categories (ministries here) contribute to the sub-totals. It also has the nice property that the area of every rectangle is proportional to the plotted value.

    The problem with this kind of plot is that one can't use gnuplot's otherwise excellent stacked histogram mechanism to generate it, because the column widths themselves bear information here, and they have to be calculated from sums of datafile columns.

    I'm sure that there is a way to do it in pure gnuplot, given the recent additions to its facilities, but so far I couldn't figure it out. The linked image was done by way of a perl script that took the data file in the matrix form above and calculated from it the coordinates of the individual rectangles.

    You could look into it if you find this problem interesting.

    The other suggestion, or more like question is a little more involved and a little more physics-specific.
    Given a data file containing magnetic field measurements (data in six columns: x, y, z, B_x, B_y, B_z; the points can be presumed to form a regularly spaced 3D grid, or 2D grids in one or more planes), how can we generate a plot that shows the field lines? Creating a plot that shows a vector corresponding to the measured B field in every x,y,z point is easy. But such a plot is quite unusable in practice. Plotting unit vectors to show the direction of the field only, and using a color palette to show field strength is slightly better, but it's still not as intuitive as one would hope.
    The best would be a plot that shows the field lines as they twist, curl and connect to the magnetic poles, but I don't even know where to begin with that.

    ReplyDelete
  2. Greetings Péter,

    Thanks for the comments. The first one was interesting, so I have worked it out. As for the second one, I would have to think about it. As far as I understand, what you want to have is a continuous line that follows the field, and then the difficulty is that as we move along the line, we might end up not at the next grid point, (next in the sense that it is neighbouring along grid lines), but somewhere else. This is an exciting problem, and if it is of any consolation, I don't know either where to start:)
    Cheers,
    Zoltán

    ReplyDelete
  3. Hi, Friends.

    I have two queries-

    Please give me information related- how to skip odd number of point in gnuplot? And how to increase the space between (say) dashed line in gnuplot? Give the command for these.

    Thanks.

    ReplyDelete
  4. For skipping points, you can use the every keyword. So, e.g.,

    plot 'foo' every 2

    will select every other point.

    For linetypes, type

    test

    on the command line, and see what happens.
    Cheers,
    Zoltán

    ReplyDelete
  5. Hi,

    It is a really nice and useful code,thanks!
    I'd like to ask a question: I want to make a pdf from a code really similar to this. I'm using latex to do so, but it gives an error message. What should I include?

    Thanks in advance,
    Ági

    ReplyDelete
  6. Hi Ági,

    Well, I think, it is a bit hard to tell anything specific without knowing what the error message is:)

    Cheers,
    Zoltán

    ReplyDelete
  7. Hi, I need small help. I have been trying to create the graph from the data which will have growing columns but first two column remain static and rest of the columns can be growing. I am not sure how to create the graph via gnuplot. I would like to have the line graph with legends as "ServerX" for each line. The second should be y-axis. Any help should be appreciated.

    Server1 35000 105 280 420
    Server2 49000 0
    Server3 49000 1372 9016
    Server4 49000 0
    Server5 49000 0
    Server6 35000 0
    Server7 35000 1120
    Server8 35000 0
    Server9 35000 10640 11655 3955 7175 7910 7945
    Server10 35000 945
    Server11 35000 1050 1085 420 455 490 525 560
    Server12 35000 21385
    Server13 35000 1435 3780
    Server14 35000 0

    ReplyDelete