Fixing KD Tessellation Aspect Ratios

THE PROBLEM

The current KD tessellation implementation recursively splits the data (or rather a sample) in alternating dimensions using the median as the split point. This can lead to tessellations made up of long needle like rectangles, which are not suitable for the calculations our system is designed for.

SOLUTION ONE

Allow consectutive splits in the same dimension, as opposed to strictly alternating the split dimension.

It would be possible when splitting to either :

  • Choose the dimension that offers a split with the best aspect ratio.
  • Keep alternating unless this would cause a ratio bound to be exceeded and the split in the other dimension has a better aspect ratio.

This will solve the problems for some sets of data, but does not guarantee the eliminatation of bad aspect ratios. This is because a 50/50 split of the data might cause a bad aspect ratio in both dimensions. A simple of example of this is a tile with all the data concentrated in one of the corners.

SOLUTION 2

Allow the split point to vary from the median in the data.

Ideally, we would like to be able to provide limits as follows:

  • $lp <= pop(x) <= hp$ (probably with $lp = median - epsilon$, and $hp = median + epsilon$) where $lp$ is the low bound on population percentage, $hp$ the high bound, and $pop(x)$ the percent of the tile population lower than the split point $x$.

  • $lr <= ratio(x) <= hr$, (probably with $lr = 1/hr$) where $lr$ is the low bound on ratio, $hr$ the high bound, and $ratio(x)$ the aspect ratio at the split point $x$.

Of course, meeting both constraints at once is not possible in general. We can always meet one constraint, sometimes both, but never both all the time. Note that the QuadGrid and the KdTessellation rigidly bound one constraint while ignoring the other:

  • If we allow no variation in the population constraint, and do not bound aspect ratio we have a KD tree.
  • If we allow no variation in aspect ratio and do not bound population then we have a binary tree, which forms the basis for the QuadGrid.

What we need is a tessellation technique that trades off population split with aspect ratio. Suppose we have a penalty formula which increases in value when the ratio and population percentage of a split deviate from their respective ideal values. A simple algorithm for performing the tessellation would be as follows:

if the number of (sample size adjusted) points in tile is less than (max
points per tile)
  do not split this tile further
else
  compute the minimum value of the penalty formula for both x dim and y
dim.
  split on the dimension with the lowest penalty.

Hybrid versions are also possible. The following algorithm tries to keep the KD algorithm intact unless predetermined ratio bounds cannot be adhered to.

if we can do so without breaking ratio bound
   follow existing implementation (ie alternate 50/50 splits),
else if we can do so without breaking ratio bound
   split 50/50 on the other dimension,
else
   use penalty formula to determine the best split point as above.

Similarly, it would be possible to have a tessellation which maintained the geometric properties of a binary (quad) tree if the population split percentage was within a predetermined range, but used a different aspect ratio when this was not the case.

POSSIBLE PENALTY FUNCTIONS

  • $a(e ^{abs(ln (ratio(x)))} )^2 + b (pop(x) - 0.5)^2$ where $a$ and $b$ weight the relative importance of each feature. This penalizes whenever the ratio deviates from one, and when the percentile of the population split deviates from 0.5 (50%). I think it would be nice to subtract 1 from the first term before squaring so that when the ratio is zero the penalty is also zero.

    We would probably need to set the value of b fairly high to that of a, since reasonable ratios eg 2:1 or 3:1, would dominate the population term. For example, with a ratio of 2:1, the ratio penalty is 4a, while the penalty of a 90/10 split would be only 0.16b. It may be a problem that the ratio penalty is unbounded, while the population penalty can be at most 0.25b.

  • $a ( PR - pop(x)) ^ 2 + b (pop(x) - 0.5)^2$, where PR is the percentile of the data point that lies at the point with ideal (or as close to ideal as possible) aspect ratio. I suspect this may take a little longer to compute (we have to find the value of PR), but it does mean that both terms in the penalty function are bounded. However, it also means that the comprimise between population and ratio is brokered entirely in terms of the population percentage, which could lead to a favoritism towards population percentage and bad aspect ratios.

Alistair Dundas, 26th September 2002.


Send suggestions, questions, and feedback to WEINBERG at ASTRO dot UMASS dot EDU.
Documentation generated at Fri Mar 26 00:35:11 2010 by doxygen