home ::
syllabus ::
groups ::
© 2024, Tim Menzies
The motto on SMO might be, learn a little, find regions of doubt, try there.
The motto of RRP is "look before your leap".
RRP is a landscape analysis technique:
- for y=f(x) ,
- if x is cheap to explore but y is expensive
- look for patterns in the independent X variables
- before exploring the Y s
- e.g in the following picture, if y is some vertical goal function, then the other two dimensions are the x choices
For multi-goal-reasoning, it is useful to consider things as two spaces, the X s and the Y s (so f is the magic that bridges X s to Y s)
In Y space, many algorithms work in generations, find the best fo far (pruning the rest) then designing the next generation solution from the Pareto frontier seen so far,
- Definition: The Pareto frontier is a set of solutions that represents the best trade-off between all the objective functions
To know the landscape is to know how to optimize, how to avoid getting stuck on being over-adapted (hence overspecialized) on some local peak, when as Stewart Brand so aptly puts it...
- "Whole mountain ranges of opportunity could be glimpsed in the distance, but getting to them involved venturing 'downhill' into regions of lower fitness".
Studying such landscapes made Brand distrust claims for "optimality" since what you call "optimum" may actually be just a stepping zone to a better place.
Brand's favorite landscape comes from a 1932 genetics paper1 that discusses how different breeding strategies respond well (or badly) to environmental pressures. In the following, the horizontal and vertical axis represent combinations of genetic choices (e.g. "length of hair" "body weight", "how fast can u run?") and the contours might "probability of winning a fight".
- Each frame represents different evolutionary scenarios and their impact on the population in question.
- Frame C differs from the other five frames in that it represented a changing environment, which would create a dynamic landscape, so the population is shown tracking a moving landscape by the arrow.
Once you get this diagram, you'll never un-see it.
- The world as you know it, will no longer be merely three-dimensional.
- Instead as you walk down the street, you will realize you are walking a landscape to maximize shade, minimize travel time, minimize distance to coffee shop and dry-cleaners, etc. etc, etc.
FYI, while it is not commonly stated, in my view, Landscape analytics generalizes a range of algorithms from different fields:
- When used on the 𝑋 shape, landscape analytics might be called “clustering”.
- And once we find what parts of X parts to what parts of Y then we can do prediction (regression and classification to predict numbers and symbols)
- And the delta between different parts of the landscape (with dufferent Y scores) might be called "planning" or "optimizaton"
- When used on the 𝑋, 𝑌 shape, if we only sample a few times at each part of the landscape, then limited sampling might be called “semi-supervised learning”;
- Similarly, in a joint analysis of the 𝑋, 𝑌 shape, if we bias our “leaps” towards regions that (in
the past) had good 𝑌 scores,
- landscape analytics might be called “reinforcement learning”.
- And if we use landscape analytics to jointly explore the Y goals such as performance shapes, then this could be called “hyperparameter optimization”
In a recent survey of landscape analytics methods, Malan 2 reports no less than 33 “families” of landscape analytics methods.
- Most of these methods tend to enumerate the whole landscape, before exploring it.
- I.e. much eval of the Y values
- So I do as much landscape analysis in X spacee, then only ask for Y isa few spots
Given some distance measure:
Then lets peek at the landscape:
- Using just the independent
$x$ values, - Find two distance points (sort them by distance to heaven)
- Draw a line between them
- Everyone else point to their closest point on that line
- Cut line in the middle
- Recurse on each half
- Stop at
$\sqrt{N}$
Apply to auto93.csv. Note that by just recursively exploring
|..
|.. |..
|.. |.. |..
|.. |.. |.. |.. {3.96, 90.87, 0, 80.71, 3, 2045.71, 16.6, 35.42}
|.. |.. |.. |.. {3.88, 89.04, 0, 75.2, 3, 2087.4, 17.03, 28.8}
|.. |.. |..
|.. |.. |.. |.. {4.0, 125.92, 0, 81.68, 1, 2504.2, 16.52, 30.8}
|.. |.. |.. |.. {4.48, 129.68, 0, 78.28, 3, 2562.76, 14.96, 26.4}
|.. |..
|.. |.. |..
|.. |.. |.. |.. {4.2, 109.28, 0, 76.16, 2, 2462.92, 15.68, 26.8}
|.. |.. |.. |.. {4.0, 104.4, 0, 72.04, 2, 2309.24, 16.74, 26.0}
|.. |.. |..
|.. |.. |.. |.. {4.24, 110.24, 0, 77.76, 2, 2423.72, 17.77, 32.8}
|.. |.. |.. |.. {4.0, 122.96, 0, 78.52, 1, 2426.52, 16.04, 29.2}
|..
|.. |..
|.. |.. |..
|.. |.. |.. |.. {4.17, 128.6, 0, 73.38, 1, 2393.5, 16.91, 24.58}
|.. |.. |.. |.. {6.0, 232.0, 0, 74.96, 1, 3359.96, 17.18, 20.0}
|.. |.. |..
|.. |.. |.. |.. {6.0, 215.76, 0, 79.2, 1, 3194.44, 16.33, 22.0}
|.. |.. |.. |.. {7.6, 300.08, 0, 78.16, 1, 3727.64, 15.53, 20.4}
|.. |..
|.. |.. |..
|.. |.. |.. |.. {6.64, 255.2, 0, 71.12, 1, 3317.08, 14.82, 18.4}
|.. |.. |.. |.. {8.0, 311.96, 0, 74.44, 1, 4011.28, 13.55, 15.2}
|.. |.. |..
|.. |.. |.. |.. {8.0, 364.96, 0, 73.8, 1, 4353.44, 12.53, 12.8}
|.. |.. |.. |.. {8.0, 397.16, 0, 70.84, 1, 4286.92, 11.0, 12.4}
{5.45, 193.43, 0, 76.01, 1, 2970.42, 15.57, 23.84}
{Clndrs, Volume, HpX, Model, origin, Lbs-, Acc+, Mpg+}
(In the above, we only score the Y values after clustering.)
- When recursing, reuse on far point from parent
- So the above tree only needed 16 evals
- (No reuse would have needed 30)
- Find distant points without looking at everything
- the Fastmap heuristic for finding far points 3 (
$O(2N)$ not$O(N^2)$ ) - pick any point at random
- find a point far from any
- find a point far from that first point
- the Fastmap heuristic for finding far points 3 (
- When exploring for far points, don't use all the data (just use, say, 256 points picked at random)
- Aside: for prudence, do not take the most distant points (that can be confused by outliers)
- Instead, only go 95%
- Aside: for prudence, do not take the most distant points (that can be confused by outliers)
- When recursing, reuse on far point from parent
- So the above tree only needed 16 evals
- (If no reuse would we have needed 30)
- When recursing, ignore the branch furthest from heaven:
- So in 5 evals, we can find the top-most cluster seen above:
{3.96, 90.87, 0, 80.71, 3, 2045.71, 16.6, 35.42} - And now we are competitive with SMO.
- So in 5 evals, we can find the top-most cluster seen above:
- The above distance calculation assume all numerics. But what about mixtures of nums and syms?
- Aha's distance measure 4 over the x-attributes (note: slow. Ungood for large dimensional spaces. We'll fix that below.)
- euclidean : p=2
- But what is Δ :
- Δ Symbols:
- return 0 if x == y else 1
- Δ Numbers:
- x - y
- to make numbers fair with symbols, normalize x,y 0,1 using (x-min)/(max-min)
- Δ Symbols:
- But what about missing values:
- assume worst case
- if both unknown, assume Δ = 1
- if one symbol missing, assume Δ = 1
- if one number missing:
- let x,y be unknown, known
- y = normalize(y)
- x = 0 if y > 0.5 else 1
- Δ = (x-y)
function SYM:dist(x,y)
return (x=="?" and y=="?" and 1) or (x==y and 0 or 1) end
function NUM:norm(x)
return x=="?" and x or (x - self.lo) / (self.hi - self.lo + 1E-30) end
function NUM:dist(x,y)
if x=="?" and y=="?" then return 1 end
x,y = self:norm(x), self:norm(y)
if x=="?" then x=y<.5 and 1 or 0 end
if y=="?" then y=x<.5 and 1 or 0 end
return math.abs(x-y) end
ROW
adds all these dist
ances across all the X columns:
function ROW:dist(other,data, d,n,p)
d, n, p = 0, 0, the.p
for _, col in pairs(data.cols.x) do
n = n + 1
d = d + col:dist(self.cells[col.at], other.cells[col.at]) ^ p end
return (d/n)^(1/p) end
Finding two distant, points, sorting by their distance to heaven, reusing far points from parents:XXXXdisnace.
function ROW:neighbors(data, rows)
return l.keysort(rows or data.rows,
function(row) return self:dist(row,data) end) end
function DATA:farapart(rows, sortp,a, b,far,evals)
far = (#rows * the.Far) // 1
evals = a and 1 or 2
a = a or l.any(rows):neighbors(self, rows)[far]
b = a:neighbors(self, rows)[far]
if sortp and b:d2h(self) < a:d2h(self) then a,b=b,a end
return a, b, a:dist(b,self),evals end
Given two distant points, project everyone else onto a line drawn between them
function DATA:half(rows,sortp,before,evals)
local some,a,b,d,C,project,as,bs
some = l.many(rows, math.min(the.Half,#rows))
a,b,C,evals = self:farapart(some, sortp, before)
function d(row1,row2) return row1:dist(row2,self) end
function project(r) return (d(r,a)^2 + C^2 -d(r,b)^2)/(2*C) end
as,bs= {},{}
for n,row in pairs(l.keysort(rows,project)) do
table.insert(n <=(#rows)//2 and as or bs, row) end
return as, bs, a, b, C, d(a, bs[1]), evals end
So recursive random projections just means calling half
, the calling it again on each half.
function DATA:tree(sortp, _tree, evals,evals1)
evals = 0
function _tree(data,above, lefts,rights,node)
node = NODE.new(data)
if #data.rows > 2*(#self.rows)^.5
then lefts, rights, node.left, node.right, node.C, node.cut, evals1 =
self:half(data.rows, sortp, above)
evals = evals + evals1
node.lefts = _tree(self:clone(lefts), node.left)
node.rights = _tree(self:clone(rights), node.right) end
return node end
return _tree(self),evals end
And instead of clustering over all the days, we could just run down the best half.
function DATA:branch( stop, rest, _branch,evals)
evals, rest = 1, {}
stop = stop or (2*(#self.rows)^.5)
function _branch(data, above, left, lefts, rights)
if #data.rows > stop
then lefts, rights, left = self:half(data.rows, true, above)
evals=evals+1
for _, row1 in pairs(rights) do rest[1+#rest]= row1 end
return _branch(data:clone(lefts), left)
else return self:clone(data.rows), self:clone(rest),evals end end
return _branch(self) end
Footnotes
-
The Roles of Mutation, Inbreeding, crossbreeding and Selection in Evolution. Wright, S. Proceedings of the XI International Congress of Genetics, 8:209-222, 1932. ↩
-
Katherine M. Malan. 2021. A Survey of Advances in Landscape Analysis for Optimisation. Algorithms 14, 2 (2021), 40. https://doi.org/10.3390/a14020040 ↩
-
Christos Faloutsos and King-Ip Lin. 1995. FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. SIGMOD Rec. 24, 2 (May 1995), 163–174. https://doi.org/10.1145/568271.223812 ↩
-
Section 2.4 or Aha, D.W., Kibler, D. & Albert, M.K. Instance-based learning algorithms. Mach Learn 6, 37–66 (1991). https://doi.org/10.1007/BF00153759 ↩