home ::
syllabus ::
groups ::
© 2024, Tim Menzies
Previosly it was argued that "explanation is everything". In support of this:
- Here I define an explanation algorithm
- THen I show how many other tasks can be accomplished with the same architecture as explanation generation.
Given data divided recursively into a tree with leaf clusters
- E.g. all those labels are actually available;
- E.g. those labels were geneated via RRL (so the clusters are numbered left to right, best to worst, 1,2,3,4...);
- E.g. for a small random sample per leaf, you actually look up the labels.
Each cluster
- E.g. Differences between the cluster centroids (ignoring any deltas that are not statistically significant);
- E.g. the edge distance in the cluster tree (so siblings have a distance of 2).
Then:
cite | Task | Implementation | |
---|---|---|---|
Localization | take current example, walk it down the cluster tree to find its relevant leaf | ||
1 | Anomaly detection | Anomalous if you are far away from everything else in your relevant leaf | |
Certification | Warn if any new example is anomalous. For efficient certification, use compression (see below). |
||
Streaming | (a) Collect the anomalies at their nearest cluster; (b) If more than |
||
Classification | Localization + report mode class label in relevant leaf | ||
Regression | Localization + report median class label in relevant leaf | ||
Pollution marking | (a) Measure the variance or the inaccuracies of predictions at each leaf. (b) Starting at the leaves and working up the tree, delete any node that is too variable or inaccurate. |
||
Compression | Report whole contrast tree as just the two distant points (at each level) and the median point (where you cut the data, left right) | ||
2 3 | Optimization | see planning. Note that this kind of planning does not need |
|
2 3 4 | Hyperparameter optimization (see optimziation) | ||
Configuration | see optimization, but seek good configs with very few samples | ||
Envy | (a) Localization to find (b) Find other leaves |
||
Fear | (a) Localization to find (b) Find other leaves |
||
Contrast | Find the delta between two leaf clusters |
||
Explain | Report contrasts |
||
Planning | Explanation of contrasts to thing you envy | ||
Maximal | Planning | Planning to the thing you envy the most | |
Minimal | Planning | Planning that maximizes improvement while minimizing the change from |
|
Monitoring | Explanation of contrasts to thing you fear | ||
Maximal | Monitoring | Monitoring to the thing you fear the most | |
Minimal | Monitoring | Monitoring that maximizes loss while minimizing the change from |
|
Data Synthesis | Generate examples by interpolating between items in each cluster | ||
Privatization | Use data synthesis, favoring regions close to other examples (so you can't distinguish new from old) | ||
1 | Compressed | Privatization | Only share compressed data (so the data not in the compression is 100% private) |
1 | Sharing | (a) Pass around the compressed private data to each stakeholder. (b) Each stakeholder only adds in their local data that is anomalous (i.e. do not add in things that are already there) |
|
1 | Transfer Learning | After sharing, data from N sources will be all mixed up in the contrast tree (in different leaves). Your local data can then transfer knowledge from other sites by localizing into that space. |
Can the above be adapted to SMO? Perhaps.
- If we used the
best
andrest
models to select groups of$C_i$ (good examples) and$C_j$ (bad examples); - How much of the previous section could be repeated?
What about other tasks? Is there a list somewhere of all the tasks people want to achieve?
Well, yes. See Buse and Zimmermann 5.
Footnotes
-
F. Peters, T. Menzies and L. Layman, "LACE2: Better Privacy-Preserving Data Sharing for Cross Project Defect Prediction," 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, Italy, 2015, pp. 801-811, doi: 10.1109/ICSE.2015.92. ↩ ↩2 ↩3 ↩4
-
Nair, Vivek, Tim Menzies, and Jianfeng Chen. "An (accidental) exploration of alternatives to evolutionary algorithms for sbse." Search Based Software Engineering: 8th International Symposium, SSBSE 2016, Raleigh, NC, USA, October 8-10, 2016, Proceedings 8. Springer International Publishing, 2016. ↩ ↩2
-
J. Chen, V. Nair, R. Krishna and T. Menzies, "“Sampling” as a Baseline Optimizer for Search-Based Software Engineering," in IEEE Transactions on Software Engineering, vol. 45, no. 6, pp. 597-614, 1 June 2019, doi: 10.1109/TSE.2018.2790925. ↩ ↩2
-
Andre Lustosa and Tim Menzies. 2023. Learning from Very Little Data: On the Value of Landscape Analysis for Predicting Software Project Health. ACM Trans. Softw. Eng. Methodol. Just Accepted (November 2023). https://doi.org/10.1145/3630252 ↩
-
Raymond P. L. Buse and Thomas Zimmermann. 2012. Information needs for software development analytics. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, 987–996. ↩