Semantic Meta-Mining
Part 3 of the Tutorial on Semantic Data Mining Melanie Hilario, Alexandros Kalousis University of Geneva
Semantic Data Mining Tutorial (ECML/PKDD’11) 1 Athens, 9 September 2011
Semantic Meta-Mining Part 3 of the Tutorial on Semantic Data Mining - - PowerPoint PPT Presentation
Semantic Meta-Mining Part 3 of the Tutorial on Semantic Data Mining Melanie Hilario, Alexandros Kalousis University of Geneva Semantic Data Mining Tutorial (ECML/PKDD11) 1 Athens, 9 September 2011 Overview of Part 3 Melanie Hilario What
Semantic Data Mining Tutorial (ECML/PKDD’11) 1 Athens, 9 September 2011
Semantic Data Mining Tutorial (ECML/PKDD’11) 2 Athens, 9 September 2011
Introduction: What is semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 3 Athens, 9 September 2011
Introduction: What is semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 4 Athens, 9 September 2011
Introduction: What is semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 5 Athens, 9 September 2011
The meta-mining framework
(Fold i) Proc3.i
Iris−Tst’i Iris−Tst’i FWeights i FWeights i Predictions i J48Model i Iris−Trn’i Iris−Trni (Sub−)Workflows DM Operators (nodes) Inputs/outputs (edges) SelectByWeights RM−ApplyModel
Proc3
Weka−J48 SelectByWeights WeightByInfoGain D−TrainFinalModel Final J48 Model WeightByInfoGain SelectByWeights RM−Performance RM−X−Validation Iris−Trni Iris−Tsti Iris Iris FWeights Iris’ Weka−J48 Input data: Iris Task: Feature selection + classification Algorithms: InfoGain based FS + DT Evaluation strategy: 10−fold cross−val Outputs: Learned DT and estimated accuracy Accuracy i AverageAccuracy
Semantic Data Mining Tutorial (ECML/PKDD’11) 6 Athens, 9 September 2011
The meta-mining framework
Semantic Data Mining Tutorial (ECML/PKDD’11) 7 Athens, 9 September 2011
The meta-mining framework
Semantic Data Mining Tutorial (ECML/PKDD’11) 8 Athens, 9 September 2011
The meta-mining framework
data service call data flow
software DB DMEX
Semantic Data Mining Tutorial (ECML/PKDD’11) 9 Athens, 9 September 2011
The meta-mining framework
Semantic Data Mining Tutorial (ECML/PKDD’11) 10 Athens, 9 September 2011
An ontology for semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 11 Athens, 9 September 2011
An ontology for semantic meta-mining
RDF Triple Store Formal Conceptual Framework
Accepted Knowledge of DM Tasks, Algorithms, Operators Specific DM Applications Workflows, Results DMOP DM−KB Experiment Databases DMEX−DBs Knowledge Base ABox TBox
Meta−miner’s training data Meta−miner’s prior DM knowledge
Semantic Data Mining Tutorial (ECML/PKDD’11) 12 Athens, 9 September 2011
An ontology for semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 13 Athens, 9 September 2011
An ontology for semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 14 Athens, 9 September 2011
An ontology for semantic meta-mining
Representation Bias Preference Bias
Categorical LabelledDataSet Classification Model hasObjectiveFct hasOptimizationProblem OptimizationProblem hasOptimGoal hasConstraint InductionCostFunction Constraint {Minimize, Maximize} hasOptimizationStrategy controlsModComplexity hasHyperparameter OptimizationStrategy (many other properties) hasLossComponent hasRegularizationPar assumes AlgorithmAssumption specifiesInputType specifiesOutputType hasModelStructure ModelStructure hasComplexityMetric ModelComplexityMeasure DecisionBoundary hasDecisionBoundary hasModelParameter ModelParameter ModelComplexityMeasure LossFunction ModelComplContStrat AlgorithmParameter hasComplexityComp. RegularizationParameter ClassificationModellingAlgorithm
Semantic Data Mining Tutorial (ECML/PKDD’11) 15 Athens, 9 September 2011
An ontology for semantic meta-mining
class individual subclass of instance of Assumption Algorithm AssumptionOn ProbabilityDistr Multinomial Assumption Assumption Gaussian Assumption Uniform LogisticPosteriorAssumption MultinomialClassPriorAssumption UniformClassPriorAssumption AssumptionOn CategTarget RealTarget AssumptionOn Targets AssumptionOn Features AssumptionOn Instances AssumptionOn NormalClassCondPrAssumption CommonCovarianceAssumption FeatureIndependenceAssumption ConditionalFeatIndepAssumption MultinomialClassCondPrAssumption ClassSpecificCovarianceAssumption AntiMonotonicityOfSupport LinearSeparabilityAssumption IIDAssumption
Semantic Data Mining Tutorial (ECML/PKDD’11) 16 Athens, 9 September 2011
An ontology for semantic meta-mining
subclass of instance of
Optimization Strategy Continuous OptStrategy Discrete OptStrategy Deterministic HC Stochastic HC Deterministic LBS Stochastic LBS Random Walk BreadthFirst DepthFirst UniformCost A* Beam S. BestFirst GreedyBF Search Strategy Relaxation Strategy
Genetic Search Hill Climbing
Path−based Blind Informed IterImprove. Stochastic Greedy Branch&Bound Heuristic BF Local Beam S. Deterministic
Semantic Data Mining Tutorial (ECML/PKDD’11) 17 Athens, 9 September 2011
An ontology for semantic meta-mining
hasDecisionStrategy {Global, Local} {InfoGain, Chi2, CFS−Merit, Consistency ...} SearchStrategy DiscreteOptimizationStrategy {Forward, Backward ...} FeatureSelectionAlgorithm hasOptimizationStrategy DecisionRule StatisticalTest DecisionStrategy {Filter, Wrapper, Embedded} RelaxationStrategy hasSearchDirection {Deterministic,Stochastic} {Blind, Informed} hasUncertaintyLevel hasSearchGuidance {Irrevocable, Tentative} hasChoicePolicy hasCoverage FeatureWeightingAlgorithm hasFeatureEvaluator hasEvaluationTarget hasEvaluationContext hasEvaluationFunction {SingleFeature, FeatureSubset} {Univariate, Multivariate} interactsWithLearnerAs
Semantic Data Mining Tutorial (ECML/PKDD’11) 18 Athens, 9 September 2011
An ontology for semantic meta-mining
CFS−SearchStopRule
Semantic Data Mining Tutorial (ECML/PKDD’11) 19 Athens, 9 September 2011
An ontology for semantic meta-mining
(Fold i) Proc3.i
Iris−Tst’i Iris−Tst’i FWeights i FWeights i Predictions i J48Model i Iris−Trn’i Iris−Trni D−TrainFinalModel Iris SelectByWeights RM−ApplyModel
Proc3
WeightByInfoGain SelectByWeights RM−Performance RM−X−Validation Iris−Trni Iris−Tsti Weka−J48 Accuracy i Final J48 Model AverageAccuracy
Proc3: DM-Process hasInput(Proc3, Iris) executes(Proc3, FSC-Infogain-J48-Xval-Wf) hasOutput(Proc3, J48Model3-Final) hasOutput(Proc3, AvgAccuracy) hasFirstSubprocess(Proc3, Opex3-Xval) hasSubProcess(Proc3, Opex3-Xval) hasSubProcess(Proc3, Opex3-TrainFinalModel) Opex3-Xval: DM-Operation hasFirstSubprocess(Opex3-Xval, Proc3.i) executes(Opex3-Xval, RM-X-Validation) hasParameterSetting(Opex3-Xval, OpSet3) hasOutput(Opex3-Xval, AvgPerfMeasure3) isFollowedDirectlyBy.{OpEx3-TrainFinalModel) isFollowedBy(OpEx3-TrainFinalModel) isSubprocessOf(Opex3-Xval, Proc3) hasSubProcess(Opex3-Xval, Proc3.i) Proc3.i: DM-Process hasInput(Proc3.i, Iris-Trn3.i) hasInput(Proc3.i, Iris-Tst3.i) hasOuptut(Proc3.i, PerfMeasure-3.1.fold-i) hasFirstSubprocess(Proc3.i, Opex3.i.1-WeightByInfogain) isSubprocessOf(Proc3.i, Opex3-Xval) hasSubProcess(Proc3.i, Opex3.i.1-WeightByInfogain) hasSubProcess(Proc3.i, Opex3.i.2-SelectByWeights) hasSubProcess(Proc3.i, Opex3.i.3-J48) hasSubProcess(Proc3.i, Opex-3.i.4-SelectByWeights) hasSubProcess(Proc3.i, Opex3.i.5-ApplyModel) hasSubProcess(Proc3.i, Opex3.i.6-Performance) ...
Semantic Data Mining Tutorial (ECML/PKDD’11) 20 Athens, 9 September 2011
Collaborative Ontology Development Platform
OWL
Semantic Data Mining Tutorial (ECML/PKDD’11) 21 Athens, 9 September 2011
Collaborative Ontology Development Platform
Semantic Data Mining Tutorial (ECML/PKDD’11) 22 Athens, 9 September 2011
Recap
Semantic Data Mining Tutorial (ECML/PKDD’11) 23 Athens, 9 September 2011
Semantic Data Mining Tutorial (ECML/PKDD’11) 2 Athens, 9 September 2011
From meta-learning to semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 2 Athens, 9 September 2011
From meta-learning to semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 3 Athens, 9 September 2011
From meta-learning to semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 4 Athens, 9 September 2011
From meta-learning to semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 5 Athens, 9 September 2011
Semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 6 Athens, 9 September 2011
Semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 7 Athens, 9 September 2011
Semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 8 Athens, 9 September 2011
Semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 9 Athens, 9 September 2011
Semantic meta-mining
Retrieve Split End result Weight by Information Gain training set Select by Weights weights Naive Bayes test set Apply Model model Performance labelled data labelled data performance example set training set input / output edges sub input / output edges X basic nodes Legend X composite nodes Join
X-Validation
Semantic Data Mining Tutorial (ECML/PKDD’11) 10 Athens, 9 September 2011
Semantic meta-mining
(a) Parse tree Retrieve X-Validation Weight by Information Gain Select by Weights Naive Bayes Apply Model Performance End (b) Augmented parse tree Retrieve X-Validation DataProcessing Algorithm FeatureWeighting Algorithm UnivariateFeature WeightingAlgorithm Weight by Information Gain DecisionRule Select by Weights SupervisedModelling Algorithm ClassificationModelling Algorithm Generative Algorithm Bayesian Algorithm NaiveBayes Algorithm NaiveBayes Normal Naive Bayes Apply Model Performance End Semantic Data Mining Tutorial (ECML/PKDD’11) 11 Athens, 9 September 2011
Semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 12 Athens, 9 September 2011
Semantic meta-mining
(c)
(d)
Semantic Data Mining Tutorial (ECML/PKDD’11) 13 Athens, 9 September 2011
Semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 14 Athens, 9 September 2011
Semantic meta-mining
Semantic Data Mining Tutorial (ECML/PKDD’11) 15 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 16 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 17 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 18 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 19 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 20 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 21 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 22 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 23 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 24 Athens, 9 September 2011
Semantic meta-mining for DM workflow planning
Semantic Data Mining Tutorial (ECML/PKDD’11) 25 Athens, 9 September 2011
Appendix
[1]
. Nguyen, H. Do, A. Woznica, and A. Kalousis. Ontology-based meta-mining of knowledge discovery workflows. In
[2]
In Proceedings of the Planning to learn Workshop, ECAI-2010.
[1]
and Grid Computing, in conjunction with WWW-2003, pages 113–134, 2003. [2]
3rd Planning to Learn Workshop (in conjunction with ECAI-2010), pages 27–34, Lisbon, 2010. [3]
. Nguyen, and A. Woznica. A data mining ontology for algorithm selection and meta-learning. In Proc. ECML/PKDD Workshop on Third-Generation Data Mining: Towards Service-Oriented Knowledge Discovery (SoKD-09), Bled, Slovenia, September 2009. [4] J.-U. Kietz, F . Serban, A. Bernstein, and S. Fischer. Data mining workflow templates for intelligent discovery assistance and auto-experimentation. In Proc. 3rd Workshop on Third-Generation Data Mining: Towards Service-Oriented Knowledge Discovery (SoKD-10), pages 1–12, 2010. [5] P . Panov, L. Soldatova, and S. Dzeroski. Towards an ontology of data mining investigations. In Discovery Science, 2009. [6] Joaquin Vanschoren and Larisa Soldatova. Exposé: An ontology for data mining experiments. In International Workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery (SoKD-2010), September 2010. [7]
. Kremen, F . Zelezny, and N. Lavrac. Automating knowledge discovery workflow composition through ontology-based
Semantic Data Mining Tutorial (ECML/PKDD’11) 26 Athens, 9 September 2011
Appendix
[1]
[2]
the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 325–330, 2000. [3] P . Brazdil, J. Gama, and B. Henery. Characterizing the applicability of classification algorithms using meta-level learning. In Machine Learning: ECML-94. European Conference on Machine Learning, pages 83–102, Catania, Italy, 1994. Springer-Verlag. [4]
Processing (ICONIP), Shanghai 2001, pages 235–240, 2001. [5]
Data Mining, Decision Support and Meta-learning, pages 57–68, 2001. [6]
. Brazdil. Introduction to the special issue on meta-learning. Machine Learning, 54:187–193, 2004. [7]
[8]
International Conference on Machine Learning, ICML ’2000, pages 743–750, San Francisco, California, June 2000. Morgan Kaufmann. [9]
[10]
. Brazdil. Zoomed ranking: selection of classification algorithms based on relevant performance information. In Principles of Data Mining and Knowledge Discovery. Proceedings of the 4th European Conference (PKDD-00, pages 126–135. Springer, 2000. [11]
. Brazdil, and P . Kuba. A meta-learning method to select the kernel width in support vector regression. Machine Learning, 54(3):195–209, 2004. [12]
[13]
. Brazdil, and C. Soares. Using meta-learning to support data mining. International Journal of Computer Science and Applications, 1(1):31–45, 2004.
Semantic Data Mining Tutorial (ECML/PKDD’11) 27 Athens, 9 September 2011
Appendix
[1] M.J. Zaki Efficiently mining frequent trees in a forest: Algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17:1021–1035, special issue on Mining Biological Data.
Semantic Data Mining Tutorial (ECML/PKDD’11) 28 Athens, 9 September 2011