Capturing the Laws of (Data) Nature
Hannes Mühleisen, Martin Kersten & Stefan Manegold CIDR 2015
Capturing the Laws of (Data) Nature Hannes Mhleisen, Martin - - PowerPoint PPT Presentation
Capturing the Laws of (Data) Nature Hannes Mhleisen, Martin Kersten & Stefan Manegold CIDR 2015 Statistical Model Fitting & DB? User gave me a model, lets see. I am storing some data. I need some of the observations to fit
Hannes Mühleisen, Martin Kersten & Stefan Manegold CIDR 2015
I am storing some data. User gave me a model, let’s see. I need some of the observations to fit the model. This other guy is reading some of my data. Cool, the model seems to fit the data well! Let’s get some more data to validate the fit… This other guy is reading some more of my data. Amazing, model fit is validated. I am storing some data. Beer!
understanding of the world
these models
Configuration Measurement
Model! Grouped by-source operation Convergence Hints
Measurement Configuration Fitted parameters
0.12 0.14 0.16 0.18 0.20 2.0 2.5 3.0 3.5 Frequency (GHz) Intensity (Jy)
source=17562, alpha=-0.692, p=0.812
Model to function conversion (automatic) Move to DB (automatic)
Approximate Answer with zero IO*
specified in the query?
all combinations of values are allowed in the model.
Flux Flux Residuals Ratio ORIG 11,665,408 11,665,408 0% GZIP 4,331,782 3,748,872 86% BZIP2 3,341,574 2,752,044 82% XZ 2,887,584 2,727,144 94%
Drop residuals = lossy compression =
model?
management system.
the user and store the model for later use.
I ≈ p · να ? S ν I S ν I R2 = 0.92 ! I ≈ p · να ? R2 = 0.92 ! S p α I ≈ p · να S = 42, ν = 0.14, I =? I = 3.0 ± 0.05 ! (1) (2) (3) (4) (5)
Questions?
http://hannes.muehleisen.org @hfmuehleisen
George E. P. Box