- Good data is definitely more important than good models
- If quality of the data wasn’t of importance, organizations wouldn’t spend so much time cleaning and preprocessing it!
- Even for scientific purpose: good data (reflected by the design of experiments) is very important
How do you define good?
– good data: data relevant regarding the project/task to be handled
– good model: model relevant regarding the project/task
– good model: a model that generalizes on external data sets
Is there a universal good model?
– No, otherwise there wouldn’t be the overfitting problem!
– Algorithm can be universal but not the model
– Model built on a specific data set in a specific organization could be ineffective in other data set of the same organization
– Models have to be updated on a somewhat regular basis
Are there any models that are definitely not so good?
– “all models are wrong but some are useful” George E.P. Box
– It depends on what you want: predictive models or explanatory power
– If both are bad: bad model
Source