TL;DR How might a data analysis system detect data drift: shifts in data distribution? Depending on context, shifting data distributions may imply new opportunities for fast action, or new risks to be mitigated. In this post, first principles build to an automated drift detection method, tailored for categorical data.
What’s the meaning of probability? There are two primary views. The “frequentist” view defines probability as event frequency over repeated trials – if a coin lands heads in 50 out of 100 flips, there’s 50% probability of heads. On the other hand, the Bayesian view defines probability as “strength of belief” in possible outcomes of one trial – how a single geopolitical event unfolds, for example.
What’s your favorite restaurant? It’s probably your favorite because of current experience – great food, atmosphere, value, and so on. But how do you anticipate future experience? How do you assess where a restaurant is trending?
A restaurant’s trend hints at future direction: what might it be like in 6 months or 1 year?
Picture this. Your mission is to secure great deals on vehicles sold at wholesale auto auction. An intriguing prospect comes on your radar. How do you make the purchasing decision? How do you quickly weigh 25 pieces of available information? I’m eager to help. I introduce and verify a machine learning model trained by 70,000 historical transactions.
Key Takeaways
Around 2012, Carvana – a car retailer – posted wholesale auto auction transaction data, prompting analysts to predict whether a purchased vehicle would turn out to be a lemon. Historical data of this sort may help auction buyers avoid future lemons.
Preliminary insights follow from visual exploration of the data.
.nobullet li { list-style-type: none; } Key Takeaways
A business case study may be played out using a replicable controlled experiment1.
Consider a hypothetical business case with these characteristics:
The objective is prediction of a numerical outcome. Many indicators are available to try and predict the outcome.
Key Takeaways
A linear model – a “line of best fit” – estimates the true relationship between predictors and an outcome.
Linear models come in classical statistics or machine learning varieties.
To learn about linear models’ relative performance/behavior, we use controlled experiments, powered by computer simulation.