#### Downloads

- Lecture Notes (PDF)
- Course/grade/intelligence plate model example (Netica file)
- Output from sneezing diagnosis class exercise (Netica file)
- Burgular hidden markov model example (Netica file)
- Same burgular HMM in Excel (XLS file)

Here's a quick one to make the files available online from today's AI lecture at RMIT University. Much thanks to Lawrence Cavedon for making it happen.

####
Downloads

- Lecture Notes (PDF)
- Course/grade/intelligence plate model example (Netica file)
- Output from sneezing diagnosis class exercise (Netica file)
- Burgular hidden markov model example (Netica file)
- Same burgular HMM in Excel (XLS file)

It's always good to give a little something back, so each year I do some guest lecturing on data warehousing to RMIT's CS Masters students.

We usually pull a data warehouse box out of our compute cloud for the session so I can walk through the whole end-to-end stack from the hardware through to the dashboards. The session is quite useful and always well received by students.

This year the delightful Jenny Zhang and I showed the students MDX Runner, an abstraction used at VizDynamics on a daily basis to access our data warehouses. As powerful as MDX is, it has a steep learning curve and the result sets it returns can be bewildering to access programmatically. MDX Runner eases this pain by abstracting out the task of building and consuming MDX queries.

Given that it has usefulness far beyond what we do at VizDynamics, I have made arrangements for MDX Runner to be open-sourced. If you are running analysis services or any other MDX-compatible data warehousing environment, take a look at **mdxrunner.org** - you will certainly find it useful.

Do reach out with updates if you test it against any of the other BI platforms. Hopefully over time we can start building out a nice generalised interface into Oracle, Teradata, SAP HANA and SSAS.

In this post I share a formal framework for reasoning about advertising traffic flows, is how black box optimisers work and needs to be covered before we get into any models. If you are a marketer, then the advertising stuff will be old-hat and if you are a data scientist then the axioms will seem almost obvious.

What is useful is combining this advertising + science view and the interesting conclusions about traffic valuation one can draw from it. The framework is generalised and can be applied to a single placement or to an entire channel.

I should preface by saying my view on creative is that it is more important than the quality of one's analysis and media buying prowess. All the data crunching in the world is not worth a pinch if the proposition is wrong or the execution is poor.

On the other hand, an amazing ad for a great product delivered to just the right people at the perfect moment will set the world on fire.

The digital advertising optimisation problem is well known: analyse the performance data collected to date and find the advertising mix that allocates the budget in such a way that maximises the expected revenue.

This can be divided into three sub-problems: assigning conversion probabilities to each of the advertising opportunities; estimating the financial value of advertising opportunities; and finding the Optimal Media Plan.

The most difficult of these is the assessment of conversion probabilities. Considering only the performance of a single placement or search phrase tends to discard large volumes of otherwise useful data (for example, the performance of closely related keywords or placements). What is required is a technique that makes full use of all the data in calculating these probabilities without double-counting any information.

In most digital advertising marketplaces, forces are such that traffic with high conversion probability will cost more than traffic with a lower conversion probability (see Figure 1). This is because advertisers are willing to pay a premium for better quality traffic flows while simultaneously avoiding traffic with low conversion probability.

Digital advertising also possesses the property that the incremental cost of traffic increases as an advertiser purchases more traffic from a publisher (see Figure 2). For example, an advertiser might increase the spend on a particular placement by 40%, but it is unlikely that any new deal would generate an additional 40% increase in traffic or sales.

To counter this effect, sophisticated marketers grow their advertising portfolios by expanding
into new sites and opportunities (by adding more placements), rather than by
paying more for the advertising they already have. This horizontal expansion creates an optimisation problem: given a monthly budget of $x, what allocation of advertising will
generate the most sales? This configuration then is the *Optimal Media Plan*.

To solve the Optimal Media Plan problem, one needs to know three things for every
advertising opportunity: the *cost* of each prospective placement; the expected *volume*
of clicks; and the *propensity* of the placement to convert clicks into sales (see Figure 3).
This Holy Triumvirate of Digital Advertising (cost, volume and propensity) is constrained along a
response surface that ensures that low cost, high propensity and high volume placements occur infrequently
and without longevity.

For the remainder of this post (and well into the future), propensity will be considered exclusively in terms of ConversionProbability. This post will provide a general framework for this media plan optimisation problem and explore how ConversionProbability relates to search and display advertising.

Oh dear, the game is up. Our big secret is out. We should have a parade.

This year is looking like when computer scientists come out and confess that the world is undergoing a huge technology driven revolution based on simple probabilities. Or perhaps it's just that people have started to notice the rather obvious impact it is making on their lives (the hype around the recent DARPA Robotics Challenge and Christine Lagarde's entertaining lecture last month are both marvelous example of that).

This change is to computer science what quantum mechanics was to physics: a grand shift in thinking from an absolute and observable world to an uncertain and far less observable one. We are leaving the digital age and entering the probabilistic one. The main architects of this change are some very smart people and my favorite super heroes - Daphne Koller, Sebastian Thrun, Richard Neapolitan, Andrew Ng and Ron Howard (no not the Happy Days Ron – this one).

Behind this shift are a clique of innovators and ‘thought leaders’ with an amazing vision of the future. Their vision is grand and they are slowly creating the global cultural change they need to execute it. In their vision, freeways are close to 100% occupied, all cars travel at maximum speed and the population growth declines to a sustainable level.

This upcoming convergence of population to sustainable levels will not come from job-stealing or killer robots, but from increased efficiency and the better lives we will all live, id est, the kind of productivity increase that is inversely proportional to population growth.

And then world is saved... by computer scientists.

Classical computer science is based on very precise, finite and discrete things, like counting pebbles, rocks and shells in an exact manner. This classical science consists of many useful pieces such as the von-neumann architecture, relational databases, sets, graph theory, combinatorics, determinism, greek logic, sort + merge, and so many other well defined and expressible-in-binary things.

What is now taking hold is a whole different class of computer-ey science, grounded in probabilistic reasoning and with some other thing called information theory thrown in on the sidelines. This kind of science allows us to deal in the greyness of the world. Thus we can, say, assign happiness values to whether we think those previously mentioned objects are in fact more pebbly, rocky or shelly given what we know about the time of day and its effect on the lighting of pebble-ish, rock-ish and shell-ish looking things. Those happiness values are expressed as probabilities.

The convenience of this probability-based framework is its compact representation of what we know, as well as its ability to quantify what we do not(ish).

Its subjective approach is very unlike the objectivism of classical stats. In classical stats, we are trying to uncover a pre-existing independent, unbiased assessment. In the Bayesian or probabilistic world bias is welcomed as it represents our existing knowledge, which we then update with real data. Whoa.

This paradigm shift is far more general than just the building of robots - it's changing the world.

A testament to the power of this approach is that the market leaders in many tech verticals already have this math at their heart. Google Search is a perfect example - only half of their rankings are PageRank based. The rest is a big probability model that converts your search query into a machine-readable version of your innermost thoughts and desires (to the untrained eye it looks a lot like magic).

If you don’t believe me, consider for a moment, how does Facebook choose what to display in your own feed? How do laptops and phones interpret gestures? How do handwriting, speech and facial recognition systems work? Error Correction? Chatbots? Emotion recognition? Game AI? PhotoSynth? Data Compression?

It’s mostly all the same math. There are other ways, which are useful for some sub-problems, but they can all ultimately be decomposed or factored into some sort of Bayesian or Markovian graphical probability model.

Try it yourself: Pickup your iPhone right now and ask the delightful Siri if she is probabilistic, then assign a happiness value in your mind as to whether she is. There, you are now a Bayesian.

Notwithstanding small pockets of knowledge, we don’t properly teach this material in Australia, partly because it is so difficult to learn.

We are not alone here. Japan was recently struck down by this same affliction when their robots could not help to resolve their Fukushima disaster. Their classically trained robots cannot cope with changes to their environment that probabilities so neatly quantify.

To give you an idea of how profound this thesis is, or how far and wide it will eventually travel, it is currently taught by the top American universities across many faculties. The only other mathematical discipline that has found its way into every aspect of science, business and humanities is the Greek logic, and that is thousands of years old.

The Probabilistic Calculus subsumes Greek Logic, Predicate Logic, Markov Chains, Kalman Filters, Linear Models, possibly even Neural Networks; that is, because they can all be expressed as graphical probability models. Thus logic is no longer king. Probabilities, expected utility and value of information are the new general purpose ‘Bayesian’ way to reason about anything, and can be applied in a boardroom setting as effectively as in the lab.

One could build a probability model to reason about things like love, however it's ill advised. For example, a well-trained model would be quite adept at answering questions like “what is the probability of my enduring happiness given a prospective partner with particular traits and habits.”

The ethical dilemma here is that a robot built on the Bayesian Thesis is not thinking as we know it – it's just a systematic application of an ingenious mathematical trick to create the appearance of thought. Thus for some things, it simply is not appropriate to pretend to think deeply about a topic; one must actually do it.

These probabilistic apps of the future (some of which already exist) will drive bandwidth-hogging monsters (quite possibly literally) that could make full use of low latency fibre connections.

These apps construct real-time models of their world based on vast repositories of constantly updated knowledge stored ‘in the cloud’. The mechanics of this requires the ability to transmit and consume live video feeds, whilst simultaneously firing off thousands of queries against grand mid- and big-data repositories.

For example, an app might want to assign probabilities to what that shiny thing is over there, or if its just sensor noise, or if you should buy it, or if you should avoid crashing into it, or if popular sentiment towards it is negative; and, oh dear, we might want to do that thousands of times per second by querying Flickr, Facebook and Google and and and. All at once. Whilst dancing. And wearing a Gucci augmented reality headset, connected to my Hermes product aware wallet.

This repetitive probability calculation is exactly what robots do, but in a disconnected way. Imagine what is possible once they are all connected to the cloud. And then to each other. Then consider how much bandwidth it will require!

But, more seriously, the downside of this is that our currently sufficient 4G LTE network will very quickly be overwhelmed by these magical new apps in a similar way to how the iPhone crushed our 3G data networks.

Given that i-Devices and robots like to move around, I don't know whether FTTH would be worth the expense, but near-FTTH with a very high performance wireless local loop certainly would help here. At some point we will be buying Hugo Boss branded Occulus Rift VR headsets, and they need to plug into something a little more substantive than what we have today.

In my previous post I said I would be covering advertising things. So here it is if you haven't already worked it out: this same probability guff also works with digital advertising, and astonishingly well.

There I said it, the secret is out. Time for a parade.

...some useful bits coming in the next post.

Fab, I’m blogging.

###
A Chump’s Game

A good friend of mine, whilst working at a New York hedge fund once said to me, “online advertising is a chump’s game”.

At the time he was exploring the idea of constructing financial instruments around the trade of user attention. His comment was coming from just how unpredictable, heterogeneous and generally intractable the quantitative side of advertising can be. Soon after, he quickly recoiled from the world of digital advertising and re-ascended back into the transparent market efficiency of haute finance; a world of looking for the next big “arb”.

###
What I Do

I am a data scientist and I work on this problem every day.

Over the last 15 or so years I have come to find that digital advertising is, in fact, completely the opposite of a chump's game – yes, media marketplaces are extraordinarily opaque and highly disconnected – but with that comes fantastically gross pricing inefficiencies exploitable in a form of advertising arbitrage.

The Wall Street guys saw this, but never quite cracked how to exploit it.

###
What you will find here

If you have spent more than a little time with me, then in between mountain biking and heli-boarding at my two favorite places in the world, you will have probably heard about or seen a probability model or two.

In the coming months I will be banging on about some of this, and in particular sharing a few easy tricks on how advertisers can use data to gain a bit of an advantage. With the right approach, it’s rather simple.

The concepts I will present here are already built into the black-box ad platforms we use daily, the foundations of which are two closely related assumptions:

I am hoping along the way that a few interesting people might also compare their own experiences and provide further feedback and refinement. If its well received then we might scale up the complexity.

So, if digital advertising is your game then stay tuned, this will be a bit of fun!

At the time he was exploring the idea of constructing financial instruments around the trade of user attention. His comment was coming from just how unpredictable, heterogeneous and generally intractable the quantitative side of advertising can be. Soon after, he quickly recoiled from the world of digital advertising and re-ascended back into the transparent market efficiency of haute finance; a world of looking for the next big “arb”.

Over the last 15 or so years I have come to find that digital advertising is, in fact, completely the opposite of a chump's game – yes, media marketplaces are extraordinarily opaque and highly disconnected – but with that comes fantastically gross pricing inefficiencies exploitable in a form of advertising arbitrage.

The Wall Street guys saw this, but never quite cracked how to exploit it.

In the coming months I will be banging on about some of this, and in particular sharing a few easy tricks on how advertisers can use data to gain a bit of an advantage. With the right approach, it’s rather simple.

The concepts I will present here are already built into the black-box ad platforms we use daily, the foundations of which are two closely related assumptions:

- Any flow of advertising traffic has people behind it whom possess a fixed amount of buying power and a more elastic willingness to exercise it.
- As individuals we are independent thinkers, but as a swarm, we behave in remarkably predictable ways.

The format is a mix of theory, worked examples and how-to, combined with a touch of spreadsheet engineering. A dear friend of mine – whom has written more than a few articles for the Economist - will be helping me edit things to keep the technical guff to a minimum.Achtung!This site makes use of in-browser 3D. If your computer is struggling, then you probably need a little upgrade, a GPU or a browser change. Modern data science needs compute power, and alot of it.

I am hoping along the way that a few interesting people might also compare their own experiences and provide further feedback and refinement. If its well received then we might scale up the complexity.

So, if digital advertising is your game then stay tuned, this will be a bit of fun!

Subscribe to:
Posts (Atom)