Gordon Scarlett's data value blog: 2015

Monday, May 11, 2015

The $10,000 Analytics Engagement

Many practices tend to overlook the small business market when exploring analytics opportunities for reasons ranging from price sensitivity on the consumer side to an impractical cost to deliver on the side of the provider. This paper proposes that although not all small business models are viable prospects, there are some that may offer ripe opportunities in light of their size due to the specifics of their analytics challenges and the limited investment required to address them. Is a $10,000 engagement viable, can it be delivered with OOB tools like Alteryx or ubiquitous tools like MS Excel and be a lucrative niche offering? I have yet to prove it, but on the surface, I believe it can. The key variables that determine the viability of a small analytics engagement with a client are identical to those of larger clients at the highest level. They are the maturity level of the client, the analytical problem type, and the quality and availability of data.

Maturity Level
The opportunities to apply analytics to address business problems are available at all stages of a business’ life cycle. However, those at the beginning which include market identification and customer base acquisition are daunting in that they would require a significant amount of 3rd party data as well as time at the onset which may be financially prohibitive to a new business owner. For this reason, I propose that the target audience should be the more mature business owner with a core customer base whose priority is growth and is looking for opportunities to address this growth with a more “intelligent spend” as in targeting the spend whether it be marketing or client support in such a way that the return is quantifiable and adjustable.

Problem Type
The ideal business model that comes to mind is that of an agency offering financial services or insurance products. These types come to mind because the relationships between the proprietor and the customer tend to be more interactive and frequent and those two variables provide the tremendous amount of data collection opportunities. Additionally, good data will drive better analytics and thus improves the likelihood of a tangible, quantifiable result for your small business client. Although new customer acquisition is always a priority, so is understanding who has growth potential, who is an attrition risk and where her time and resources should be targeted to maximize the value of her customer base. This type of problem is the most common in the real world from an analytics toolbox perspective in that it lends itself to the application of Classification methods since the goal of the business is a segmentation model that would allow the alignment of resource spend (marketing, value add services, etc.) with the clients’ value to the firm (current, potential, and lifetime). In other words, aligning a segmentation model with a cost to serve model. Classification methods abound, but your client may not be that interested in the science of some of the more complex methods available and more than likely needs you to apply an easily explainable (logical to the layperson) methodology. In the case of the client having an idea of the number of segments they have identified from heuristics, K-Centroid, Hierarchical Clustering or Principal Components Analysis may be viable approachs. Of course, your analyst may have a differing opinion on the method to use, but the key to remember is that your client needs to feel empowered, not overwhelmed, so the ability to explain your reasoning and process should be one of your core focuses.

Data Quality and Availability
The client business model that I have proposed is ideal because of its 1:1 interaction between the proprietor and her clients. This allows for the capture of specific and robust historical data about their accounts including policy changes, portfolio changes, assets under management, meetings. Additionally, a major plus of this business model is that unquantified (yet valuable) data attributes are also easier to capture such as ‘high’ vs ‘low’ maintenance or family demographics that could indicate future product or service needs, a key component of your final model. As a matter of fact, this will be where the majority of an analysts time will be spent, quantifying the “unquantifiable”. Turning ‘gut feelings’ into attributes and working with the client to think more broadly about their experience with a client and the client’s experience with them that they probably have had time to in the past. The benefits are real however if this exercise allows a business owner to integrate a cost to serve variable into their pricing proposals or their policy/account reviews. The guilt of spending too much time here or not enough there is eliminated and now reflected in their core KPIs such as margins, attrition rates, referral rates, etc.

Conclusion
What I have described thus far is a very common and often unaddressed challenge being faced by small business owners. However I propose that the execution of this approach can be accomplished with minimal investment by executing the following steps:

1) Planning discussions with client: 4 – 12 hrs
2) Data acquisition and transformation: 8 – 24 hrs
3) Identify and quantify the “unquantifiable” attributes: 4 – 8 hrs
4) Method selection and model training: 4 – 8 hrs
5) Model testing and review: 8 – 16 hrs 6) Productionalize model: 16 – 24 hrs

This engagement when modeled provides an estimate of about 80 hours with 90% confidence, which equates to ~$10,000 at $120/hr. Quarterly or semi-annual model reviews and fine tuning could net additional revenues. I welcome your thoughts and feedback.

Thursday, February 5, 2015

The 12 Days of Christmas - A look at pre-Christmas retail sales patterns

During a recent tenure at a national sporting goods retailer, I was bewildered by the increased Marketing spend in the final days leading up to Christmas after witnessing limited spend between Black Friday and then. Based upon my experience (robust sample, I know, but hold on), buying decisions for the most part had been made significantly earlier than the week of Christmas, and last minute shopping would not be enough to save the season for this retail category. This prompted me to examine our actual seasonal sales (Black Friday through Christmas) and understand how those sales were actually realized. This resulted in a result so sompelling that our BI team published it for our business partners/customers.

The 12 Days of Christmas

As you all know, the sales season spanning from Black Friday to Christmas Eve (period referred to as Christmas season for remainder of this document) is the most important time of year for retailers. While everyone understands the impact of Black Friday, it’s interesting to note that on average, 53% of the season’s sales happen in the last 12 days leading up to Christmas (51% if 2008 is removed).

Week after week we look at daily comp sales, and have a pretty good idea what sales might be in the coming days and weeks.

See typical 15 day period below:

However, in the days leading up to Christmas, this method of gauging performance is problematic showing large swings which may or may not reflect the existing sales trend.

See 15 day period including Christmas below: Christmas (red circles)

So the question that we wanted to address is,

Given the volatility in year over year comparisons during this period, is there a way to stabilize our observations and develop a reliable prediction methodology for this period of time?

Upon analyzing 6 years of data, a surprisingly simple pattern began to unfold. During the 12 days preceding Christmas, we identified 3 buckets of time (number of days) with similar sales proportions i.e. percentages of the seasons total sales volume. Specifically, we found that on average ~17.5% of sales occurs in the 3 day period prior to Christmas, ~18.8% of sales occurs during the 4 day period preceding the 3 day period, and ~16.7% occur in the preceding 5 day period, hence the name, 3-4-5 Rule. It’s a rule that helps us to better understand some of the ‘when’ and ‘how much’ questions that become critical to answer during holiday sales.

See illustration below:

On Average, each bucket will account for 17.5% (+/- 1%) of the Holiday Season’s Total Sales. Despite the study being based upon 6 years, the below illustration is using only 2 years:

· 3 days of sales leading up to Christmas (22^nd to 24^th)

· 4 days of sales preceding the 3-day bucket (18^th to 21^st)

· 5 days of sales preceding the 4-day bucket (13^th to 17^th)

So what does this all mean? Well, it can help us to have a better picture of where our sales will end up as soon as December 12^th. By then, we know that we have roughly 50% of seasonal sales revenue has been accounted for with expectations for 16% - 18% in the next 5 day window and 16% - 18% in the following 4 day window and 16% - 18% in the final 3 day window prior to Christmas.

This approach allowed us in BI to predict that Total Company Sales vs. LY for Period 11 would finish around -1.38% (actual Period results were -1.2%) despite seeing total company sales comp on December 20th at -12.9%. IT also showed that the increased spend in the final days leading up to Christmas was not impacting our top line sales as history had shown that the realized pattern over history was similar regardless of spend in these final days.

Tuesday, February 3, 2015

An alternative approach to justifying EIM

Introduction
After reading 'How to Measure Anything' by Douglas Hubbard, I realized that my message when trying to initiate an EIM program was being drowned out by the sticker shock. It isn't a cheap proposition and most business users can't see past the most obvious EIM benefit of faster more accurate reporting. I came to the realization that EIM shouldn't be presented as an initiative that IT needed to justify, but as a business initiative, one for which they would be held accountable. If data is an asset and EIM increases the value of of that asset, then how would I make a quantitative argument to get my point across that it was up to the business to generate a return on this increase in asset value? The below was my attempt to create this argument at a previous employer, an asset management firm. It was directed towards our IT leadership and upon agreement, was shared with strong quantitative leaders in the business.

An Exercise in quantifying the true value of EIM
The purpose of the EIM effort, as a necessary step to integrate disparate data sources and to enhance the validity of corporate data, has been discussed ad nauseum. The benefits however have been hard to articulate and even more so, to quantify. The purpose of this document is to introduce a method to quantify the benefits of this resource-heavy effort by removing the emphasis from trying to quantify actual deliverables and placing it on decision makers as they impact the bottom line by making better informed and more profitable decisions because of an increase the Value of Information (VoI). EIM in a nutshell is an effort to increase the Value of Information (VoI) by primarily doing 3 things:

1. Improve data accuracy/consistency

2. Increase the breadth of data we have around any subject area that matters (data completeness).

3. Make it accessible to a broad number of constituents

As these 3 data attributes increase, so does the Value of Information as an asset, as it can now be used to reduce the uncertainty around managerial decision making. Note that increasing the value of information does not in and of itself contribute to an organization’s bottom line. It is the improved quality of decision making by decision makers based on the reduced uncertainty of the outcome. Before tackling this train of thought any further, let’s define Uncertainty and how it is impacted by having better information to work with.

Uncertainty: Any event/decision that can result in more than one outcome. E.g. Whether a campaign leads with thought leadership vs. product, whether launching product A vs product B is a better bet based upon current advisor behavioral trends

For any decision or any event there are multiple possible outcomes. Given any situation that calls for a decision to be made, the decision maker must have some idea as to the consequences of taking (or not taking) an action. And despite not doing complex statistical analysis, they have without necessarily realizing it, determined certain probabilities for the possible outcomes and chosen the outcome with the highest probability for the desired result. We will call this intuition. The chart below shows how increasing the Value of information can reduce a firms’ reliance on intuition and the inherent risks associated with it.

For the purpose of this paper, the possible outcomes from decision making are denoted as O₁, O₂, and O₃. We are limiting the number of possible outcomes for any decision or event to 3 for the sake of simplicity. Now suppose that a decision maker has assigned a probability to the most likely outcome from a pending decision (O₁) to 70% based upon the information the decision maker has reviewed and his/her experience in the role or the industry, and the remaining outcomes, O2 and O₃ as having probabilities of 25% and 5% respectively, for a total of 100%. This total amount by definition confirms that all possible outcomes have been considered.

In reality, there is always an outcome X (unforeseen outcome), which can be looked at as some unsuspected surprise outcome that has by definition a relatively small percentage assigned to it, otherwise a prudent decision maker would refrain from making the decision.

Outcomes and their likelihood of occurring restated

Likelihood of O₁ = 70%

Likelihood of O₂ = 25%

Likelihood of O₃ = 5%

With more valuable information made possible by improving data quality, accessibility, etc., revisiting these possible outcomes with new insight should result in 1 of 3 possibilities for each outcome’s estimate.

The likelihood of the outcome actually occurring will increase
The likelihood of the outcome actually occurring will decrease
The likelihood of the outcome should be 0 and thus is eliminated as a possibility

Should the decision maker’s years on the job be of value, which we presume is the case, they have predicted correctly that O₁ is the most likely outcome. However, now post EIM, they have determined that the Likelihood of O₁ is 95%, this almost certain determination allows the decision maker to do a number of things differently such as

Realign more resources to this outcome to further capitalize on its expected benefit.
Have increased confidence in their being able to achieve his/her objectives.
Have increased capacity to positively impact the organization as the time involved in gathering the information necessary to make a high confidence decision has decreased.

This example thus far has assumed that the 3 possible outcomes are all positive ones, defined as, will make Company X money. What if one of the outcomes was negative in that if it came to fruition, Company X would suffer some sort of a loss.

This brings the concept of Risk to the equation, which we will define as:

Risk: An event/decision where one or more of the possible outcomes is negative/catastrophic (costs money).

Let’s take an alternate view and examine O₃ with its likelihood of 5%. In addition, it is a risk that will cause the firm $100,000 should the event, however unlikely, occur. With a 5% likelihood, quantifying this risk creates an expected loss from this outcome at $5,000 (5% X $100,000). Assuming that the benefits from the other outcomes outweigh this significantly, this will not impact the decision maker’s decision to act. Now examine this outcome again in a post-EIM world. If improved information had delivered insights that made the decision maker realize that the likelihood of O₃ was really 20% and not 5%, his/her expected loss from this decision would be, $20,000 (20% X $100,000), which may have a decisive impact on his/her original chain of thought.

And finally, a third scenario to consider is one where the outcomes are not the result of a managerial decision, but from an event that will occur regardless of what anyone at Company X does? For example something triggered by regulatory or economic forces. If the negatively impacting outcome from the event was initially thought to have a 5% probability, but with better data, we were able to determine that the probability was closer to 20%, this insight would trigger Company X to take mitigating actions to hopefully dampen the impact of what could be catastrophic, but was once considered a non-issue.

Example of the impact of reduced uncertainty

Imagine that we are trying to maximize a prospect advisor’s first total purchase amount. We’ve seen that:

If an advisor initiates 3 Web visits and 2 IA (ironman advisors) calls in the 90 days prior to their first purchase when accompanied by some count of outbound calls and Email sends the result is an initial purchase amount of up to $45,000. The problem lies in not being able to track the email sends or outbound calls to really determine the optimal mix.

Restating the problem, we have 3 Web visits + 2 IA calls + X outbound calls + Y Email sends <= $45,000 as the problem in a pre-EIM environment. However, after EIM, we now have a more complete view of the advisors experience with Company X and have observed multiple cases of the following scenarios:

If we have determined via analysis that an initial purchase amount of at least $40,000 creates a relationship that maximizes the lifetime value of an advisor, we now have provided some guidance to our wholesalers as to how to maximize their success, by initiating between 2 and 4 outbound calls and between 2 and 4 emails in addition to the advisor’s initiated activities of web visits and IA calls, they are on their way to a potentially viable and profitable relationship. Note that we did not need precision or an exact answer to benefit from the new insight provided. This is a major adjustment in thinking that many firms tackling similar efforts of EIM’s scope and expense need to come to grips with. The exact count, though critical in Finance, is seldom needed in business operations. Improving insight into a situation has much greater impact on a decision maker’s decision making when that insight gives them a more accurate picture of possible outcomes and their likelihoods of success.

Quantifying these benefits

What this all leads up to is, the real benefit of EIM with its focus on reducing uncertainty in the decision making process will allow decision makers to

Allocate the proper resources to outcomes that appear to be more profitable uses of Company X’s resources. (greater efficiency with utilization of scarce resources)
Avoid making decisions with unreasonable levels of risk
React proactively to risks in the environment so as to limit losses
Identify new opportunities that were not seen previously

Technically and briefly, we take the expected return of decisions/events prior to EIM and subtract them from the expected return of decision/events post EIM, taking into consideration the more precise probabilities explained in the discussion above. The positive value is the benefit of EIM. But how can we practically prove this out?

We can start with the question, “Who benefits?” Which by itself is hard to quantify, but when examined more closely, the questions really are (at the least):

1. What areas of the business will this open up opportunities for us to make more/better revenue enhancing decisions?

2. What is the set of decisions that have been forgone to this point that will be able to be made?

3. What are the set of decisions that have been made, but will be able to be made with a greater level of confidence?

To get to these answers, we need to address decision makers specifically with an investigative line of questioning similar to the following:

· What will you be able to do that you cannot do right now?

· How will that change your behavior (decision making) as far as which courses of action become available to you and which course of action you will take?

· Quantify these courses of action (range of financial possibilities is perfectly normal)

· Will this be unchartered territory, new dollars or a net improvement on an existing process?

We should begin with those individuals who have voiced a solid understanding of what they expect from this, for example VP 1 and VP 2, using the above questions as a starting point, to quantify how their changes to how they make decisions and do business will impact our revenue.