Modern Data Platform vs. Traditional Data Platform: Where to invest your time and money?

Affine’s Analytics Engineering Practices releases Episode 3rd of Modern Data Platform, unbinding muddles between traditional and modern data platforms.

We explored data-driven strategy and the characteristics of modern enterprises in previous episodes of the Modern Data Platform series. Let’s take a closer look at some contemporary data-driven difficulties and see how traditional and modern techniques compared.

What are the bottlenecks of Traditional Data Platform?

Traditional data platforms fall behind the cloud on various factors. Organizations traditionally relied on in-house infrastructure to drive innovation and manage their workflow. Businesses would manage their in-house infrastructure and be accountable for everything, following traditional data platforms. The administration was taken care of by the in-house IT professionals. It was the business’s responsibility for downtime and repairs. We can say that managing the traditional data platform used to be an expensive affair.

Businesses had to take various responsibilities, like planning, people, hardware, software, and the environment. From a scalability perspective, it was possible, but it comes with the price of various challenges and delays hindering the performance of the enterprise’s overall data management.

Fig. 1: Illustrating the disadvantages of traditional data platform

Why should you consider cloud as a substitute option for Traditional Data Platform?

For instance, the traditional methods are still working effectively for many businesses, with a certain level of complications and challenges. Perhaps, the effectiveness gradually decreases as the business landscape changes with technology disruption.  The modern approach (cloud data platform) has a unique pace and benefits that can accommodate many inconsistencies at a time. The significant differences between the traditional and modern data platforms are highlighted in the table below:

How can Cloud Solutions boost your Modern Data Platform (MDP) Journey?

The cloud data platform is easing the use of data and securing it for every business. It provides various components and services like databases, software capabilities, applications, etc., that are engineered to leverage the power of cloud resources to decipher business problems. What are the benefits that businesses can achieve using Modern Data Platforms?  There are numerous benefits of MDP. Quicker time to develop and storage cost are a couple of crucial factors. Let’s take a look at other factors that comes into consideration.


It allows users to rent virtual computers to run their applications. IaaS has infrastructure and hardware running in the cloud. PaaS has application platforms and databases in the cloud. Containers are isolated environments for running software, and serverless functions compute services to run code in response to events.


A database service built and accessible through the cloud platform. It hosts databases without having to buy dedicated hardware. It can be managed by the user or as a service operated by a service provider. It can support relational and NoSQL databases. The database can be accessed from- anywhere through a web interface or an API.


It allows users to store and access data over a network, typically the internet. The cloud data platform offers flexible, pay-as-you-go pricing. Scalable up/down in near real-time. It supports backup, disaster recovery, collaboration/file sharing, archiving, primary data storage, and near-line storage.


Clients seek services that isolate resources, protect internet-facing workloads, and encrypt data on the network. Other networking aspects of a cloud platform include load balancing, content delivery networks, and domain name services.

Comparison of Cloud Service Providers

The Modern Data Platform is a future-proof architecture for business analytics. It is a functional architecture with all the components to support Modern Data Warehousing, Machine Learning, AI development, real-time data ingesting and processing, etc., and to leverage this, businesses need Cloud service providers who could provide services ranging from complete application development platforms to servers, storage, and virtual desktops. Based on the requirements, these providers are shortlisted to maximize the benefit to the enterprise. The cloud data platform helps enterprises extract high-value results to serve their customers in a data-driven future.

Affine’s Analytics Engineering practices can help with this seamless and effective transformation. Are you ready to embark on your journey to true data-centricity? We are here to help. Setup a call now!

The Modern Data Platform itself has proved as an asset to enterprises. So, what kind of data strategy and roadmap is needed to execute cloud-based solutions?

Well, that is the story for the next episode of the Modern Data Platform Series.

Till then, keep reading:

Episode 1: What Is Modern Data Platform? Why Do Enterprises Need It? – Affine

Episode 2: What are legacy systems? How Can Modern Data Platforms Bring Revolutionary Change? – Affine

About The Author(s):

The Modern Data Platform blog series is a part of research efforts conducted by the AE-Practices, which exists solely for hyper-innovation in the Analytics Engineering space. 

Our AE Practices is a dedicated in-house team that caters to all R&D needs. The team is responsible for continuously working on research tools and technologies that are new, happening, futuristic, and cater to business problems. We build solutions that are both cost and performance effective. Affine has always been deeply invested in research to create innovative solutions. We believe in delivering excellence and innovation, driven by a dedicated team of researchers with industry experience and technology expertise. For more details, contact us today!

What are legacy systems? How Can Modern Data Platforms Bring Revolutionary Change?

Affine’s Analytics Engineering Practices is kicking off a new series on “All that you need to know about Modern Data Platform.” Read the second part of the series here. You can also read part one of the series here.

What are Legacy Systems? Does your business have one?

 Legacy systems comprise ETL systems, data warehouses, and other traditional software/hardware data architectures. Organizations retain legacy systems if it is expensive to transition to modern data platforms because of data migration or if the legacy system is critical to the business.

Why organizations need to adopt Modern Data Platform?

Investments in modern data platforms can transform business practices in the long term. The following are four fundamental reasons; why an organization should adopt powerful modern data platforms.

Modern Data Platforms – 4 Potential Reasons Why They Are Revolutionary?

1. Enhancing data discovery efforts

A robust modern data platform can synthesize different data types. It can parse through structured or unstructured data in the cloud or organize it according to user requirements.

Legacy systems are less effective in handling advanced data discovery and processing. They store data in isolated silos that neither the system nor a user can reconcile. This flaw makes legacy systems less efficient since the user will have to manually manage or organize the data with other tools and then process it. Only after they complete these steps can they obtain insights from the data.

On the other hand, a modern data platform will jump right to the last step to generate insights for business.

2. Promoting Data Democratization throughout the organization

The idea behind data democratization is to enable the business user a smoother way to access staged data so businesses can leverage data to transform the workflow by unleashing the value of information locked up in the data store. A modern data platform facilitates effective data democratization while making accessibility easier to empower users to obtain the relevant data points and insights independently and quickly.

A transparent process and platform to access data enable domain experts such as data scientists to skip logistical hoops and effortlessly home in on the data points they need. Thus, it might not be the case for legacy systems, which often have redundant interim steps, such as report request processes.

3. Prioritizing data safety and privacy

Modern data platforms are equipped with multiple layers of security to prevent data breaches. Most organizations follow the regulations such as CCPA, HIPAA, FCRA, FERPA, GLBA, ECPA, COPPA, and VPPA. These data protection laws provide governing frameworks for data usage, storage, and deletion, which are easier to process in modern data platforms to ensure data security and privacy.

Most businesses that use legacy systems find it difficult to implement the regulations required to meet security and privacy standards.

4. Ensuring self-service of data

A streamlined modern data platform smoothly enables self-service of data to your internal customers. It is well equipped to identify various data points and efficiently cater to internal customers’ requirements. Thus, a modern data platform reduces the complexity and allows users to access the data swiftly when they need it most. Legacy systems lack self-service, and all data requirements must be routed through IT and data teams.

How to overcome legacy systems confrontations, and where should you invest?

Legacy systems are in rapid decline. Organizations are choosing to store a significant chunk of their data on modern data platforms to optimize their data processes and decision-making. Modern data architectures need to keep up with the rapidly growing data-driven needs of businesses. As a result, modern data platforms have emerged as the most efficient solutions and promise to take the business world by storm.

However, before making a significant financial investment in the space of Modern Data Platforms, organizations must find a suitable, competent technology partner to partner with them on this journey. An expert like Affine’ s Analytics Engineering practices can make this transition seamless and effective. Are you ready to begin your journey to true data centricity? We are here to help. Schedule a call today!

This blog is the second episode that signifies “All You Need to Know About Modern Data Platforms.” In the next episode, we’ll compare traditional vs. cloud hosted data platforms to determine which would be better for your business.

What Is Modern Data Platform? Why Do Enterprises Need It?

Affine’ s Analytics Engineering Practices is kicking off a new series on “All You Need to Know About Modern Data Platforms.” Read the first episode of the series here.

What is a Modern Data Platform?

Legacy systems are outdated. Your business needs a modern data platform!

But first, what is a modern data platform? It is a future-proof modern data architecture focused on delivering high-quality business analytics for your ever-growing business undergoing transformation. It combines modern data warehousing, AI and ML, and real-time data ingesting and processing. Modern data platforms for enterprises are agile with workloads and yield rapid value from your data.

In short, modern data platforms deliver what organizations need to become data-driven and deliver value to their ever-changing customers. They help organizations become modern and future-proof. Modern data platforms process massive volumes of data and make data-led insights available at the click – of a button. They have a scalable storage system that ingests unprecedented volumes of data and meets the demands of the business in delivering customer insights and making efficient decisions. Statistics-driven analysis, efficient data processing, reliable prediction, and low-latency information delivery are other benchmarks for an efficient modern data platform.

Why do enterprises need modern data platforms?

Your customer is changing. They demand – always-on, always-connected service and settle for nothing less than speed, agility, and reliability. They want hyper-personalized experiences without compromising on their current SLAs.

Your business is changing as a result of the data-driven environment transforming businesses across industries. Organizations must embrace an insight-driven approach to meet the changing customer demands. It would be best to have a solution that gives you easy access to insights, especially one that provides swift ROI from technological and marketing interventions by breaking down data silos in your business.

Modern Data Platform accelerates the journey to data-centricity with the following features:

How to build your own or buy a Modern Data Platform?

It depends on your requirement and budget.

Needless to say, the cost of building an in-house modern data platform could be an expensive affair. In order to accelerate the data-centricity journey, businesses rely heavily on purchased platforms. For example, modern data platforms of the big three cloud providers equip their customers with a wide range of data analytics tools. It empowers them with the capability to analyze vast volumes of customer, business, and transactional data quickly, securely, and at a low cost. Companies must soon assess their data analytics capabilities and chart a course for transformation to a data-driven enterprise. Given the rapidly changing nature of technology and the marketplace, becoming more responsive to customers and market opportunities and greater agility is crucial.

Do you have a strategy for the Modern Data Platform?

Adopting a modern data analytics platform for enterprises will change everything—how your organization makes everyday business decisions.

Businesses need a cultural change to make the most of the modern data platform opportunity. They must reform and reinvent strategies and redesign operating models. Organizational structures need to be amended, roles need to be redefined, and resources need to be upskilled. Realigning existing data models for a better result and strategizing performance tracking to understand the core capabilities are also critical. Most importantly, it would be necessary to upskill your technical and business teams to make the most of the modern data platform and leverage it in everyday decision-making. 

Investing in a modern data platform for business is no mean task. You need a technology partner who understands the space holistically and can deliver quick time to value. A partner like Affine’s Analytics Engineering Practices. Are you ready to begin your journey to true data centricity? We are here to help. Schedule a call today!

This is the first in a series of blogs that depicts “All That You Need to Know About Modern Data Platform”. The next blog will outline Modern Data Platforms vs. Legacy Systems.

Statistical Model Lifecycle Management

Organizations have realized quantum jumps in business outcomes through the institutionalization of data-driven decision making. Predictive Analytics, powered by the robustness of statistical techniques, is one of the key tools leveraged by data scientists to gain insight into probabilistic future trends. Various mathematical models form the DNA of Predictive Analytics.

A typical model development process includes identifying factors/drivers, data hunting, cleaning and transformation, development, validation – business & statistical and finally productionisation. In the production phase, as actual data is included in the model environment, true accuracy of the model is measured. Quite often there are gaps (error) between predicted and actual numbers. Business teams have their own heuristic definitions and benchmark for this gap and any deviation leads to forage for additional features/variables, data sources and finally resulting in rebuilding the model.

Needless to say, this leads to delays in the business decision and have several cost implications.

Can this gap (error) be better defined, tracked and analyzed before declaring model failure? How can stakeholders assess the Lifecycle of any model with minimal analytics expertise?

At Affine, we have developed a robust and scalable framework which can address above questions. In the next section, we will highlight the analytical approach and present a business case where this was implemented in practice.


The solution was developed based on the concepts of Statistical Quality Control esp. Western Electric rules. These are decision rules for detecting “out-of-control” or non-random conditions using the principle of process control charts. Distributions of the observations relative to the control chart indicate whether the process in question should be investigated for anomalies.

X is the Mean error of the analytical model based on historical (model training) data. Outlier analysis needs to be performed to remove any exceptional behavior.
Zone A = Between Mean ± (2 x Std. Deviation) & Mean ± (3 x Std. Deviation)
Zone B = Between Mean ± Std. Deviation & Mean ± (2 x Std. Deviation)
Zone C = Between Mean & Mean ± Std. Deviation.
Alternatively, Zone A, B, and C can be customized based on the tolerance of Std. Deviation criterion and business needs.

1Any single data point falls outside the 3σ limit from the centerline (i.e., any point that falls outside Zone A, beyond either the upper or lower control limit)
2Two out of three consecutive points fall beyond the 2σ limit (in zone A or beyond), on the same side of the centerline
3Four out of five consecutive points fall beyond the 1σ limit (in zone B or beyond), on the same side of the centerline
4Eight consecutive points fall on the same side of the centerline (in zone C or beyond)

If any of the rules are satisfied, it indicates that the existing model needs to be re-calibrated.

Business Case

A large beverage company wanted to forecast industry level demand for a specific product segment in multiple sales geographies. Affine evaluated multiple analytical techniques and identified a champion model based on accuracy, robustness, and scalability. Since the final model was supposed to be owned by client internal teams, Affine enabled assessing lifecycle stage of a model through an automated process. A visualization tool was developed which included an alert system to help user proactively identify for any red flags. A detailed escalation mechanism was outlined to address any queries or red flags related to model performance or accuracies.

Fig1: The most recent data available is till Jun-16. An amber alert indicates that an anomaly is identified but this is most likely an exception case.

Following are possible scenarios based on actual data for Jul-16.

Case 1

Process in control and no change to model required.

Case 2:

A red alert is generated which indicates model is not able to capture some macro-level shift in the industry behavior.

Any single data point falls outside the 3σ limit from the centerline (i.e., any point that falls outside Zone A, beyond either the upper or lower control limit)

  1. Two out of three consecutive points fall beyond the 2σ limit (in zone A or beyond), on the same side of the centerline
  2. Four out of five consecutive points fall beyond the 1σ limit (in zone B or beyond), on the same side of the centerline
  3. Eight consecutive points fall on the same side of the centerline (in zone C or beyond)

If any of the rules are satisfied, it indicates that the existing model needs to be re-calibrated.

Key Impact and Takeaways

  1. Quantify and develop benchmarks for error limits.
  2. A continuous monitoring system to check if predictive model accuracies are within the desired limit.
  3. Prevent undesirable escalations thus rationalizing operational costs.
  4. Enabled through a visualization platform. Hence does not require strong analytical

Leveraging Advanced Analytics for Competitive Advantage Across FMCG Value Chain


According to World Bank, FMCG (Fast Moving Consumer Goods) market in India is expected to grow at a CAGR of 20.6% and is expected to reach US$ 103.7 billion by 2020 from US$ 49 billion in 2016. Some of the key changes that are fueling this growth are:

  • Industry Expansion – ITC Ltd has forayed into the frozen market with plans to launch frozen vegetables and fruits, aiming US$ 15.5 billion in revenues by 2030. Similarly, Patanjali Ayurveda is targeting a 10x growth by 2020, riding on the ‘ethnic’ recipes and winning consumer share of wallet.
  • Rural and semi-urban segments are growing at a rapid pace with FMCG accounting for 50% of total rural spending. There is an increasing demand for branded products in rural India. Rural FMCG market in India is expected to grow at a CAGR of 14.6%, and reach US$ 220 billion by 2025 from US$ 29.4 billion in 2016.
  • Logistics sector will see operational efficiencies with GST reforms. Historically, firms had installed hubs and transit points in multiple states to evade state value added tax (VAT). This is because the hub-to-hub transfer is treated as a stock transfer, and does not attract VAT. Firms can now focus on centralized hub operations, thus gaining efficiencies.
  • The rising share of the organized market in FMCG sector, coupled with the slow adoption of GST by wholesalers has led many FMCGs to explore alternative distribution channels such as direct distribution and cash-and-carry. Dabur, Marico, Britannia, and Godrej have already started making structural shifts in this direction.
  • Many leading FMCGs have started selling their brands through online grocery portals such as Grofers, Big Basket, and AaramShop. The trend is expected to increase with a strive towards cashless economy, and evolving payment mechanisms.
  • Traditional advertising mediums have seen a dip with the advent of YouTube, Netflix, and Hotstar.

Digital medium is being used more and more for branding and customer connect.

On top of this, barriers to new entrants in FMCG sector are eroding, owing to a wider consciousness of consumer needs, availability of finance and product innovations. This has raised the level of competition in the industry and generated a need to rethink the consumer offer, route to market, digital consumer engagement and premiumization.

In the above context, few buzzwords are circulating the FMCG corridors such as Analytics, Big data, Cloud, Predictive, Artificial Intelligence (AI) etc. They are being discussed in the light of preparing for the future – improved processes, innovations and transformations. Some disruptive use cases are:

  • With growing focus on direct distribution, AI becomes all more important to help sales personnel offer right trade promotions on the go.
  • With rising organized sector in urban segment, machine learning can improve the effectiveness of Go to marketing strategy by allowing customized shelf planning for various kinds of retailers.
  • AI can help recognize customer perceptions based on market research interviews, make predictions about their likes/dislikes, and design new targeted product offerings. For instance, a leading FMCG brand uses AI to recognize micro facial expressions in focus group research for a fragrance to predict whether the customer liked the product or not. On the same lines, Knorr is using AI to recommend recipes to consumers based on their favorite ingredients. Consumers can share this information via SMS with Knorr.
  • AI enabled vending machines can help personalize consumer experience. Coca Cola has come with an AI powered app for vending machines. The app will personalize the ordering experience to the user, and allow ordering of multiple drinks ahead of time. It will also customize in-app offers to keep people coming back to the vending machine.
  • With increasing adoption of the digital medium, Internet of things (IoT) enabled Smart manufacturing is creating havocs in the manufacturing function. IoT framework allows sensing of data from machine logs, controllers, sensors, equipment etc. in real-time. This data can be used to boost product quality compliance monitoring and predictive maintenance & scheduling.

Let’s go through some of the analytics use cases across FMCG functions that promise immediate value.

Function Wise Analytics Use Cases in FMCG

Go To Marketing

Go to Marketing plays a very important function in FMCG value chain by enabling products to reach the market. FMCG distribution models range from direct store delivery to retailer warehousing to third party distributor networks. Further complications arise due to the structure of the Indian market – core market vs. organized retail. Analytics can help optimize the GTM processes in multiple ways:

  • Network planning can help in minimizing logistics cost by optimizing fleet routes, number of retail outlets touched, order of contact and product mix on trucks in sync with each retailer’s demand.
  • Inventory orders can be optimized to reduce inventory pile-ups for slow moving products, and stock outs for faster moving products. SKU level demand forecasting followed by safety stock scenario simulations can help in capturing the impact of demand variability and lead time variability on stock outs.
  • Assortments intelligence promises a win-win situation for both FMCGs and retailers. Retailers increase margins by localizing assortments to local demand while FMCGs ensure a fluid movement of right products in right markets.
  • Smart Visi cooler allocations can help in increasing brand visibility and performance. Visi coolers come in different shapes and sizes. Traditionally, the sales personnel decide what kind of visi cooler to give to which retailer based on gut based judgment. Machine learning can be used to learn from retailer demand, performance and visibility data to make an optimal recommendation, thus improving brand visibility and performance.

Supply Chain & Operations

Analytics has percolated the supply value chain deeply. IoT is being popularly identified as the technology framework that will lend major disruptions with pooled data sources such as telemetry (fleet, machines, mobile), inventory and other supply chain process data. A couple of key applications are:

  • Vendor selection using risk scores based on contract, responsiveness, pricing, quantity and quality KPIs. Traditionally, this was done based on the qualitative assessment. But now, vendor risk modeling can be done to predict vendor risk scores, and high performance-low risk vendors can be selected from the contenders.
  • ‘Smart’ warehousing with IoT sensing frameworks. Traditionally, warehouses have functioned as a facility to only store inventory. But, IoT and AI have transformed it into a ‘Smart’ efficiency booster hub. For instance, AI can be used to automatically place the incoming batches on the right shelves such that picking them up for distribution consumes lesser resources and hence lesser cost.


Since FMCG sales structure is very personnel oriented, use cases such as incentive compensation, sales force sizing, territory alignment and trade promotion decisions continue to be very relevant.

  • Sales force sizing can be improved through a data-driven segmentation of retailer base followed by algorithmic estimation of sales effort required in a territory.
  • Trade promotion recommendations can be automated and personalized for a retailer based on historical performance and context. This will allow sales personnel to meet the retailer, key in some KPIs and recommend a personalized trade promotion in real-time.


Analytics has always been a cornerstone for enabling marketing decisions. It can help improve the accuracy and speed of these decisions.

  • Market mix modeling can be improved by simulating omnichannel spend attribution scenarios, thus optimizing overall marketing budget allocation and ROI.
  • Brand performance monitoring can be made intuitive by using rich web based visualizations that promise multi-platform consumption and quick decision making.
  • Sentiment analysis can help monitor the voice of customers on social media. Web based tools can provide a real-time platform to answer business queries such as what is important to customers, concerns/highlights, response to new launches & promotions etc.


With the advent of IoT, AI and Big data systems, use cases such as predictive maintenance have become more feasible. Traditionally, the manufacturer will need to wait for a failure scenario to occur a few times, and then learn from it, and predict the re-occurrence of that scenario. Companies are now focusing on sensing failures before they happen so that the threat of new failures can be minimized. This can result in immense cost savings through continued operations and quality control. The benefits can be further extended to:

  • Improved production forecasting systems using POS data (enabled by retail data sharing), sophisticated machine learning algorithms and external data sources such as weather, macroeconomics that can be either scraped or bought from third party data vendors.
  • Product design improvisations using attribute value modeling. Idea is to algorithmically learn what product attributes are most valued by consumers, and use the insights to design better products in future.

Promotions and Revenue Management

Consumer promotions are central for gaining short term sales lifts and influencing consumer perceptions. Analytics can help in designing and monitoring these promotions. Also, regional and national events can be monitored to calculate the promotional lifts, which could be used for designing better future promotional strategies.

  • Automated consumer promotion recommendations based on product price elasticity, consumer feedback from market research data, cannibalization scenarios and response to historical promotions.

A key step in adopting and institutionalizing analytics use cases is to assess where you are in the analytics maturity spectrum.

Analytics Maturity Assessment

A thorough analytics maturity assessment can help companies to understand their analytics positioning in the industry, and gain competitive advantages by enhancing analytical capabilities. Here are few high-level parameters to assess analytics maturity:

Now that we understand, where we are in the journey, let’s look at “How do we get there?”

Levers of Change

FMCGs need to adopt a multi-dimensional approach with respect to adapting to the changing trends. The dimensions could be:

  1. Thought Leadership: Companies need to invest considerable effort in developing a research and innovation ecosystem. We are talking about leapfrogging the traditional process improvement focus and getting on the innovation bandwagon. This requires hiring futurists, building think tanks inside the company, and creating an ‘’Edison mindset’’ (Progressive trial-error-learning mindset).
  2. Technology: Traditionally, companies have preferred the second-mover route when it comes to adoption of a newer technology. The rationale is that of risk avoidance and surety. With analytics enablement technologies such as Big data and cloud, this rationale falls to pieces. Primarily because, you are not the second-mover, but probably a double or triple digit move owing to widespread adoption across industries. Analytics enablement technologies have become a necessity for organizations.
  3. Learning from Others: Human beings are unique in having the ability to learn from the experience of others. This ability helps them not only correct their errors but also find new possibilities. Similarly, can FMCG learn from modern fast fashion retailers and revolutionize the speed to market? Can it learn from telecom and hyper-personalized offerings? Can it learn from banking and touch the consumer in multiple ways?

Harmonizing above levers along with relevant FMCG contextualization can lead to the desired transformation.

Measuring Impact: Top 4 Attribution and Incrementality Strategies

I believe you have gone through part 1 and understood what Attribution and Incrementality mean and why it is important to measure these metrics. Below, we will discuss some methods that are commonly used across the industry to achieve our goals.

Before we dive into the methods, let us understand the term Randomised Controlled Trials (RCT). And by the way, in common jargon, they are popularly known as A/B tests.

What are Randomized Controlled Trials (RCT)?

Simply put, it is an experiment that measures our hypothesis. Suppose we believe (hypothesis) that the new email creative (experiment) will perform (measure) better than the old email creative. Now, we will randomly split our audience into 2 groups. One of them, the control group, keeps receiving the old emails, and the other, the test group, keeps receiving the new email creative.

Now how do you quantify your measure? How do you understand your experiment is performing better? Choose any metric that you think should determine the success of the new email creatives. Say, Click-through Rate (CTR). Thus, if the test group has a better CTR than the control group, you can say that the new email creative is performing better than the old email creative.

Some popular methods to run experiments:

Method 1:

User-Level Analysis

Illustration for Incrementality

One of the simple ways to quantify incrementality would be to run an experiment as done in the diagram. Divide your sample into two random groups. Expose the groups to a different treatment; for example, one group receives a particular email/ad, and the other does not.

The difference in the groups reflects the true measurement of the treatment. This helps us quantify the impact of sending an email or showing a particular ad.

Method 2:

Pre/Post Analysis

This is an experiment that can be used to measure the effect of a certain event or action by taking a measurement before (pre) and after (post) the start of the experiment.

You can introduce a new email campaign during the test period and measure the impact over time of any metric of your interest by comparing it against the time when the new email campaign was not introduced.

Thus, by analyzing the difference in the impact of the metric you can estimate the effect of your experiment.

Things to keep in mind while performing pre/post analysis:

  • Keep the control period long enough to get significant data points
  • Keep in mind that there might be spillover in results during the test phase, so we should ensure that the impact of this spillover is not missed
  • Ensure that you keep enough time for the disruption period. It refers to the transient time just after you have launched the experiment
  • It is ideal to avoid peak seasons or other high volatility periods in the business for these experiments to yield conclusive results

Method 3:

Natural Experiment

It is similar to the A/B test, where you can observe the effect of a treatment (event, feature) on different samples but not having the ability to define/control the sample. So, it is similar to Randomised Controlled Trial, but you cannot control the environment of the experiment.

Suppose you want to understand the impact of a certain advertisement. If you do what we have explained above in Method 1, and create 2 groups, a control group that is not shown the particular advertisement and a test group that has been shown the ad and try to measure the impact of the advertisement, you might make a basic mistake. The groups may not be homogenous to start with. The behavior of the groups can be different from the start itself, so you are expected to see very different results and thus cannot be sure of the effectiveness of the ad.

We need to decrease the bias by attempting resampling and reweighting techniques. To tackle this, we can create Synthetic Control Groups(SCGs).

Find below an example with an illustration for the scenario:

We will create SCGs within the unexposed groups. We will try to understand which households (HHs) missed an ad, but based on their viewing habits are similar to those households (HHs) which have seen one.


James Bond: This Photo by Unknown Author is licensed under CC BY-NC;

Jurrasic park: This Photo by Unknown Author is licensed under CC BY-NC-ND;

Fast and Furious: This Photo by Unknown Author is licensed under CC BY-NC-ND;

Mr Bean: This Photo by Unknown Author is licensed under CC BY-NC

Another sub-method that is out of the scope for this blog is to attach a weight to every household based on their demographics attributes(gender, age, income, etc) using iterative proportional fitting and the comparison happens on the weighted results.

Method 4:

Geo Measurement

Geo measurement is a method that utilizes the ability to spend and/or market in one geographic area (hence “geo”) vs. another. A typical experiment consists of advertising in one geo (the “on” geo) and holding out another geo (the “off” geo), and then measuring the difference between them i.e., the incrementality caused by the treatment. One also needs to account for pre-test differences between on and off geographies either by normalizing these before evaluation or adjusting for this post-hoc analysis.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Hope this helps as a good starting point in understanding what attribution and incrementality are and how it is utilized in the industry.

DECIPHERING: How do Consumers Make Purchase Decisions?


Suppose you are looking for a product on a particular website. As soon as you commence on the journey of making; the first search for a product, fidgeting on the idea to either buy it or not, and finally purchasing it, you are targeted or tempted by various marketing strategies through; various channels to buy the product.

You may start seeing the ads for the particular product on social media websites, on the side of various web pages, receive promotional emails, etc. This entire experience through these different channels that you interact with; will be referred to as touchpoints.

Customer Touchpoint Journey | Source

So, to sum up, whenever you provide an interest/signal to a platform that you are going to purchase a certain product, you may interact with these touchpoints mentioned above.

The job of a marketing team of a particular company is to utilize the marketing budget in a way that they get the maximum return on the marketing spend, i.e. to ensure that you buy their product.

So to achieve this, the marketing team uses a technique called Attribution.

What is Attribution?

Attribution is also known as Multi-Touch Attribution. Moreover, it’s an identification that walks you through of a set of user actions/events/touchpoints that drive a certain outcome or result and the assignment of value to each of those events/touchpoints.

Why is Attribution Important?

The main aim of marketing attribution is to quantify the influence of various touchpoints on the desired outcome and optimize their contribution to the user journey to maximize the return on marketing spend.

How does Attribution Work?

Assume; you had shown an interest in buying sneakers on Amazon. You receive the email tempting you to make the purchase, and finally, after some deliberation, you click on it and make the purchase. In a simple scenario, the marketing team will attribute your purchase to this email, i.e. they will feel that the email channel is what caused the purchase. They will think that there is a causal relationship between the targeted email and the purchase decision.

Suppose this occurrence is replicated across tens of thousands of users. The marketing team feels that email has the best conversion when compared to other channels. They start allocating more budget to it. They spend money on aesthetic email creatives, hire better designers, send more emails as they feel email is the primary driver.

But, after a month, you notice that the conversion is reducing. People are not interacting with the email. Alas! The marketing team has wasted the budget on a channel that they thought was causing the purchases.

Where did the Marketing Team go Wrong?

Attribution models are not causal, signifying that they give the credit of a transaction to a channel that may not necessarily cause that transaction. So, it was not only the emails that were causing the transactions; but there might have been another important touchpoint/touchpoints that were actually driving the purchase.

Understanding Causal Inference

The main goal of the marketing team is to use the attribution model to infer causality, and as we have discussed, they are not necessarily doing so. We need Causal Inference to truly understand the process of cause and effect of our marketing channels. Causal Inference deals with the counterfactual; it is imaginative and retrospective. Causal inference will instead help us understand what would have happened in the absence of a marketing channel.

Ta-Da!! Enters Incrementality. (Incrementality waiting the entire time to make its entrance in the blog)

What is Incrementality?

Incrementality is the process of identifying an interaction that caused a customer to do a certain transaction.

In fact, it finds the interaction that, in its absence, a transaction would not have occurred. Therefore, incrementality is the art of finding causal relationships in the data.

It is tricky to quantify the inherent relationships among touchpoints, so I have dedicated part 2 to discuss various strategies that are used to measure incrementality and how a marketing team can better distribute its budget across marketing channels.

Hotel Recommendation Systems: What is it and how to effectively build one?

What is a Hotel Recommendation System?

A hotel recommendation system aims at suggesting properties/hotels to a user such that they would prefer the recommended property over others.

Why is a Hotel Recommendation System required?

In today’s data-driven world, it would be nearly impossible to follow the traditional heuristic approach to recommend millions of users an item that they would actually like and prefer.

Hence, a Recommendation System solves our problem where it incorporates user’s input, historical interaction, and sometimes even user’s demographics to build an intelligent model to provide recommendations.


In this blog, we will cover all the steps that are required to build a Hotel Recommendation System for the problem statement mentioned below. We will do an end-to-end implementation from data understanding, data pre-processing, and the algorithms used along with their PySpark codes.

Problem Statement: Build a recommendation system providing hotel recommendations to users for a particular location they have searched for on

What type of data are we looking for?

Building a recommendation system requires two sources of data, explicit and implicit signals.

Explicit data is the user’s direct input, like filters (4 star rated hotel or preference of pool in a hotel) that a user applies while searching for a hotel. Information such as age, gender, and demographics also comes under explicit signals.

Implicit data can be obtained by users’ past interactions, for example, the average star rating preferred by the user, the number of times a particular hotel type (romantic property) is booked by the user, etc.

What data are we going to work with?

We are going to work with the following:

  1. Explicit signals where a user provides preferences for what type of amenities they are looking for in a property
  2. Historical property bookings of the user
  3. Users’ current search results from where we may or may not get information regarding the hotel that a user is presently interested in

Additionally, we have the property information table (hotel_info table), which looks like the following:

hotel_info table

Note: We can create multiple property types (other than the above 4, Wi-Fi, couple, etc.) ingeniously covering the maximum number of properties in at least one of the property types. However, for simplicity, we will continue with these 4 property types.

Data Understanding and Preparation:

Consider that the searches data is in the following format:

user_search table

Understanding user_search table:

Information about a user (user ID), the location they are searching in (Location ID), their check-in and check-out dates, the preferences applied while making the search (Amenity Filters), the property specifically looked into while searching (Property ID), and whether they are about to book that property (Abandoned cart = ‘yes’ means that they are yet to make the booking and only the payment is left) can be extracted from the table

Clearly, we do not have all the information for the searches made by the user hence, we are going to split the users into 3 categories; namely, explicit users (users whose amenity filter column is not null), abandoned users (users whose abandoned cart column is ‘yes’), and finally, historical users (users for whom we have historical booking information)

Preparing the data:

For splitting the users into the 3 categories (explicit, abandoned, historical), we give preference in the following order, Abandoned users>Explicit users>historical users. This preferential order is because of the following reasons:

The abandoned cart gives us information regarding the product the user was just about to purchase. We can exploit this information to give recommendations similar to the product in the cart; since the abandoned product represents what a user prefers. Hence, giving abandoned users the highest priority.

An explicit signal is an input directly given by the user. The user directly tells his preference through the Amenities column. Hence, explicit users come next in the order.

Splitting the users can be done following the steps below:

Firstly, create a new column as user_type, under which each user will be designated with one of the types, namely, abandoned, explicit, or historical

Creating a user_type column can be done using the following logic:

df_user_searches =‘xyz…….’)

df_abandon = df_user_searches.withColumn(‘abandon_flag’,F.when(col(‘Abandon_cart’).like(‘yes’) & ‘Property_ID is not Null’,lit(1)).otherwise(lit(None))).filter(‘abandon_flag = 1’).withColumn(‘user_type’,lit(‘abandoned_users’)).drop(‘abandon_flag’)

df_explicit = df_user_searches.join(‘user_ID’),’user_ID’,’left_anti’).withColumn(‘expli_flag’,F.when(col(‘Amenity_Filters’).like(‘%Wifi Availibility%’)|col(‘Amenity_Filters’).like(‘%Nature Friendly%’)|col(‘Amenity_Filters’).like(‘%Budget Friendly%’)|col(‘Amenity_Filters’).like(‘%Couple Friendly%’),lit(1)).otherwise(lit(None))).filter(‘expli_flag = 1’).withColumn(‘user_type’,lit(‘explicit_users’)).drop(‘expli_flag’)

df_historical = df_user_searches.join(df_abandon.unionAll(df_explicit).select(‘user_ID’).distinct(),’user_ID’,’left_anti’).withColumn(‘user_type’,lit(‘historical_user’))

df_final = df_explicit.unionAll(df_abandon).unionAll(df_historical)

Now, the user_search table has the user_type as well. Additionally,

For explicit users, user_feature columns will look like this:

explicit_users_info table

For abandoned users, after joining the property id provided by the user with that in the hotel_info table, the output will resemble as follows:

abandoned_users_info table

For historical users, sum over the user and calculate the total number of times the user has booked a particular property type; the data will look like the following:

historical_users_info table

For U4 in the historical_users_info table, we have information that tells us that the user prefers an average star rating of 4, has booked WiFi property 5 times, and so on. Eventually, telling us the attribute preferences of the user….

Building the Recommendation System:

Data at hand:

We have users split and user’s preferences as user_features

We have the hotel attributes from the hotel_type table, assume that it contains the following values:

hotel_type table

We will use content-based-filtering in building our recommendation model. For each of the splits, we will use an algorithm that will give us the best result. To gain a better understanding of recommendation systems and content-based filtering, one can refer here.

Note: We have to give recommendations based on the location searched by the user. Hence, we will perform a left join on the key Location ID to get all the properties that are there in the location.

Building the system:

For Explicit users, we will proceed in the following way:

We have user attributes like wifi_flag, budget_flag, etc. Join this with the hotel_type table on the location ID key to get all the properties and their attributes

Performing Pearson correlation will give us a score([-1,1]) between the user and hotel features, eventually helping us to provide recommendation in that location

Code for explicit users:

explicit_users_info = explicit_users_info.drop(‘Property_ID’)

expli_dataset = explicit_users_info.join(hotel_type,[‘location_ID’],’left’).drop(‘star_rating’)

header_user_expli = [‘wifi_flag’,’couple_flag’,’budget_flag’,’nature_flag’]

header_hotel_features = [‘Wifi_Availibility’,’Couple_Friendly’,’Budget_Friendly’,’Nature_Friendly’]

assembler_features = VectorAssembler(inputCols= header_user_expli, outputCol=”user_features”)

assembler_features_2 = VectorAssembler(inputCols= header_hotel_features, outputCol=”hotel_features”)

tmp = [ assembler_features,assembler_features_2]

pipeline = Pipeline(stages=tmp)

baseData =

df_final = baseData

def pearson(a,b):

if (np.linalg.norm(a) * np.linalg.norm(b)) !=0:

a_avg, b_avg = np.average(a), np.average(b)

a_stdev, b_stdev = np.std(a), np.std(b)

n = len(a)

denominator = a_stdev * b_stdev * n

numerator = np.sum(np.multiply(a-a_avg, b-b_avg))

p_coef = numerator/denominator

return p_coef.tolist()

pearson_sim_udf = udf(pearson, FloatType())

pearson_final = df_final.withColumn(‘pear_correlation_res’, pearson_sim_udf(‘user_features’,’hotel_features’))


Our output will look like the following:

explicit users

For abandoned and historical users, we will proceed as follows:

Using the data created above, i.e., abandoned_users_info and historical_users_info tables, we obtain user preferences in the form of WiFi_Availibility or wifi_flag, star_rating or avg_star_rating, and so on

Join it with the hotel_type table on the location ID key to get all the hotels and their attributes

Perform Cosine Similarity to find the best hotel to recommend to the user in that particular location

Code for abandoned users:

abandoned_users_info = abandoned_users_info.drop(‘Property_ID’)\






abandoned_dataset = abandoned_users_info.join(hotel_type,[‘location_ID’],’left’)

header_user_aban = [‘a_Wifi_Availibility’,’a_Couple_Friendly’,’a_Budget_Friendly’,’a_Nature_Friendly’,’a_Star_Rating’]

header_hotel_features = [‘Wifi_Availibility’,’Couple_Friendly’,’Budget_Friendly’,’Nature_Friendly’,’Star_Rating’]

assembler_features = VectorAssembler(inputCols= header_user_aban, outputCol=”user_features”)

assembler_features_2 = VectorAssembler(inputCols= header_hotel_features, outputCol=”hotel_features”)

tmp = [ assembler_features,assembler_features_2]

pipeline = Pipeline(stages=tmp)

baseData =

df_final = baseData

def cos_sim(value,vec):

if (np.linalg.norm(value) * np.linalg.norm(vec)) !=0:

dot_value =, vec) / (np.linalg.norm(value)*np.linalg.norm(vec))

return dot_value.tolist()

cos_sim_udf = udf(cos_sim, FloatType())

abandon_final = df_final.withColumn(‘cosine_dis’, cos_sim_udf(‘user_features’,’hotel_features’))


Code for historical users:

historical_dataset = historical_users_info.join(hotel_type,[‘location_ID’],’left’)

header_user_hist = [‘wifi_flag’,’couple_flag’,’budget_flag’,’nature_flag’,’avg_star_rating’]

header_hotel_features = [‘Wifi_Availibility’,’Couple_Friendly’,’Budget_Friendly’,’Nature_Friendly’,’Star_Rating’]

assembler_features = VectorAssembler(inputCols= header_user_hist, outputCol=”user_features”)

assembler_features_2 = VectorAssembler(inputCols= header_hotel_features, outputCol=”hotel_features”)

tmp = [ assembler_features,assembler_features_2]

pipeline = Pipeline(stages=tmp)

baseData =

df_final = baseData

def cos_sim(value,vec):

if (np.linalg.norm(value) * np.linalg.norm(vec)) !=0:

dot_value =, vec) / (np.linalg.norm(value)*np.linalg.norm(vec))

return dot_value.tolist()

cos_sim_udf = udf(cos_sim, FloatType())

historical_final = df_final.withColumn(‘cosine_dis’, cos_sim_udf(‘user_features’,’hotel_features’))


Our output will look like the following:

historical users

abandoned users

Giving Recommendations:

Giving 3 recommendations per user, our final output will look like the following:


One can notice that we are not using hotel recommendation X for the abandoned user U1 as a first recommendation we are avoiding so as hotel features were created from the same property ID, hence, it will always be at rank 1

Unlike cosine similarity where 0’s are considered a negative preference, Pearson correlation does not penalize the user if no input is given; hence we use the latter for explicit users


In the end, the objective is to fully understand the problem statement, work around the data available, and provide recommendations with a nascent system.

Marketing Mix Modelling: What drives your ROI?

There was a time when we considered traditional marketing practices, and the successes or failures they yield, as an art form. With mysterious, untraceable results, marketing efforts lacked transparency and were widely regarded as being born out of the creative talents of star marketing professionals, but the dynamics switched, and regime of analytics came into power. It has evolved over the time and numerous methodologies have been discovered in this regard. Market mix model is one among those popular methods.

The key purpose of a Marketing Mix Model is to understand how various marketing activities are contributing together in driving the sales of any given product. Through MMM the effectiveness of each marketing input/channel can be assessed in terms of Return on Investment (ROI). In other words, a marketing input/channel with higher ROI is a more effective than others with a lower ROI. Such understanding facilitates effective marketing decisions with regards to spends allocation across channels.

Marketing Mix Modelling is a statistical technique of determining the effectiveness of marketing campaigns by breaking down aggregate data and differentiating between contributions from marketing tactics and promotional activities, and other uncontrollable drivers of success. It is used as a decision-making tool by brands to estimate the effectiveness of various marketing initiatives in increasing Return on Investment (ROI).

Whenever we change our methodologies, it is our human nature we would have various questions. Let’s deep dive into the MMM Modelling technique and address these questions in detail.

Question 1: How is the data collected? How much minimum data is required?

MMM Model requires a brand`s product data to collectively capture the impact of key drivers such as marketing spends, price factor, discounts, social media presence/sentiment of the product, event information etc. In any analytical method, more the data, better is the implementation of the modelling technique and more robust the results will be. Hence, these methods are highly driven by the quantum of data available to develop the model over.

Question 2: What level of data granularity is required/best for MMM?

A best practice for any analytical methodology and to generate valuable insights is to have as granular data as possible. For example, a Point-of-Sale data at Customer-Transaction-Item level will yield recommendations with highly focused marketing strategy at similar granularity. However, if needed, the data can always be rolled up at any aggregated level suitable for the business requirement.

Question 3: Which sales drivers are included in the marketing mix model?

In order to develop a robust and stable Market Mix Model, various sales drivers such as Price, Distribution, Seasonality, Macroeconomic variables, Brand Affinity etc. play a pivotal role in understanding the consumer behaviour towards product. Even more important are the features that capture the impact of marketing efforts for the product. Such features provide an insight into how consumers react to the respective marketing efforts or the impact of these efforts on the product.

Question 4: How do you ensure the accuracy of the data inputs?

Ensuring accuracy of data inputs is very subjective with respect to business. On many occasions direct imputation is not very helpful and would skew the results. Further sanity check and statistical testing like distribution of each feature set can be measured.

MMM Components –

In Market Mix Modelling sales are divided into 2 components:

Base Sales:

Base Sales is what marketers get if they do not do any advertisement. It is sales due to brand equity built over the years. Base Sales are usually fixed unless there is some change in economic or environmental factors.

Base Drivers:
  1. Price: The price of a product is a significant base driver of sales as price determines both the consumer segment that a product is targeted toward and the promotions which are implemented to market the product to the chosen audience.
  2. Distribution: The number of store locations, their respective inventories, and the shelf life of that stock are all considered as base drivers of the sales. Store locations and the inventory are static and can be unwittingly understood by customers without any marketing intervention.
  3. Seasonality: Seasonality refers to variations that occur in a periodic manner. Seasonal opportunities are enormous, and often they are the most commercially critical times of the year. For example, major share of electronics sales is around the holiday season.
  4. Macro-Economic Variables: Macro-economic factors greatly influence businesses and hence, their marketing strategies. Understanding of macro factors like GDP, unemployment rate, purchase power, growth rate, inflation and consumer sentiment is very critical as these factors are not under the control of businesses yet substantially impact them.

Incremental Sales:

This is sales generated by marketing activities like TV advertisement, print advertisement, and digital spends, promotions etc. Total incremental sales are split into sales from each input to calculate contribution to total sales.

Incremental Drivers:
  1. Media Ads: Promotional media ads form the core of MMM which penetrates the market and competitor deeply and create awareness about product key feature & other aspects. Numerous media channel available such as TV, print ads, digital ads, social media, direct mail marketing campaigns, in-store marketing etc.
  2. Product Launches: Marketers invest carefully to position the new product into the market and plan marketing strategies to support the new launch.
  3. Events & Conferences: Brands need to look for opportunities to build relationships with prospective customers and promote their product through periodic events and conferences.
  4. Behavioural Metrics: Variables like touch points, online behaviour metrics and repurchase rate provide deeper insights into customers for businesses.
  5. Social Metrics: Brand reach or recognition on social platforms like Twitter, Facebook, YouTube, blogs, and forums can be measured through indicative metrics like followers, page views, comments, views, subscriptions, and other social media data. Other social media data like the types of conversations and trends happening in your industry can be gathered through social listening.

Ad-stock Theory –

Ad-stock, or goodwill, is the cumulative value of a brand’s advertising at a given point in time. For example, if any company is advertising its product over 10 weeks, then for any given week t spending value would be X + Past Week Fractional Amount.

Ad-stock theory states that advertising is not immediate and has diminishing returns, meaning that its influential power decreases over time, even if more money is allocated to it. Therefore, time regression analysis will help marketers to understand the potential timeline for advertising effectiveness and how to optimize the marketing mix to compensate for these factors

  1. Diminishing Returns: The underlying principle for TV advertisement is that the exposure to TV ads create awareness to a certain extent in the customers’ minds. Beyond that, the impact of exposure to ads starts diminishing over time. Each incremental amount of GRP (stand for “Gross Rating Point” which measures impact of Advertisement) would have a lower effect on Sales or awareness. So, the incremental sales generated from incremental GRP start to diminish and saturate eventually. This effect can be seen in the above graph, where the relationship between TV GRP and sales in non-linear. This type of relationship is captured by taking exponential or log of GRP.
  2. Carry over effect or Decay Effect: The impact of past advertisement on present sales is known as Carry over effect. A small component termed as lambda is multiplied with the past month GRP value. This component is also known as Decay effect as the impact of previous months’ advertisement decays over time.

Implementation details:

The most common marketing mix modelling regression techniques used are:

  1. Linear regression
  2. Multiplicative regression

1. Linear Regression Model:

Linear regression can be applied when the DV is continuous and the relationship between the DV and IDVs is assumed to be linear. The relationship can be defined using the equation:

Here ‘Sales’ is the dependent variable to be estimated, X are the independent variables and ε is the error term. βi’s are the regression coefficients. The difference between the observed outcome Sales and the predicted outcome sales is known as a prediction error. Regression analysis is mainly used for Causal analysis, Forecasting the impact of a change, Forecasting trends etc. However, this method does not perform well on large amounts of data as it is sensitive to outliers, multicollinearity, and cross-correlation.

2. Multiplicative Regression Models-

Additive models imply a constant absolute effect of each additional unit of explanatory variables. They are suitable only if businesses occur in more stable environments and are not affected by interaction among explanatory variables. But in scenarios such as when pricing is zero, the sales (DV) will become infinite.

To overcome the limitations inherent in linear models, multiplicative models are often preferred. These models offer a more realistic representation of reality than additive linear models do. In these models, IDVs are multiplied together instead of added.

There are two kinds of multiplicative models:

Semi-Logarithmic Models-

In Log-Linear models, the exponents of independent variables are multiplied.

Logarithmic transformation of the target variable linearizes the model form, which in turn can be estimated as an additive model. The dependent variable is logarithmic transformed, the only difference between additive model and semi-logarithmic model

Some of the benefits of Log-Linear models are:

  1. The coefficients β can be interpreted as % change in business outcome (sales) to unit change in the independent variables.
  2. Each independent variable in the model works on top of what has been already achieved by other drivers.
Logarithmic Models-

In Log-Log models, independent variables are also subjected to logarithmic transformation in addition to the target variable.

In the case of non-linear regression models, the above defined elasticity formula needs to be tweaked according to the equation. Refer the table below.

Statistical significance –

Once the model has been generated, it should be checked for validity and prediction quality. Based on the nature of the problem, various model stats are used for evaluation purposes. The following are the most common statistical measures in marketing mix modelling

  1. R-squared – R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination. R-squared is always between 0 and 100%; 0% indicates that the model explains none of the variability of the response data around its mean.100% indicates that the model explains all the variability of the response data around its mean. General formula for R-squared is: R2 = 1 – Where SSE = Sum of squared errors and SST = Total sum of square
  2. Adjusted R Squared: The adjusted R-squared is a refined version of R-squared that has been penalized for the number of predictors in the model. It increases only if the new predictor improves the model. The adjusted R-squared can be used to compare the explanatory power of regression models that contain different numbers of predictors.
  3. Coefficient: Regression coefficients are estimates of the unknown population parameters and describe the relationship between a predictor variable and the response. In linear regression, coefficients are the values that multiply the predictor values. The sign of each coefficient indicates the direction of the relationship between a predictor variable and the response variable. A positive sign indicates that as the predictor variable increases, the response variable also increases. A negative sign indicates that as the predictor variable increases, the response variable decreases
  4. Variable Inflation Factor: A variance inflation factor (VIF) detects multicollinearity in regression analysis. Multicollinearity is when there is a correlation between predictors (i.e. independent variables) in a model. The VIF estimates how much the variance of a regression coefficient is inflated due to multicollinearity in the model. Every variable in the model would be regressed against all the other available variables to calculate the VIF. VIF is usually calculated as Where Ri2 is R-squared value obtained by regressing “i”, the predictor variable against all other variables.
  5. Mean Absolute Error (MAE): MAE measures the average magnitude of the errors in a set of predictions. It is the average over the absolute differences between prediction and actual observation where all individual differences have equal weight Where Yt is the actual value at time ‘t’ and ŷt is the predicted value at time ‘t’
  6. Mean Absolute Percentage Error (MAPE): MAPE is the average absolute percent error for each observation or predicted values minus actuals divided by actuals Where yt is the actual value at time ‘t’ and ŷt is the predicted value at time ‘t’

MMM Output –

Marketing Mix Model outputs provide contribution of each marketing vehicle/channel, which along with marketing spends, provide marketing ROIs. It also captures time decay and diminishing returns on different media vehicles, as well as the effects of other non-marketing factors discussed above and other interactions like the halo effect and cannibalization. The model output provides all the necessary components and parameters required to arrive at the best media mix under any condition

Expected Benefit & Limitation –

Benefits of Marketing Mix Modelling –

  • Enables marketers to prove the ROI of their efforts across marketing channels
  • Returns insights that allow for effective budget allocation
  • Facilitates superior sales trend forecasting

Limitations of Marketing Mix Modelling –

  • Lacks the convenience of real-time modern data analytics
  • Critics argue that modern attribution methods are more effective as they consider 1 to 1, individual data
  • Marketing Mix Modelling does not analyze customer experience (CX)

Application/Scope for Optimization, Extension of MMM Model

1. Scope for Optimization

Marketing optimization is the process of improving marketing efforts to maximize desired business outcomes. Since the nature of MMM are mostly non-linear, non-linear constrained algorithms are used for optimization. Some of the use cases for marketing mix optimization are:

To improve current sales level by x%, what is the level of spends required in different marketing channels? E.g., To increase sales by 10%, how much to invest in TV ads or discounts or sales promotions?

What happens to the outcome metric (sales, revenue, etc.), if the current level of spends is increased by x%? E.g., On spending additional $20M on TV, how much more sales can be obtained? Where are these additional spends to be distributed?

2. Halo and Cannibalization Impact

Halo effect is a term for a consumer’s favouritism towards a product from a brand because of positive experiences they have had with other products from the same brand. Halo effect can be the measure of a brand’s strength and brand loyalty. For example, consumers favour Apple iPad tablets based on the positive experience they had with Apple iPhones.

Cannibalization effect refers to the negative impact on a product from a brand because of the performance of other products from the same brand. This mostly occurs in cases when brands have multiple products in similar categories. For example, a consumer’s favouritism towards iPads can cannibalize MacBook sales.

In Marketing Mix Models, base variables, or incremental variables of other products of the same brand are tested to understand the halo or cannibalizing impact on the business outcome of the product under consideration.


Marketing mix modelling techniques can minimize much of the risk associated with new product launches or expansions. Developing a comprehensive marketing mix model can be the key to sustainable long-term growth for a company. It will become a key driver for business strategy and can improve the profitability of a

company’s marketing initiatives. While some companies develop models through their in-house marketing and analytics departments, many choose to collaborate with an external company to develop the most efficient model for their business.

Developers of marketing mix models need to have a complete understanding of the marketing environment they operate within and of the latest advanced market research techniques. Only through this will they be able to fully comprehend the complexities of the numerous marketing variables that need to be accounted for and calculated in a marketing mix model. While numerical and statistical expertise is undoubtedly crucial, an insightful understanding of market research and market environments is just as important to develop a holistic and accurate marketing mix model. With these techniques, you can get started on developing a watertight marketing mix model that can maximise performance and sales of a new product.


The Emergence of Augmented Analytics & How it is Helping to Create a Data-Driven Organization?

In the last few years, the data flowing from various sources has changed the way organizations work and solve their business problem. Identifying potential data points, collecting data, and storing it in a highly secured place has been a need of the hour for many big companies across the industries. In this regard, Big data analytics practices are gaining more popularity; followed by rapid adoption of AI/ML technologies (RL, DeepRL, NLP, etc) in the workflow. Essentially these technological advancements are helping organizations to capture, store, and analyze data to convert them into valuable insights and solve business problems.

On the other hand, you would need to understand the current scenario of dealing with data, ensure the security and compliance factors, and select the right tools that suffice your data analytics prerequisites. But the challenge is how you identify the solution to your changing data needs? And what does all this have to do with augmented analytics? In this blog, we will discuss the Terminology of augmented analytics, What does it offers to businesses, Global market projection, and many other touchpoints that hold the answers for these questions and help you navigate towards creating a data-driven organization.

Let’s start with a big picture!

The blend of different AI capabilities such as ML, NLP, and Computer vision (CV) with few other advanced technologies like AR/VR are boosting the augmented analytics practices, especially in extracting valuable data insights. The graph of the pervasiveness of AI Vs. Instant/near real-time results shown in the above image is mere proof in this context. Thus, Augmented analytics brings all necessary ingredients with it to help organizations conducting more efficient and effective data analytics activities across workflow and create a hassle-free road map to be a data-driven organization; and form Citizen data scientists to solve new business problems with ease and unique capabilities.

Terminology of Augmented Analytics

Gartner– Augmented analytics uses machine learning to automate data preparation, insight discovery, data science, and machine learning model development and insight sharing for a broad range of business users, operational workers, and citizen data scientists.

In other words, it’s a paradigm shift that brings all necessary components and features to be a key driver of modern analytics platforms that program and integrate processes such as data preparation, creating models around data clusters, developing insights, and data cleansing to assist business operations and so forth.

What does it offer?

  • Improved relevance and business insights: Helps to identify false or less relevant insights, minimizes the risk of missing imperative insights in data, navigates to actionable insights to users, and empowers decision making abilities and actions.
  • Faster & near-perfect insights: Greatly reduces the time spent in data discovery and exploration, provides near-perfect data insights to business users, and help them augment the data analysis with AI/ML algorithms.
  • Insights made available everywhere & anywhere: The flexibility and compatibility of augmented analytics expand the data reach across the workflow, beyond citizen data scientists, and operational teams who can leverage the insights with less efforts.
  • Enable less dependency on skill constraints: You don’t need to rely more on data scientists anymore. With the help of advanced AI/ML algorithms; augmented analytics fills the required skill constraints helping organizations to do more with technology than humans’ intervention in data analytics and management process.

The augmented analytics market is broadly classified into deployment, function, component, industrial vertical, and organization size. Later, the deployment category is further divided into the cloud and on-premises. Also, in terms of process and function, the market is segmented into operation, IT, finance, sales & marketing, and others.

Traditional BI Vs. Augmented Analytics

Traditional BI

In the traditional Business Intelligence process, databases were analyzed to generate basic reports. The analysis was executed by a dedicated team of data analysts and access to the reports produced by these professionals

was limited to certain teams. In a way, the regular business users were unable to use this facilitation due to complexity and security constraints. Hence, they were unable to make data-driven decisions.

In the latter days, the level of complexity was reduced with help of technological advancement. However, the manual data collection from data sources remained the same, where the data analysts clean up the data, select the data sources they want to analyze, transfer it to the platform for analysis, generate reports/insights, and share it across the workflow through emails, messages, or within the platform as shown in the above image.

Augmented Analytics

In Augmented analytics, AI technology usage reduces the manual process of data collection and enhances the data transfer and reception across the different sources. Once the data is made available from respective sources, the AI/ML-powered smart systems help users to select the suitable datasets based on the relationships it has identified while bringing the data in for analysis. During the time of data analysis process, AI systems will allow user influence in the process, also suggests different analysis combination that human intervention would take loads of time to produce the same. Once the insights are generated, business users can leverage these insights across the workflow through in-app messaging, mobile apps, chatbots, AI assistants, and more.

Hence throughout the augmented analytics practice, AI empowers the data analytics process by simplifying the insight discovery activity and provides the noteworthy trends and details without a specific user query.

With Augmented Analytics in place businesses can:

  • Perform hassle-free data analysis to meet the business objectives
  • Improve the ability to identify the root cause of data analysis challenges and problems
  • Unearth hidden growth opportunities without investing additional efforts
  • Democratize enterprise-wide insights in a BI perspective to enhance the business performance
  • Opportunities to turn actionable data insights into business outcomes

Summing Up

The world is changing into a data world, and the data is now shaping up beyond big data. Countless devices are connected to each other and producing new data sets every passing day and minute. These data sets are processed and stored in a more complex form to create insightful information; hence businesses need to invest and stat using robust analytical systems and AI assistance to make sense for their efforts in the data analytics journey. On the other hand, the need to democratize the analytics and upsurge productivity; businesses need to innovate and change their legacy approaches. Augmented analytics is proving one such opportunity to uplift the existing and new business objectives to stay ahead in the race. Invest wisely and make the best use of Augmented Analytics to create a data-driven organization, ensure assured success.

Copyright © 2024 Affine All Rights Reserved

Manas Agrawal

CEO & Co-Founder

Add Your Heading Text Here

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.