The transformer revolution in video recognition. Are you ready for it?

Imagine how many lives could be saved if caretakers or medical professionals could be alerted when an unmonitored patient showed the first signs of sickness. Imagine how much more secure our public spaces could be if police or security personnel could be alerted upon suspicious behavior. Imagine how many tournaments could be won if activity recognition could inform teams and coaches of flaws in athletes’ form and functioning.

With Human Activity Recognition (HAR), all these scenarios can be effectively tackled. HAR has been one of the most complex challenges in the domain of computer vision. It has a wide variety of potential applications such as – sports, post-injury rehabilitation, analytics, security surveillance traffic monitoring, etc. But the complexity of HAR arises from the fact that an action spans both spatial and temporal dimensions. In normal computer vision tasks, where a model is trained to classify/detect objects located in a picture, only spatial dimension is involved. However, in HAR, learning multiple frames together over the time helps in classifying the action better. Hence, the model must be able to track both the spatial and temporal components.

The architecture used for video activity recognition includes 2D convolutions, 3D CNN volume filters to capture spatio-temporal information (Tran et al., 2015), 3D convolution factorized into separate spatial and temporal convolutions (Tran et al., 2018), LSTM for spatio-temporal information (Karpathy et al., 2014), as well as the combination/enhancements of these techniques.

TimeSformer – A revolution by Facebook!

Transformers have been making waves in Natural Language Processing for the last few years. They employ self-attention with encoder-decoder architecture to make accurate predictions by extracting information about context. In the domain of computer vision, the first implementation of transformers came through ViT (visual transformers) developed by Google. In ViT, a picture is divided into patches of size 16×16 (see Figure 1) and then flattened to 1D vectors. Then they are embedded and passed through an encoder. The self-attention is calculated with respect to all the other patches.

\vision-tranformer-gif

Figure 1. The Vision Transformer treats an input image as a sequence of patches, akin to a series of word embeddings generated by an NLP Transformer. (Source: Google AI Blog: Transformers for Image Recognition at Scale (googleblog.com))

Recently Facebook developed TimeSformer, the first instance in which transformers are used for HAR. In the case of TimeSformer, as in the case of other HAR methods, the input is a block of continuous frames from the video clip, for example, 16 continuous frames of size 3x224x224. To calculate the self-attention for a patch in a frame, two sets of other patches are used:

 a) other patches of the same frame (spatial attention).

 b) patches of the adjacent frames (temporal attention).

There are several and different ways to use these patches. We have utilized only the “divided space-time attention” (Figure 2) for this purpose. It uses all the patches of the current frame and patches at the same position of the adjacent frames. In “divided attention”, temporal attention and spatial attention are separately applied within each block, and it leads to the best video classification accuracy (Bertasius et al., 2021).

Figure 2. Divided space-time attention in a block of frames (Link: TimeSformer: A new architecture for video understanding (facebook.com))

It must be noted that TimeSformer does not use any convolutions which brings down the computational cost significantly. Convolution is a linear operator; the neighboring pixels are used by the kernel in computations. Vision transformers (Dosovitskiy et al., 2020), on the other hand, are permutation invariant and require sequences of data. So, for the transformer’s input, spatial non-sequential data is converted into a sequence. Learnable positional embeddings are added per patch (analogously taken in an NLP task) to allow the model to learn the structure of the image.

TimeSformer is roughly three times faster to train than 3DCNNs, requires less than one-tenth the amount of compute for inference, and has 121,266,442 trainable parameters compared to only 40,416,074 trainable parameters in the 2DCNN model and 78,042,250 parameters in 3DCNN models.

In addition to this, TimeSformer has the advantages of customizability and scalability over convolutional models – the user can choose the size and depth of the frames which is used as input to the model. The original study has utilized images as big as 556×556 and as deep as 96 frames and yet could not scale exponentially. 

But challenges abound…

Following are some challenges while tracking motion:

  1. The challenge of angles: A video can be shot from multiple angles; the pattern of motion could appear different in different angles.
  2. The challenge of camera movement: Depending on whether the camera moves with the motion of the object or not, the object could appear to be static or moving. Shaky cameras add further complexity.
  3. The challenge of occlusion: During motion, the object could be hidden by another object temporarily in the foreground.
  4. The challenge of delineation: Sometimes it is not easy to differentiate where one action ends and the other begins.
  5. The challenge of multiple actions: Different objects in a video could be performing different actions, adding complexity to recognition.
  6. The challenge of change in relative size: Depending on whether an object is moving towards or away from the camera, its relative size could change continuously adding further complexity to recognition.

Dataset

In order to determine video recognition capability, we have used a modified UCF11 data set.

We have removed the Swing class as it has many mislabeled data. The goal is to recognize 10 activities – basketball, biking, diving, golf swing, horse riding, soccer juggling, tennis swing, trampoline jumping, volleyball spiking, and walking. Each has 120-200 videos of different lengths ranging from 1-21 seconds. The link to the dataset is (CRCV | Center for Research in Computer Vision at the University of Central Florida (ucf.edu)).

How we trained the model

We carried out different experiments to get the best-performing model. Our input block contained 8 frames of size 3x224x224. The base learning rate was 0.005 which was reduced by a factor of 0.1 in the 11th and 14th steps. For augmentation, color jitter (within 40), random horizontal flip and random crop (from 256×320 to 224×224) were allowed. We trained the model on Amazon AWS Tesla M60 GPU with a batch size of 2 (due to memory limitations) for 15 epochs.

Metrics are everything

In the original code, Timesformer samples one input block per clip for training and validation. In the case of test videos, it takes 3 different crops and averages over the predictions (we term this samplewise accuracy). As a result, several of the models we trained could achieve over 95% validation accuracy. However, in our humble opinion, this is not satisfactory because it does not examine all the different spatio-temporal possibilities in the video. To address that, we take two other metrics into consideration.

  • Blockwise accuracy – a video clip is considered an object obtained by combining continuous building blocks (no overlap). The model makes predictions for all the input blocks, and this accuracy of prediction is measured. This is more suitable for real-time scenarios.
  • Clipwise accuracy – prediction of all the blocks of a video is considered and the mode is assigned to be the prediction for the clip, and that accuracy is measured. This also helps to understand real-time accuracy in a larger timeframe.

The final outcome

Our best model had the following performance metric values:

  • Samplewise accuracy – 97.3%
  • Blockwise accuracy – 86.8%
  • Clipwise accuracy – 92.2%

The confusion matrix for the clipwise accuracy is given in Figure 3(a). For comparison, the confusion matrix for 2dCNN and 3DCNN models are shown in Figure 3(b) and 3(c) respectively.

These metrics are quite impressive and far better than the results we obtained using 2D convolution (VGG16) as well as 3D convolution (C3D); 81.3% and 74.6% respectively. This points to the potential of TimeSformers.

Figure 3(a). Clipwise accuracy confusion matrix for TimeSformer

Concluding Remarks

In this work, we have explored the effectiveness of the TimeSformer for Human Activity Recognition task on the modified UCF11 dataset. This non-convolution model has outperformed the 2DCNN and 3DCNN models and performed extremely well on some hard to classify classes such as ‘basketball’ and ‘walking’. Future work includes trying more augmentation techniques to fine-tune this model and using Vision-based Transformers in other video-related tasks such as video captioning and action localization.

When done right, TimeSformer can truly change the game for Human Activity Recognition. Its use cases across healthcare and sports, safety and security can truly come to life with TimeSformer.

References

[1] Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 4489-4497).

[2] Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459).

[3] Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014). Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1725-1732).

[4] Bertasius, G., Wang, H., & Torresani, L. (2021). Is Space-Time Attention All You Need for Video Understanding?. arXiv preprint arXiv:2102.05095.

[5] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., & Houlsby, N. (2020). An image is worth 16×16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

DECIPHERING: How do Consumers Make Purchase Decisions?

Background

Suppose you are looking for a product on a particular website. As soon as you commence on the journey of making; the first search for a product, fidgeting on the idea to either buy it or not, and finally purchasing it, you are targeted or tempted by various marketing strategies through; various channels to buy the product.

You may start seeing the ads for the particular product on social media websites, on the side of various web pages, receive promotional emails, etc. This entire experience through these different channels that you interact with; will be referred to as touchpoints.

Customer Touchpoint Journey | Source

So, to sum up, whenever you provide an interest/signal to a platform that you are going to purchase a certain product, you may interact with these touchpoints mentioned above.

The job of a marketing team of a particular company is to utilize the marketing budget in a way that they get the maximum return on the marketing spend, i.e. to ensure that you buy their product.

So to achieve this, the marketing team uses a technique called Attribution.

What is Attribution?

Attribution is also known as Multi-Touch Attribution. Moreover, it’s an identification that walks you through of a set of user actions/events/touchpoints that drive a certain outcome or result and the assignment of value to each of those events/touchpoints.

Why is Attribution Important?

The main aim of marketing attribution is to quantify the influence of various touchpoints on the desired outcome and optimize their contribution to the user journey to maximize the return on marketing spend.

How does Attribution Work?

Assume; you had shown an interest in buying sneakers on Amazon. You receive the email tempting you to make the purchase, and finally, after some deliberation, you click on it and make the purchase. In a simple scenario, the marketing team will attribute your purchase to this email, i.e. they will feel that the email channel is what caused the purchase. They will think that there is a causal relationship between the targeted email and the purchase decision.

Suppose this occurrence is replicated across tens of thousands of users. The marketing team feels that email has the best conversion when compared to other channels. They start allocating more budget to it. They spend money on aesthetic email creatives, hire better designers, send more emails as they feel email is the primary driver.

But, after a month, you notice that the conversion is reducing. People are not interacting with the email. Alas! The marketing team has wasted the budget on a channel that they thought was causing the purchases.

Where did the Marketing Team go Wrong?

Attribution models are not causal, signifying that they give the credit of a transaction to a channel that may not necessarily cause that transaction. So, it was not only the emails that were causing the transactions; but there might have been another important touchpoint/touchpoints that were actually driving the purchase.

Understanding Causal Inference

The main goal of the marketing team is to use the attribution model to infer causality, and as we have discussed, they are not necessarily doing so. We need Causal Inference to truly understand the process of cause and effect of our marketing channels. Causal Inference deals with the counterfactual; it is imaginative and retrospective. Causal inference will instead help us understand what would have happened in the absence of a marketing channel.

Ta-Da!! Enters Incrementality. (Incrementality waiting the entire time to make its entrance in the blog)

What is Incrementality?

Incrementality is the process of identifying an interaction that caused a customer to do a certain transaction.

In fact, it finds the interaction that, in its absence, a transaction would not have occurred. Therefore, incrementality is the art of finding causal relationships in the data.

It is tricky to quantify the inherent relationships among touchpoints, so I have dedicated part 2 to discuss various strategies that are used to measure incrementality and how a marketing team can better distribute its budget across marketing channels.

In-Store Traffic Analytics: Retail Sensing with Intelligent Object Detection

1. What is Store Traffic Analytics?

In-store traffic analytics allows data-driven retailers to collect meaningful insights about customer’s behavioral data.

The retail industry receives millions of visitors every year. Along with fulfilling the primary objective of a store, it is can also extract valuable insights from this constant stream of traffic.

The footfall data, or the count of people in a store, creates an alternate source of value for retailers. One can collect traffic data and analyze key metrics to understand what drives the sales of their product, customer behavior, preferences, and related information.

2. How does it help store potential?

Customer Purchase Experience

Store Traffic Analytics helps provide insights and in-depth knowledge of customer shopping and purchasing habits, their in-store journey, etc., by capturing key data points such as the footfall at different periods, the preferred product categories identifying traffic intensity across departments, among others. Retailers can leverage such analytics to strategize and target their customers such that it enhances customer experience and drive sales.

Customer Dwell Time Analysis

Dwell time is the length of time a person spends looking at the display or remains in a specific area. It grants an understanding of what in a store holds customer attention and helps in optimizing store layouts and product placements for higher sales.

Demographics Analysis

Demographic analysis separates store visitors into categories based on their age and gender, aiding in optimizing product listing. For instance, a footwear store footfall analysis shows that the prevalent customers are young men between the age group of 18-25. The information helps the store manager list products that appeal to this demographic group, ensuring better conversion rates.

Human Resource Scheduling

With the help of store traffic data, workforce productivity can also be enhanced by effective management of staff schedules according to peak shopping times to meet demands and provide a better customer experience, directly impacting operational costs.

3. Customer Footfall Data

The first step for Store Traffic Analytics is to have a mechanism to capture customer footfall data. Methods to count people entering the store (People Counting) have been evolving rapidly. Some of them are as follows –

  • Manual tracking
  • Mechanical counters
  • Pressure mats
  • Infrared beams
  • Thermal counters
  • Wi-Fi counting
  • Video counters.

This article will take a closer look into the components of an AI-based object detection and tracking framework for Video counters using Python, Deep Learning and OpenCV, by leveraging CCTV footage of a store.

4. People Counting (Video Counters)

Following are the key components involved in building a framework for people counting in CCTV footage:

  1. Object Detection – Detecting objects (persons) in each frame or after a set of a fixed number of frames in the video.
  2. Object Tracking – Assigning unique IDs to the persons detected and tracking their movement in the video stream.
  3. Identifying the entry/exit area – Based on the angle of the CCTV footage, identifying the entry/exit area tracks the people entering and exiting a store.

Since object detection algorithms are computationally expensive, we can use a hybrid approach where objects are detected once every N frames (and not in each frame). And when not in the detecting phase, the objects are tracked as they move around the video frames. Tracking continues until the Nth frame, and then the object detector is re-run. We then repeat the entire process. The benefit of such an approach helps to apply highly accurate object detection algorithms without much computational burden.

4.1 Object Detection

Object detection is a computer vision technique that allows us to determine where the object is in an image/frame. Some object detection algorithms include Faster R-CNN, Single Shot Detectors (SSD), You Only Look Once (YOLO), etc.

Illustration of YOLO Object Detector Pipeline (Source)

YOLO being significantly faster and accurate, can be used for video/real-time object detection. YOLOv3 model is pre-trained on the COCO dataset to classify 80 different classes, including people, cars, etc. Using the machine learning concept of transfer learning (where knowledge gained from solving a problem helps solve similar problems), for people counting, the pre-trained model weights, developed by the darknet team, can be leveraged to detect persons in the frames of the video stream.

Following are the steps involved for object detection in each frame using the YOLO model and OpenCV –

1. Load the pre-trained YOLOv3 model using OpenCV’s DNN function-

2. Determine the output layer names/classes from YOLO model and construct blob from the frame –

3. For each object detected in ‘layerOutputs,’ filter objects labeled ‘Person’ to identify all the persons present in the video frame.

4.2 Object Tracking

The persons detected using object detection algorithms are tracked with the help of an object tracking algorithm that accepts the input coordinates (x,y) of where the person is in the frame and then assigns a unique ID to that particular person. The tracked person moves around a video stream (in different frames) by predicting the new object location in the next frame based on various factors of the frame such as gradient, speed, etc. Few object tracking algorithms are Centroid Tracking, Kalman Filter tracking, etc.

Since the position of a person in the next frame is determined to a great extent by their velocity and position in the current frame, the Kalman Filter tracking algorithm tracks old or new persons detected. Kalman Filter allows model tracking based on velocity and position, predicting likely possible positions. It does so by using Gaussians. When it receives a new reading, it uses probability to assign measurements to its prediction and update itself. Accordingly, the object assigns to existing or unique IDs. This blog explains the maths behind Kalman Filter.

4.3 Identifying the entry/exit area

To keep track of people entering/exiting a particular area of the store, based on the CCTV angle, the entry/exit area in the video stream is specified to accurately collect data of the customer journey and traffic in the store.

In the image below (the checkout counter), the yellow boundary specifies the entry/exit area, and the status of store traffic is updated accordingly.

A reference frame from the video (Source of the original video)

5. Key Challenges –

Some key challenges observed during object detection and tracking framework for footfall data capturing are listed below –

  • Speed for real-time detection – The object detection and prediction time needs to be incredibly fast to accurately capture the traffic in the store with frequently visiting customers.
  • The angle of CCTV Cameras – The camera angle should accurately capture the footfall traffic.
  • Video Clarity – For object detection and tracking algorithms to accurately capture the people in a video, the quality of the video plays an important role. It should not be too blurry, have proper lighting, etc.

6. Conclusion

The need for Store Traffic Analytics has become apparent with the growing complexity of the industry. Retail businesses face fierce competition that pressures them to guarantee that the right products and services are available to their customers.

Collecting and analyzing data that accurately reveals customer behavior has therefore become a crucial part of the process.

7. References

Hotel Recommendation Systems: What is it and how to effectively build one?

What is a Hotel Recommendation System?

A hotel recommendation system aims at suggesting properties/hotels to a user such that they would prefer the recommended property over others.

Why is a Hotel Recommendation System required?

In today’s data-driven world, it would be nearly impossible to follow the traditional heuristic approach to recommend millions of users an item that they would actually like and prefer.

Hence, a Recommendation System solves our problem where it incorporates user’s input, historical interaction, and sometimes even user’s demographics to build an intelligent model to provide recommendations.

Objective:

In this blog, we will cover all the steps that are required to build a Hotel Recommendation System for the problem statement mentioned below. We will do an end-to-end implementation from data understanding, data pre-processing, and the algorithms used along with their PySpark codes.

Problem Statement: Build a recommendation system providing hotel recommendations to users for a particular location they have searched for on xyz.com

What type of data are we looking for?

Building a recommendation system requires two sources of data, explicit and implicit signals.

Explicit data is the user’s direct input, like filters (4 star rated hotel or preference of pool in a hotel) that a user applies while searching for a hotel. Information such as age, gender, and demographics also comes under explicit signals.

Implicit data can be obtained by users’ past interactions, for example, the average star rating preferred by the user, the number of times a particular hotel type (romantic property) is booked by the user, etc.

What data are we going to work with?

We are going to work with the following:

  1. Explicit signals where a user provides preferences for what type of amenities they are looking for in a property
  2. Historical property bookings of the user
  3. Users’ current search results from where we may or may not get information regarding the hotel that a user is presently interested in

Additionally, we have the property information table (hotel_info table), which looks like the following:

hotel_info table

Note: We can create multiple property types (other than the above 4, Wi-Fi, couple, etc.) ingeniously covering the maximum number of properties in at least one of the property types. However, for simplicity, we will continue with these 4 property types.

Data Understanding and Preparation:

Consider that the searches data is in the following format:

user_search table

Understanding user_search table:

Information about a user (user ID), the location they are searching in (Location ID), their check-in and check-out dates, the preferences applied while making the search (Amenity Filters), the property specifically looked into while searching (Property ID), and whether they are about to book that property (Abandoned cart = ‘yes’ means that they are yet to make the booking and only the payment is left) can be extracted from the table

Clearly, we do not have all the information for the searches made by the user hence, we are going to split the users into 3 categories; namely, explicit users (users whose amenity filter column is not null), abandoned users (users whose abandoned cart column is ‘yes’), and finally, historical users (users for whom we have historical booking information)

Preparing the data:

For splitting the users into the 3 categories (explicit, abandoned, historical), we give preference in the following order, Abandoned users>Explicit users>historical users. This preferential order is because of the following reasons:

The abandoned cart gives us information regarding the product the user was just about to purchase. We can exploit this information to give recommendations similar to the product in the cart; since the abandoned product represents what a user prefers. Hence, giving abandoned users the highest priority.

An explicit signal is an input directly given by the user. The user directly tells his preference through the Amenities column. Hence, explicit users come next in the order.

Splitting the users can be done following the steps below:

Firstly, create a new column as user_type, under which each user will be designated with one of the types, namely, abandoned, explicit, or historical

Creating a user_type column can be done using the following logic:

df_user_searches = spark.read.parquet(‘xyz…….’)

df_abandon = df_user_searches.withColumn(‘abandon_flag’,F.when(col(‘Abandon_cart’).like(‘yes’) & ‘Property_ID is not Null’,lit(1)).otherwise(lit(None))).filter(‘abandon_flag = 1’).withColumn(‘user_type’,lit(‘abandoned_users’)).drop(‘abandon_flag’)

df_explicit = df_user_searches.join(df_abandon.select(‘user_ID’),’user_ID’,’left_anti’).withColumn(‘expli_flag’,F.when(col(‘Amenity_Filters’).like(‘%Wifi Availibility%’)|col(‘Amenity_Filters’).like(‘%Nature Friendly%’)|col(‘Amenity_Filters’).like(‘%Budget Friendly%’)|col(‘Amenity_Filters’).like(‘%Couple Friendly%’),lit(1)).otherwise(lit(None))).filter(‘expli_flag = 1’).withColumn(‘user_type’,lit(‘explicit_users’)).drop(‘expli_flag’)

df_historical = df_user_searches.join(df_abandon.unionAll(df_explicit).select(‘user_ID’).distinct(),’user_ID’,’left_anti’).withColumn(‘user_type’,lit(‘historical_user’))

df_final = df_explicit.unionAll(df_abandon).unionAll(df_historical)

Now, the user_search table has the user_type as well. Additionally,

For explicit users, user_feature columns will look like this:

explicit_users_info table

For abandoned users, after joining the property id provided by the user with that in the hotel_info table, the output will resemble as follows:

abandoned_users_info table

For historical users, sum over the user and calculate the total number of times the user has booked a particular property type; the data will look like the following:

historical_users_info table

For U4 in the historical_users_info table, we have information that tells us that the user prefers an average star rating of 4, has booked WiFi property 5 times, and so on. Eventually, telling us the attribute preferences of the user….

Building the Recommendation System:

Data at hand:

We have users split and user’s preferences as user_features

We have the hotel attributes from the hotel_type table, assume that it contains the following values:

hotel_type table

We will use content-based-filtering in building our recommendation model. For each of the splits, we will use an algorithm that will give us the best result. To gain a better understanding of recommendation systems and content-based filtering, one can refer here.

Note: We have to give recommendations based on the location searched by the user. Hence, we will perform a left join on the key Location ID to get all the properties that are there in the location.

Building the system:

For Explicit users, we will proceed in the following way:

We have user attributes like wifi_flag, budget_flag, etc. Join this with the hotel_type table on the location ID key to get all the properties and their attributes

Performing Pearson correlation will give us a score([-1,1]) between the user and hotel features, eventually helping us to provide recommendation in that location

Code for explicit users:

explicit_users_info = explicit_users_info.drop(‘Property_ID’)

expli_dataset = explicit_users_info.join(hotel_type,[‘location_ID’],’left’).drop(‘star_rating’)

header_user_expli = [‘wifi_flag’,’couple_flag’,’budget_flag’,’nature_flag’]

header_hotel_features = [‘Wifi_Availibility’,’Couple_Friendly’,’Budget_Friendly’,’Nature_Friendly’]

assembler_features = VectorAssembler(inputCols= header_user_expli, outputCol=”user_features”)

assembler_features_2 = VectorAssembler(inputCols= header_hotel_features, outputCol=”hotel_features”)

tmp = [ assembler_features,assembler_features_2]

pipeline = Pipeline(stages=tmp)

baseData = pipeline.fit(expli_dataset).transform(expli_dataset)

df_final = baseData

def pearson(a,b):

if (np.linalg.norm(a) * np.linalg.norm(b)) !=0:

a_avg, b_avg = np.average(a), np.average(b)

a_stdev, b_stdev = np.std(a), np.std(b)

n = len(a)

denominator = a_stdev * b_stdev * n

numerator = np.sum(np.multiply(a-a_avg, b-b_avg))

p_coef = numerator/denominator

return p_coef.tolist()

pearson_sim_udf = udf(pearson, FloatType())

pearson_final = df_final.withColumn(‘pear_correlation_res’, pearson_sim_udf(‘user_features’,’hotel_features’))

pearson_final.withColumn(‘recommendation_rank’,F.row_number().over(Window.partitionBy(‘User_ID’).orderBy(desc(‘pear_correlation_res’)))).show()

Our output will look like the following:

explicit users

For abandoned and historical users, we will proceed as follows:

Using the data created above, i.e., abandoned_users_info and historical_users_info tables, we obtain user preferences in the form of WiFi_Availibility or wifi_flag, star_rating or avg_star_rating, and so on

Join it with the hotel_type table on the location ID key to get all the hotels and their attributes

Perform Cosine Similarity to find the best hotel to recommend to the user in that particular location

Code for abandoned users:

abandoned_users_info = abandoned_users_info.drop(‘Property_ID’)\

.withColumnRenamed(‘Wifi_Availibility’,’a_Wifi_Availibility’)\

.withColumnRenamed(‘Nature_Friendly’,’a_Nature_Friendly’)\

.withColumnRenamed(‘Budget_Friendly’,’a_Budget_Friendly’)\

.withColumnRenamed(‘Couple_Friendly’,’a_Couple_Friendly’)\

.withColumnRenamed(‘Star_Rating’,’a_Star_Rating’)

abandoned_dataset = abandoned_users_info.join(hotel_type,[‘location_ID’],’left’)

header_user_aban = [‘a_Wifi_Availibility’,’a_Couple_Friendly’,’a_Budget_Friendly’,’a_Nature_Friendly’,’a_Star_Rating’]

header_hotel_features = [‘Wifi_Availibility’,’Couple_Friendly’,’Budget_Friendly’,’Nature_Friendly’,’Star_Rating’]

assembler_features = VectorAssembler(inputCols= header_user_aban, outputCol=”user_features”)

assembler_features_2 = VectorAssembler(inputCols= header_hotel_features, outputCol=”hotel_features”)

tmp = [ assembler_features,assembler_features_2]

pipeline = Pipeline(stages=tmp)

baseData = pipeline.fit(abandoned_dataset).transform(abandoned_dataset)

df_final = baseData

def cos_sim(value,vec):

if (np.linalg.norm(value) * np.linalg.norm(vec)) !=0:

dot_value = np.dot(value, vec) / (np.linalg.norm(value)*np.linalg.norm(vec))

return dot_value.tolist()

cos_sim_udf = udf(cos_sim, FloatType())

abandon_final = df_final.withColumn(‘cosine_dis’, cos_sim_udf(‘user_features’,’hotel_features’))

abandon_final.withColumn(‘recommendation_rank’,F.row_number().over(Window.partitionBy(‘User_ID’).orderBy(desc(‘cosine_dis’)))).show()

Code for historical users:

historical_dataset = historical_users_info.join(hotel_type,[‘location_ID’],’left’)

header_user_hist = [‘wifi_flag’,’couple_flag’,’budget_flag’,’nature_flag’,’avg_star_rating’]

header_hotel_features = [‘Wifi_Availibility’,’Couple_Friendly’,’Budget_Friendly’,’Nature_Friendly’,’Star_Rating’]

assembler_features = VectorAssembler(inputCols= header_user_hist, outputCol=”user_features”)

assembler_features_2 = VectorAssembler(inputCols= header_hotel_features, outputCol=”hotel_features”)

tmp = [ assembler_features,assembler_features_2]

pipeline = Pipeline(stages=tmp)

baseData = pipeline.fit(historical_dataset).transform(historical_dataset)

df_final = baseData

def cos_sim(value,vec):

if (np.linalg.norm(value) * np.linalg.norm(vec)) !=0:

dot_value = np.dot(value, vec) / (np.linalg.norm(value)*np.linalg.norm(vec))

return dot_value.tolist()

cos_sim_udf = udf(cos_sim, FloatType())

historical_final = df_final.withColumn(‘cosine_dis’, cos_sim_udf(‘user_features’,’hotel_features’))

historical_final.withColumn(‘recommendation_rank’,F.row_number().over(Window.partitionBy(‘User_ID’).orderBy(desc(‘cosine_dis’)))).show()

Our output will look like the following:

historical users

abandoned users

Giving Recommendations:

Giving 3 recommendations per user, our final output will look like the following:

Note:

One can notice that we are not using hotel recommendation X for the abandoned user U1 as a first recommendation we are avoiding so as hotel features were created from the same property ID, hence, it will always be at rank 1

Unlike cosine similarity where 0’s are considered a negative preference, Pearson correlation does not penalize the user if no input is given; hence we use the latter for explicit users

Conclusion:

In the end, the objective is to fully understand the problem statement, work around the data available, and provide recommendations with a nascent system.

Accelerate Your eCommerce Sales with Big Data and AI for 2021

Holiday season is the most exciting time of the year for businesses. It has always driven some of the highest sales of the year. In 2019, online holiday sales in the US alone touched $135.35 billion and the average order value hit $152.95. After an unprecedented 2020, retailers are performing many bold maneuvers for turning the tide around in the new year.

A successful holiday strategy in 2021 requires much more than just an online presence. To compete during one of the strangest seasons post a strangest year yet, brands are trying to create more meaningful connections with consumers, offering hyper-personalized online experiences, and ensuring that holiday shoppers experience nothing short of pure convenience and peace of mind.

In 2020, retailers faced some novel challenges and many unknowns. To begin with, here are a few key things that could not be ignored:

  • Customer behaviors significantly changed during the pandemic and expectations now have only burgeoned
  • Gen Z and Millennial shoppers who have the maximum purchasing power got focused on sustainability and peace of mind
  • The ecommerce industry saw five years of digital transformation in two months courtesy the pandemic. Immersive cutting-edge technology like voice-aided shopping, AI-assisted browsing and machine learning were no longer seen as optional, they became must-haves for facilitating a superior customer experience

Here are ten ways how big data and AI tech are helping businesses accelerate ecommerce sales

1. Hyper-Personalized product recommendations through Machine Learning

Providing people with exactly what they want is the best way to attract new customers and to retain existing ones. So, having intelligent systems to surface products or services that people would be inclined to buy only seems natural. To enable this, data and machine learning are playing big roles. They are helping businesses put the right offers in front of the right customers at the right time. Research has proven that serving relevant product recommendations can have a sizable impact on sales. As per a study, 45% of customers reveal they are likely to shop on a site that preempts their choices, while 56% are more likely to return to such a site. Smart AI systems are allowing deep dive into buyer preferences and sentiments and helping retailers and e-commerce companies provide their customers with exactly what they might be looking for.

2. Enabling intelligent search leveraging NLP

The whole point of effective search is to understand user intent correctly and deliver exactly what the customer wants. More and more companies are using modern customer-centric search powered by AI which enables it to think like humans. It deploys advanced image and video recognition and natural language processing tools to constantly improve and contextualize the results for customers which eventually helps companies in closing the leads more rigorously.

3. One-to-one marketing using advanced analytics

With one-to-one marketing, retailers are taking a more targeted approach in delivering a personalized experience than they would with giving personalized product recommendations or intelligent search engines. Data like page views and clickstream behavior forms the foundation of one-to-one marketing. As this data is harvested and processed, commonalities emerge that correspond with broad customer segments. As this data is further refined, a clearer picture emerges of an individual’s preferences and 360° profile, which is informing real-time action on the end of the retailer.

4. Optimized pricing using big data

There are numerous variables that impact a consumer’s decision to purchase something: product seasonality, availability, size, color, etc. But many studies zero down on price being the number one factor in determining whether the customer will buy the product.

Pricing is a domain that has traditionally been handled by an analyst after diving deep into reams of data. But big data and machine learning-based methods today are helping retailers accelerate the analysis and create an optimized price, often several times in a single day. This helps keep the price just right so as not to turn off potential buyers or even cannibalize other products, but also high enough to ensure a sweet profit.

5. Product demand forecasting and inventory planning

In the initial months of the pandemic, many retailers had their inventory of crucial items like face coverings and hand sanitizers exhausted prematurely. In certain product categories, the supply chains could not recover soon enough, and some have not even recovered yet. Nobody could foretell the onslaught of the coronavirus and its impending shadow on retailers, but the disastrous episode that followed sheds urgent light on the need for better inventory optimization and planning in the consumer goods supply chain.

Retailers and distributors who early-on leveraged machine learning-based approaches for supply chain planning fared better than their contemporaries who continued to depend solely on analysts. With a working model in place, the data led to smarter decisions. Incorporating external data modules like social media data (Twitter, Facebook), macroeconomic indicators, market performance data (stocks, earnings, etc.) to the forecasting model, in addition to the past samples of the inventory data seasonality changes, are helping correctly determine the product demand pattern.

6. Blending digital and in-store experiences through omnichannel ecommerce offerings

The pandemic has pushed many people who would normally shop in person to shop online instead. Retailers are considering multiple options for getting goods in the hands of their customers, including contactless transactions and curbside pickups. Not that these omnichannel fulfillment patterns were not already in place before the coronavirus struck, but they have greatly accelerated under COVID-19. AI is helping retailers expedite such innovations as e-commerce offerings, blending of digital and in-store experiences, curbside pickup and quicker delivery options, and contactless delivery and payments.

7. Strengthening cybersecurity and fighting fraud using AI

Fraud is always a threat around the holidays. And given the COVID-19 pandemic and the subsequent shift to everything online, fraud levels have jumped by 60% this season. An increase in card-not-present transactions incites fraudsters to abuse cards that have been compromised. Card skimming, lost and stolen cards, phishing scams, account takeovers, and application fraud present other loopholes for nefarious exploits to take place. In a nutshell, fraudsters are projected to extort innocent customers by about 5.5% more this year. In this case, card issuers and merchants alike armed with machine learning and AI are analyzing huge volumes of transaction, identifying the instances of attempted fraud, and automating the response to it.

8. AI-powered chatbots for customer service

Chatbots that can automatically respond to repetitive and predictable customer requests are one of the speediest growing sectors of big data and AI. Thanks to advances in NLP and natural language generation, chatbots can now correctly understand complex written and spoken queries of the most nuanced order. These smart assistants are already saving companies millions of dollars per year by supplementing human customer service reps in resolving issues with purchases, facilitating returns, helping find stores, answering repetitive queries concerning hours of operation, etc.

9. AI guides for enabling painless gift shopping

As this is the busiest time of the year when customers throng websites and stories for gift-shopping, gaps in customer service can seriously confuse and dissuade the already indecisive shopper. In such a scenario, tools like interactive AI-powered gift finders are engaging shoppers in a conversation by asking a few questions about the gift recipient’s personality, and immediately providing them with gifting ideas, helping even the most unsettled gift shopper to find the perfect gift with little wavering. This is helping customers overcome choice paralysis and inconclusiveness and helping companies boost conversions, benefiting both sides of the transaction table.

10. AR systems for augmented shopping experience

AR is taking the eCommerce shopping and customer experience to the next level. From visual merchandising to hyper-personalization, augmented reality offers several advantages. Gartner had indicated in a 2019’s predictions report that by 2020 up to 100 million consumers are expected to use augmented reality in their shopping experiences and the prophecy came true. The lockdown and isolation necessitated by Covid-19 rapidly increased the demand for AR systems.

Based on the “try-before-you-buy” approach, augmented shopping appeals to customers by allowing them to interact with their choice of products online before they proceed to buy any. For instance, AR is helping buyers visualize what their new furniture will look and feel like by moving their smartphone cameras around the room in real-time and getting a feel of the size of the item and texture of the material for an intimate understanding before purchase. In another instance, AR is helping women shop for makeup by providing them with a glimpse of the various looks on their own face at the click of a button.

To survive the competitive landscape of eCommerce and meet the holiday revenue goals this year, merchants and retailers are really challenge the status quo and adopting AI-powered technology for meeting customer expectations. AI is truly the future of retail, and not leveraging the power of artificial intelligence, machine learning and related tech means you are losing out.

Recommendation Systems for Marketing Analytics

How I perceive recommendation systems is something which the traditional shopkeepers used to use.

Remember the time when we used to go shopping with our mother in the childhood to the specific shop. The shopkeeper used to give the best recommendations for the products, and we used to buy it from the same shop because we knew that this shopkeeper knows us best.

What the shopkeeper did was he understood our taste, priorities, the price range that we are comfortable with and then present the products which best matched our requirement. This is what the businesses are doing in the true sense now.

They want to know their customer personally by their browsing behaviour and then make them recommendation of the products that they might like, the only thing is that they want to do it on a large scale.

For example, Amazon and Netflix understand your behaviour through what you browse, add to basket and order, movies you watch and like and then recommend the best of the products which you make like with high probability.

In a nutshell, they combine what you call as the business understanding with some mathematics so that we can essentially know and learn about the products that the customer likes.

So basically, as recommendation system for marketing analytics is a subclass of information filtering system that seeks the similarities between users and items with different combinations.

Below are some of the most widely used types of recommendation systems:

  1. Collaborative Recommendation system
  2. Content-based Recommendation system
  3. Demographic based Recommendation system
  4. Utility based Recommendation system
  5. Knowledge based Recommendation system
  6. Hybrid Recommendation system

Let us go into the most useful ones which the industry is using:

  • Content Based Recommendation System

The point of content-based is that we should know the content of both user and item. Usually we construct user-profile and item-profile using the content of shared attribute space. The product attributes like image (Size, dimension, colour etc…) and text description about the product is more inclined towards “Content Based Recommendation”.

This essentially means that based upon the content that I watch on Netflix, I can run an algorithm to see what the most similar movies are and then recommend the same to the other users.

For example, when you open Amazon and search for a product, you get the similar products pop up below which is the item-item similarity that they have computed for the overall products that are there in Amazon. This gives us a very simple yet effective idea of how the products behave with each other.

Bread and butter could be similar products in the true sense as they go together but their attributes can be varied. In case of the movie industry, features like genres, reviews could tell us the

similar movies and that is the type of similarity we get for the movies.

  • Collaborative Recommendation System:

Collaborative algorithm uses “User Behaviour” for recommending items. They exploit behaviour of other users and items in terms of transaction history, ratings, selection, purchase information etc. In this case, features of the items are not known.

When you do not want to see what the features of the products are for calculating the similarity score and check the interactions of the products with the users, you call it as a collaborative approach.

We figure out from the interactions of the products with the users what are the similar products and then take a recommendation strategy to target the audience.

Two users who watched the same movie on Netflix can be called similar and when the first user watches another movie, the second users gets that same recommendation based on the likes that these people have.

  • Hybrid Recommendation System:

Combining any of the two systems in a manner that suits the industry is known as Hybrid Recommendation system. It combines the strengths of more than two Recommendation system and eliminates any weakness which exist when only one recommendation system is used.

When we only use Collaborative Filtering, we have a problem called as “cold start” problem. As we take into account the interaction of users with the products, if a user comes to the website for the first time, I do not have any recommendations to make to that customer as I do not have interactions available.

To eliminate such a problem, we used hybrid recommendation systems which combines the content-based systems and

collaborative based systems to get rid of the cold start problem. Think of it as this way, item-item and user-user, user-item interaction all combined to give the best recommendations to the users and to give more value to the business.

From here, we will focus on the Hybrid Recommendation Systems and introduce you to a very strong Python library called lightfm which makes this implementation very easy.

LightFM:

The official documentation can be found in the below link:

lyst/lightfm

Build status Linux OSX (OpenMP disabled) Windows (OpenMP disabled) LightFM is a Python implementation of a number of…

github.com

LightFM is a Python implementation of the number of popular recommendation algorithms for both implicit and explicit feedback.

User and item latent representations are expressed in terms of their feature’s representations.

It also makes it possible to incorporate both item and user metadata into the traditional matrix factorization algorithms. When multiplied together, these representations produce scores for every item for a given user; items scored highly are more likely to be interesting to the user.

Interactions : The matrix containing user-item interactions.

User_features : Each row contains that user’s weights over features.

Item_features : Each row contains that item’s weights over features.

Note : The matrix should be Sparsed (Sparse matrix is a matrix which contains very few non-zero elements.)

Predictions

fit_partial : Fit the model. Unlike fit, repeated calls to this method will cause training to resume from the current model state.

Works mainly for the new users to append to the train matrix.

Predict : Compute the recommendation score for user-item pairs.

The scores are sorted in the descending form and top n-items are recommended.

Model evaluation

AUC Score : In the binary case (clicked/not clicked), the AUC score has a nice interpretation: it expresses the probability that a randomly chosen positive item (an item the user clicked) will be ranked higher than a

randomly chosen negative item (an item the user did not click). Thus, an AUC of 1.0 means that the resulting ranking is perfect: no negative item is ranked higher than any positive item.

Precision@K : Precision@K measures the proportion of positive items among the K highest-ranked items. As such, this is focused on the ranking quality at the top of the list: it does not matter how good or bad the rest of your ranking is as long as the first K items are mostly positive.

Ex: Only one item of your top 5 item are correct, then your precision@5 is 0.2

Note : If the first K recommended items are not available anymore (say, they are out of stock), and you need to move further down the ranking. A high AUC score will then give you confidence that your ranking is of high quality throughout.

Enough of the theory now, we will move to the code and see how the implementation for lightfm works:

I have taken the dataset from Kaggle, you can download it below:

E-Commerce Data

Actual transactions from UK retailer www.kaggle.com

Hope you liked the coding part of it, and you are ready to implement that in any version. The enhancement that can be done in this is if you have the product and the user features.

These can also be taken as inputs into the lightfm model and the embedding that the model creates would be based upon all those attributes. The more data that is pushed into the lightfm will give the model a better accuracy and more training data.

That’s all from my end for now. Keep Learning!! Keep Rocking!!

Python Newbie – Doctest

Any Software Development process consists of five stages:

  1. Requirement Analysis
  2. Design
  3. Development
  4. Testing
  5. Maintenance

Though each and every process mentioned above is important in SDLC lifecycle, this post will mainly focus on the importance of testing and enlighten on how we can use doctest a module in python to perform testing.

Importance of testing

We all make mistakes and if left unchecked, some of these mistakes can lead to failures or bugs that can be very expensive to recover from. Testing our code helps to catch these mistakes or avoid getting them into production in the first place.

Testing therefore is very important in software development.

Used effectively, tests help to identify bugs, ensure the quality of the product, and to verify that the software does what it is meant to do.

Python module- doctest

Doctest helps you test your code by running examples embedded in the documentation and verifying that they produce the expected results. It works by parsing the help text to find examples, running them, then comparing the output text against the expected value.

To make things easier, let us start by understanding the above implementation using a simple example

Python inline function

So, in the above snippet, I have written a basic inline function that adds up a number to itself.

For this function, I run manually a couple of test cases to do some basic verification (to do sanity check of my function).

Now, consider a scenario in which python can read the above output and perform the sanity check for us at the run time. This is exactly the idea behind a doctest.

Now, let’s see how we can implement one.

Let’s take a very simple example to calculate what day of the week it will be ahead of the current weekday. We write a docstring for our function which helps us to understand what our function does and what inputs it takes and so on. In this docstring, I have added couple of test cases which will be read by the doctest module at the run time while testing is carried out.

Implementation of doctest

When we run the above script from the command, we will get the foollowing output:

Doctest Output

We can see from the above snippet that all test cases mentioned in the docstring were successful as the resulted outcome matched with the expected outcome.

But what happens if any test fails, or the script does not behave as expected?

To test this, we add a false test case as we know our script can only take integers as input.

What will happen if we give a string as an input? Let us check out.

Test case with strings as input

I have used the same script but made a small change in the test case where I have passed strings as an input.

So, what is the outcome for the above test case?

Failed test case output

Voila! We can see that the doctest has clearly identified the failed test case and listed the reason for the failure of the above-mentioned test case.

Conclusion

That is all you require to get started with doctest. In this post, we have come across the power of doctest and how it makes a lot easier to carry out automated testing for most of the script. Though, many developers find doctest easier as compared to unittest because in its simplest form, there is no API to learn before using it. However, as the examples become more complex, the lack of fixture management can make writing doctest tests more cumbersome than when using unittest. But still due to the ease of its module, doctest is worth adding to our codes.

In upcoming blogs, we are going to discuss more about handy python module that can ease our task and we will also dive into some other concepts like Machine Learning and Deep Learning. Until then, keep learning!

Recommendation Systems for Beginners

Why do we need a recommendation system ?

Let us take the simplest and the most relatable example of E-commerce giant, Amazon. When we shop at Amazon, it gives us the options of bundles and products that are usually bought along with the product you are currently going to buy. For example, if you are to buy a smartphone, it recommends you to buy a back cover for the product as well.

For a second let us think and try to figure out what Amazon is trying to do in the figure below:

What does a recommendation system do ?

A recommendation system recommends you products or items that can be of your interest or liking. Let’s take another example:

It’s quite easy to notice that they are trying to sell the equipment that is generally required for a camera (memory card and the case). Now, the main question is, how do they do it for millions of items listed on their website. This is where a recommendation system comes handy.

When we first set up our Netflix account, they ask us what our preferences are, which movie or TV show is most likely to be watched by us or what genre of movie is our favorite. So as the first layer of recommendation, Netflix gives us recommendations based on our input, it shows us movies or shows similar to the input that we had provided to it. Once we are a frequent user, it has gathered enough data and gives recommendations more accurately based on our preferences of genres, cast, star rating, and so on…

The ultimate aim here is to recommend a user with an item such that he will watch it or buy it(in the case of Amazon), this in turn makes sure that the user is engaged with the platform and the customer lifetime value(CLTV) is maintained.

Objective of this blog

By the end of this blog, one will have a basic understanding of how to approach towards building a recommendation system. To make things more lucid let us take an example and try building a Hotel recommendation system. In this process, we will cover data understanding and the algorithms that can be used to realize how a nascent recommendation engine is built. We will use analogies between diurnal used products like Amazon and Netflix to have a clearer understanding.

Understanding the data required for building a recommendation system

To build a recommendation system, we must be clear with the problem statement and the end objective to provide accurate recommendations. For example, consider the following scenarios:

  1. Providing a user with a hotel recommendation based on his/her current search and historical behavior (giving a recommendation knowing that a user is looking for a hotel in Las Vegas and prefers hotels with casinos).
  2. Providing a hotel recommendation based on the user’s historical behavior, targeting those users who are not actively engaged (searching) but can be incentivized towards making a booking by targeting through a relevant recommendation (a general recommendation can be based on metrics such as a user’s historical star rating preference or historical budget preference).

These are two different objectives, and hence, the approach towards achieving both of them is different.

One must be aware of what type of data is available and also needs to know how to leverage that data to proceed towards building a recommendation engine.

There are two types of data which are of importance in our use case:

Explicit Data:

Explicit signals or input is where a user directly gives feedback to a particular item/product. This can be star values, say in the range of 1 to 5 or just a binary 1(like) and 0(dislike). For example, when we rate an item on Amazon or when we rate a movie on IMDb, these are explicit signals where we are directly giving our feedback towards an item. One thing to keep in mind is that we should be aware that each and every individual is not the same, i.e. for an Item X, User A, and User B can have different ratings. User A can be generous with his ratings and can give a rating of 5 stars whereas, User B is a critic and gives Item X 3.5 stars and gives 5 stars only for exceptional Items.

Replicating the example for our Hotel recommendation use case can be summarized like, the filters that a user applies while searching for a Hotel, say, filters like swimming pool or WiFi are explicit signals, here the user is explicitly saying that he is interested in properties which have WiFi and a swimming pool.

Additionally, the explicit data is sparse in most of the cases, as it is not ideally possible for a user to give ratings to each and every item. Logically thinking, I would not have seen each and every movie on Netflix and hence can only provide feedback for the set of movies that I have seen. These ratings reflect how much a user likes or approves of an item.

Implicit Data:

Implicit signals are obtained by capturing a user’s interaction with the item. This can be a continuous value, like the number of times a user has clicked on an item or the number of times a user has watched an Action movie or Binary, similar to just clicked or not clicked. For example, while scrolling through amazon.com the amount of time spent viewing an item or the number of times you have clicked the item can act as implicit feedback.

Drawing parallels for hotel recommendations with implicit signals can be understood as follows. Consider that we have the historical hotel bookings of a user A, and we see that in the 4 out of 5 bookings made by the user, it was a property that was near the beach. This can act as an implicit signal where we can say that user A prefers hotels near the beach.

Types of Recommendation Systems

Let us take a specific example given below to further explain the recommendation models:

While making a hotel recommendation system, we have the user’s explicit and implicit signals. However, we do not have all the signals for all the users, for a set of users E, we have explicit signals and for a set of users I, we have implicit signals.

Further, let us assume that a hotel property has the following attributes:

WiFi Availability, Couple Friendly, Budget Friendly and Nature Friendly (closer to nature)

For simplicity, let us assume that these are flags, such that if a property A has WiFi in it, the WiFi availability column will be 1. Hence our hotels data will look something like the following:

Let us name this table as Hotel_Type for further use

Content Based Filtering:

This technique is used when explicit signals are provided by the user or when we have the user and item attributes and the interaction of the user with that item. The objective here is to show items/products which are similar to the item/product that a person has already purchased or shows a liking for, or in another case, show a product to a user where he explicitly says that he is looking for something in particular. Taking our example, consider that you are booking a hotel from xyz.com, you apply filters for couple-friendly properties, here you are explicitly saying that you are looking for a couple-friendly property and hence, xyz.com will show you properties that are couple friendly. Similarly, while providing recommendations, if we have explicit signals from a user we try to get the best match for that signal with the list of items that we have and provide recommendations accordingly.

Model Algorithms:

Cosine Similarity: It is a measure of similarity

between two non-zero vectors. The values range from 0 to 1.

Cosine Similarity is used as a measure of similarity between two non-zero vectors. Here the vectors can be both user or item based.

Let us take an example, Assume that a user A has specifically shown interest towards property X from the hotel_type table (the user has previously booked the property or has searched for the property X multiple times recently), we now have to recommend him properties that are similar to property X. To do so, we are going to find the similarity between property X and the rest of the properties in the table.

We can clearly see that property Q has the highest similarity with property X and followed by property P. So if we are to recommend a property to user A, we will recommend him property Q knowing that he has a preference for property X.

Pearson Correlation: It is a measure of linear correlation between two variables. The values range from -1 to 1.

Let us take an example where we are getting explicit input from the user where the user is shown the 4 categories (WiFi, Budget, Couple, Nature). The user has the option to provide his input by selecting as many as he wants, he can even select none. Considering the case when a user B has selected at least one of the 4 options. Now, assume user B’s input looks like the following:

While one can say that we can use cosine similarity in this case by just filling in the null values as 0. However, it is not advised to do so since, cosine similarity assumes the 0’s as a negative preference and in this explicit signal we cannot for sure say that user B is not looking for a couple friendly or a budget friendly property just because the user has not given an input in that field.

Hence, to avoid this, we use Pearson correlation and the output of the similarity measuring technique would look like the following:

We can see that property Z is highly correlated to user B’s explicit signal and hence, we will provide Z as a recommendation for user B.

So, for the set of users E (explicitly proving us their preference) we will use Pearson Correlation and for the set of users I (implicitly telling us that he/she is looking for a property with a certain set of attributes), we will use Cosine Similarity.

Note: A user’s explicit signal is always preferred over an implicit signal. For example, in the past, I have only booked hotels in the urban areas, however, now I want to book a hotel near the beach (nature friendly). In my explicit search, I specify this, but if you are making an implicit signal from my past bookings you will see that I do not prefer hotels near the beach and would recommend me hotels in the city. In conclusion, Pearson correlation and Cosine similarity are the most widely used similarity techniques, however, we need to always use the correct similarity measuring technique as per our use case. More information on different types of similarity techniques can be found here.

Collaborative Filtering:

This modeling technique is used by leveraging user-item interaction. Here, we try to match or group similar users and recommend based on the preferences of similar users. Let us consider a user-item interaction matrix (rating matrix) where we have the hotel rating a user has given a particular hotel:

Rating Matrix

Now let us compare user A and user E, we can see that they both have similar tastes and have rated Hotel Y as 4, seeing this let us assume that user A will rate Hotel X as 5 and hotel R as 3. Hence, we can give a recommendation of hotel X to user A by noticing the similarity between user A and user E (considering that he will like Hotel X and rate it 5).

So, if we are provided with the interaction of a user with an item where the user has given feedback towards the item, we can use collaborative filtering (for example, the rating matrix). Explicit ratings such as star rating

given by the user or Implicit signals such as a flag if the user has booked a property or contrary of user-item interaction.

Model Algorithms:

Memory and Model-Based Approach are the two types of techniques to implement collaborative filtering. The key difference between the two is that in the memory-based approach we do not use parametric machine learning models.

Memory-Based Approach: It can be divided into two subdivisions, user-item filtering, and item-item filtering. In the user-item approach, we identify clusters of similar users and utilize the interaction of a particular user in that cluster to predict the interaction of the whole cluster. For example, to predict the rating user C gives to a hotel X, we will take a weighted sum of hotel X’s rating by the other users, here weight is the similarity number between user X and the other users. Adjusted cosine similarity can also be used to remove the difference in the nature of individuals, which brings critics and the general public on the same scale.

Item-item filtering is similar to user-item filtering, but here we take an item and see the users that liked that item and find other sets of items that those set of users or similar users also liked. It takes items, finds similar items, and outputs those items as recommendations.

Model-Based Approach: In this technique, we use machine learning models to predict the rating for an item that could have been given by a user and hence, provide recommendations.

Several ML models that are used, to name a few, Matrix factorization, SVD (singular value decomposition), ALS, and SVD++. Some also use neural networks, decision trees, and latent factor models to enhance the results. We will delve into Matrix Factorization below.

Matrix Factorization:

In matrix factorization, the goal is to complete the matrix and fill in the null values in the rating matrix.

The preferences of the users are identified by a small number of hidden features of the user and items. Here there are two hidden feature vectors for users(user matrix 4×2) and items(item matrix 2×4). Once we multiply the user and item matrix back together, we get back our ratings matrix with null values replaced by predicted values. Once we get the predicted values, we can recommend an Item with the highest rating for a user (not considering the items already interacted with).

Note: Here we are not providing any feature vector for the users or the items, the computation decomposes and creates vectors on its own and, finally, predicts by filling in the null values.

If we have user demographics information and user’s features and preference information and item features, we can use SVD++ where we can pass users and item feature vectors as well to get the best recommendation results.

Hybrid Models:

The hybrid model combines multiple models/algorithms to build a recommendation system. To improve the effectiveness of the recommendation, we can combine collaborative filtering and content-based filtering giving appropriate weights to the individual models and finally using this hybrid system to give out a recommendation.

For example, we can combine the results of the following by giving weights:

  1. Using Matrix factorization (collaborative filtering) on ratings matrix to match similar users and predict a rating for the user-item pair.
  2. Using Pearson correlation (content-based filtering) to find similarity between users who provide explicit filters and the hotels with feature vectors.

The combined results can be used to provide recommendations to users.

Conclusion:

Building a recommendation system highly depends on the end objective and the data we have at hand. Understanding the algorithms and knowing which one to use to get recommendations plays a vital role in building a suitable recommendation system. Additionally, a sound recommendation system also uses multiple algorithms and combines the results to provide the final recommendations.

References:

Manas Agrawal

CEO & Co-Founder

Add Your Heading Text Here

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.