The Emergence of Augmented Analytics & How it is Helping to Create a Data-Driven Organization?

In the last few years, the data flowing from various sources has changed the way organizations work and solve their business problem. Identifying potential data points, collecting data, and storing it in a highly secured place has been a need of the hour for many big companies across the industries. In this regard, Big data analytics practices are gaining more popularity; followed by rapid adoption of AI/ML technologies (RL, DeepRL, NLP, etc) in the workflow. Essentially these technological advancements are helping organizations to capture, store, and analyze data to convert them into valuable insights and solve business problems.

On the other hand, you would need to understand the current scenario of dealing with data, ensure the security and compliance factors, and select the right tools that suffice your data analytics prerequisites. But the challenge is how you identify the solution to your changing data needs? And what does all this have to do with augmented analytics? In this blog, we will discuss the Terminology of augmented analytics, What does it offers to businesses, Global market projection, and many other touchpoints that hold the answers for these questions and help you navigate towards creating a data-driven organization.

Let’s start with a big picture!

The blend of different AI capabilities such as ML, NLP, and Computer vision (CV) with few other advanced technologies like AR/VR are boosting the augmented analytics practices, especially in extracting valuable data insights. The graph of the pervasiveness of AI Vs. Instant/near real-time results shown in the above image is mere proof in this context. Thus, Augmented analytics brings all necessary ingredients with it to help organizations conducting more efficient and effective data analytics activities across workflow and create a hassle-free road map to be a data-driven organization; and form Citizen data scientists to solve new business problems with ease and unique capabilities.

Terminology of Augmented Analytics

Gartner– Augmented analytics uses machine learning to automate data preparation, insight discovery, data science, and machine learning model development and insight sharing for a broad range of business users, operational workers, and citizen data scientists.

In other words, it’s a paradigm shift that brings all necessary components and features to be a key driver of modern analytics platforms that program and integrate processes such as data preparation, creating models around data clusters, developing insights, and data cleansing to assist business operations and so forth.

What does it offer?

  • Improved relevance and business insights: Helps to identify false or less relevant insights, minimizes the risk of missing imperative insights in data, navigates to actionable insights to users, and empowers decision making abilities and actions.
  • Faster & near-perfect insights: Greatly reduces the time spent in data discovery and exploration, provides near-perfect data insights to business users, and help them augment the data analysis with AI/ML algorithms.
  • Insights made available everywhere & anywhere: The flexibility and compatibility of augmented analytics expand the data reach across the workflow, beyond citizen data scientists, and operational teams who can leverage the insights with less efforts.
  • Enable less dependency on skill constraints: You don’t need to rely more on data scientists anymore. With the help of advanced AI/ML algorithms; augmented analytics fills the required skill constraints helping organizations to do more with technology than humans’ intervention in data analytics and management process.

The augmented analytics market is broadly classified into deployment, function, component, industrial vertical, and organization size. Later, the deployment category is further divided into the cloud and on-premises. Also, in terms of process and function, the market is segmented into operation, IT, finance, sales & marketing, and others.

Traditional BI Vs. Augmented Analytics

Traditional BI

In the traditional Business Intelligence process, databases were analyzed to generate basic reports. The analysis was executed by a dedicated team of data analysts and access to the reports produced by these professionals

was limited to certain teams. In a way, the regular business users were unable to use this facilitation due to complexity and security constraints. Hence, they were unable to make data-driven decisions.

In the latter days, the level of complexity was reduced with help of technological advancement. However, the manual data collection from data sources remained the same, where the data analysts clean up the data, select the data sources they want to analyze, transfer it to the platform for analysis, generate reports/insights, and share it across the workflow through emails, messages, or within the platform as shown in the above image.

Augmented Analytics

In Augmented analytics, AI technology usage reduces the manual process of data collection and enhances the data transfer and reception across the different sources. Once the data is made available from respective sources, the AI/ML-powered smart systems help users to select the suitable datasets based on the relationships it has identified while bringing the data in for analysis. During the time of data analysis process, AI systems will allow user influence in the process, also suggests different analysis combination that human intervention would take loads of time to produce the same. Once the insights are generated, business users can leverage these insights across the workflow through in-app messaging, mobile apps, chatbots, AI assistants, and more.

Hence throughout the augmented analytics practice, AI empowers the data analytics process by simplifying the insight discovery activity and provides the noteworthy trends and details without a specific user query.

With Augmented Analytics in place businesses can:

  • Perform hassle-free data analysis to meet the business objectives
  • Improve the ability to identify the root cause of data analysis challenges and problems
  • Unearth hidden growth opportunities without investing additional efforts
  • Democratize enterprise-wide insights in a BI perspective to enhance the business performance
  • Opportunities to turn actionable data insights into business outcomes

Summing Up

The world is changing into a data world, and the data is now shaping up beyond big data. Countless devices are connected to each other and producing new data sets every passing day and minute. These data sets are processed and stored in a more complex form to create insightful information; hence businesses need to invest and stat using robust analytical systems and AI assistance to make sense for their efforts in the data analytics journey. On the other hand, the need to democratize the analytics and upsurge productivity; businesses need to innovate and change their legacy approaches. Augmented analytics is proving one such opportunity to uplift the existing and new business objectives to stay ahead in the race. Invest wisely and make the best use of Augmented Analytics to create a data-driven organization, ensure assured success.

Moving Beyond Remote

3 steps to ensure a safe work transition

With the rollout of vaccines and dipping fresh infection rates, many organizations around the world are flagging off work from the workplace albeit cautiously. Given the unpredictability around the virus and its unsurprising emerging new strains, it’s all still a tricky terrain, and firms are testing hybrid models including keeping offices open on alternate days, rotating employees on a weekly basis, and introducing shifts. Multiple challenges plague the gradual return to work from office. For one, the experience will be completely altered in the wake of COVID 19 restrictions and subsequent binding social distancing requirements. Companies are mandated to keep a low attendance that is strictly need-based, have employees wearing masks at all times, redesign spaces to ensure physical distancing, and restrict movement in congested areas (for instance, elevator banks and pantries). As a result, even after the reopening, attitudes toward offices will probably continue to evolve.

The experience itself is having employees torn between choosing to continue to work from home and returning to the workplace after being homebound for months now. Many are still not ready to give up the satisfaction and productivity they discovered while working from their homes close to their families and loved ones, with minimal hours lost to commute, superfluous water-cooler conversations, avoidable meetings, and multitudinous social engagements one can never really bypass in an office space. On the other hand, there are those who are feeling the loss of the physical interaction and are of the view that corporate cultures and communities and those planned and unplanned moments of in-person collaboration are essential to one’s growth, mentorship, talent development, and overall mental and social wellbeing.

Taking into account both the enjoyment and the fatigue of the long-necessitated virtual working, many organizations are finally opening up their doors now to an array of mixed emotions and levels of happiness, unhappiness, productivity, and participation. Mounting economic turmoil and the inefficiencies related with remote work have necessitated the move to ‘business as usual’ though it still looks far from anything of the kind with the string of preventative measures tailing along, including social distancing between desks, regular sanitization and in some cases, only allowing those with personal vehicles to come to office.

Navigating this gradual change now will be one of the biggest business challenges of our time. The goal is to keep operations going while minimizing the risk to employees, and the primary responsibility of it all rests with the Management teams who need to tread this course from crisis to recovery with much forethought and careful heeding of expert advice.

Three primary steps when kickstarting ‘work from work’ yet again

Clearly, there is no one solution that can adequately cater to varying needs of different organizations. Leading firms will need to challenge the old, deep-rooted assumptions about how work should be done and what should be the role of the office in fulfilling that. Answers will shift too, from business to business, depending on the kind of talent they work with, what roles are most important to them, what degree of collaboration is inevitable for their excellence, and where their offices are located, among a bevy of other factors.

Even within an organization, the answer could significantly fluctuate across geographies, businesses, and functions, so the exercise of ascertaining what exactly will be needed to make this ‘work from work’ successful again must be a collective initiative across real estate, human resources, technology, and the business. Leaders must be ready to make hard choices and spearhead the effort across individual functions and businesses. Lasting change will also require cutting-edge change-management skills and constant fine-tuning based on how well the effort is fructifying with time.

Post a serious assessment of how on-ground operations are being reshaped with the slackening of restrictions in what is being seen as a serious and relieving nod to economic activity, our team of experts at Affine suggest the following few steps that businesses can mull over, in the wake of reopening their doors.

1. Prioritize safety of the workforce

The health and safety of the employees should precede any other obligation on the part of the management as the operations start swinging back to the old normal yet again. Teams will need to strictly adhere to the federal, state and local orders as the employees start streaming in back to the office for mission-critical work. The rules will of course vary with location of the offices, factories, distribution centers, and hence planning will need to be democratized for a range of scenarios. Companies might need resetting their protocols for deep cleaning and sanitization. In many cases, workspace layout might need changing, such as moving workstations to comply with social distancing norms, rejigging employee schedules to limit the number of people at a time, establishing guidelines for the use of face masks and gloves, mandating regular temperature checks, and revising leave policy in the wake of infection. Businesses might also leverage technology to facilitate contact tracing and communicate with employees who’ve been exposed to the virus and need to self-quarantine, in line with the protection of privacy and personal data of employees.

2. Determine who needs to be onsite and who can stay back home

Transitioning from remote work to full-time working from office requires the management to make important decisions about which employees really need to be present in-house or on the factory floor and which ones can still manage work while being away sheltered at home. This assessment is important to keep the onsite headcount low and the risk to employee health minimal. For instance, certain roles, such as sales or relationship management that might have earlier required face-to-face interaction could do perfectly fine with a little bit of tweaking given the evolving health guidelines and customer preferences, as well as the advisability of travel for non-essential purposes. Other roles might undeniably depend on onsite tools or technology and might require employees to trudge back to office sooner rather than later.

3. Practice empathy and effective communication with employees

These are unprecedented times. More than any tool, technology, or management decision, what can really help employees tide over the crisis is generous empathy and understanding on the part of the management. Not all employees will have the same needs and response to the pandemic.

So, the questions you really need to ask yourself are: How are your employees hanging in there? Have you devised a clear plan for their wellbeing? Are you providing them with enough resources to take care of their physical, mental, emotional, and financial health? What’s your strategy for their safety and security? Are you laterally focused on them as much as you are on supporting your customer’s needs?

In the midst of the pandemic-triggered chaos, employees are working with a novel sense of anxiety, fear, and loneliness amid isolation. As the leader, it is incumbent on you to stay prudent and make this work — for your employees, for your customers, and for you and your business, at large. A patchwork approach won’t do. You’ll need a solid and coherent coping strategy to empower your workforce and keep them healthy, happy, productive, and motivated when returning to office. Because their sliding optimism directly translates into the diminishing ability of your business to deliver relevant experiences that are able to meet customers’ expectations. Your brand image is entirely hinged on how well your employees are holding up right now. And what a prominent Gallup poll shows is not short of alarming. Only 54 percent of employees strongly agree to feeling well-prepared to do their work, under the shadow of COVID-19.

Adopting vital, actionable organizational practices to blossom trust, compassion, stability, and hope among your employees is paramount to business success. Cultivating empathy as an intrinsic part of your organization’s culture and showing flexibility to match specific employee needs is critical for the exact reasons.

Offer a greater leeway, keep work hours per day to a necessary minimum to spare them time for childcare or self-care, and offer leaves of absence as required. Showing added support, solidarity, and appreciation will make your employees feel that they are heard and cared for. It will give them a sense of security, otherwise stretched thin in this troubling time, that will re-establish their rapport with you and your organization.

It can actually enable your people to kill at their jobs with enthusiasm and commitment equivalent, if not possibly equal, to the pre-pandemic normal, and eventually boost productivity in the long run. These are testing times. But the board with meticulous planning and strategic decisions can aid a smooth transition from remote work to offices in a healthy and effective manner. It’s on the leadership at the end of the day to make possible a well-planned return to offices with heightened safety, collaboration, and productivity, and talent growth.

Accelerate Your eCommerce Sales with Big Data and AI for 2021

Holiday season is the most exciting time of the year for businesses. It has always driven some of the highest sales of the year. In 2019, online holiday sales in the US alone touched $135.35 billion and the average order value hit $152.95. After an unprecedented 2020, retailers are performing many bold maneuvers for turning the tide around in the new year.

A successful holiday strategy in 2021 requires much more than just an online presence. To compete during one of the strangest seasons post a strangest year yet, brands are trying to create more meaningful connections with consumers, offering hyper-personalized online experiences, and ensuring that holiday shoppers experience nothing short of pure convenience and peace of mind.

In 2020, retailers faced some novel challenges and many unknowns. To begin with, here are a few key things that could not be ignored:

  • Customer behaviors significantly changed during the pandemic and expectations now have only burgeoned
  • Gen Z and Millennial shoppers who have the maximum purchasing power got focused on sustainability and peace of mind
  • The ecommerce industry saw five years of digital transformation in two months courtesy the pandemic. Immersive cutting-edge technology like voice-aided shopping, AI-assisted browsing and machine learning were no longer seen as optional, they became must-haves for facilitating a superior customer experience

Here are ten ways how big data and AI tech are helping businesses accelerate ecommerce sales

1. Hyper-Personalized product recommendations through Machine Learning

Providing people with exactly what they want is the best way to attract new customers and to retain existing ones. So, having intelligent systems to surface products or services that people would be inclined to buy only seems natural. To enable this, data and machine learning are playing big roles. They are helping businesses put the right offers in front of the right customers at the right time. Research has proven that serving relevant product recommendations can have a sizable impact on sales. As per a study, 45% of customers reveal they are likely to shop on a site that preempts their choices, while 56% are more likely to return to such a site. Smart AI systems are allowing deep dive into buyer preferences and sentiments and helping retailers and e-commerce companies provide their customers with exactly what they might be looking for.

2. Enabling intelligent search leveraging NLP

The whole point of effective search is to understand user intent correctly and deliver exactly what the customer wants. More and more companies are using modern customer-centric search powered by AI which enables it to think like humans. It deploys advanced image and video recognition and natural language processing tools to constantly improve and contextualize the results for customers which eventually helps companies in closing the leads more rigorously.

3. One-to-one marketing using advanced analytics

With one-to-one marketing, retailers are taking a more targeted approach in delivering a personalized experience than they would with giving personalized product recommendations or intelligent search engines. Data like page views and clickstream behavior forms the foundation of one-to-one marketing. As this data is harvested and processed, commonalities emerge that correspond with broad customer segments. As this data is further refined, a clearer picture emerges of an individual’s preferences and 360° profile, which is informing real-time action on the end of the retailer.

4. Optimized pricing using big data

There are numerous variables that impact a consumer’s decision to purchase something: product seasonality, availability, size, color, etc. But many studies zero down on price being the number one factor in determining whether the customer will buy the product.

Pricing is a domain that has traditionally been handled by an analyst after diving deep into reams of data. But big data and machine learning-based methods today are helping retailers accelerate the analysis and create an optimized price, often several times in a single day. This helps keep the price just right so as not to turn off potential buyers or even cannibalize other products, but also high enough to ensure a sweet profit.

5. Product demand forecasting and inventory planning

In the initial months of the pandemic, many retailers had their inventory of crucial items like face coverings and hand sanitizers exhausted prematurely. In certain product categories, the supply chains could not recover soon enough, and some have not even recovered yet. Nobody could foretell the onslaught of the coronavirus and its impending shadow on retailers, but the disastrous episode that followed sheds urgent light on the need for better inventory optimization and planning in the consumer goods supply chain.

Retailers and distributors who early-on leveraged machine learning-based approaches for supply chain planning fared better than their contemporaries who continued to depend solely on analysts. With a working model in place, the data led to smarter decisions. Incorporating external data modules like social media data (Twitter, Facebook), macroeconomic indicators, market performance data (stocks, earnings, etc.) to the forecasting model, in addition to the past samples of the inventory data seasonality changes, are helping correctly determine the product demand pattern.

6. Blending digital and in-store experiences through omnichannel ecommerce offerings

The pandemic has pushed many people who would normally shop in person to shop online instead. Retailers are considering multiple options for getting goods in the hands of their customers, including contactless transactions and curbside pickups. Not that these omnichannel fulfillment patterns were not already in place before the coronavirus struck, but they have greatly accelerated under COVID-19. AI is helping retailers expedite such innovations as e-commerce offerings, blending of digital and in-store experiences, curbside pickup and quicker delivery options, and contactless delivery and payments.

7. Strengthening cybersecurity and fighting fraud using AI

Fraud is always a threat around the holidays. And given the COVID-19 pandemic and the subsequent shift to everything online, fraud levels have jumped by 60% this season. An increase in card-not-present transactions incites fraudsters to abuse cards that have been compromised. Card skimming, lost and stolen cards, phishing scams, account takeovers, and application fraud present other loopholes for nefarious exploits to take place. In a nutshell, fraudsters are projected to extort innocent customers by about 5.5% more this year. In this case, card issuers and merchants alike armed with machine learning and AI are analyzing huge volumes of transaction, identifying the instances of attempted fraud, and automating the response to it.

8. AI-powered chatbots for customer service

Chatbots that can automatically respond to repetitive and predictable customer requests are one of the speediest growing sectors of big data and AI. Thanks to advances in NLP and natural language generation, chatbots can now correctly understand complex written and spoken queries of the most nuanced order. These smart assistants are already saving companies millions of dollars per year by supplementing human customer service reps in resolving issues with purchases, facilitating returns, helping find stores, answering repetitive queries concerning hours of operation, etc.

9. AI guides for enabling painless gift shopping

As this is the busiest time of the year when customers throng websites and stories for gift-shopping, gaps in customer service can seriously confuse and dissuade the already indecisive shopper. In such a scenario, tools like interactive AI-powered gift finders are engaging shoppers in a conversation by asking a few questions about the gift recipient’s personality, and immediately providing them with gifting ideas, helping even the most unsettled gift shopper to find the perfect gift with little wavering. This is helping customers overcome choice paralysis and inconclusiveness and helping companies boost conversions, benefiting both sides of the transaction table.

10. AR systems for augmented shopping experience

AR is taking the eCommerce shopping and customer experience to the next level. From visual merchandising to hyper-personalization, augmented reality offers several advantages. Gartner had indicated in a 2019’s predictions report that by 2020 up to 100 million consumers are expected to use augmented reality in their shopping experiences and the prophecy came true. The lockdown and isolation necessitated by Covid-19 rapidly increased the demand for AR systems.

Based on the “try-before-you-buy” approach, augmented shopping appeals to customers by allowing them to interact with their choice of products online before they proceed to buy any. For instance, AR is helping buyers visualize what their new furniture will look and feel like by moving their smartphone cameras around the room in real-time and getting a feel of the size of the item and texture of the material for an intimate understanding before purchase. In another instance, AR is helping women shop for makeup by providing them with a glimpse of the various looks on their own face at the click of a button.

To survive the competitive landscape of eCommerce and meet the holiday revenue goals this year, merchants and retailers are really challenge the status quo and adopting AI-powered technology for meeting customer expectations. AI is truly the future of retail, and not leveraging the power of artificial intelligence, machine learning and related tech means you are losing out.

ProGAN, StyleGAN, StyleGAN2: Exploring NVIDIA’s breakthroughs

This article focuses on exploring NVIDIA’s approach on generating high quality images using GANs and progress made in each of its successor networks.

Photo by Nana Dua on Unsplash

Back in 2014, Ian Goodfellow and his colleagues presented the much famous GANs(Generative Adversarial Networks) and it aimed at generating true to life images which were nearly unidentifiable to be outcome of a network.

Researchers found many use-cases where GANs could entirely change the future of ML industry but there were some shortcomings which had to be addressed. ProGAN and its successors improve upon the lacking areas and provide us with mind blowing results.

This post starts at understanding GAN basics and their pros and cons, then we dive into architectural changes incorporated into ProGAN, StyleGAN and StyleGAN2 in detail. It is assumed that you are familiar with concepts of CNN and overall basics of Deep neural nets.

Let’s Start-

Quick Recap into GANs —

GANs are Generative model which aims to synthesize new data like training data such that it is becomes hard to recognize the real and fakes. The architecture comprises of two networks — Generator and Discriminator that compete against each other to generate new data instances.

Generator: This network takes some random numbers/vector as an input and generates an image output. This output is termed as “fake” image since we will be learning the real image data distribution and attempt to generate a similar looking image.

Architecture: The network comprises of several transposed convolution layers aimed at up-scaling and turning the vector 1-D input to image. In below image we see that a 100-d input latent vector gets transformed into (28x28x1) image by successive convolution operations.

Generator (Source)

Discriminator: This network accepts generator output + real image(from training set) and classifies them as real or fake. In the below image, we see the generator output is fed into discriminator and then classified accordingly by a classifier network.

Discriminator (Source)

Both the networks are in continuous feedback, where the generator learns to create better “fakes” and discriminator learns to accurately classify “fakes” as “fake”. We have some predefined metrics to check generator performance but generally the quality of fakes tells the true story.

Overall GAN architecture and its training summary-

GAN-architecture (Source)

? Note: In the rest of the article Generator and Discriminator networks will be referred as G network and D network.

Here is the step-by-step process to understand the working of GAN model:

  1. Create a huge corpus(>30k) of training data having clean object centric images and no sort of waste data. Once data gets created, we perform some more intermediate data prep steps(as specified in official StyleGAN repository) and start the training.
  2. The G network takes a random vector and generates images, most of which will look like absolutely nothing or will be worse at start.
  3. D network takes 2 inputs (fakes by G from step 1 + real images from training data) and classifies them as “real” or “fake”. Initially classifier will easily detect the fakes but once the training commences, G network will learn to fool the classifier.
  4. After calculation of loss function, D network weights are being updated to make the classifier stricter. This making predicting fakes easy for D network.
  5. Thereafter the G network updates its parameters and aims to improve the quality of images to match the training distribution with each iterative feedback from D network.
  6. Important: Both the networks train in isolation, if the D network parameters get updated, G remain untouched and vice-versa.

This iterative training of G & D network continues till G produces good quality images and fools the D confidently. Thus both networks reach a stage known as “Nash equilibrium”.

? Limitations of GAN:

  1. Mode collapse — The point at which generator produces same set of fakes over a period is termed as mode collapse.
  2. Low-Res generator output— GANs work best when operated within low-res image boundaries(less than 100×100 pixels output) since generator fails to produce images with finer details which may yield high-res images. Thus high-res images can be easily classified as “fake” and thus discriminator network overpowers the generator network.
  3. High volume of training data — Generation of fine results from generator requires lot of training data to be used because less the data more distinguishable the features will be from output fake images.

Let us start with knowing the basics of ProGAN architecture in next section and what makes it stand out.

ProGAN:

Paper — “Progressive Growing of GANs for Improved Quality, Stability, and Variation” by Tero Karras, et al. from NVIDIA

ImplementationProgressive_growing_of_gans

Vanilla GAN and most of earlier works in this field faced the problem of low-resolution result images(‘fakes’). The architecture could perfectly generate 64- or 128-pixels square images but higher pixel images were difficult to handle (images above 512×512) by these models.

ProGAN (Progressive Growing GAN) is an extension to the GAN that allows generation of large high-quality images, such as realistic faces with the size 1024×1024 pixels by efficient training of generator model.

1.1 Understanding the concept :

Progressive growing concept refers to changing the way generator and discriminator model architectures train and evolve.

Generator network starts with less Convolution layers to output low-res images(4×4) and then increments layers(to output high-res images 1024×1024) once the last smaller model converges. Similarly D network follows same approach, starts with smaller network taking the low-res images and outputs the probability. It then expands its network to intake the high-res images from generator and classify them as “real” or “fake”.

Both the networks expand simultaneously, if G outputs 4×4 pixel image then D network needs to have architecture that accepts these low-res image as shown below –

ProGAN training visualization (Source)

This incremental expansion of both G and D networks allows the models to effectively learn high level details first and later focus on understanding the fine features in high-res (1024×1024) pixel images. It also promotes model stability and lowering the probability of “mode collapse”.

We get an overview of how the ProGAN achieves generation of high-res images but for more detail into how the incremental transition in layers happens refer to the two best blogs —

a) Introduction-to-progressive-growing-generative-adversarial-networks

b) ProGAN: How NVIDIA Generated Images of Unprecedented Quality

Sample 1024×1024 results by ProGAN. (Source)

ProGANs were the first iteration of GAN models that aimed at generating such high-res image output that gained much recognition. But the recent StyleGAN/StyleGAN2 has taken the level too high, so we will mostly focus on these two models in depth.

Let us jump to StyleGAN:

Paper: A Style-Based Generator Architecture for Generative Adversarial Networks

Implementation https://github.com/NVlabs/stylegan

ProGAN expanded vanilla GANs capacity to generate high-res 1024-pixel square images but still lacked the control over the styling of the output images. Although its inherent progressive growing nature can be utilized to extract features from multiple scales in meaningful way and get drastically improved results but still lacked the fineness in output.

Facial features include high level features like face shape or body pose, finer features like wrinkles and color scheme of face and hair. All these features need to be learnt by model appropriately.

StyleGAN mainly improves upon the existing architecture of G network to achieve best results and keeps D network and loss functions untouched. Let us jump straight into the additional architectural changes –

Generator architecture increments. (Source)

  1. Mapping Network:

Instead of directly injecting input random vector to G network, a standalone mapping network(f) is added that takes the same randomly sampled vector from the latent space(z) as input and generates a style vector(w). This new network comprises of 8 FC (fully connected)layers which outputs a 512-dimension latent vector similar in length to the input 512-d vector. Thus we have w = f(z) where both z,w are 512-d vectors. But a question remains.

What was the necessity to transform z w

“Feature entanglement” is the reason we need this transformation. In humans dataset, we see beard and short hair are associated with males which means these features are interlinked, but we need to remove that link (so we see guys have longer hair) for more diverse output and get control over what GANs can produce.

The necessity arises to disentangle features in the input random vector so as to allow a finer control on feature selection while generating fakes and the mapping network helps us achieve this mainly not following the training data distribution and reducing the correlation between features.

The G network in StyleGAN is renamed to “synthesis network” with the addition of the new mapping network to the architecture.

“You might be wondering? how this intermediate style vector adds into the G network layers”. AdaIN is the answer to that

2. AdaIN (Adaptive Instance Normalization):

To inject the styles into network layers, we apply a separately learned affine operation A to transform latent vector win each layer. This operation A generates a separate style y[ys, yb] (these both are scalars)from w whichis applied to each feature map when performing the AdaIN.

In the AdaIN operation, each feature map is normalized first and then scale(ys) + bias(yb) is applied to place the respective style information to feature maps.

AdaIN in G Network (Source)

Using normalization, we can inject style information into the G network in a much better way than just using an input latent vector.

The generator now has a sort of “description” of what kind of image it needs to construct (due to the mapping network), and it can also refer to this description whenever it wants (thanks to AdaIN).

3. Constant Input:

“Having a constant input vector ?”, you might be wondering why….?

Answer to this lies in AdaIN concept. Let us consider we are working on a vanilla GAN and off-course we require a different input random vector each time we want to generate a new fake with different styles. This means we are getting all different variations from input vector only once at start.

But StyleGAN has AdaIN & mapping network which allows to incorporate different styles/variations in input vector at every layer, then why we need a different input latent vector each time? Why can’t we work with constant input only?

G network no longer takes a point from the latent space as input but relies on a learned constant 4x4x512 value input to start the image synthesis process.

4. Adding Noise:

Need to have more fine-tuned output that looks more realistic? A small feature change can be added by the random noise being added to input vector which makes the fakes look truer.

A Gaussian noise (represented by B) is added to each of the activation maps before the AdaIN operations. A different sample of noise is generated for each block and is interpreted based on scaling factors of that layer.

5. Mixing regularization:

Using the intermediate vector at each level of synthesis network might cause network to learn correlation between different levels. This correlation needs to be removed and for this model randomly selects two input vectors (z1 and z2) and generates the intermediate vector (w1 and w2) for them. It then trains some of the levels with the first and switches (in a random split point) to the other to train the rest of the levels. This switch in random split points ensures that network do not learn correlation very much and produces different looking results.

Training configurations: Below are different configurations for training StyleGAN which we discussed above. By default Config-F is used while training.

Source

Training StyleGAN model

2.1. Let us have a look on Training StyleGAN on Custom dataset:

Pre-requisites– TensorFlow 1.10 or newer with GPU support, Keras version <=2.3.1. Other requirements are nominal and can be checked on official repository.

Below are the steps to be followed –

1. StyleGAN has been officially trained on FFHQ, LSUN, CelebHQ datasets which nearly contain more than 60k images. So looking at the count, our custom data must have around 30k images to begin with.

2. Images must square shaped(128,256,512,1024) and the size must be chosen to depend upon GPU or compute available for training model. 3. We will be using official repository for training steps. So let us clone the repository and start with next steps.

4. Data prep — Upload the image data folder to clone repository folder. Now we need to convert the images to TFRecords since the training and evaluation scripts only operate on TFRecord data, not on actual images. By default, the scripts expect to find the datasets at datasets/<NAME>/<NAME>-<RESOLUTION>.tfrecords

5. But why multi-resolution data Answer lies in the progressive growing nature of G and D network which train model progressively with increasing resolution of images. Below is script for generating TF-records for custom dataset –

Source for the custom source code format. Carbon.sh

6. Configuring the train.py file: We need to configure the train file with our custom data TFRecord folder name present in datasets folder. Also there are some other main changes(shown in image below) related to kimgs and setting the GPUs available.

Train script from StyleGAN repo with additional parameter change comments

7. Start training — Run the train script withpython train.py. The model runs for several days depending on the training parameters given and images.

8. During the training, model saves intermediate fake results in the path results/<ID>-<DESCRIPTION>. Here we can find the .pkl model files which will be used for inference later. Below is a snap of my training progress.

Snapshot from self-trained model results

2.2. Inference using trained model: 1. Authors provide two scripts — pretrained_example.py and generate_figures.py to generate new fakes using our trained model. Upload your trained model to Google Drive and get corresponding model file link.

2. pretrained_example.py — Using this script we can generate fakes using different seed values. Changes required to file shown below –

3. generate_figures.py — This script generates all sample images as shown in StyleGAN paper. Change the model url in the file and if you have used different resolution training images make changes as shown below. Suppose you trained model on 512×512 images.

2.3. Important mention:

Stylegan-encoder repository allows to implement style-mixing using some real-world test images rather than using seeds. Use the jupyter-notebook to implement some style-mixing and playing with latent directions.

2.4 Further reading:

Check out the 2 part lectures series on StyleGAN by ML Explained — A.I. Socratic Circles — AISC for further insights:

StyleGAN 2:

Paper: Analyzing and Improving the Image Quality of StyleGAN

Implementation: https://github.com/NVlabs/stylegan2

3.1. Introduction

StyleGAN surpassed the expectation of many researchers by creating astonishing high-quality images but after analyzing the results there were some issues found. Let us dive into the pain points first and then have a look into the changes made in StyleGAN 2.

Issues with StyleGAN-

1. Blob(Droplet) like artifact: Resultant images had some unwanted noise which occurred in different locations. Upon research it was found that it occurs within synthesis network originating from 64×64 feature maps and finally propagating into output images.

Source

This problem occurs due to the normalization layer (AdaIN). “When a feature map with a small spike-type distribution comes in, even if the original value is small, the value will be increased by normalization and will have a large influence”. Authors confirmed it by removing the normalization part and analyzing the results.

2. Phase artifacts: Progressive nature of GAN is the flag bearer for this issue. It seems that multiple outputs during the progressive nature causes high frequency feature maps to be generated in middle layers, compromising shift invariance.

Source

3.2. StyleGAN2 — Discussing major model improvements

StyleGAN2 brings up several architecture changes to rectify the issues which were faced earlier. Below are different configurations available –

Source

1 Weight Demodulation:

StyleGAN2 like StyleGAN uses a normalization technique to infiltrate styles from W vector using learned transform A into the source imagebut now the droplet artifacts are being taken care of. They introduced Weight Demodulation for this purpose. Let us investigate changes made –

Source

The first image(a) above shows the synthesis network from StyleGAN having 2 main inputs — Affine transformation (A) and input noise(B) applied to each layer. The next image(b) expands the AdaIN operation into respective normalization and modulation modules. Also each style(A) has been separated into different style blocks.

Let us discuss the changes in next iteration(image C)-

Source

  • First, have constant input(c) directly as model input rather than modified input C with noise and bias.
  • Second, the noise and bias are removed from style block and moved outside.
  • At last, we only modify standard deviation per feature map rather than both mean and std.

Next further we aim at adding demodulation module(image D)to remove the droplet artifacts.

Source

As seen in image above, we transform each style block by two operations-

1. Combine modulation and convolution operation(Mod) by directly scaling the convolution weights rather than first applying modulation followed by convolution.

Source

Here w — original weight, w’ — modulated weight, si — scaling value for feature map i.

2. Next is the demodulation step(Demod), here we scale the output feature map j by standard deviation of output activations (from above step) and send to convolution operation.

Source

Here small ε is added to avoid numerical operation issues. Thus entire style block in now integrated into single convolution layer whose weights are updated as described above. These changes improve training time, generate more finer results, and mainly remove blob like artifacts.

2 Lazy Regularization:

StyleGANs cost function include computing both main loss function + regularization for every mini-batch. This computation has heavy memory usage and computation cost which could be reduced by only computing

regularization term once after 16 mini-batches. This strategy had no drastic changes on model efficiency and thus was being implemented in StyleGAN2.

3 Path Length Regularization:

It is atype of regularization that allows good conditioning in the mapping from latent codes to images. The idea is to encourage that a fixed-size step in the latent space W results in a non-zero, fixed-magnitude change in the image. For a great detailed explanation please refer —

Papers with Code – Path Length Regularization Explained

Path Length Regularization is a type of regularization for generative adversarial networks that encourages good…

paperswithcode.com

4 Removing Progressive growing:

Progressive nature of StyleGAN has attributed to Phase artifacts wherein output images have strong location preference for facial features. StyleGAN2 tries to imbibe the capabilities of progressive growing(training stability for high-res images) and implements a new network design based on skip connection/residual nature like ResNet.

This new network does not expand to increase image resolution and yet produces the same results. This network is like the MSG-GAN which also uses multiple skip connections. “Multi-Scale Gradients for Generative Adversarial Networks” by Animesh Karnewar and Oliver Wang showcases an interesting way to utilize multiple scale generation with a single end-to-end architecture.

Below is actual architecture of MSG-GAN with the residual connections between G and D networks.

Source

StyleGAN2 makes use of the different resolution features maps generated in the architecture and uses skip connections to connect low-res feature maps to final generated image. Bilinear Up/Down-sampling is used within the G and D networks.

Source

To find out the optimal network, several G,D network combinations were tried and below is the result on FFHQ and LSUN datasets.

Source

Result analysis –

  1. PPL values improves drastically in all combinations with G network having skip connections.
  2. Residual D network and G output skip network give best FID and PPL values and is being mostly used. This combination of network is the configuration E for StyleGAN2.

5 Large Networks:

With all above model configurations explained, we now see the influence of high-res layers on the resultant image. The Configuration E yields the best results for both metrics as seen in last section. The below image displays contribution of different resolutions layers in training towards final output images.

The vertical axis shows the contribution of different resolution layers and horizontal axis depicts training progress. The best Config-E for StyleGAN2 has major contribution from 512 resolution layers and less from 1024 layers. The 1024 res-layers are mostly adding some finer details.

Source

In general training flow, low-res layers dominate the output initially but eventually it is the final high-res layers that govern the final output. In Config-E, 512 res-layers seem to have more contribution and thus it impacts the output too. To get more finer results from the training, we need to increase the capacity of 1024 res-layers such that they contribute more to the output.

Config-F is considered a larger network that increases the feature maps in high-res layers and thus impacts the quality of resultant images.

Training StyleGAN2

3.3. Let us have a look on Training StyleGAN 2 on Custom dataset:

Pre-requisites– TensorFlow 1.14 or 1.15 with GPU support, Keras version <=2.3.1. Other requirements are nominal and can be checked on official repository.

Below are the steps to be followed – 1. Clone the repository — Stylegan2. Read the instructions in the readme file for verifying the initial setup on GPU and Tensorflow version.

2. Prepare the dataset(use only square shaped image with power of 2) as we did in StyleGAN training and place it in cloned folder. Now let us generate the multi-resolution TFRecords for our images. Below is the command –

3. Running training script: We do not need to change our training file, instead we can specify our parameters in the command only.

4. Like StyleGAN, here too our results will be stored into ./results/../ directory where we can see our model files(.pkl) and the intermediate fakes. Using the network-final.pkl file we will try generating some fakes with some random seeds as input.

3.4. Inference with random seeds:

  1. Upload your trained model to Drive and get the download link to it.
  2. We will be using run_generator.py file for generating fakes and style-mixing results.

In the first command, we provide seeds from 6600–6625 which generates 25 fake samples from our model corresponding to each seed value. Thus we can change this range to get desired number of fakes.

Similarly for style-mix, there are row-seeds and col-seeds input which generate images for which we need to have style-mixing. Change the seeds and we will get different images each time.

3.5 Results:

This sums up the StyleGAN2 discussion covering important architectural changes and training procedure. Below are the results generated and it gets rid of all issues faced by StyleGAN model.

Source

Check out this great StyleGAN2 explanation by Henry AI Labs YouTube channel –

Thanks for going through the article. With my first article, I have attempted to cover an important topic in Vision area. If any error in the details is found, please feel free to highlight in comments.

References –

Custom code snippets: Create beautiful code snippets in different programming languages at https://carbon.now.sh/. All above snippets were created from the same website.

ProGAN

How to Implement Progressive Growing GAN Models in Keras – Machine Learning Mastery

The progressive growing generative adversarial network is an approach for training a deep convolutional neural network…

machinelearningmastery.com

A Gentle Introduction to the Progressive Growing GAN – Machine Learning Master

Progressive Growing GAN is an extension to the GAN training process that allows for the stable training of generator…

machinelearningmastery.com

StyleGAN 

A Gentle Introduction to StyleGAN the Style Generative Adversarial Network – Machine Learning…

Generative Adversarial Networks, or GANs for short, are effective at generating large high-quality images. Most…

machinelearningmastery.com

StyleGAN — Style Generative Adversarial Networks — GeeksforGeeks

Generative Adversarial Networks (GAN) was proposed by Ian Goodfellow in 2014. Since its inception, there are a lot of…

www.geeksforgeeks.org

StyleGANs: Use machine learning to generate and customize realistic images

Switch up your style and let your imagination run free by unleashing the power of Generative Adversarial Networks

heartbeat.fritz.ai

Explained: A Style-Based Generator Architecture for GANs – Generating and Tuning Realistic…

NVIDIA’s novel architecture for Generative Adversarial Networks

towardsdatascience.com

StyleGAN2

StyleGAN2

This article explores changes made in StyleGAN2 such as weight demodulation, path length regularization and removing…

towardsdatascience.com

GAN — StyleGAN & StyleGAN2

Do you know your style? Most GAN models don’t. In the vanilla GAN, we generate an image from a latent factor z.

medium.com

From GAN basic to StyleGAN2

This post describes GAN basic, StyleGAN, and StyleGAN2 proposed in “Analyzing and Improving the Image Quality of…

medium.com

Chatbot in Python-Part 1

According to Gartner, “by 2022, 70% of white-collar workers will interact with conversational platforms daily.”

According to an estimate, more than 67% of consumers worldwide used a chatbot for customer support in the past year, and around 85% of all customer interactions will be handled without a human agent by 2020.

These are not mere estimates but the reality. In today’s world most of us in our business or daily activities are dependent on virtual assistants.

Consider a scenario where a customer wants to fix some of their issues and then lands on a chatbot where just by describing their issue, they can find a suitable fix of their solution. Even if a suitable fix is not possible, they can create a ticket for the same and can check their ticket progress. In this way, ITOA task becomes seamless and hassle-free.

Chatbots have not only made it possible but have opened a new gateway to dealing with some of the time-consuming tasks with great ease and customer satisfaction.

In this post (Part 1), we are going to see how we can develop chatbots.

Many frameworks can be used for chatbot development, and in this article we are focusing on developing a chatbot in python using Microsoft Bot Framework.

Before directly diving into the development, let us first start by setting up all the requirements which will be required in the development of chatbot using bot SDK.

  1. Python 3.6 or 3.7
  2. Bot Framework emulator
  3. git-for version control

Setting up an environment for development

Let us start by setting up a virtual env in python. For this, we need to install virtualenv using pip command

Install virtualenv

Once virtualenv is installed successfully, create a virtual environment by using the following command:

Activate the virtual environment

For windows

Activate the virtual environment in windows

Installing the required dependencies

  1. To develop bot locally in python, there are some packages like botbuilder-core, asyncio, and cookiecutter which need to be installed

Installing required packages

2. Microsoft Bot framework provides some predefined templates to get started quickly. In this step, we use the cookie cutter to install an echo bot template which is one of the basic bot service templates.

Downloading the pre-defined bot service template

3. Now we navigate to the Wechat folder where we save our bot and install all dependencies mentioned in requirement.txt that are required to run out bot.

4. Now after we have installed all the dependencies, we will finally run our predefined bot template by running the app.py file present in the WeChat folder.

Running echo bot template

5. We open the bot emulator and connect to localhost on 3978 port number

Bot echoing back the response given by the user

This finishes our steps of installation and running our bot template.

But, wouldn’t it be great if as soon as a new conversation starts or a user is added to the conversation, the bot should send a message on its own letting the user know more about its feature or its capabilities, as the primary object of a bot is to engage your user in a meaningful conversation?

So, now we will understand how we can add a welcome message to our existing bot template.

But wait, before that, it is important to learn how a bot uses activity objects to communicate with its users. So, let us first look at activities that are exchanged when we run a simple echo bot.

Bot working process

Two activity types take place as soon a user joins the bot and sends an input to it (refer above image):

  1. ConverstationUpdate- Bots sends an update message when a new user or a bot joins the conversation
  2. Message- The message activity carries conversation information between the end-user and the bot. In our case of Echo bot, it is a simple text which the channel render. Alternatively, the message activity can be a text to be spoken, suggested action, or cards.

From the above diagram, we notice that in a conversation, people may speak one at a time taking “turns”. So how do we implement this case of a bot?

In the case of Bot Framework, a turn consists of a user’s incoming activity to the bot and the activity which the bot sends back to the user as an immediate response. But the question lies here is how the bot handles the user incoming activity and decides which response is to be sent. This is what we focus on in the bot activity stack where we see how a bot handles the arrival of incoming message activity.

When the bot receives an activity, it passes it on to its activity handlers, under which lies one base handler called turn handler. All the activities get routedthrough there. The turn handler sees an incoming activity and sends it to the OnMessageActivityAsync activity handler.

When we create the bot, the logic for handling and responding to messages will go in the OnMessageActivityAsync handler and the logic for handling the members being added to the conversation will go into OnMembersAddedAsync handler, which is called whenever a member is added to a handler. I have discussed these two handlers, in brief, to know about them more please follow this link.

So, to send the welcome message to every new user added in the conversation, we need to apply logic on the OnMembersAddedAsync. For that, we create a property welcome_user_state.did_welcome_user to check if the user was greeted with a welcome message. So whenever new user input is received, we check this property to see if it is set to true, if the property is not set to true, we send an initial welcome message to the user. If it is set to true, based on the content of the user’s input this bot will do one of the following:

  • Echo back a greeting received from the user.
  • Display a hero card providing additional information about bots.

Instead of sending a normal text message to the user we are going to send an Adaptive card to the user as a welcome message.

In this blogpost, I am not discussing much on adaptive cards but I suggest reading this article to know more on that.

So, for now, we need to create an adaptive card and send it as an activity to the user to be seen on the channel, once a new user has been welcomed, user input information has been evaluated for each message turn, and the bot has provided a response based on the context of that user input.

Code snippet to send a Welcome message to the new user Now we run the above code to see if the user is greeted with a welcome message when a new user is added to a conversation.

Bot Emulator Snippet

Conclusion:

In this article, we have seen how we can use python with Microsoft Bot Framework to create an efficient chatbot, and how we can send a welcome message to a new user.

In the upcoming articles, we will learn how to implement the concepts of Dialogs, intents, and entities using LUIS and Microsoft Cognitive Services.

To check out the above code, follow the link to my Github repo.

Demystifying the struggles of adopting AI in the Manufacturing Sector

“For half of the businesses in the Manufacturing sector, AI adoption is still an unexplored area with a hand full of complex workflows and a mind full of uncertain queries.”

A recent study conducted by the MAFI Foundation revealed that only 5% of manufacturing firms have succeeded in identifying AI opportunities and have a roadmap ready to capitalize on their business. Within this context, it pays to identify the reasons for lower AI adoption rate in the manufacturing sector and how advancement in AI practice is helping to change this narrative to embrace technological change.

Top 3 reasons for the lower AI adoption rate in the Manufacturing sector

  1. Lack of identifying organizational imperatives: It is an accepted truism that people are at the center of executing any strategic vision, but such a truism holds only when a vision exists in the first place. In the current scenario, over half of the firms in the manufacturing sector have indicated that they do not even have a plan underway to integrate AI into their value creation paradigm. Thus, leaders in the manufacturing sector can use this opportunity to step up and create a new vision to take their business to the next level.
  2. Solution approach: The effort might require the businesses to invest in capacity building, training, confronting the organization’s culture, striking out and finding new partnerships, and creating plans for their data assets. The result, however, will be a nimble but data-driven organization with an upgraded arsenal of analytical tools can succeed even under the most challenging conditions.
  3. Underlining mismatch and expectation in the AI adoption process: The second piece of the puzzle centers around the mismatch in expectations on AI within the manufacturing sector. Expectations on how AI can be developed and implemented within manufacturing companies in the current scenario can vary widely, from the realm of excessive optimism to the realm of complete pessimism. Meanwhile, the domain of AI itself continues to evolve rapidly, with new infrastructures and services coming to life, thanks to the competition between a dazzling array of technology players across the world. Solution approach: There is a growing need for Analytics Translation across organizations, where the expertise needs to understand and communicate advanced analytical insights to a variety of stakeholders to become a key factor towards successful AI adoption. Such translators may emerge within or outside of the organization and bridge the gap between mathematics, cutting edge computing, and the business’ balance sheet. Their enduring value comes through shaping a data-driven culture that eventually enables new paradigms of decision making for firms.
  4. Ability to create and sustain the value proposition: The third reason revolves around the context. Any tool or any decision can only be useful and applied correctly when it suffices the context. The success of AI adoption within different manufacturing firms depends upon the ability to create and sustain value right from the beginning. Moreover, it requires the right data to be matched with a certain problem before the right solution makes an appearance. Solution approach: Given that differential levels of technical debt accumulate within firms over years, integration of efforts with in-house systems and the seamless interlinking of data flows would enable a fluid adoption of AI to be highly contextual. Moreover, it requires an understanding of the business that would go beyond traditional management consulting or general IT-based solution offerings.

For any growing business in the manufacturing sector to accelerate AI deployment, the right partnerships that are built on a shared contextual understanding will go a long way in mitigating any adoption risks for AI. Thus, identifying the pain-points and finding a solution to the technological problems using an AI mindset is paramount and can do wonders to scale bigger in the manufacturing business.

Gradient Boosting Trees for Classification: A Beginner’s Guide

Introduction

Machine learning algorithms require more than just fitting models and making predictions to improve accuracy. Nowadays, most winning models in the industry or in competitions have been using Ensemble Techniques to perform better. One such technique is Gradient Boosting.

This article will mainly focus on understanding how Gradient Boosting Trees works for classification problems. We will also discuss about some important parameters, advantages and disadvantages associated with this method. Before that, let us get a brief overview of Ensemble methods.

What are Ensemble Techniques ?

Bias and Variance — While building any model, our objective is to optimize both variance and bias but in the real-world scenario, one comes at the cost of the other. It is important to understand the trade-off and figure out what suits our use case.

Ensembles are built on the idea that a collection of weak predictors, when combined, give a final prediction which performs much better than the individual ones. Ensembles can be of two types —

i) Bagging — Bootstrap Aggregation or Bagging is a ML algorithm in which a number of independent predictors are built by taking samples with replacement. The individual outcomes are then combined by average (Regression) or majority voting (Classification) to derive the final prediction. A widely used algorithm in this space is Random Forest.

ii) Boosting — Boosting is a ML algorithm in which the weak learners are converted into strong learners. Weak learners are classifiers which always perform slightly better than chance irrespective of the distribution over the training data. In Boosting, the predictions are sequential wherein each subsequent predictor learns from the errors of the previous predictors. Gradient Boosting Trees (GBT) is a commonly used method in this category.

Fig 1. Bagging (independent predictors) vs. Boosting (sequential predictors)

Performance comparison of these two methods in reducing Bias and Variance — Bagging has many uncorrelated trees in the final model which helps in reducing variance. Boosting will reduce variance in the process of building sequential trees. At the same time, its focus remains on bridging the gap between the actual and predicted values by reducing residuals, hence it also reduces bias.

Fig 2. Gradient Boosting Algorithm

What is Gradient Boosting ?

It is a technique of producing an additive predictive model by combining various weak predictors, typically Decision Trees.

Gradient Boosting Trees can be used for both regression and classification. Here, we will use a binary outcome model to understand the working of GBT.

Classification using Gradient Boosting Trees

Suppose we want to predict whether a person has a Heart Disease based on Chest Pain, Good Blood Circulation and Blocked Arteries. Here is our sample dataset —

Fig 3: Sample Dataset

I. Initial Prediction — We start with a leaf which represents an initial prediction for every individual. For classification, this will be equal to log(odds) of the dependent variable. Since there are 4 people with and 2 without heart disease, log(odds) is equal to –

Next, we convert this to a probability using the Logistic Function –

If we consider the probability threshold as 0.5, this means that our initial prediction is that all the individuals have Heart Disease, which is not the actual case.

II. Calculate Residuals — We will now calculate the residuals for each observation by using the following formula,

Residual = Actual value — Predicted value

where Actual value= 1 if the person has Heart Disease and 0 if not and Predicted value = 0.7

The final table after calculating the residuals is the following —

Fig 4. Sample dataset with calculated residuals

III. Predict residuals — Our next step involves building a Decision Tree to predict the residuals using Chest Pain, Good Blood Circulation and Blocked Arteries.

Here is a sample tree constructed —

Fig 5. Decision Tree — While constructing the tree, its maximum depth has been limited to 2

Since the number of residuals is more than the number of leaves, we can see that some residuals have ended up in the same leaf.

How do we calculate the predicted residuals in each leaf?

The initial prediction was in terms of log(odds) and the leaves are derived from a probability. Hence, we need to do some transformation to get the predicted residuals in terms of log(odds). The most common transformation is done using the following formula —

Applying this formula to the first leaf, we get predicted residual as –

Similarly, we calculate the predicted residuals for the other leaves and the decision tree will finally look like —

Fig 6. Modified Decision Tree

IV. Obtain new probability of having a heart disease — Now, let us pass each sample in our dataset through the nodes of the newly formed decision tree. The predicted residuals obtained for each observation will be added to the previous prediction to obtain whether the person has a heart disease or not.

In this context, we introduce a hyperparameter called the Learning Rate. The predicted residual will be multiplied by this learning rate and then added to the previous prediction.

Why do we need this learning rate?

It prevents overfitting. Introducing the learning rate requires building more Decision Trees, hence, taking smaller steps towards the final solution. These small incremental steps help us achieve a comparable bias with a lower overall variance.

Calculating the new probability

Let us consider the second observation in the dataset. Since, Good Blood Circulation = “Yes” followed by Blocked Arteries = “Yes” for the person, it ends up in the first leaf with a predicted residual of 1.43.

Assuming a learning rate of 0.2 , the new log(odds) prediction for this observation will be –

Next, we convert this new log(odds) into a probability value –

Similar computation is done for the rest of the observations.

V. Obtain new residuals — After obtaining the predicted probabilities for all the observations, we will calculate the new residuals by subtracting these new predicted values from the actual values.

Fig 7. Sample dataset with residuals calculated using new predicted probabilities

Now that we have the residuals, we will use these leaves for building the next decision tree as described in step III.

VI. Repeat steps III to V until the residuals converge to a value close to 0 or the number of iterations matches the value given as hyperparameter while running the algorithm.

VII. Final Computation — After we have calculated the output values for all the trees, the final log(odds) prediction that a person has a heart disease will be the following –

where subscript of predicted residual denotes the i-th tree where i = 1,2,3,…

Next, we need to convert this log(odds) prediction into a probability by plugging it into the logistic function.

Using the common probability threshold of 0.5 for making classification decisions, if the final predicted probability that the person has a heart disease > 0.5, then the answer will be “Yes”, else “No”.

Hope these steps helped you understand the intuition behind working of GBT for classification !

Let us now keep a note of a few important parameters, pros, and cons of this method.

Important Parameters

While constructing any model using GBT, the values of the following can be tweaked for improvement in model performance –

  • number of trees (n_estimators; def: 100)
  • learning rate (learning_rate; def: 0.1) — Scales the contribution of each tree as discussed before. There is a trade-off between learning rate and number of trees. Commonly used values of learning rate lie between 0.1 to 0.3
  • maximum depth (max_depth; def: 3) — Maximum depth of each estimator. It limits the number of nodes of the decision trees

Note — The names of these parameters in Python environment along with their default values are mentioned within brackets

Advantages

  • It is a generalised algorithm which works for any differentiable loss function
  • It often provides predictive scores that are far better than other algorithms
  • ·It can handle missing data — imputation not required

Disadvantages

  • This method is sensitive to outliers. Outliers will have much larger residuals than non-outliers, so gradient boosting will focus a disproportionate amount of its attention on those points. Using Mean Absolute Error (MAE) to calculate the error instead of Mean Square Error (MSE) can help reduce the effect of these outliers since the latter gives more weights to larger differences. The parameter ‘criterion’ helps you choose this function.
  • It is prone to overfit if number of trees is too large. The parameter ‘n_estimators’ can help determining a good point to stop before our model starts overfitting
  • Computation can take a long time. Hence, if you are working with a large dataset, always keep in mind to take a sample of the dataset (keeping odds ratio for target variable same) while training the model.

Conclusion

Although GBT is being widely used nowadays, many practitioners still use it as a complex black-box and just run the models using pre-built libraries. The purpose of this article was to break down the supposedly complex process into simpler steps and help you understand the intuition behind the working of GBT. Hope the post served its purpose !

References

Revealing the Secrets of Creating New Game Characters Without Using Paintbrush

Introduction

In the gaming industry, the studios are responsible for the design and development of any game. A game designer will be responsible for providing multiple design ideas for various In-game components like characters, maps, scenes, weapons, etc.To develop a single character, a designer has to factor in multiple attributes like face morphology, gender, skin tone, clothing accessories, expressions, etc leading to a long and tedious development cycle. To minimize this complexity, we aim to identify tools and techniques, which can combine the automation power of machines to generate designs based on certain guard rails defined by the designers. This approach will be a path towards machine creativity with human supervision. From a business point of view, there will be more design options for the studios to select within a short span of time leading to huge cost savings.

Our solution utilizes advantage of advance deep learning models like GANs which have been proven to work extremely well in generative tasks (generating new data instances) such as image generation, image-to-image translation, voice synthesis, text-to-image translation, etc.

In this whitepaper, we explore the effectiveness of GANs for a specific use case in the gaming industry. The objective of using GAN in this use case is to create new Mortal Kombat (MK) game characters through style transfer on new or synthesized images and conditional GANs to generate only the characters of interest.

GAN & it’s clever way of training a generative model for image generation!

GAN has two competing neural networks, namely a Generator (G) and a Discriminator (D). In the case of image generation, the goal of the generator is to generate images that are indistinguishable (or the fake images) from the training images (or the real images). The goal of the discriminator is to classify between the fake and real images. The training process aims towards making the generator fool the discriminator, hence, to get the generated fake images that are as realistic as the real ones.

Following are the highlights of the GANs solution framework to create new Mortal Kombat Game characters without using a paintbrush:

i) Showing detailed analysis and the effectiveness of GANs to generate new MK characters in terms of image quality produced (subjective evaluation) and FID distances (objective evaluation).

ii) Results of style mixing using MK characters. The style mixing performed using the trained model.

iii) Experimental evaluation using Mortal Kombat dataset (custom dataset having 25,000) images. The training time is captured to understand the computation resources required to achieve the desired performance.

Types of GAN and their role in creating Mortal Kombat realistic characters

GANs are an advanced and rapidly changing field backed by unsupervised machine learning methodology. In order to use GAN effectively to create Mortal Kombat realistic characters, it is vital to comprehend its architecture and different types in use to get near-perfect results. In this section, we will discuss the types of GAN and architectural details of StyleGans in the latter stage.

Types of GAN

1. GAN (or vanilla GAN) [Goodfellow et al. 2014]-GAN belongs to a class of methods used for learning generative models based on game theory. There are two competing neural networks, Generator (G) and Discriminator (D). The goal of GAN is to train a generator network to produce sample distribution, which mimics the distribution of the training data. The training signal for G provided by the discriminator D, that is trained to classify the real and fake images. The following is the cost function of GAN.

Cost function:

The min/max cost function aims to train the D to minimize the probability of the data generated by G (fake data) and maximize the probability of the training data (real data). Both G and D are trained in alternation by Stochastic Gradient Descent (SGD) approach.

ii) Progressive Growing GAN(ProGAN)- ProGANs (Karras et al., 2017) are capable of generating high-quality photorealistic images, starting with generating very small images of resolution 4*4, growing progressively into generating images of 8*8,…1024*1024, until the desired output size, is generated. The training procedure includes cycles of fine-tuning and fading-in, which means that there are periods of fine-tuning a model with a generator output, followed by a period of fading in new layers in both G and D.

iii) Style GANs– StyleGANs (Karras et al., 2019) are based on ProGANs with minimal changes in the architectural design to equip it to demarcate and control high-level features like pose, face shape in a human face, and low-level features like freckles, pigmentation, skin pores, and hair. The synthesis of new images controlled by the inclusion of high-level and low-level features. And it is executed by a style-based generator in styleGAN. In a style-based generator, the input to each level is modified separately. Thus, there is a better control over the features expressed at that level.

There are various changes incorporated in styleGAN generator architecture to synthesize photorealistic images. These are bilinear up-sampling, mapping network, Adaptive Instance Normalization, removal of latent point input, the addition of noise, and mixing regularization control. The intent of using StyleGan here is to separate image content and style content from the image.

(Note: For generating MK characters, we have used styleGAN architecture, and the architectural details are provided in the next section.)

iv) Conditional GANs

If we need to generate new Mortal Kombat characters based on the description provided by end-users. In vanilla GANs, we don’t have control over the types of data generated. The purpose of using conditional GANs [Mirza & Osindero, 2014] is to control the images generated by the generator based on the conditional information given to the generator. Providing the label information (face mask, eye mask, gender, hat, etc.) to the generator helps in restricting the generator to synthesize the kind of images the end-user wants, i.e. “content creation based on the description”.

The cost function of conditional GAN given below:

Cost function:

This conditional information is supplied “prior” to the generator. In other words, we are giving an arbitrary condition ‘y’ to the GAN network, which can restrict G in generating the output and the discriminator in receiving the input. The following figure depicts the cGAN inputs and an output.

Fig 1. A- Random generation of MK characters using Random Noise vector as input, B-Controlled generation of MK characters using Random Noise vector and labeled data as input (Condition-Create MK characters with features as Male, Beard, Moustache)

Architectural details of StyleGANs

Following are the architectural changes in styleGANs generator:

  1. The discriminator D of a styleGAN is similar to the baseline progressive GAN.
  2. The generator G of a styleGAN uses baseline proGAN architecture, and the size of generated images starts from 4*4 resolution to 1024*1024 resolution on the incremental addition of layers.
  3. Bi-linear up/down-sampling is used in both discriminator and generator.
  4. Introduction of mapping network, which is used to transform the input latent vector space into an intermediate vector space (w). This process is executed to disentangle the Style and Content features. The mapping function is implemented using 8-layers Multi-layer Perceptron (8 FC in Fig 2).
  5. The output of the mapping network is passed through a learned Affine transformation (A), that transforms intermediate latent space w to styles y=(ys, yb) that controls the Adaptive Instance Normalization module of the synthesis network. This style vector gives control to the style of the generated image.
  6. The input to the AdaIN is y = (ys, yb) which is generated by applying Affine transformation to the output of the mapping network. The AdaIN operation defined as the following equation:

Each feature map xi is normalized separately and scaled using the scalar ys component and biased using the scalar yb component of y. The synthesis network contains 18 convolutional layers 2 for each of the resolutions for 4×4 – 1024×1024 resolution images. So, a total of 18 layers are present.

7. The input is a constant matrix of 4*4*512 dimension. Rather than taking a point from the latent space as input, there are two sources of randomness induced in generating the images. These results were extracted from the mapping network and the noise layers. The noise vector introduces stochastic variations in the generated image.

a) ProGAN generator

b) StyleGAN generator

Fig 2. a-ProGAN generator, b-StyleGAN generator [Karras et al., 2019]

Following is the table depicting different configuration setups on which style GANs can be trained.

Table 1: Configuration for StyleGANs

Business objectives and possible solutions

Experimental setup

Dataset

Fig 3. Sample images from the training MK dataset

Experimental design

We have conducted a set of experiments to examine the performance of StyleGANs in terms of FID, quality of output produced, training time vs performance on FID. In addition, we also checked the results imposing pre-trained latent vectors on new faces of data and Mortal Kombat characters data. We have implemented GANs and performed its feasibility analysis to overcome the following issues:

i) How effective StyleGANs are in producing MK characters with lesser data, low- resolution images, lighter architecture, and less training?

ii) How computationally extensive GANs are? Are GANs expensive to train? How to estimate the time and computational resources required to generate the desired output?

iii) How well the pre-trained vectors used for style-mixing to the original images?

Evaluation metrics

FID [Heusel et al. 2017]-Frechet Inception Distance score (FID) is a metric for image generation quality that calculates the distance between feature vectors calculated for real and generated images. FID is used to understand the quality of the image generated, the lower the FID score, higher the quality of image generated. The perfect FID score is zero. Experimental platform- All experiments are performed on the AWS platform, and the following is its configuration.

Experiments results

In order to provide a precise view of image generation of MK characters, we have conducted various experiments; and we were able to extract different results for each experiment.

  1. Training- The training images retrieved at a different number of kims are depicted in the following figures. All experiments were conducted at configuration “d”. Refer to Table 1. Fig 4. MK training results 1-Kims 2364, 2-Kims 5306 , 3-Kims 6126, 4-,Kims 8006 5-Kims 9766, 6- 10000 Kims
  1. Training time– The following table depicts the time taken, and FID results for the model trained till 10000 kims. The best-trained models have an FID of 4.88 on the FFHQ dataset (Karras et al., 2019) on configuration d (refer table 1). We got the final FID score of 53.39 on the MK dataset.

Table 3. Time taken for training & FID score

iii) Real-time image generation – Following are some of the fake images generated using different trained models using random seed values.

Fig 5. Real-time images generation (using seed value) @7726 kims pickle MK.

iv) Style Mixing

Using the Style Mixing method to generate new variations in the character appearance where we could use two or more reference images to generate new results. In this context, we have used two characters to create a new character.

Fig 6. Style Mixing results on

MK characters

v) Progressively growing GAN- results on MK

When progressively growing GAN is applied to MK characters, with its nature of incrementally adding the layers to the network, the network learns to generate smaller and low-resolution images, followed by more complex images. Thus, stabilizing the training and overcoming mode collapse of the generators.

Fig 7. Progressive GANs output

Conclusion

In this whitepaper, we have discussed the methodology of applying the feasibility analysis using styleGANs for Mortal Kombat characters. We have provided a detailed report on the GANs types and evolution with respect to image generation approaches to solve different use cases using GANs. Which also includes style representation and conditional generation. The latter part of our attempt provides the results of GANs training time, FID scores, real-time image generation output, style mixing, and ProGANs results. After completing ~15 days of training (with GPU cost $1.14 per hour), the FID score achieved is 53.39. The quality of images is also improved, which can be further enhanced by conducting flawless training sessions. Recent advancements

in GANs illustrate that a better GAN performance is achievable even with fewer data. Adaptive Discriminator Augmentation [Karras et al., 2020] and Differentiable Augmentation [Zhao et al., 2020] are a few of the recent approaches which have been proposed to train GANs effectively even with less amount of data, which is currently being researched in our CoE team.

About Affine

Affine is a Data Science & AI Service Provider, offering capabilities across the analytical value chain from data engineering to analytical modeling and business intelligence to solve strategic & day-to-day business challenges of organizations worldwide. Affine is a strategic analytics partner to medium and large-sized organizations (majorly Fortune 500 & Global 1000) around the globe that creates cutting-edge creative solutions for their business challenges.

References

[1] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).

[2] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems (pp. 6626-6637).

[3] Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.

[4] Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4401-4410).

[5] Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., & Aila, T. (2020). Training generative adversarial networks with limited data.arXiv preprint arXiv:2006.06676.

[6] Mirza, M., & Osindero, S. (2014). Conditional generative adversarial nets.arXiv preprint arXiv:1411.1784.

[7] Zhao, S., Liu, Z., Lin, J., Zhu, J. Y., & Han, S. (2020). Differentiable Augmentation for Data-Efficient GAN Training. arXiv preprint arXiv:2006.10738.

Recommendation Systems for Marketing Analytics

How I perceive recommendation systems is something which the traditional shopkeepers used to use.

Remember the time when we used to go shopping with our mother in the childhood to the specific shop. The shopkeeper used to give the best recommendations for the products, and we used to buy it from the same shop because we knew that this shopkeeper knows us best.

What the shopkeeper did was he understood our taste, priorities, the price range that we are comfortable with and then present the products which best matched our requirement. This is what the businesses are doing in the true sense now.

They want to know their customer personally by their browsing behaviour and then make them recommendation of the products that they might like, the only thing is that they want to do it on a large scale.

For example, Amazon and Netflix understand your behaviour through what you browse, add to basket and order, movies you watch and like and then recommend the best of the products which you make like with high probability.

In a nutshell, they combine what you call as the business understanding with some mathematics so that we can essentially know and learn about the products that the customer likes.

So basically, as recommendation system for marketing analytics is a subclass of information filtering system that seeks the similarities between users and items with different combinations.

Below are some of the most widely used types of recommendation systems:

  1. Collaborative Recommendation system
  2. Content-based Recommendation system
  3. Demographic based Recommendation system
  4. Utility based Recommendation system
  5. Knowledge based Recommendation system
  6. Hybrid Recommendation system

Let us go into the most useful ones which the industry is using:

  • Content Based Recommendation System

The point of content-based is that we should know the content of both user and item. Usually we construct user-profile and item-profile using the content of shared attribute space. The product attributes like image (Size, dimension, colour etc…) and text description about the product is more inclined towards “Content Based Recommendation”.

This essentially means that based upon the content that I watch on Netflix, I can run an algorithm to see what the most similar movies are and then recommend the same to the other users.

For example, when you open Amazon and search for a product, you get the similar products pop up below which is the item-item similarity that they have computed for the overall products that are there in Amazon. This gives us a very simple yet effective idea of how the products behave with each other.

Bread and butter could be similar products in the true sense as they go together but their attributes can be varied. In case of the movie industry, features like genres, reviews could tell us the

similar movies and that is the type of similarity we get for the movies.

  • Collaborative Recommendation System:

Collaborative algorithm uses “User Behaviour” for recommending items. They exploit behaviour of other users and items in terms of transaction history, ratings, selection, purchase information etc. In this case, features of the items are not known.

When you do not want to see what the features of the products are for calculating the similarity score and check the interactions of the products with the users, you call it as a collaborative approach.

We figure out from the interactions of the products with the users what are the similar products and then take a recommendation strategy to target the audience.

Two users who watched the same movie on Netflix can be called similar and when the first user watches another movie, the second users gets that same recommendation based on the likes that these people have.

  • Hybrid Recommendation System:

Combining any of the two systems in a manner that suits the industry is known as Hybrid Recommendation system. It combines the strengths of more than two Recommendation system and eliminates any weakness which exist when only one recommendation system is used.

When we only use Collaborative Filtering, we have a problem called as “cold start” problem. As we take into account the interaction of users with the products, if a user comes to the website for the first time, I do not have any recommendations to make to that customer as I do not have interactions available.

To eliminate such a problem, we used hybrid recommendation systems which combines the content-based systems and

collaborative based systems to get rid of the cold start problem. Think of it as this way, item-item and user-user, user-item interaction all combined to give the best recommendations to the users and to give more value to the business.

From here, we will focus on the Hybrid Recommendation Systems and introduce you to a very strong Python library called lightfm which makes this implementation very easy.

LightFM:

The official documentation can be found in the below link:

lyst/lightfm

Build status Linux OSX (OpenMP disabled) Windows (OpenMP disabled) LightFM is a Python implementation of a number of…

github.com

LightFM is a Python implementation of the number of popular recommendation algorithms for both implicit and explicit feedback.

User and item latent representations are expressed in terms of their feature’s representations.

It also makes it possible to incorporate both item and user metadata into the traditional matrix factorization algorithms. When multiplied together, these representations produce scores for every item for a given user; items scored highly are more likely to be interesting to the user.

Interactions : The matrix containing user-item interactions.

User_features : Each row contains that user’s weights over features.

Item_features : Each row contains that item’s weights over features.

Note : The matrix should be Sparsed (Sparse matrix is a matrix which contains very few non-zero elements.)

Predictions

fit_partial : Fit the model. Unlike fit, repeated calls to this method will cause training to resume from the current model state.

Works mainly for the new users to append to the train matrix.

Predict : Compute the recommendation score for user-item pairs.

The scores are sorted in the descending form and top n-items are recommended.

Model evaluation

AUC Score : In the binary case (clicked/not clicked), the AUC score has a nice interpretation: it expresses the probability that a randomly chosen positive item (an item the user clicked) will be ranked higher than a

randomly chosen negative item (an item the user did not click). Thus, an AUC of 1.0 means that the resulting ranking is perfect: no negative item is ranked higher than any positive item.

Precision@K : Precision@K measures the proportion of positive items among the K highest-ranked items. As such, this is focused on the ranking quality at the top of the list: it does not matter how good or bad the rest of your ranking is as long as the first K items are mostly positive.

Ex: Only one item of your top 5 item are correct, then your precision@5 is 0.2

Note : If the first K recommended items are not available anymore (say, they are out of stock), and you need to move further down the ranking. A high AUC score will then give you confidence that your ranking is of high quality throughout.

Enough of the theory now, we will move to the code and see how the implementation for lightfm works:

I have taken the dataset from Kaggle, you can download it below:

E-Commerce Data

Actual transactions from UK retailer www.kaggle.com

Hope you liked the coding part of it, and you are ready to implement that in any version. The enhancement that can be done in this is if you have the product and the user features.

These can also be taken as inputs into the lightfm model and the embedding that the model creates would be based upon all those attributes. The more data that is pushed into the lightfm will give the model a better accuracy and more training data.

That’s all from my end for now. Keep Learning!! Keep Rocking!!

CatBoost – A new game of Machine Learning

Gradient Boosted Decision Trees and Random Forest are one of the best ML models for tabular heterogeneous datasets.

CatBoost is an algorithm for gradient boosting on decision trees. Developed by Yandex researchers and engineers, it is the successor of the MatrixNet algorithm that is widely used within the company for ranking tasks, forecasting and making recommendations. It is universal and can be applied across a wide range of areas and to a variety of problems.

Catboost, the new kid on the block, has been around for a little more than a year now, and it is already threatening XGBoost, LightGBM and H2O.

Why Catboost?

Better Results

Catboost achieves the best results on the benchmark, and that’s great. Though, when you look at datasets where categorical features play a large role, this improvement becomes significant and undeniable.

GBDT Algorithms Benchmark

Faster Predictions

While training time can take up longer than other GBDT implementations, prediction time is 13–16 times faster than the other libraries according to the Yandex benchmark.

Left: CPU, Right: GPU

Batteries Included

Catboost’s default parameters are a better starting point than in other GBDT algorithms and it it is good news for beginners who want a plug and play model to start experience tree ensembles or Kaggle competitions.

GBDT Algorithms with default parameters Benchmark

Some more noteworthy advancements by Catboost are the features interactions, object importance and the snapshot support.In addition to classification and regression, Catboost supports ranking out of the box.

Battle Tested

Yandex is relying heavily on Catboost for ranking, forecasting and recommendations. This model is serving more than 70 million users each month.

The Algorithm

Classic Gradient Boosting

Gradient Boosting on Wikipedia

Catboost Secret Sauce

Catboost introduces two critical algorithmic advances – the implementation of ordered boosting, a permutation-driven alternative to the classic algorithm, and an innovative algorithm for processing categorical features.

Both techniques are using random permutations of the training examples to fight the prediction shift caused by a special kind of target leakage present in all existing implementations of gradient boosting algorithms.

Categorical Feature Handling

Ordered Target Statistic

Most of the GBDT algorithms and Kaggle competitors are already familiar with the use of Target Statistic (or target mean encoding). It’s a simple yet effective approach in which we encode each categorical feature with the estimate of the expected target y conditioned by the category. Well, it turns out that applying this encoding carelessly (average value of y over the training examples with the same category) results in a target leakage.

To fight this prediction shift CatBoost uses a more effective strategy. It relies on the ordering principle and is inspired by online learning algorithms which get training examples sequentially in time. In this setting, the values of TS for each example rely only on the observed history.

To adapt this idea to a standard offline setting, Catboost introduces an artificial “time”— a random permutation σ1 of the training examples. Then, for each example, it uses all the available “history” to compute its Target Statistic. Note that, using only one random permutation, results in preceding examples with higher variance in Target Statistic than subsequent ones. To this end, CatBoost uses different permutations for different steps of gradient boosting.

One Hot Encoding

Catboost uses a one-hot encoding for all the features with at most one_hot_max_size unique values. The default value is 2.

Catboost’s Secret Sauce

Orederd Boosting

CatBoost has two modes for choosing the tree structure, Ordered and Plain. Plain mode corresponds to a combination of the standard GBDT algorithm with an ordered Target Statistic. In Ordered mode boosting we perform a random permutation of the training examples – σ2, and maintain n different supporting models – M1, . . . , Mn such that the model Mi is trained using only the first i samples in the permutation. At each step, in order to obtain the residual for j-th sample, we use the model Mj−1. Unfortunately, this algorithm is not feasible in most practical tasks due to the need of maintaining n different models, which increase the complexity and memory requirements by n times. Catboost implements a modification of this algorithm, on the basis of the gradient boosting algorithm, using one tree structure shared by all the models to be built.

Catboost Ordered Boosting and Tree Building

In order to avoid prediction shift, Catboost uses permutations such that σ1 = σ2. This guarantees that the target-yi is not used for training Mi neither for the Target Statistic calculation nor for the gradient estimation.

Tuning Catboost

Important Parameters

cat_features — This parameter is a must in order to leverage Catboost preprocessing of categorical features, if you encode the categorical features yourself and don’t pass the columns indices as cat_features you are missing the essence of Catboost.

one_hot_max_size — As mentioned before, Catboost uses a one-hot encoding for all features with at most one_hot_max_size unique values. In our case, the categorical features have a lot of unique values, so we won’t use one hot encoding, but depending on the dataset it may be a good idea to adjust this parameter.

learning_rate & n_estimators — The smaller the learning_rate, the more n_estimators needed to utilize the model. Usually, the approach is to start with a relative high learning_rate, tune other parameters and then decrease the learning_rate while increasing n_estimators.

max_depth — Depth of the base trees, this parameter has an high impact on training time.

subsample — Sample rate of rows, can’t be used in a Bayesian boosting type setting.

colsample_bylevel, colsample_bytree, colsample_bynode— Sample rate of columns.

l2_leaf_reg — L2 regularization coefficient

random_strength — Every split gets a score and random_strength is adding some randomness to the score, it helps to reduce overfitting.

Check out the recommended spaces for tuning here

Model Exploration with Catboost

In addition to feature importance, which is quite popular for GBDT models to share, Catboost provides feature interactions and object (row) importance

Catboost’s Feature Importance

Catboost’s Feature Interactions

Catboost’s Object Importance

SHAP values can be used for other ensembles as well

Not only does it build one of the most accurate model on whatever dataset you feed it with — requiring minimal data prep — CatBoost also gives by far the best open source interpretation tools available today AND a way to productionize your model fast.

That’s why CatBoost is revolutionising the game of Machine Learning, forever. And that’s why learning to use it is a fantastic opportunity to up-skill and remain relevant as a data scientist. But more interestingly, CatBoost poses a threat to the status quo of the data scientist (like myself) who enjoys a position where it’s supposedly tedious to build a highly accurate model given a dataset. CatBoost is changing that. It’s making highly accurate modeling accessible to everyone.

Image taken from CatBoost official documentation: https://catboost.ai/

Building highly accurate models at blazing speeds

Installation

Installing CatBoost on the other end is a piece of cake. Just run

pip install catboost

Data prep needed

Unlike most Machine Learning models available today, CatBoost requires minimal data preparation. It handles:

  • Missing values for Numeric variables
  • Non encoded Categorical variables. Note missing values have to be filled beforehand for Categorical variables. Common approaches replace NAs with a new category ‘missing’ or with the most frequent category.
  • For GPU users only, it does handle Text variables as well. Unfortunately I couldn’t test this feature as I am working on a laptop with no GPU available. [EDIT: a new upcoming version will handle Text variables on CPU. See comments for more info from the head of CatBoost team.]

Building models

As with XGBoost, you have the familiar sklearn syntax with some additional features specific to CatBoost.

from catboost import CatBoostClassifier # Or CatBoostRegressor model_cb = CatBoostClassifier() model_cb.fit(X_train, y_train)

Or if you want a cool sleek visual about how the model learns and whether it starts overfitting, use plot=True and insert your test set in the eval_set parameter:

from catboost import CatBoostClassifier # Or CatBoostRegressor model_cb = CatBoostClassifier() model_cb.fit(X_train, y_train, plot=True, eval_set=(X_test, y_test))

Note that you can display multiple metrics at the same time, even more human-friendly metrics like Accuracy or Precision. Supported metrics are listed here. See example below:

Monitoring both Logloss and AUC at training time on both training and test sets

You can even use cross-validation and observe the average & standard deviation of accuracies of your model on the different splits:

Finetuning

CatBoost is quite similar to XGBoost . To fine-tune the model appropriately, first set the early_stopping_rounds to a finite number (like 10 or 50) and start tweaking the model’s parameters.

Manas Agrawal

CEO & Co-Founder

Add Your Heading Text Here

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.