Tag: artificial intelligence

Chatbots are here to stay. Here are 4 reasons why.

Chatbots are here to stay. Here are 4 reasons why.

For the past several years, I’ve been dedicating my life to learn, design, build and write about chatbots. The main reason chatbots fascinate me and conversational interfaces, in general, is because they offer the most natural way for humans to interact with machines. Not only is the interaction natural, but also simple, clean and focused on what you need instantly. Think of Google’s search interface. All you can do is input search queries into a little text box. Everything that comes afterward is magic.Chatbots are still in their very early stage due to a few factors:

1. Lack of expectations between what chatbots can do and what users expect (or in other words, bad user experience). This leads to an instant disappointment and therefore to low usability. It starts with the largest bots such as Siri and all the way down to the basic ones, which confuse users about their actual capabilities. In theory, Siri claims she can do almost anything related to your iPhone device, calendar, and other internal Apple apps.

In reality, over 60% of what you ask Siri is not understood or results in general web search results. Just so you can get an idea of what she can do, here’s a list of Siri commands. Did you know she can do most of it? I didn’t before.

2. The need to educate users on creating new habits. The last bot I’ve built was focused purely on the conversation and therefore did not have any buttons or menus. Retention was low, despite the fact that most of the conversations between the bot and users were held successfully. What I’ve discovered, was that the more buttons and menus I added, the more retention grew. Swelly is a great example for successful bots with minimum conversation and maximum buttons.

This behavior led me to the conclusion that the majority of users are still not used to talking with machines naturally, but rather prefer to click on buttons as we’ve been used to for the past 30 years. Secondly, clicking buttons is faster than typing sentences. However, buttons are not faster than voice, which is why I believe voice will eventually dominate the bot space. The transition from buttons to a natural conversation is growing but is still in its early adoption stage.

3. Artificial intelligence might have improved, but is still in its infancy stages. The reason this last and not first is because I truly believe we can build great chatbots with today’s current AI solutions (such as Api.ai, Wit.ai, etc.) if bots were more focused on how to create user habits and to offer a well-designed user experience that meets user expectations.

You can read more about how to do this in a previous post I’ve written on how to improve your chatbot with 3 simple steps. Apparently, AI will only improve with time as more and more data is collected and trained across a multitude of domains.

For all the reasons above and more, we’re still far from seeing the real potential of chatbots. However, there are strong reasons why chatbots are here to stay and will improve exponentially over time:

The optimal Human to Machine interaction

If you think about it, we’ve learned over the past 30 years to adjust ourselves to the complexity and limitations of machines. This adjustment has made via websites, applications, buttons, icons, etc. But in my opinion, the optimal scenario should be just the opposite — Machines should be able to adjust themselves to us humans, both from a natural language understanding perspective and personalization.

Humans should be able to ask a machine anything naturally, instead of having to learn new interfaces, products, and habits for every service they need.

For example, let’s say you’d like to know the weather. Until recently, you’d have to find and pick a service of your choice out of many alternatives and learn how to use it functionally. Now, since every service tries to be innovative and different from its alternatives, this usually results in various UX/UI which means more learning and effort required by users. The optimal solution would be if you could just ask. Thankfully there are many great solutions today for weather assistants (such as Siri or Poncho), and there are much more to come in other domains.

Domain specific AI

Not very long ago, companies built virtual assistants who tried to go very wide and open but quickly realized how hard it is to understand natural language. Going back to Siri’s example, Apple sought to capture many domains to display Siri as the ultimate personal assistant. This ambition failed very quickly. On the other hand, AI solutions that have succeeded, are the ones that focus narrowly on one specific domain.

Take for example Getdango — an AI solution for predicting emojis. They’re doing a great job predicting emojis based on natural language, and it’s due to their narrow focus. Another example is Meekan, a scheduling assistant for teams on Slack. Meekan is a chatbot dedicated to providing the best solution for scheduling events as easily as possible.

The power of synergy where individual bots focus on specific domains is the right approach for solving bigger AI challenges. You can see companies moving in this direction like FB Messenger’s latest release of handover protocol which enables two or more applications to collaborate. More importantly, Amazon partnered with Microsoft to collaborate on Alexa with the help of Cortana to provide a more powerful virtual assistant. If every bot were to focus on one specific domain, the race to AI in a whole would be solved faster and more efficiently. Happily, that’s where we’re heading.

The power of long term relationships

The way most products are designed today is to maximize immediate short term value for users once they enter an application. While the web and mobile applications focus on short term value, bots can and should focus on long term value. Bot developers should focus on how to build relationships with users over time so that value is gained and grows continuously with every single interaction. With time, bots should know enough about what they need, to maximize and personalize the user experience and minimize input friction.

For example, say you’re looking for a travel planning service. The way you would go by it today, is to look at travel sites, fill the proper forms, and teach the site about your preferences and filters every single time. The way a bot should work is to know which information is relevant to learn about the user, like personal information, preferences, budgets, places the user has already been to and etc. Bots should constantly learn from user’s behavior and offer much more personalized responses.

The optimal way you’d be conversing with such a bot after some time would be as follows:

The bot should know by now how many people are in your family, their ages, where you’ve already been to, where you’re at right now, what you like to do, what you don’t like and much more. In other words, the bot shouldn’t be any different from a real travel agent.

To see how much are users willing to share regarding their personal information within a conversation, I’ve researched with one of my bots. The bot started asking users basic questions like “how old are you” all the way to more personal questions like “what are you most insecure about?”. Guess how many of total users answered all questions truthfully?… Over 85%. More specifically, 89% of women answered all questions truthfully, while “only” 81% of men answered all questions. So if you’re a bot developer, don’t be worried about users not sharing their information. Worry about what questions you should be asking to enhance the users long term value. This kind of information retrieval is something today’s applications cannot achieve, and where chatbots have a huge advantage.

Cross platform usability

In just a few years, mobile apps have transformed to must-haves for smartphone users. But despite the increase in app usage and app choices, the number of apps used per user is staying the same, according to a recent report from Nielsen. Most people are tired of downloading mobile apps and learning about how to use new interfaces.

Also, research says that US citizens own in average 3.6 connected devices. These devices can vary from mobile devices to Smart TVs and products like Amazon Alexa. That’s a lot of connected devices! Now obviously, users would like to interact with your service on any device they’re using. But what do you do? Build an application for iOS, Android, Smart TV, Alexa, Smart watch, iPad, Windows and Mac and more? Sounds like a lot of work. And it’s going to be very hard for you to get users to download your app in the first place since they’re already flooded with other apps.

This ever growing challenge is where the beauty of messaging platforms comes in. At present, approximately 75% of all smartphone users use some of messaging apps such as WhatsApp, WeChat, Facebook Messenger, etc. Over 1.2 billion people worldwide have and use Messenger on their devices, all people that have mobile devices have SMS and obviously, most have an email account. The list goes on. Instead of building applications and spending hundreds of thousands of dollars, just focus on building your bots back end. For the front end, just integrate your bot across multiple messaging platforms that are already on users devices, and you’re set. If your service brings value, users will come. More importantly, turns out it’s what users want.


The future and success of chatbots depend not only on the big 4 tech companies but on developers and entrepreneurs who continue to innovate and push boundaries in AI and conversational interfaces. Most of the current mistakes in this sectors are tomorrows improvements and solutions. Eventually, bots will bring great value that cannot be achieved with today’s applications.

To learn more about chatbots go ahead and read the chatbots beginners guide. If you want to start building one, read this post on how to develop a Facebook Messenger bot.

Practical machine learning: Ridge regression vs. Lasso

Practical machine learning: Ridge regression vs. Lasso

For many years, programmers have tried to solve extremely complex computer science problems using traditional algorithms which are based on the most basic condition statement: if this then that. For example, if the email contains the word “free!” it should be classified as spam.

In recent years, with the rise of exceptional cloud computing technologies, the machine learning approach for solving complex problems has been magnificently accelerated. Machine learning is the science of providing computers the ability to learn and solve problems without being explicitly programmed. Sounds like a black magic? Maybe. In this post, I will introduce you to problems which can be solved using machine learning, as well as practical machine learning solutions for solving them.

Exactly like humans learn on a daily basis, in order to let a machine to learn, you need to provide it with enough data. Once it processed the data, it can make predictions about the future. Assuming you want to classify emails by whether they are spam emails or not. In order to solve this problem using machine learning, you need to provide the machine with many labeled emails – which are already classified in the correct classes of spam vs. not spam. The classifier will iterate over the samples and learn what are the features that define a spam email. Assuming you trained the machine learning model right, it will be able to predict whether a future email should be classified as spam or not, with high accuracy. In many cases, you’ll not be able to completely understand how the model predicts the class.

Machine learning hierarchy

The world of machine learning can be divided into two types of problems: supervised learning and unsupervised learning. In this post, we will focus only on supervised learning, which is a subset of problems which contain labeled data (That is, every email is labeled as spam or not spam). For cases where you have unlabeled data, unsupervised learning might be a proper solution.

Underneath the supervised learning problems, there is another division of regression problems vs. classification problems. In regression problems, the value you wish to predict is continuous. For example, house price. In classification problems, on the other hand, the value you are about to predict is discrete, like spam vs. not spam.

The data you need to provide in order to train your model depends on the problem and the value you wish to predict. Let’s assume you want to predict a house price based on different properties. So in this case, each row in your dataset should (for example) consist of:

  1. features: house size, the number of rooms, floor, whether elevator exists, etc.
  2. label: house price.

Choosing and collecting the features that best describe a house for predicting its price can be challenging. It requires market knowledge as well as access to big data sources. The features are the keys in which the prediction of the house price will be based upon.

Machine learning as an optimization problem

Every machine learning problem is basically an optimization problem. That is, you wish to find either a maximum or a minimum of a specific function. The function that you want to optimize is usually called the loss function (or cost function). The loss function is defined for each machine learning algorithm you use, and this is the main metric for evaluating the accuracy of your trained model.

For the house price prediction example, after the model is trained, we are able to predict new house prices based on their features. For each house price we predict, denoted as Ŷi, and the actual house price Yi we can calculate the loss by:

li = ( Ŷi– Yi)2

This is the most basic form of a loss for a specific data-point, That is used mostly for linear regression algorithms. The loss function as a whole can be denoted as:

L = ( Ŷi– Yi)2

Which simply defines that our model’s loss is the sum of distances between the house price we’ve predicted and the ground truth. This loss function, in particular, is called quadratic loss or least squares. We wish to minimize the loss function (L) as much as possible so the prediction will be as close as possible to the ground truth.

If you followed me up until now, you are familiar with the basic concept of every practical machine learning problem. Remember, every machine learning algorithm defines its own loss function according to its goal in life.

Linear regression

Linear regression is a basic yet super powerful machine learning algorithm. As you gain more and more experience with machine learning, you’ll notice how simple is better than complex most of the time. Linear regression is widely used in different supervised machine learning problems, and as you may guessed already, it focuses on regression problem (the value we wish the predict is continuous). It is extremely important to have a good understanding of linear regression

before studying more complex learning methods. Many extensions have been developed for linear regression which I will introduce later in this post.

The most basic form of linear regression deals with dataset of a single feature per data point (think of it as the house size). Because we are dealing with supervised learning, each row (house) in the dataset should include the price of the house (which is the value we wish the predict).

An example of our dataset:

House size (X)

House price (Y)

50

102

70

127

32

65

68

131

93

190

44

82

56

120

In a visual representation:

In linear regression we wish to fit a function (model) in this form:

Ŷ = β0+β1X

Where X is the vector of features (the first column in the table below), and β0, β1 are the coefficients we wish to learn.

By learning the parameters I mean executing an iterative process that updates β at every step by reducing the loss function as much as possible. Once we reach the minimum point of the loss function we can say that we completed the iterative process and learned the parameters.

Just to make it even more clear, the combination of the β coefficients are our trained model – which means that we have a solution to the problem!

After executing the iterative process, we can visualize the solution on the same graph:

 

Where the trained model is:

Ŷ = -0.5243+1.987X

Now let’s assume we want to predict based on our trained model, what will be the price of a house of size 85. In order to predict the price, we will substitute the β values we found into the model function, including the house size, and get the predicted house price:

Ŷx=85 168.37

To recap what we’ve covered so far:

  1. Every machine learning problem is basically an optimization problem. That is, we want to minimize (or maximize) some function.
  2. Our dataset is consist of features (X) and a label (Y). In our case – house size is the single feature, house price is the label.
  3. In linear regression problems, we want to minimize the quadratic loss which is the sum of distances between the predictions and the actual value (ground truth).
  4. In order to minimize the loss function and find the optimal β coefficients, we will execute an iterative process.
  5. To predict the label (house price) of a new house based on its size, we will use the trained model.

The iterative process for minimizing the loss function (a.k.a learning the coefficients β), will be discussed in another post. Although it can be done with one line of code, I highly recommend reading more about iterative algorithms for minimizing loss functions like Gradient Descent.


Linear regression with multiple features

In real world problems, you usually have more than one feature per row (house). Let’s see how linear regression can help us with multi-feature problems.

Considering this dataset:

House size (X1)

rooms (X2)

floor (X3)

House price (Y)

50

2

5

123

70

2

3

118

32

1

3

62

68

3

7

148

93

4

10

250

44

2

6

100

56

3

1

110

So currently we have 3 features:

  1. house size
  2. number of rooms
  3. floor

Therefore, we need to adapt our basic linear model to an extended one that can take into account the additional features for each house:

Ŷ = β0+β1X1+β2X2+β3X3

In order to solve the multi-feature linear regression problem, we will the same iterative algorithm and minimize the loss function. The main difference will be that we will end up with four β coefficients instead of only two.

Overfit in machine learning algorithms

Having more features may seem like a perfect way for improving the accuracy of our trained model (reducing the loss) – because the model that will be trained will be more flexible and will take into account more parameters. On the other hand, we need to be extremely careful about overfitting the data. As we know, every dataset has noisy samples. For example, the house size wasn’t measured accurately or the price is not up to date. The inaccuracies can lead to a low-quality model if not trained carefully. The model might end up memorizing the noise instead of learning the trend of the data.

A visual example of a nonlinear overfitted model:

Overfit can happen in linear models as well when dealing with multiple features. If not filtered and explored up front, some features can be more destructive than helpful, repeat information that already expressed by other features and add high noise to the dataset.

Overcoming overfit using regularization

Because overfit is an extremely common issue in many machine learning problems, there are different approaches to solving it. The main concept behind avoiding overfit is simplifying the models as much as possible. Simple models do not (usually) overfit. On the other hand, we need to pay attention the to gentle trade-off between overfitting and underfitting a model.

One of the most common mechanisms for avoiding overfit is called regularization. Regularized machine learning model, is a model that its loss function contains another element that should be minimized as well. Let’s see an example:

L = ( Ŷi– Yi)2 + λβ2

This loss function includes two elements. The first one is the one you’ve seen before – the sum of distances between each prediction and its ground truth. The second element though, a.k.a the regularization term, might seem a bit bizarre. It sums over squared β values and multiplies it by another parameter λ. The reason for doing that is to “punish” the loss function for high values of the coefficients β. As aforesaid, simple models are better than complex models and usually do not overfit. Therefore, we need to try and simplify the model as much as possible. Remember that our goal of the iterative process is to minimize the loss function. By punishing the β values we add a constraint to minimize them as much as possible.

There is a gentle trade-off between fitting the model, but not overfitting it. This approach is called Ridge regression.

Ridge regression

Ridge regression is an extension for linear regression. It’s basically a regularized linear regression model. The λ parameter is a scalar that should be learned as well, using a method called cross validation that will be discussed in another post.

A super important fact we need to notice about ridge regression is that it enforces the β coefficients to be lower, but it does not enforce them to be zero. That is, it will not get rid of irrelevant features but rather minimize their impact on the trained model.

Lasso method

Lasso is another extension built on regularized linear regression, but with a small twist. The loss function of Lasso is in the form:

L = ( Ŷi– Yi)2 + λ|β|

The only difference from Ridge regression is that the regularization term is in absolute value. But this difference has a huge impact on the trade-off we’ve discussed before. Lasso method overcomes the disadvantage of Ridge regression by not only punishing high values of the coefficients β but actually setting them to zero if they are not relevant. Therefore, you might end up with fewer features included in the model than you started with, which is a huge advantage.

Conclusions

Machine learning is getting more and more practical and powerful. With zero knowledge in programming, you can train a model to predict house prices in no time.

We’ve covered the basics of machine learning, loss function, linear regression, ridge and lasso extensions.

There is more Math involved from what I’ve covered in this post, I tried to keep it as practical and, on the other hand, high-level as possible (Someone said trade-off?).

I encourage you to take a deep dive into this amazing world.