Machine learning (ML)
Machine learning (ML)is the study of computer algorithms that improve automatically through experience.It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so.
MACHINE LEARNING?
At a very high level, machine learning is the process of teaching a computer system how to make accurate predictions when fed data.
Those predictions could be answering whether a piece of fruit in a photo is a banana or an apple, spotting people crossing the road in front of a self-driving car, whether the use of the word book in a sentence relates to a paperback or a hotel reservation, whether an email is spam, or recognizing speech accurately enough to generate captions for a YouTube video.
The key difference from traditional computer software is that a human developer hasn't written code that instructs the system how to tell the difference between the banana and the apple.
Instead a machine-learning model has been taught how to reliably discriminate between the fruits by being trained on a large amount of data, in this instance likely a huge number of images labelled as containing a banana or an apple.
Data, and lots of it, is the key to making machine learning possible.
History of machine learning
1642 - Blaise Pascal invents a mechanical machine that can add, subtract, multiply and divide.
1679 - Gottfried Wilhelm Leibniz devises the system of binary code.
1834 - Charles Babbage conceives the idea for a general all-purpose device that could be programmed with punched cards.
1842 - Ada Lovelace describes a sequence of operations for solving mathematical problems using Charles Babbage's theoretical punch-card machine and becomes the first programmer.
1847 - George Boole creates Boolean logic, a form of algebra in which all values can be reduced to the binary values of true or false.
1936 - English logician and cryptanalyst Alan Turing proposes a universal machine that could decipher and execute a set of instructions. His published proof is considered the basis of computer science.
1952 - Arthur Samuel creates a program to help an IBM computer get better at checkers the more it plays.
1959 - MADALINE becomes the first artificial neural network applied to a real-world problem: removing echoes from phone lines.
1985 - Terry Sejnowski and Charles Rosenberg's artificial neural network taught itself how to correctly pronounce 20,000 words in one week.
1997 - IBM's Deep Blue beat chess grandmaster Garry Kasparov.
1999 - A CAD prototype intelligent workstation reviewed 22,000 mammograms and detected cancer 52% more accurately than radiologists did.
2006 - Computer scientist Geoffrey Hinton invents the term deep learning to describe neural net research.
2012 - An unsupervised neural network created by Google learned to recognize cats in YouTube videos with 74.8% accuracy.
2014 - A chatbot passes the Turing Test by convincing 33% of human judges that it was a Ukrainian teen named Eugene Goostman.
2014 - Google's AlphaGo defeats the human champion in Go, the most difficult board game in the world.
2016 - LipNet, DeepMind's artificial-intelligence system, identifies lip-read words in video with an accuracy of 93.4%.
2019 - Amazon controls 70% of the market share for virtual assistants in the U.S.
Types of machine learning
Classical machine learning is often categorized by how an algorithm learns to become more accurate in its predictions. There are four basic approaches: supervised learning, unsupervised learning, semi-supervised learning and reinforcement learning. The type of algorithm a data scientist chooses to use depends on what type of data they want to predict.
- Supervised learning. In this type of machine learning, data scientists supply algorithms with labeled training data and define the variables they want the algorithm to assess for correlations. Both the input and the output of the algorithm is specified.
- Unsupervised learning. This type of machine learning involves algorithms that train on unlabeled data. The algorithm scans through data sets looking for any meaningful connection. Both the data algorithms train on and the predictions or recommendations they output are predetermined.
- Semi-supervised learning. This approach to machine learning involves a mix of the two preceding types. Data scientists may feed an algorithm mostly labeled training data, but the model is free to explore the data on its own and develop its own understanding of the data set.
- Reinforcement learning. Reinforcement learning is typically used to teach a machine to complete a multi-step process for which there are clearly defined rules. Data scientists program an algorithm to complete a task and give it positive or negative cues as it works out how to complete a task. But for the most part, the algorithm decides on its own what steps to take along the way.
Supervised machine learning?
Supervised machine learning requires the data scientist to train the algorithm with both labeled inputs and desired outputs. Supervised learning algorithms are good for the following tasks:
- Binary classification. Dividing data into two categories.
- Multi-class classification. Choosing between more than two types of answers.
- Regression modeling. Predicting continuous values.
- Ensembling. Combining the predictions of multiple machine learning models to produce an accurate prediction.
Unsupervised machine learning?
Unsupervised machine learning algorithms do not require data to be labeled. They sift through unlabeled data to look for patterns that can be used to group data points into subsets. Most types of deep learning, including neural networks, are unsupervised algorithms. Unsupervised learning algorithms are good for the following tasks:
- Clustering. Splitting the data set into groups based on similarity.
- Anomaly detection. Identifying unusual data points in a data set.
- Association mining. Identifying sets of items in a data set that frequently occur together.
- Dimensionality Reduction. Reducing the number of variables in a data set.
SEMI-SUPERVISED LEARNING?
The importance of huge sets of labelled data for training machine-learning systems may diminish over time, due to the rise of semi-supervised learning.
As the name suggests, the approach mixes supervised and unsupervised learning. The technique relies upon using a small amount of labelled data and a large amount of unlabelled data to train systems. The labelled data is used to partially train a machine-learning model, and then that partially trained model is used to label the unlabelled data, a process called pseudo-labelling. The model is then trained on the resulting mix of the labelled and pseudo-labelled data.
The viability of semi-supervised learning has been boosted recently by Generative Adversarial Networks ( GANs), machine-learning systems that can use labelled data to generate completely new data, for example creating new images of Pokemon from existing images, which in turn can be used to help train a machine-learning model.
Were semi-supervised learning to become as effective as supervised learning, then access to huge amounts of computing power may end up being more important for successfully training machine-learning systems than access to large, labelled datasets.
REINFORCEMENT LEARNING?
A way to understand reinforcement learning is to think about how someone might learn to play an old school computer game for the first time, when they aren't familiar with the rules or how to control the game. While they may be a complete novice, eventually, by looking at the relationship between the buttons they press, what happens on screen and their in-game score, their performance will get better and better.
An example of reinforcement learning is Google DeepMind's Deep Q-network, which has beaten humans in a wide range of vintage video games. The system is fed pixels from each game and determines various information about the state of the game, such as the distance between objects on screen. It then considers how the state of the game and the actions it performs in game relate to the score it achieves.
Over the process of many cycles of playing the game, eventually the system builds a model of which actions will maximize the score in which circumstance, for instance, in the case of the video game Breakout, where the paddle should be moved to in order to intercept the ball.
A Taste of Machine Learning
Machine learning can appear in many guises. We now discuss a number of
applications, the types of data they deal with, and finally, we formalize the
problems in a somewhat more stylized fashion. The latter is key if we want to
avoid reinventing the wheel for every new application. Instead, much of the
art of machine learning is to reduce a range of fairly disparate problems to
a set of fairly narrow prototypes. Much of the science of machine learning is
then to solve those problems and provide good guarantees for the solutions.
Applications
Most readers will be familiar with the concept of web page ranking. That
is, the process of submitting a query to a search engine, which then finds
web pages relevant to the query and which returns them in their order of
relevance.That is, the search engine returns a sorted list of webpages
given a query. To achieve this goal, a search engine needs to ‘know’ which pages are relevant and which pages match the query. Such knowledge can be
gained from several sources: the link structure of web pages, their content,
the frequency with which users will follow the suggested links in a query, or
from examples of queries in combination with manually ranked web pages.
Increasingly machine learning rather than guesswork and clever engineering
is used to automate the process of designing a good search engine.
A rather related application is collaborative filtering. Internet bookstores such as Amazon, or video rental sites such as Netflix use this information extensively to entice users to purchase additional goods (or rent more
movies). The problem is quite similar to the one of web page ranking. As
before, we want to obtain a sorted list (in this case of articles). The key difference is that an explicit query is missing and instead we can only use past
purchase and viewing decisions of the user to predict future viewing and
purchase habits.
An equally ill-defined problem is that of automatic translation of documents. At one extreme, we could aim at fully understanding a text before
translating it using a curated set of rules crafted by a computational linguist
well versed in the two languages we would like to translate.
Many security applications, e.g. for access control, use face recognition as
one of its components. That is, given the photo (or video recording) of a
person, recognize who this person is. In other words, the system needs to
classify the faces into one of many categories (Alice, Bob, Charlie, . . . ) or
decide that it is an unknown face.
Another application where learning helps is the problem of named entity
recognition.The problem of identifying entities,
such as places, titles, names, actions, etc. from documents. Such steps are
crucial in the automatic digestion and understanding of documents. Some
modern e-mail clients, such as Apple’s Mail.app nowadays ship with the
ability to identify addresses in mails and filing them automatically in an
address book.
Advantages and Limitations
Machine learning has seen powerful use cases ranging from predicting customer behavior constituting the operating system for self-driving cars. But just because some industries have seen benefits doesn't mean machine learning is without its downsides.
When it comes to advantages, machine learning can help enterprises understand their customers at a deeper level. By collecting customer data and correlating it with behaviors over time, machine learning algorithms can learn associations and help teams tailor product development and marketing initiatives to customer demand.
Some internet companies use machine learning as a primary driver in their business models. Uber, for example, uses algorithms to match drivers with riders. Google uses machine learning to surface the right advertisements in searches.
But machine learning comes with Limitations. First and foremost, it can be expensive. Machine learning projects are typically driven by data scientists, who command high salaries. These projects also require software infrastructure that can be high-cost.
There is also the problem of machine learning bias. Algorithms that trained on data sets that exclude certain populations or contain errors can lead to inaccurate models of the world that, at best, fail and, at worst, are discriminatory. When an enterprise bases core business processes on biased models, it can run into regulatory and reputational harm.
Future of machine learning
While machine learning algorithms have been around for decades, they've attained new popularity as artificial intelligence (AI) has grown in prominence. Deep learning models, in particular, power today's most advanced AI applications.
Machine learning platforms are among enterprise technology's most competitive realms, with most major vendors, including Amazon, Google, Microsoft, IBM and others, racing to sign customers up for platform services that cover the spectrum of machine learning activities, including data collection, data preparation, data classification, model building, training and application deployment.
As machine learning continues to increase in importance to business operations and AI becomes ever more practical in enterprise settings, the machine learning platform wars will only intensify.
Human-interpretable machine learning
Explaining how a specific ML model works can be challenging when the model is complex. There are some vertical industries where data scientists have to use simple machine learning models because it's important for the business to explain how each and every decision was made. This is especially true in industries with heavy compliance burdens like banking and insurance.
Complex models can accurate predictions, but explaining to a layperson how an output was determined can be difficult.