Neural networks are a set of algorithms, modeled loosely after
the human brain, that are designed to recognize patterns. They interpret
sensory data through a kind of machine perception, labeling or clustering raw
input. The patterns they recognize are numerical, contained in vectors, into
which all real-world data, be it images, sound, text or time series, must be
translated.
Neural networks help us cluster and classify. You can think of
them as a clustering and classification layer on top of data you store and
manage. They help to group unlabeled data according to similarities among the
example inputs, and they classify data when they have a labeled dataset to
train on. (To be more precise, neural networks extract features that are fed to
other algorithms for clustering and classification; so you can think of deep
neural networks as components of larger machine-learning applications involving
algorithms for reinforcement learning, classification and regression).
What kind of problems does deep learning solve, and more importantly can it solve yours? To know the answer, you need to ask yourself a few questions: What outcomes do i care about? Those outcomes are labels that could be applied to data: for example, spam or not spam in an email filter, good guy or bad guy in fraud detection, angry customer or happy customer in customer relationship management. Then ask, do i have the data to accompany those labels? That is, can i find labeled data, or can i create a labeled dataset, where spam has been labeled as spam, in order to teach an algorithm the correlation between labels and inputs.
There’s been a lot of advancement in
using neural networks and other deep learning algorithms to obtain high
performance on a variety of NLP tasks. Traditionally, the bag of words model
along with classifiers that use this model, such as the Maximum Entropy
Classifier, have been successfully leveraged to make very accurate predictions
in NLP tasks such as sentiment analysis. However, with the advent of deep
learning research and its applications to NLP, discoveries have been made that
improve the accuracy of these methods in primarily two ways: a supervised
neural network to run your input through several classifications, and an
unsupervised neural network optimize feature selection as a pre-training step.
The Motivation for Neural Networks and Deep
Learning in NLP
At its core, deep learning (and neural networks)
are all about giving the computer some data, and letting it figure out how it
can use this data to come up with features and models to accurately represent
complex tasks - such as analyzing a movie review for its sentiment. With more
common machine learning algorithms, human-designed features are generally used
to model the problem and prediction becomes a task of optimizing weights to
minimize a cost function. However, hand crafting features is time consuming,
and these human made features tend to either over-represent the general problem
and become to specific or are incomplete over the entire problem space.
Supervised Learning: From Regression to a Neural
Network
The Max Entropy classifier, commonly abbreviated to
a Maxent classifier, is a common probabilistic model used in NLP. Given some
contextual information in a document (in the form of multisets, unigrams,
bigrams, etc), this classifier attempts to predict the class label (positive,
negative, neutral) for it. This classifier is also used in neural networks, and
it’s known as the softmax layer - the final layer (and sometimes only) in the
network used for classification. So, we can model a single neuron in a neural
network as computing the same function as a max entropy classifier:
Here, x is our vector of inputs,
the neuron computes the function with parameters w and b and
outputs a single result in h.
Then, a neural network with multiple neurons can
simply be thought of feeding the same input to several different classification
functions at the same time. The neural network is nothing more than running a
given vector of inputs (x in our above picture) through many (as
opposed to a single) functions, where each neuron represents a different
regression function. As a result, we obtain a vector of outputs:
…And you can feed this vector of outputs to another
layer of logistic regression functions (or a single function), until you obtain
your output, which is the probability that your vector belongs to a certain
class:
Applying Neural Networks to Unsupervised Problems in NLP
In NLP, words and their surrounding contexts are
pretty important: a word surrounded by relevant context is valuable, while a word
surrounded by seemingly irrelevant context is not very valuable. Each word is
mapped to a vector defined by its features (which in turn relate to the word’s
surrounding context), and neural networks can be used to learn which features
maximize a word vector’s score.
A valuable pre-training step for any supervised
learning task in NLP (such as classifying restaurant reviews) would be to
generate feature vectors that represent words well - as discussed in the
beginning of this post, these features are often human-designated. Instead of
this, a neural network can be used to learn these features. The input to such a neural network would be a matrix defined by, for
example, a sentence’s word vectors.
Our neural network can then be composed of several
layers, where each layer sends the previous layer’s output to a function.
Training is achieved through back propagation: taking derivates using the chain
rule with respect to the weights to optimize these weights. From this, the
ideal weights that define our function (which is a composition of many
functions) are learned. After training, we now have a method of extracting
ideal feature vectors that a given word is mapped to.
This unsupervised neural network is powerful,
especially when considered in the context of traditional supervised softmax
models. Running this unsupervised network on a large text collection allows
input features to be learned rather than human designated, often resulting in
better results when these features are fed into a traditional, supervised
neural network for classification.
References
Comments
Post a Comment