# Linear separability and the boundary of wx+b

In machine learning, everyone talks about weights and activations, often in conjunction with a formula of the form `wx+b`. While reading machine learning in action I frequently saw this formula but didn’t really understand what it meant. Obviously its a line of some sort, but what does the line mean? Where does w come from? I was able to muddle past this for decision trees, and naive bayes, but when I got to support vector machines I was pretty confused. I wasn’t able to follow the math and conceptually things got muddled.

At this point, I switched over to a different book, machine learning an algorithmic perspective. Here, the book starts with a discussion on neural networks, which are directly tied to the equation of `wx+b` and the concept of weights. I think this book is much better than the other one I was reading. The author does an excellent job of describing the big picture of what each algorithm is, followed by how the math is derived. This helped put things in perspective for me and let me peek into the workings of each algorithm without any glossed over magic.

## Neural networks

In a neural network, you take some set of inputs, multiply them by some value, and if the sum of all the inputs times that value is greater than some threshold (defined by a function) then the neuron fires. The thing that you multiply the input by is called the weight. The threshold is determined by an activation function.

But, lets say the inputs to all these nodes is zero, but you want the neuron to fire. Zero times any weights is zero so the node never can fire. This is why in neural networks you introduce a bias node. This bias node always has the value of one, but it also has its own weight. This bias node can offset inputs that are all zero that should trigger an activation.

A common way of calculating the activation of a neural network is to use matrix multiplication. Remembering that a neuron fires if the input times the weights is above some value, to find the sum you can take a row vector of the inputs and multiply it by a column vector of the weights. This gives you a single value that you can pass to a function that determines whether you want to fire or not. In this way you can think of a neural network as a basic classifier. Given some input data it either fires or it doesn’t. If the input data doesn’t cause a firing then the class can be thought of 0, and if it does fire then the output class can be thought of as 1.

Now it makes sense why the w and the x are bold. Bold arguments, in math formulas, represent vectors and not scalars.

## Classifiers

But how does this line relate to classifiers? The point of classifiers is, given some input feature data, to determine what kind of thing it is. With a binary classifier, you want to find if the input data classifies as a one or a zero.

Here is an image representing the margin in a support vector machine

`f(x1, x2) = w1*x1+w2*x2 + b`