This blog post covers the evaluation of probabilities and probability densities for continuous variables. The motivation for this discussion came from seeing the disparity in various modeling approaches. My hope is that this blog post serves as a resource, particularly for those entering the field, on the different approaches that exist.
Is machine learning the answer to creating flexible, capable machines? Can learning alone solve complex tasks? Perhaps, but, more likely, creating intelligent machines will require a certain degree of design, which we can think of as an architectural prior. In other words, a learning system requires a scaffolding on which to build. There is no better example of this than the intricate evolutionary structures of the human brain. This is the type of system we are striving for: hard coded low-level functionalities with a specific, yet incredibly flexible, higher-level architecture, resulting in the potential to quickly learn a large variety of tasks, i.e. general intelligence. Such a system walks the line between nature and nurture, trading off between pre-specified design and learning.
Normalization is a fundamental component of machine learning. Take any introductory machine learning course, and you’ll learn about the importance of normalizing the inputs to your model. The justification goes something like this: the important patterns in the data often correspond to the relative relationships between the different input dimensions. Therefore, you can make the task of learning and recognizing these patterns easier by removing the constant offset and standardizing the scales.
I enjoy thinking about how to apply insights from our understanding of biological intelligence to improve machine intelligence. However, the insights often flow in the opposite direction as well: machine learning often recasts findings from neuroscience and psychology in a more grounded and principled way. The following is a perspective on what machine learning can teach us about ourselves.
Deep learning has exploded in recent years. Researchers are continually coming up with new and exciting ways to compose deep networks to perform new tasks and learn new things. If I can borrow Nando de Freitas’ analogy, we’re like kids playing with lego blocks, stacking these blocks together, trying to develop novel creations. When we stack these blocks into towers so tall that they become unstable, someone develops a new technique for stacking blocks, allowing us to continue constructing even bigger towers. Occasionally someone invents a new block, and if it’s useful, the rest of us scramble to incorporate this block into our towers.