How does the human brain perceive reality? Deep Learning from a Physics Perspective, Similar to the Philosophy of the Brain

How does the human brain perceive reality?

By dardani lennonPublished 2 years ago • 9 min read

Today, artificial intelligence is present in almost every aspect of our lives. AI-based applications such as smartphones, social media, recommendation engines, online advertising networks and navigation tools impact us every day. Deep learning has systematically advanced the state of the art in areas such as speech recognition, autonomous driving, machine translation, and visual object recognition.

What makes Deep Neural Networks (DNNs) so powerful is the heuristic understanding that we achieve excellent results by using large datasets and following specific training protocols. Recently, a possible explanation has been proposed based on a physics-based conceptual framework, the renormalization group (RG), and a neural network known as a restricted Boltzmann machine (RBM).

RG and RBM as coarse-grained processes

Renormalization is a technique used to study the behavior of microscopic parts of a physical system when information is unavailable. It's a "coarse-grained" approach to how the laws of physics change as we "put on blurry glasses", zoom out and examine objects at different length scales.

When we change the length scale of a physical system, our theories "navigate" through all possible theories.

The importance of RG theory is that it provides a powerful framework for essentially explaining why physics itself is possible.

To describe the motion of a complex structure like a satellite, one does not need to consider the motion of all its components.

RG theory provides a powerful framework for why physics itself is possible.

For example, to calculate the orbit of a satellite around the Earth, we just need to apply Newton's laws of motion. We do not need to consider the extremely complex behavior of the satellite's microscopic composition to explain its motion. What we do in practice is some kind of "averaging" of the detailed behavior of the basic components of the system (satellites in this case).

Furthermore, RG theory seems to suggest that all our current theories about the physical world are only approximations of some unknown "true theory" (in more technical terms, this true theory "lives" on what physicists call a scale shift near the fixed point).

RG theory seems to suggest that all our current theories about the physical world are only approximations of some unknown "true theory".

When the studied system is at a critical point, RG works well and shows self-similarity. A self-similar system is "completely or approximately similar to a part of itself" regardless of the length over which it is observed. An example of a system that exhibits self-similarity is fractal.

When the system is at a tipping point, those parts that are far apart from each other show strong correlations. All subsections affect the entire system, and the physical properties of the system are completely independent of its microstructure.

An artificial neural network can also be viewed as a coarse-grained iterative process. The artificial neural network consists of several layers, as shown below, the more initial layers learn only lower-level features (such as edges and colors) from the input data, while the deeper layers learn these lower-level features (by the more The initial layers provide information) into higher-level features. In the words of Geoffrey Hinton, one of the leading figures in the field of deep learning: “First learn simple features, and then build on that to learn more complex features, in stages.” Furthermore, , as in the RG process, the deeper layers retain only the functions considered relevant, while weakening the irrelevant ones.

Convolutional Neural Network (CNN).

a precise connection

Both physics and machine learning deal with systems that have many components. Physics studies systems containing many (interacting) objects. Machine learning research contains complex data with a large number of dimensions. Furthermore, similar to RG in physics, neural networks are able to classify data, such as pictures of animals, regardless of their components (such as size and color).

In an article published in 2014, two physicists Pankaj Mehta and David Schwab explained the performance of deep learning based on renormalization group theory. They show that DNNs are very powerful feature extractors because they can effectively "simulate" the coarse-grained process of RG processes. In their words,

The DNN architecture […] can be viewed as an iterative, coarse-grained scheme, where each new high-level layer of the neural network learns more and more abstract high-level features from the data.

In fact, in their paper, they managed to show that there is indeed an exact mapping between RGs and restricted Boltzmann machines (RBMs), the two-layer neural networks that make up the building blocks of DNNs.

In their 2014 paper by Mehta and Schwab, they introduced a mapping between RG and DNN built by stacking RBMs.

There are many other works in the literature linking renormalization and deep learning, following different strategies and with different goals. Furthermore, Mehta and Schwab explain the mapping for only one type of neural network. For the sake of brevity, I focus here on their original paper, since their insights have led to a great deal of follow-up work on this topic.

Renormalization Group Theory

As mentioned above, renormalization involves applying coarse-grained techniques to the physical system. RG theory is a general conceptual framework, so some methods are needed to implement these concepts. The Variational Renormalization Group (VRG) is a format proposed by Kadanov, Houghton, and Yarabick in 1976.

For ease of exposition, I have chosen to focus on one specific type of system to illustrate how RGs work, namely quantum spin systems, rather than a full general discussion. But before delving into the mathematical mechanism, I will give an explanation of what spin means in physics.

The concept of spin in physics

In physics, spin can be defined as "the intrinsic form of angular momentum carried by elementary particles, composite particles and atomic nuclei". While spin is a quantum mechanical concept with no classical counterpart, spin particles are often (though incorrectly) described as small tops spinning around their own axes. Spin is closely related to magnetic phenomena.

Particle spins (black arrows) and their associated magnetic field lines

The mathematics of renormalization

Let us consider a system or ensemble of N spins. For visualization purposes, assume they can be placed on a grid as shown in the image below.

A two-dimensional spin lattice (indicated by small arrows). balls are charged atoms

Since spin can be up or down, they are related to binary variables

The index i can be used to mark the position of the spin in the lattice. For convenience, I will denote the configuration of the spin by the vector v.

For a thermally balanced system, the probability distribution associated with the spin configuration v is of the form:

This is the ubiquitous Boltzmann distribution (with temperature set to 1 for convenience). H(v) is the so-called Hamiltonian of the system, which can be defined as "an operator corresponding to the sum of the kinetic and potential energies of all particles in the system". The denominator Z is a normalizing factor called the partition function.

The Hamiltonian of the system can be expressed as the sum of spin interaction terms:

Parameter collection:

Known as coupling constants, they determine the strength of interactions between spins (second term) or between spins and an external magnetic field (first term).

Another important quantity we need to consider is free energy. Free energy is a concept derived from thermodynamics, which is defined as "the energy in a physical system that can be converted into work". Mathematically, in our case, it is determined by:

The symbol "tr" stands for trace (from linear algebra). In the present case, it represents the sum of all possible configurations of the visible spin v.

At each step of the renormalization process, the behavior of the system on small scales is averaged out. The Hamiltonian for a coarse-grained system is represented by a new coupling constant:

And got new coarse-grained variables. In our case, the latter is the block spin h, and the new Hamiltonian is:

To better understand what a bulk spin is, consider the following two-dimensional lattice. Each arrow represents a rotation. Now divide the lattice into squares containing 2×2 spins. The block spin is the average spin corresponding to each block.

In block spin RG, the system is partitioned coarsely into new block variables that describe the effective behavior of the spin block

Note that the new Hamiltonian has the same structure as the original, just replacing the physical spin with a spin block.

Both Hamiltonian functions have the same structure, but different variables and couplings.

In other words, the form of the model does not change, but when we shrink the parameters of the model, it does. By systematically repeating these steps, a complete renormalization of the theory can be obtained. After several RG iterations, some parameters will be removed, while some will remain. The rest are called correlation operators.

The connection between these Hamiltonians is obtained by the requirement that the free energy (as described in the previous few lines) does not change after the RG transformation.

Variational Renormalization Group

As mentioned above, to achieve RG mapping, the Variational Renormalization Group (VRG) method can be used. In this method, the mapping is implemented by an operator:

λ is a set of parameters. This operator encodes the coupling between hidden spins and input spins and satisfies the following relation:

It defines the new Hamiltonian given above. Although in an exact RG transformation, the coarse-grained system will have exactly the same free energy as the original system.

is equivalent to the following conditions:

In practical applications, this condition cannot be fully satisfied. Find λ in a variational format that minimizes the difference in free energies:

Or equivalently, to approximate the exact RG transform.

Summarize RBM

In a previous post in Neural Quantum States - Solving the Most Challenging Problem in Modern Theoretical Physics, the Many-Body Problem, I have described in detail how Restricted Boltzmann Machines work. Here I will provide a more concise explanation.

Restricted Boltzmann Machines are generated from energy-based models. For nonlinear unsupervised feature learning. Their simplest version has only two layers:

A layer of visible units, denoted by v

A hidden layer in units of h

Illustration of a Simple Restricted Boltzmann Machine

Again, I will consider a binary visible dataset v with n elements drawn from some probability distribution:

Probability distribution of input or visible data.

The hidden units in the RBM are coupled with the visible units, and the interaction energy is given by:

The energy sub-exponent λ represents the set of variational parameters {c, b, W}. The first two elements are vectors and the third is a matrix. The goal of RBM is to output a λ-dependent probability distribution that is as close as possible to the distribution of the input data P(v).

The probability associated with configuration (v,h) and parameter λ is a function of this energy functional:

From this joint probability, we can easily derive the variational distribution of the visible units by summing the hidden units. Likewise, the marginal distribution of hidden units is obtained by summing the visible units:

The λ parameter can be chosen to optimize the so-called KL divergence or relative entropy, which is a measure of the difference between two probability distributions. In the present case, we are interested in the KL divergence between the true data distribution and the variational distribution of visible units produced by the RBM. more specifically:

When the two distributions are the same:

Accurate mapping of RG and RBM

Mehta and Schwab showed that to establish an exact mapping between RG and RBM, the following expression can be chosen for the variational operator:

Recall that the Hamiltonian function H(v) contains the probability distribution of the input data. With this choice of variational operator, it can be quickly shown that the RG Hamiltonian is the same as the RBM Hamiltonian on the hidden layer:

Likewise, when an exact RG transform can be implemented, the true Hamiltonian and the variational Hamiltonian are the same:

Thus we can see that one step of the renormalization group with spin v and block spin h can be precisely mapped to a two-layer RBM consisting of visible unit v and hidden unit h.

As we stack more and more RBM layers, we are actually performing more and more rounds of RG transformations.

Application of the Ising model

On this basis, we conclude that RBM, an unsupervised deep learning algorithm, implements the variational RG process. Mehta and Schwab proved their idea by implementing a stacked RBM on a well-understood Ising spin model. They would feed the DNN with spin configurations sampled from the Ising model as input data. Their results show that DNNs appear to achieve block spin renormalization.

In their paper, A shows the architecture of a DNN. In B, the learned parameter W is plotted to show the interaction between hidden and visible units. In D, as we move along the DNN layers, we see the gradual formation of block spins (spots in the figure). In E, the RBM reconstruction of the macrostructure of the three data samples is shown.

Science

About the Creator

dardani lennon

The question mark is the key to any science

Enjoyed the story?
Support the Creator.

Subscribe for free to receive all their stories in your feed. You could also pledge your support or give them a one-off tip, letting them know you appreciate their work.

Subscribe For Free