The AI-Vision model has improved significantly over the past decade. However, these achievements have led to neural networks, although effective, that have not many characteristics with the human vision. For example, convolutional neural networks (CNNs) are generally better at paying attention to texture, while humans respond more strongly to shapes.
A recent paper Human behavior in nature This gap has been partially resolved. It describes a new all-image neural network (All-TNN) that, when trained in natural images, develops an organized and professional structure that is more like human vision. All TNNs better mimic human spatial biases, such as expecting to see an aircraft closer to the top of the image than the bottom and much lower energy budget than other neural networks for machine vision.
“One thing you notice when you look at how knowledge is ordered in the brain is that it is fundamentally different from deep neural networks, such as convolutional neural networks.” Tim C. Kietzmannprofessor of cosupervisor of the Institute of Cognitive Sciences and thesis in Osnabrook, Germany.
All TNN networks learn about human-like spatial bias
Most machine vision systems used today, including those found in applications such as Google Photos and Snapchat, use some form of convolutional neural network. CNN replicates the same feature detector at many spatial locations (called “weight sharing”). The result is a network that, when mapped, looks like a tight repeating pattern.
The structure of the full-TNN network is very different. Instead, it looks smooth, with the associated neurons organized into clusters but never replicated. Images of spatial relationships in a full-width network appear to be topographic views of hilly areas, or a group of microorganisms viewed under a microscope.
This visual difference is more than just a comparison of beautiful pictures. Kietzmann said the weight sharing used by CNNS is a fundamental deviation from the biological brain. “When the brain learns something in one location, it’s impossible to copy the knowledge to other locations,” he said. “In CNN, you can. It’s an engineering trick that improves learning efficiency.”
Full-width networks avoid this feature through fundamentally different architectural and training methods.
Instead of weight sharing, the researchers assigned each spatial location in the network to its own set of learnable parameters. Then, to prevent this from creating chaotic, unorganized functions, they add “smoothness constraints” to training, encouraging neighboring neurons to learn similar (but never the same) functions.
To test whether this translates into machine vision in more human-like behaviors, the researchers asked 30 human participants to identify objects that flash briefly at different screen locations. Although people throughout the process are still not a perfect simulation of human vision, it turns out to be three times that of human vision as CNN networks.
Co-author Zejin Lu on paper says that the correlation between going all out and improving human vision is driven by the spatial relationships of online learning. “For humans, when you detect certain objects, they have typical positions. You already know that shoes are usually on the bottom, on the ground. Airplane, it’s on the top.”
Human behavior does not mean better performance, but it does reduce energy
The stronger correlation between the full-TNN network and the human vision suggests how machines can be taught to see the world in a more similar way to humans, but does not necessarily lead to the network being better in image classification.
CNN is still the king of image classification with an accuracy of 43.2%. Depending on the network configuration, the classification accuracy of all TNNs varies from 34.5% to 36%.
However, it lacks accuracy and its efficiency is improved. The energy of full-width tnn is significantly less than that of the tested CNN, and the running energy of CNN exceeds ten times. It is worth noting that this is implemented, although all TNNs are about 13 times that of CNN tests (the parameters of all tnn are about 107 million, while the parameters of CNN are about 8 million.
The efficiency of the full-TNN is attributed to the novel structure of the network. Although larger overall, the network can focus on the most important parts of the image instead of working with everything in a unified way. “You have a lot of different neurons that can react, but only a small number are responding,” Kietzmann said.
The efficiency of All-TNN may have an impact on machine vision on low-power devices. However, Kietzmann and Lu stressed that energy efficiency is not their primary goal, nor is it the one they found more interesting in the results of the paper. Instead, they hope that new network architectures such as All-TNN will provide a more complete framework for understanding intelligence (both artificial and human).
Kietzmann notes that the pursuit of scale seems to be inconsistent with the understanding of the real brain (accesses that access much less data and use less energy). Networks that attempt to mimic human behavior can provide alternatives to the pursuit of scale (by using more training data and training on larger models with more parameters).
“This is a trend, and this feeling is too boring to understand the fundamental problem of how cognition arises,” Kietzmann said.
From your website article
Related articles around the Internet