Convolutional neural networks are structurally inspired by the visual cortex and therefore provide an opportunity for modelling processes underlying human perception. However, the research on the effects of optical illusions in such models is currently insufficient and lacks plausible simulations that could explain their causes.
Previous attempts to investigate the Müller-Lyer illusion using CNNs were limited to binary classification and 2D pictures [1]. We propose a more natural approach that returns quantitative results suitable for statistical analysis and comparison with human performance.
To recreate real-life object size measurement, we fit a feed-forward convolutional model to estimate the height of a cuboid in a presented picture. To incorporate perspective effects perceived in the physical world and enhance model validity, we use renderings of a 3D scene as training images and the cuboid's randomized height parameters as target values. Each picture is an inside or an outside view of the cuboid's edge from a randomized distance. The geometry of this scene resembles the interiors and exteriors of various buildings.
The model processes 200x200 grayscale images and outputs the estimated height with a single linear-activated output neuron, solving a regression problem. By adjusting parameters and minimizing the mean squared error between estimated and target values, the model maximizes the accuracy of estimations.
Using transfer learning, we acquired height estimations for the inward and outward-facing arrow images of the Müller-Lyer optical illusion and found that the inward-facing arrows are consistently estimated larger than the outward-facing ones the same way it happens in human perception.
This evidence reinforces the idea of a connection between the Müller-Lyer illusion and the effects of perspective found in three-dimensional environments and encourages further research of illusions and perception using convolutional neural networks and other quantitative methods.