In short...

The artificial neural network powering expoze.io was first exposed to millions of images to learn to detect objects, and subsequently trained on thousands of images to learn to generate the correct attention heatmaps. 

It uses all this experience when generating the attention prediction results for your image or video. That is why it is possible that when you click “create new heatmap” your attention heatmap matches real eye tracking results with an accuracy of 87%.

How does expoze.io work?

How does expoze.io work?

With expoze.io you can generate heatmaps and scores reflecting which areas of an image or video are likely to draw attention. Our platform allows you to analyze images and videos in a matter of minutes, without needing participants for your results. 

The results match traditional eye tracking with 87% accuracy. Here we explain the science behind how this is possible.

It's ANN for short

The attention predictions on our platform are generated by an Artificial Neural Network (ANN). An ANN is a collection of nodes that are connected. Each connection has a weight determining how much one node impacts the next. The nodes are aggregated into layers

An ANN is trained using a large set of training data. During training the weights of the network are updated. If trained correctly, it can transform any input into the correct output.

It's ANN for short

Architecture of network

Architecture of network Architecture of network

The attention prediction produced by expoze.io is generated using a so-called generative adversarial network, or GAN. The input is the RGB values of your image or video. The output is the attention prediction. On the input side a convolutional neural network, or ConvNet, is used that was pre-trained on over 14 million images to detect objects, and therefore it already holds a latent representation of object identity.

Our training data

Our training data

Our network was trained on the attention data of thousands of participants. These participants together looked at more than 10,000 images

Each image was seen by at least 50 participants. This resulted in a training data set with a “gold truth” attention heatmap for each image.

Training our Neural Net

Before training, the GAN has random weights, resulting in faulty heatmaps. The difference between the predicted heatmap and the “gold truth” heatmap is the error. Mathematical procedures subsequently adjust the weights in a way that reduces this error. 

After a large number of training rounds, the network converges to producing heatmaps that are more and more similar to the “gold truth”.

Training our Neural Net
Validating the results

Validating the results

To measure the similarity between expoze.io predictions and real eye tracking we used the MIT saliency benchmark. We calculated the area under the curve score (AUC-Judd). A perfect match is 1.00. 

Even eye tracking data from infinite participants doesn’t perfectly match data from a different set of infinite participants; the AUC score in that case is 0.92. The AUC score for expoze.io is only a little lower, 0.87, which is higher than many competing solutions.

Want to learn more? We go into more detail in our white paper!

DOWNLOAD WHITEPAPER