A comparative study on facial expression recognition using MobileNetV2, VGG-16, ResNet and Swin Transformer

Published in Proceedings of SPIE Volume 13545.

CNN facial-expression models compared with a Swin Transformer on FER2013.

MobileNetV2 VGG-16 ResNet Swin Transformer FER2013

Read at SPIE Back to portfolio Open phosphene demo

Dataset

FER2013, 48x48 images, 7 emotion classes, and augmentation.

Evaluation

Accuracy, F1, precision, recall, loss, and confusion matrices.

Why it fits here

It connects research discipline with practical AI systems and interfaces.

At a glance

Short paper summary.

CNN backbones and a transformer compared under one FER setup.

04 model families across CNN and transformer approaches

07 FER2013 emotion categories

48x48 input image size

20+ training epochs with evaluation visuals

This paper compares CNN local-feature extractors with a Swin Transformer's global-feature approach.

Related demo

The portfolio also includes a browser-based Phosphene Vision Simulator built with Pulse2Percept.

This web demo lets users upload a JPG or PNG and generate AlphaAMS, ArgusII, and PRIMA simulations.

Open live demo

The publication and simulator show research-informed computer vision, interface clarity, and practical demo building.

Back to demo portfolio