Publication / Research

Computer vision / FER2013 / Proc. SPIE

A comparative study on facial expression recognition using MobileNetV2, VGG-16, ResNet and Swin Transformer

Published in Proceedings of SPIE Volume 13545.

CNN facial-expression models compared with a Swin Transformer on FER2013.

MobileNetV2 VGG-16 ResNet Swin Transformer FER2013
Paper details Proc. SPIE
  • Published: 3 March 2025.
  • Proceedings Volume 13545, ICANCT 2024.
  • Conference context: Wuhan, China.
  • DOI: 10.1117/12.3060400.
Research signal Model comparison
Dataset

FER2013, 48x48 images, 7 emotion classes, and augmentation.

Evaluation

Accuracy, F1, precision, recall, loss, and confusion matrices.

Why it fits here

It connects research discipline with practical AI systems and interfaces.

At a glance

Short paper summary.

CNN backbones and a transformer compared under one FER setup.

04 model families across CNN and transformer approaches
07 FER2013 emotion categories
48x48 input image size
20+ training epochs with evaluation visuals
Summary Paper overview

This paper compares CNN local-feature extractors with a Swin Transformer's global-feature approach.

Method Evaluation setup
  • Pretrained MobileNetV2, VGG-16, ResNet, and Swin Transformer models.
  • FER2013 benchmark with 48x48 images across 7 classes.
  • 20+ training epochs with augmentation and visual review.
  • Accuracy, F1, precision, recall, loss, and confusion matrices.
Related demo

A live experiment extending the site's vision work.

The portfolio also includes a browser-based Phosphene Vision Simulator built with Pulse2Percept.

Phosphene demo Live experiment

Kairui (Alex)'s Phosphene Vision Simulator

This web demo lets users upload a JPG or PNG and generate AlphaAMS, ArgusII, and PRIMA simulations.

  • Lightweight browser demo for comparing prosthetic vision outputs.
  • Upload guidance: keep files under 50 KB and avoid sensitive images.
  • Shows scientific and assistive-tech interaction design.
Portfolio context Why it belongs here

The publication and simulator show research-informed computer vision, interface clarity, and practical demo building.