- Published: 3 March 2025.
- Proceedings Volume 13545, ICANCT 2024.
- Conference context: Wuhan, China.
- DOI: 10.1117/12.3060400.
Computer vision / FER2013 / Proc. SPIE
A comparative study on facial expression recognition using MobileNetV2, VGG-16, ResNet and Swin Transformer
Published in Proceedings of SPIE Volume 13545.
CNN facial-expression models compared with a Swin Transformer on FER2013.
FER2013, 48x48 images, 7 emotion classes, and augmentation.
Accuracy, F1, precision, recall, loss, and confusion matrices.
It connects research discipline with practical AI systems and interfaces.
Short paper summary.
CNN backbones and a transformer compared under one FER setup.
This paper compares CNN local-feature extractors with a Swin Transformer's global-feature approach.
- Pretrained MobileNetV2, VGG-16, ResNet, and Swin Transformer models.
- FER2013 benchmark with 48x48 images across 7 classes.
- 20+ training epochs with augmentation and visual review.
- Accuracy, F1, precision, recall, loss, and confusion matrices.
A live experiment extending the site's vision work.
The portfolio also includes a browser-based Phosphene Vision Simulator built with Pulse2Percept.
Kairui (Alex)'s Phosphene Vision Simulator
This web demo lets users upload a JPG or PNG and generate AlphaAMS, ArgusII, and PRIMA simulations.
- Lightweight browser demo for comparing prosthetic vision outputs.
- Upload guidance: keep files under 50 KB and avoid sensitive images.
- Shows scientific and assistive-tech interaction design.
The publication and simulator show research-informed computer vision, interface clarity, and practical demo building.