Electrical and Electronics Engineering Institute

University of the Philippines - Diliman

Ubiquitous Computing Laboratory

The core mission of UCL is to build technologies that will enable computing devices to become pervasive and useful to our society. Research projects in UCL include AI for edge devices (tinyML), multimodal learning (vision, speech, text, point cloud), synthetic people, human-computer interfaces,  autonomous robots and IoT protocols.

Some of our recently completed projects and publications (2021 and 2022):

Depth Pruning with Auxiliary Networks for tinyML

(to appear in IEEE ICASSP 2022)
by Josen De Leon (UP and Samsung R&D PH) and Rowel Atienza

We developed a technique that significantly reduces the footprint and latency of deep learning models by removing the model head and replacing it with an efficient auxiliary network. On the Visual Wake Word dataset (VWW or person detection dataset), we found out that we can reduce the number of parameters by up to 93% while reducing the accuracy by only 0.65%. When deployed on ARM Cortex-M0, the MobileNet-V1 footprint is reduced from 336KB to 71KB and the latency from 904ms to 551ms while counter-intuitively increasing the accuracy by 1%.


Improving Model Generalization by Agreement of Learned Representations from Data Augmentation 

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022.
by Rowel Atienza
GitHub: https://github.com/roatienza/agmax

If data augmentation improves model performance, why not use two different data augmentations on a given input and require that the model agree on the output. This simple idea further improves the model performance on recognition, segmentation and detection. We call our method Agreement Maximization or AgMax.


Vision Transformer for Fast and Efficient Scene Text Recognition

International Conference on Document Analysis and Recognition (ICDAR) 2021. Link: ArXiv and Springer Nature
by Rowel Atienza
GitHub: https://github.com/roatienza/deep-text-recognition-benchmark

Scene text recognition (STR) enables computers to read text in the natural human environment such as billboards, road signs, product labels and paper bills. STR is more challenging compared to the more structured problem of OCR.  State of the art (SOTA) STR models are complex (made of 3 stages) and slow. Using a vision transformer, we built a fast, efficient and robust single stage STR model with comparable performance to SOTA. We call our model ViTSTR.


Scene Text Recognition using Permuted Language Modelling

by Darwin Bautista and Rowel Atienza

ViTSTR is fast and efficient but is clueless when some parts of the text image is corrupted. As humans, we can read scene text even if it is partially corrupted since we use language to predict what the characters are. Using this idea, we built a language model into our STR model and trained it using permuted language modelling. We achieved state of the art results that outperform the best STR models such as TRBA, ABINet and ViTSTR.

Network Architecture of STR with Permuted Language Modeling


Gaze on Objects

IEEE/CVF Computer Vision and Pattern Recognition (CVPR) Workshops 2021.
by Henri Tomas, Marcus Reyes, Raimarc Dionido, Mark Ty, Jonric Mirando, Joel Casimiro, Rowel Atienza and Richard Guinto (Samsung R&D PH)
GitHub: ​​https://github.com/upeee/GOO-GAZE2021

Gaze signifies a person’s focus of attention. However, there is a lack of dataset that can be used to explicitly train models using gaze and the object of attention. In Gaze on Objects (GOO), we built a dataset of both synthetic and real people looking at objects. Our dataset is publicly available on github.