Hamza Tahboub
Northeastern University. Research Assistant.
tahboub.h [at] northeastern [dot] edu
Hello! My name is Hamza, and I am a research assistant in Professor Huaizu Jiang’s Visual Intelligence Lab. I graduated from Northeastern University with a major in computer science and mathematics.
My research centers on multimodal learning, with a specific emphasis on social interaction and egocentric video to holistically understand human behavior. I am interested in social intelligence from both an understanding and a generation point of view. My work on the former was published in TMLR, in which I investigated why pre-trained VLMs struggle to model multiple social perception tasks simultaneously, uncovering a phenomenon we termed “social degradation” and overcoming it to achieve positive transfer across diverse social tasks.
Today, I am working on the generation side: I aim to improve the social coherence of video generation models to generate more realistic human-centric videos.
Research Experience
- Human-centric video generation with Joseph Gu and Huaizu Jiang
- January 2026 – Present
- Addressing the weakness of video generation models in generating socially coherent scenes.
- Building diffusion and flow models to address these gaps by explicitly modeling each agent in the video
- By representing intent, belief, and knowledge in a latent space, we can allow the model to perform higher-order reasoning about goals and actions.
- Addressing social degradation in pre-trained vision-language models with Professors Weiyan Shi, Gang Hua, and Huaizu Jiang
- February 2025 – December 2025
- Published in TMLR. [arxiv] [openreview]
- Led a project to unify different visual social interaction understanding tasks under one model, leveraging the synergies between diverse tasks to achieve positive transfer and competitive performance overall.
- Revealed popular VLMs of the same scale suffer a degradation impairing their social understanding and leading to negative transfer, which I uncovered comes from reduced social decodability of the visual representations after VLM training.
- Working on extending the work to handle complex compositional social tasks.
- Egocentric Werewolf strategy classification and utterance prediction with Harrison Kim and Professors Weiyan Shi and Huaizu Jiang
- January 2024 – January 2025
- Led a project to understand subtle social cues from an egocentric perspective.
- Significantly improved performance in strategy prediction over prior methods.
- Worked on producing a strategic game-playing agent, which eventually motivated a pivot to more general social interaction understanding (project #1 above).
- Modeling nuclei segmentation with Evan Liu and Harrison Kim @ Genentech gRED
- October 2023 – December 2023
- Contributed to novel approaches and implemented state-of-the-art methods for nuclei semantic segmentation as part of the Genentech Computer Vision R&D team.
- Medical QA fine-tuning with Dr. Michael Wu, Chloe Kim, and Ayush Zenith @ Genentech gRED
- July 2023 – December 2023
- Trained ensembles of language models and NER/RE models on large-scale in-house medical datasets.
- Designed and conducted extensive experiments to evaluate the performance of different models and techniques.
- Long-form audio-visual understanding with Huaizu Jiang
- September 2023 – December 2023
- Conducted extensive literature review to scope future research directions.
- Re-implemented from scratch papers like "Towards Long Form Audio-visual Video Understanding" in PyTorch.
- Visual common sense understanding with Alberto Mario Ceballos Arroyo and Professors Byron Wallace and Huaizu Jiang
- August 2022 – August 2023
- Focused first on visual question answering commonsense datasets and explored various approaches to solving the tasks.
- Pivoted to early concepts in reasoning like chain-of-thought (CoT) prompting, discovering that CoT prompting harmed the performance of smaller language models, contrary to popular belief at the time. We documented our findings in a preprint.