pfp.png

Hamza Tahboub

Northeastern University. Research Assistant.
tahboub.h [at] northeastern [dot] edu

Hello! My name is Hamza, and I am a research assistant in Professor Huaizu Jiang’s Visual Intelligence Lab. I graduated from Northeastern University with a major in computer science and mathematics.

My research centers on multimodal learning, with a specific emphasis on social interaction and egocentric video to holistically understand human behavior. I am interested in social intelligence from both an understanding and a generation point of view. My work on the former was published in TMLR, in which I investigated why pre-trained VLMs struggle to model multiple social perception tasks simultaneously, uncovering a phenomenon we termed “social degradation” and overcoming it to achieve positive transfer across diverse social tasks.

Today, I am working on the generation side: I aim to improve the social coherence of video generation models to generate more realistic human-centric videos.

Research Experience

  1. Human-centric video generation with Joseph Gu and Huaizu Jiang
    • January 2026 – Present
    • Addressing the weakness of video generation models in generating socially coherent scenes.
    • Building diffusion and flow models to address these gaps by explicitly modeling each agent in the video
    • By representing intent, belief, and knowledge in a latent space, we can allow the model to perform higher-order reasoning about goals and actions.
  2. Addressing social degradation in pre-trained vision-language models with Professors Weiyan Shi, Gang Hua, and Huaizu Jiang
    • February 2025 – December 2025
    • Published in TMLR. [arxiv] [openreview]
    • Led a project to unify different visual social interaction understanding tasks under one model, leveraging the synergies between diverse tasks to achieve positive transfer and competitive performance overall.
    • Revealed popular VLMs of the same scale suffer a degradation impairing their social understanding and leading to negative transfer, which I uncovered comes from reduced social decodability of the visual representations after VLM training.
    • Working on extending the work to handle complex compositional social tasks.
  3. Egocentric Werewolf strategy classification and utterance prediction with Harrison Kim and Professors Weiyan Shi and Huaizu Jiang
    • January 2024 – January 2025
    • Led a project to understand subtle social cues from an egocentric perspective.
    • Significantly improved performance in strategy prediction over prior methods.
    • Worked on producing a strategic game-playing agent, which eventually motivated a pivot to more general social interaction understanding (project #1 above).
  1. Modeling nuclei segmentation with Evan Liu and Harrison Kim @ Genentech gRED
    • October 2023 – December 2023
    • Contributed to novel approaches and implemented state-of-the-art methods for nuclei semantic segmentation as part of the Genentech Computer Vision R&D team.
  2. Medical QA fine-tuning with Dr. Michael Wu, Chloe Kim, and Ayush Zenith @ Genentech gRED
    • July 2023 – December 2023
    • Trained ensembles of language models and NER/RE models on large-scale in-house medical datasets.
    • Designed and conducted extensive experiments to evaluate the performance of different models and techniques.
  3. Long-form audio-visual understanding with Huaizu Jiang
  4. Visual common sense understanding with Alberto Mario Ceballos Arroyo and Professors Byron Wallace and Huaizu Jiang
    • August 2022 – August 2023
    • Focused first on visual question answering commonsense datasets and explored various approaches to solving the tasks.
    • Pivoted to early concepts in reasoning like chain-of-thought (CoT) prompting, discovering that CoT prompting harmed the performance of smaller language models, contrary to popular belief at the time. We documented our findings in a preprint.