Hamza Tahboub

Hello! My name is Hamza, and I am a research assistant in Professor Huaizu Jiang's Visual Intelligence Lab at Northeastern University. I graduated from Northeastern with a major in computer science and mathematics.

Hamza Tahboub profile photo

Research

My research focuses on video understanding and generation for social interaction: how models can understand, simulate, and generate people, relationships, goals, and behavior. I view social intelligence as a two-sided problem, where better models of human behavior can inform more coherent human-centric generation, and generation can reveal what our models truly understand. To that end, I study both human-centric methods and general methods for multimodal understanding and generation.

  1. Human-centric video generation with Professor Huaizu Jiang
    • January 2026 – Present
    • Addressing the weakness of video generation models in generating socially coherent scenes.
    • Building diffusion and flow models to address these gaps by explicitly modeling each agent in the video.
    • By representing intent, belief, and knowledge in a latent space, we can allow the model to perform higher-order reasoning about goals and actions.
  2. Addressing social degradation in pre-trained vision-language models with Professors Weiyan Shi, Gang Hua, and Huaizu Jiang
    • February 2025 – December 2025
    • Published in TMLR. [arxiv] [openreview]
    • Led a project to unify different visual social interaction understanding tasks under one model, leveraging the synergies between diverse tasks to achieve positive transfer and competitive performance overall.
    • Revealed popular VLMs of the same scale suffer a degradation impairing their social understanding and leading to negative transfer, which I uncovered comes from reduced social decodability of the visual representations after VLM training.
    • Working on extending the work to handle complex compositional social tasks.
  3. Egocentric Werewolf strategy classification and utterance prediction with Harrison Kim and Professors Weiyan Shi and Huaizu Jiang
    • January 2024 – January 2025
    • Led a project to understand subtle social cues from an egocentric perspective.
    • Significantly improved performance in strategy prediction over prior methods.
    • Worked on producing a strategic game-playing agent, which eventually motivated a pivot to more general social interaction understanding.
  1. Modeling nuclei segmentation with Evan Liu and Harrison Kim @ Genentech gRED
    • October 2023 – December 2023
    • Contributed to novel approaches and implemented state-of-the-art methods for nuclei semantic segmentation as part of the Genentech Computer Vision R&D team.
  2. Medical QA fine-tuning with Dr. Michael Wu, Chloe Kim, and Ayush Zenith @ Genentech gRED
    • July 2023 – December 2023
    • Trained ensembles of language models and NER/RE models on large-scale in-house medical datasets.
    • Designed and conducted extensive experiments to evaluate the performance of different models and techniques.
  3. Long-form audio-visual understanding with Huaizu Jiang
  4. Visual common sense understanding with Alberto Mario Ceballos Arroyo and Professors Byron Wallace and Huaizu Jiang
    • August 2022 – August 2023
    • Focused first on visual question answering commonsense datasets and explored various approaches to solving the tasks.
    • Pivoted to early concepts in reasoning like chain-of-thought prompting, discovering that prompting with intermediate reasoning harmed the performance of smaller language models, contrary to popular belief at the time. We documented our findings in a preprint.

Based on Jon Barron's website.