This session explores how redundancy in Transformer-based large language models (LLMs) impacts efficiency and performance, particularly focusing on the critical components: Attention and MLP layers. Attendees will learn about techniques like Attention Drop and Joint Layer Drop, which remove redundant components to significantly improve memory efficiency and inference speed while maintaining model performance. Using examples from Llama-2 and Mistral-7B, the session will delve into structured pruning methods, trade-offs, and their applicability in real-world scenarios. This talk provides valuable insights for researchers and engineers aiming to design more compact, efficient architectures for NLP tasks.
Speaker(s)
Ayoub Benachour
5th year AI engineering student at UEMF
I am Ayoub Benachour, a 23 yo final-year Artificial Intelligence engineering student at the Euromed University of Fes. My academic journey has been focused on NLP and computer vision, with hands-on experience in optimizing large language models and developing AI-powered solutions. I've worked on projects ranging from creating a Darija speech-to-text model and building mental health chatbots to detecting yellowfin tuna using YOLO models for sustainable fishing practices.
My passion lies in bridging the gap between cutting-edge AI research and real-world applications. I’ve explored model efficiency through projects like implementing GPT-2 from scratch and compressing transformers for faster deployment. I’ve also participated in hackathons, where I’ve tackled challenges like improving accessibility for dyslexic students and advancing mental health support systems.
I enjoy sharing my knowledge and engaging with the AI community, whether through conferences, workshops, or hands-on collaborations. My goal is to inspire and contribute to building efficient and inclusive AI solutions that leave a meaningful impact.
Made with ❤️ by Geeksblabla Team
| © 2025 Geeksblabla | All Rights Reserved