Highlight Teaching

AI Seminar Cycle

The AI Seminar Cycle is organized by Hi! PARIS, in collaboration with the ELLIS program on Theory, Algorithms, and Computations of Modern Learning Systems and the Paris ELLIS Unit
This partnership aims to broaden the visibility of the seminar among the European AI research community.

No Free Lunch : “From a Simultaneous (Machine) Learning Impossibility to Heisenberg Uncertainty Principle”

Thursday, October 2 (2 PM – 3:30 PM) – Hybrid: Amphitheater Becquerel, École polytechnique (Click to register) I Online

Research Session : “No labels, no training: Leveraging Language for Detecting Anomalous Events in Videos”

Wednesday, November 5 (11 AM – 12 PM) – Online (click to register)

Spyros Gidaris
Latent Representations for Better Generative Image Modeling

Wednesday, December 10 (11 AM – 12 PM) – Hybrid (click to register)

Abstract: This talk explores how latent representations shape modern generative models. While latent spaces (like those in VQ-VAE and VQ-GAN) are central to today’s generative architectures—from diffusion models to autoregressive approaches—their structure and properties are often overlooked. I will present three works that refine or leverage latent representations for better generative modeling. 

First, EQ-VAE addresses a key limitation in existing autoencoders used in latent-based generative models: their latent spaces lack equivariance to simple semantic-preserving transformations like rotation or scaling, making generation harder. We introduce a simple regularization method that enforces equivariance, reducing its complexity without degrading reconstruction quality. This improves multiple state-of-the-art models (DiT, SiT, MaskGIT) and speeds up training.

Next, ReDi integrates pretrained semantic features into latent diffusion models. Instead of just generating low-level image latents, we jointly model them with high-level semantic features (e.g., from DINOv2). This unified approach boosts image quality and training efficiency while enabling “Representation Guidance”, a simple way to steer generation using learned semantics.

Finally, DINO-Foresight tackles video prediction. We predict future frames in the semantic feature space of pretrained vision foundation models (e.g., from DINOv2), avoiding pixel-level inefficiencies. This makes forecasting simpler, faster, and more robust, enabling flexible adaptation to downstream tasks.

Together, these works highlight how better latent representations can simplify, accelerate, and improve generative modeling.

Research Session on Vision

Wednesday, February 4 (11 AM – 12 PM) – Hybrid (click to register)

Research Session on “Small talk is harder than it sounds”

Wednesday, March 4 (11 AM – 12 PM) – Online (click to register)

Abstract: Making a machine able to converse in a fluid and relevant manner with a user is challenging. In this talk, I will present several projects conducted at Kyutai on voice-based interactions. They rely on carefully designed and trained multistream causal architectures that can accommodate different usecases : spoken dialogue, visual-speech modeling, simultaneous translation, speech transcription and synthesis, dialogue assistance for ALS patients. Live demos might be proposed. 

A spectral framework for closed-form relative density estimation

Wednesday, June 3 (11 AM – 12 PM) – Online (click to register)

Abstract: Estimating relative densities and information-theoretic divergences from samples is a central problem in statistics and machine learning, but standard variational approaches to Kullback-Leibler (KL) divergence estimation often require nonlinear optimization and may suffer from numerical instability because of exponential terms. This talk presents a closed-form spectral framework for relative log-density estimation in linearly parameterized probabilistic models, including unnormalized and conditional models. The key idea is to express the KL divergence as an integral of weighted chi-squared divergences. This converts divergence estimation into a family of least-squares problems and yields explicit spectral formulas depending only on first- and second-order feature moments. The framework extends naturally to kernel methods and learned representations, including neural networks, with convergence guarantees and efficient learning algorithms in both settings.

Scientific Committee