Ozgur Kara

I am a Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), where I am advised by Professor James M. Rehg. My research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. I design computationally efficient models for long-form video editing, develop frameworks that give users precise control over story and appearance, and build safe systems to protect visual media from unauthorized manipulation.

I have since completed research internships at Adobe with Tobias Hinz in Summer 2024 (multi-shot video generation), and am currently at Google working with Du Tran (video generation).

I am open to opportunities for collaboration and am always interested in discussing new research ideas. Please feel free to contact me via email.

I am always looking for self-motivated students who want to focus on Generative AI-related projects. Feel free to reach out to me if you are interested and located at UIUC.

πŸ“ƒ Download my CV.

Email  /  Google Scholar  /  Github  /  LinkedIn  /  Twitter  /  Some Travels

profile photo

Health Care Engineering Systems Center

1206 W Clark St. UIUC

Urbana, IL, USA, 61801

Newsβ€” scroll down for more ↓

  • Sep 2025:πŸ† I was recognized as an Outstanding Reviewer at ICCV 2025 (top %4).
  • Sep 2025:πŸ“„ Thrilled to share that our paper, DiffEye, on generating continuous eye-tracking data has been accepted to NeurIPS 2025!
  • Aug 2025:πŸ—“οΈ The 7th edition of our workshop, CVEU, was conducted at SIGGRAPH 2025, where I served as a co-organizer.
  • Jun 2025:πŸ† I was recognized as an Outstanding Reviewer at CVPR 2025.
  • Jun 2025:πŸ“„ Our paper, ShotAdapter, was accepted to CVPR 2025.
  • May 2025:πŸ‘¨β€πŸ’»β€ I started my summer internship at Google (BAIR).
  • Dec 2024:πŸŽ‰ The 6th edition of our workshop, CVEU, has been accepted for CVPR 2025, where I serve as the primary organizer.
  • Sep 2024:πŸ‘¨β€πŸŽ“β€ I transferred to the University of Illinois Urbana-Champaign to continue my PhD!
  • Aug 2024:πŸ† I was recognized as an Outstanding Reviewer at ECCV 2024.
  • Jul 2024:πŸ“„ Our paper on Point Tracking was accepted to the ECCV 2024 ILR Workshop.
  • Jun 2024:πŸ“„ Our paper, RAVE, was accepted as a Highlight at CVPR 2024.
  • May 2024:πŸ‘¨β€πŸ’»β€ I started my summer internship at Adobe (FireflyTeam).
  • Mar 2024:πŸ“„ Our paper on Sign Language Recognition was accepted to FG 2024.
  • Jul 2023:🏫 I attended the International Computer Vision Summer School (ICVSS).
  • Jun 2023:🏫 I participated in the CIMPA Research School on Graph Structure.
  • Aug 2022:πŸŽ“ I started my PhD at Georgia Institute of Technology with Prof. James M. Rehg.
  • Jul 2022:πŸ“„ Our paper on Fair Affective Robotics was accepted to the LEAP-HRI Workshop.
  • Jun 2022:πŸ“„ Our paper, ISNAS-DIP, was accepted to CVPR 2022.
  • May 2022:πŸ‘¨β€πŸ’»β€ I started my summer research at EPFL in the VILAB.
  • Aug 2021:βœ… I successfully completed the Google Summer of Code program.

Education

UIUC Logo
PhD, Computer Science 2024 - Present
University of Illinois Urbana-Champaign
Advisor: Founder Prof. James M. Rehg
  • Overall GPA: 4.00/4.00
Georgia Tech Logo
PhD (transferred) and MSc, Computer Science 2022 - 2024
Georgia Institute of Technology
Advisor: Founder Prof. James M. Rehg
  • Overall GPA: 4.00/4.00
  • ECE Departmental Fellowship (2022-2024)
  • Otto F. and Jenny H. Krauss Fellowship (2022-2023)
Bogazici University Logo
BSc, Electrical-Electronics Engineering 2018 - 2022
Bogazici University
Advisors: Prof. Lale Akarun, Prof. Murat Saraclar
Kadikoy Anadolu High School Logo
High School, Math and Science 2013 - 2018
Kadikoy Anadolu High School
  • Republic Honour Award (Awarded to one student in a graduating class of 340).

Research Experienceβ€” scroll down for more ↓

UIUC & Georgia Tech Logo
UIUC / Georgia Tech, Rehg Lab 2023 Spring - Present
Supervised by Founder Prof. James M. Rehg
  • Diffusion-based model for human gaze trajectory generation (NeurIPS 2025).
  • Optimization-free scalable image immunization against diffusion-based editing (In Submission 2025).
  • Multi-shot video generation (CVPR 2025).
  • Zero-shot text guided video editing with diffusion models (CVPR 2024 Highlight).
  • Contributed to the preparation of a survey on Understanding Social AI (ArXiv).
  • Finding learnable directions in latent space of diffusion models for mitigating bias (poster at ICVSS 2023).
  • Point tracking with novel regularization loss in collaboration with Toyota Research Institute (ECCV Workshop 2024).
EPFL Logo Supervised by Asst. Prof. Amir Zamir
  • Accepted to Summer@EPFL (2% admission rate).
  • Interpretability and explainability of Vision Transformer (ViT).
Bogazici University Logo
Bogazici University, Perceptual Intelligence Lab 2021 Fall - 2022 Spring
Supervised by Prof. Lale Akarun
  • Transfer learning for under-resourced sign language recognition (IEEE FG 2024).
ETH Zurich Logo
ETH Zurich, Computer Vision Lab 2021 - 2022
Supervised by Assoc. Prof. Ender Konukoglu
  • Training-free neural architecture search for image restoration (CVPR 2022).
University of TΓΌbingen Logo
University of TΓΌbingen, Explainable ML Group 2021 Spring
Supervised by Prof. Zeynep Akata and Yongqin Xian
  • Few-shot and generalized zero-shot learning using generative models.
University of Cambridge Logo
University of Cambridge, Affective Intelligence Lab 2020 - 2021
Supervised by Prof. Hatice Gunes
Bogazici University Logo
Bogazici University, Nanonetworking Research Group 2019 - 2020
Supervised by Prof. Ali Emre Pusane

Industry Experience

Google Logo [May 2025 – Present] Research Intern
Google, Pixel Biometrics AI Research (BAIR)
  • Working on generative video layer decomposition/composition.
  • Running large-scale distributed training on a video dataset using a video diffusion transformer model.
  • Advisor: Du Tran Collaborators: Yujia Chen, Vincent Chu, & Prof. Ming-Hsuan Yang.
Adobe Logo [May 2024 – December 2024] Research Intern
Adobe, Firefly Team
  • Worked on diffusion transformer-based long-term multi-shot video generation.
  • A CVPR 2025 paper was published and a patent has been filed for this work.
  • Experienced large-scale distributed training on an internal Adobe video model.
  • Advisor: Tobias Hinz Collaborators: Krishna Kumar Singh, Prof. Feng Liu, & Duygu Ceylan.

Publications

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

O. Kara*, H. Nisar*, J. M. Rehg (* denotes equal contribution)
The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
We propose DiffEye, a diffusion-based generative model for creating realistic, raw eye-tracking trajectories conditioned on natural images, which outperforms existing methods on scanpath generation tasks.
Project Webpage / Paper
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing

T. C. Ozden*, O. Kara*, O. Akcin, K. Zaman, S. Srivastava, S. P. Chinchali, J. M. Rehg (* denotes equal contribution)
In Submission, 2025
DiffVax is an optimization-free image immunization framework that effectively protects against diffusion-based editing, generalizes to unseen content, is robust against counter-attacks, and shows promise in safeguarding video content.
Project Webpage / Paper
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models

O. Kara, K. K. Singh, F. Liu, D. Ceylan, J. M. Rehg, T. Hinz
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
ShotAdapter enables text-to-multi-shot video generation with minimal fine-tuning, providing users control over shot number, duration, and content through shot-specific text prompts, along with a multi-shot video dataset collection pipeline.
Project Webpage / Paper
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

O. Kara, B. Kurtkaya, H. Yesiltepe, J. M. Rehg, P. Yanardag
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight)
RAVE is a zero-shot, lightweight, and fast framework for text-guided video editing, supporting videos of any length utilizing text-to-image pretrained diffusion models.
Project Webpage / Paper / Code / HuggingFace Demo / Video
Towards Social AI: A Survey on Understanding Social Interactions

S. Lee, M. Li, B. Lai, B. Jia, F. Ryan, X. Cao, O. Kara, B. Boote, B. Shi, D. Yang, J. M. Rehg
In Submission, 2025
This is the first survey to provide a comprehensive overview of machine learning studies on social understanding, encompassing both verbal and non-verbal approaches.
Paper
Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets

A. Kindiroglu*, O. Kara*, O. Ozdemir, L. Akarun (* denotes equal contribution)
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2024
This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets.
Paper / Code
Leveraging Object Priors for Point Tracking

B. Boote, N. A. Thai, W. Jia, O. Kara, S. Stojanov, J. M. Rehg, S. Lee
Instance-Level Recognition (ILR) Workshop at European Conference on Computer Vision (ECCV), 2024 (Oral)
We propose a novel objectness regularization approach that guides points to be aware of object priors by forcing them to stay inside the the boundaries of object instances.
Paper / Code
Domain-Incremental Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition

N. Churamani, O. Kara, H. Gunes
IEEE Transactions on Affective Computing, 2022
we propose the novel use of Continual Learning (CL), in particular, using Domain-Incremental Learning (Domain-IL) settings, as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems.
Paper / Code
Molecular Index Modulation using Convolutional Neural Networks

O. Kara, G. Yaylali, A. Pusane, T. Tugcu
Nano Communication Networks Journal, 2022
We propose a novel convolutional neural network-based architecture for a uniquely designed molecular multiple-input-single-output topology, aimed at mitigating the detrimental effects of molecular interference in nano molecular communication.
Paper / Code
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

M. Arican*, O. Kara*, G. Bredell, E. Konukoglu (* denotes equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
ISNAS-DIP is an image-specific Neural Architecture Search (NAS) strategy designed for the Deep Image Prior (DIP) framework, offering significantly reduced training requirements compared to conventional NAS methods.
Paper / Code / Video
Towards Fair Affective Robotics: Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition

O. Kara, N. Churamani, H. Gunes
Workshop on Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI), 2021
We propose the novel use of Continual Learning (CL) as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems.
Paper / Code
Neuroweaver: a platform for designing intelligent closed-loop neuromodulation systems

P. Sarikhani, H. Hsu, O. Kara, J. Kim, H. Esmaeilzadeh, B. Mahmoudi
Brain Stimulation: Basic, Translational, and Clinical Research in Neuromodulation, Elsevier, 2021
Our interactive platform enables the design of neuromodulation pipelines through a visually intuitive and user-friendly interface. (Google Summer of Code 2021 project)
Paper / Code

Service & Recognition

  • Served as a mentor in the Google Summer of Code program in 2022, 2023, 2024, and 2025. 2025
  • Recognized as an Outstanding Reviewer at ECCV, ranked among the top 10% of all reviewers. 2024
  • Attended the 2023 International Computer Vision Summer School (ICVSS), ranked among the top 25% of 614 applicants (approximately 154 individuals), [Project] 2023
  • Participated in the CIMPA Research School on Graph Structure and Complex Network Analysis. 2023
  • Attended the highly competitive Summer@EPFL program, with a 2% acceptance rate. 2022
  • Placed among the top 50 teams worldwide in the Google Developer’s Solution Challenge, selected from over 5,000 teams, [Project] 2022
  • Placed 3rd in the Yildiz Bootcamp and was directly invited to the Yildiz Technopark Pre-Incubation Program. 2022
  • Successfully completed Google Summer of Code with the project "Graphical User Interface for OpenAI Gym," selected among 1,205 students from 6,991 applicants (17% acceptance rate), [Project] 2021
  • Placed 3rd out of 172 projects (top 1.7%) in the TUBITAK Undergraduate Research Project Competition. 2021
  • Ranked among the top 10 teams regionwide in the Google Solution Challenge with the project titled "Torch in Darkness." 2020
  • Placed 1st out of 100 projects (top 1%) in the TUBITAK Undergraduate Research Project Competition for the project "Joint Depth Estimation and Object Detection Software," [Project] 2020
  • Placed 3rd out of 15 projects (top 20%) in the IEEE METU Pixery Hackathon with the project "Mobile Application for Blind People," [Project] 2020
  • Finalist among 81 teams in the Turkish Airlines Travel Datathon. 2019
  • Ranked 180th out of 2 million (top 0.009%) in the Turkish National University Entrance Exam. 2018
  • Received the Republic Honour Award at Kadikoy Anadolu High School, awarded to one student out of 340 annually. 2018
  • Placed 3rd nationwide in the TUBITAK High School Research Project Competition for the project "Drone for Landmine Detection Using GPS," [Project] 2018
  • Placed 1st regionwide in the TUBITAK High School Research Project Competition for the project "An Autonomous Hexapod for Helping Search Teams After Earthquake," [Project] 2017
  • Accepted into the CS Bridge program, a two-week programming course led by Stanford TAs. 2016

This website is adapted from this source code.