Ozgur Kara

I am a Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), where I am advised by Professor James M. Rehg. My research is focused on advancing the capabilities of video generative AI.

My research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. I design computationally efficient models for long-form video editing, develop frameworks that give users precise control over story and appearance, and build safe systems to protect visual media from unauthorized manipulation.

Before my PhD, I conducted research on continual learning to mitigate bias with Prof. Hatice Gunes (Affective Intelligence and Robotics Lab, University of Cambridge), on zero-shot learning with GANs with Prof. Zeynep Akata and Dr. Yongqin Xian (University of Tubingen), and on neural architecture search & image restoration with Prof. Ender Konukoglu (Computer Vision Lab, ETH Zurich).

I have since completed research internships at Adobe with Tobias Hinz in Summer 2024 (multi-shot video generation), and am currently at Google working with Du Tran (video generation).

I am open to opportunities for collaboration and am always interested in discussing new research ideas. Please feel free to contact me via email.

I am always looking for self-motivated students who want to focus on Generative AI-related projects. Feel free to reach out to me if you are interested and located at UIUC.

πŸ“ƒ Download my CV.

Email  /  Google Scholar  /  Github  /  LinkedIn  /  Twitter  /  Some Travels

profile photo

Health Care Engineering Systems Center

1206 W Clark St. UIUC

Urbana, IL, USA, 61801

Education

  • PhD, Computer Science - University of Illinois Urbana-Champaign (2024 - Present) (GPA: 4.00/4.00)
  • PhD (transferred) and MSc, Computer Science - Georgia Institute of Technology (2022 - 2024) (GPA: 4.00/4.00)
  • BSc, Electrical-Electronics Engineering - Bogazici University (2018 - 2022) (GPA: 3.92/4.00)
  • High School, Math and Science - Kadikoy Anadolu High School (2013 - 2018)

News

  • Sep 2025: πŸ“„ Thrilled to share that our paper, DiffEye, on generating continuous eye-tracking data has been accepted to NeurIPS 2025!
  • Aug 2025: πŸ—“οΈ The 7th edition of our workshop, CVEU, was conducted at SIGGRAPH 2025, where I served as a co-organizer.
  • Jun 2025: πŸ† I was recognized as an Outstanding Reviewer at CVPR 2025.
  • Jun 2025: πŸ“„ Our paper, ShotAdapter, was accepted to CVPR 2025.
  • May 2025: πŸ‘¨β€πŸ’»β€ I started my summer internship at Google (BAIR).
  • Dec 2024: πŸŽ‰ The 6th edition of our workshop, CVEU, has been accepted for CVPR 2025, where I serve as the primary organizer.
  • Sep 2024: πŸ‘¨β€πŸŽ“β€ I transferred to the University of Illinois Urbana-Champaign to continue my PhD!
  • Aug 2024: πŸ† I was recognized as an Outstanding Reviewer at ECCV 2024.
  • Jul 2024: πŸ“„ Our paper on Point Tracking was accepted to the ECCV 2024 ILR Workshop.
  • Jun 2024: πŸ“„ Our paper, RAVE, was accepted as a Highlight at CVPR 2024.
  • May 2024: πŸ‘¨β€πŸ’»β€ I started my summer internship at Adobe (Firefly Team).
  • Mar 2024: πŸ“„ Our paper on Sign Language Recognition was accepted to FG 2024.
  • Jul 2023: 🏫 I attended the International Computer Vision Summer School (ICVSS).
  • Jun 2023: 🏫 I participated in the CIMPA Research School on Graph Structure.
  • Aug 2022: πŸŽ“ I started my PhD at Georgia Institute of Technology with Prof. James M. Rehg.
  • Jun 2022: πŸ“„ Our paper, ISNAS-DIP, was accepted to CVPR 2022.
  • May 2022: πŸ‘¨β€πŸ’»β€ I started my summer research at EPFL in the VILAB.
  • Aug 2021: βœ… I successfully completed the Google Summer of Code program.
  • Jul 2021: πŸ“„ Our paper on Fair Affective Robotics was accepted to the LEAP-HRI Workshop.

Industry Experience

  • [May 2025 – Present] Research Intern
    Google, Pixel Biometrics AI Research (BAIR)
    • Working on generative video layer decomposition/composition.
    • Running large-scale distributed training on a video dataset using a video diffusion transformer model.
    • Supervisors: Du Tran, Yujia Chen, Vincent Chu, & Prof. Ming-Hsuan Yang.

  • [May 2024 – December 2024] Research Intern
    Adobe, Firefly Team
    • Worked on diffusion transformer-based long-term multi-shot video generation.
    • A CVPR 2025 paper was published and a patent has been filed for this work.
    • Experienced large-scale distributed training on an internal Adobe video model.
    • Supervisors: Tobias Hinz, Krishna Kumar Singh, Prof. Feng Liu, & Duygu Ceylan.

Publications

DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images

O. Kara*, H. Nisar*, J. M. Rehg (* denotes equal contribution)
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
We propose DiffEye, a diffusion-based generative model for creating realistic, raw eye-tracking trajectories conditioned on natural images, which outperforms existing methods on scanpath generation tasks.
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing

T. C. Ozden*, O. Kara*, O. Akcin, K. Zaman, S. Srivastava, S. P. Chinchali, J. M. Rehg (* denotes equal contribution)
In Submission, 2025
DiffVax is an optimization-free image immunization framework that effectively protects against diffusion-based editing, generalizes to unseen content, is robust against counter-attacks, and shows promise in safeguarding video content.
Project Webpage / Paper
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models

O. Kara, K. K. Singh, F. Liu, D. Ceylan, J. M. Rehg, T. Hinz
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
ShotAdapter enables text-to-multi-shot video generation with minimal fine-tuning, providing users control over shot number, duration, and content through shot-specific text prompts, along with a multi-shot video dataset collection pipeline.
Project Webpage / Paper
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models

O. Kara, B. Kurtkaya, H. Yesiltepe, J. M. Rehg, P. Yanardag
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight)
RAVE is a zero-shot, lightweight, and fast framework for text-guided video editing, supporting videos of any length utilizing text-to-image pretrained diffusion models.
Project Webpage / Paper / Code / HuggingFace Demo / Video
Towards Social AI: A Survey on Understanding Social Interactions

S. Lee, M. Li, B. Lai, B. Jia, F. Ryan, X. Cao, O. Kara, B. Boote, B. Shi, D. Yang, J. M. Rehg
In Submission, 2025
This is the first survey to provide a comprehensive overview of machine learning studies on social understanding, encompassing both verbal and non-verbal approaches.
Paper
Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets

A. Kindiroglu*, O. Kara*, O. Ozdemir, L. Akarun (* denotes equal contribution)
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2024
This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets.
Paper / Code
Leveraging Object Priors for Point Tracking

B. Boote, N. A. Thai, W. Jia, O. Kara, S. Stojanov, J. M. Rehg, S. Lee
Instance-Level Recognition (ILR) Workshop at European Conference on Computer Vision (ECCV), 2024 (Oral)
We propose a novel objectness regularization approach that guides points to be aware of object priors by forcing them to stay inside the the boundaries of object instances.
Paper / Code
Domain-Incremental Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition

N. Churamani, O. Kara, H. Gunes
IEEE Transactions on Affective Computing, 2022
we propose the novel use of Continual Learning (CL), in particular, using Domain-Incremental Learning (Domain-IL) settings, as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems.
Paper / Code
Molecular Index Modulation using Convolutional Neural Networks

O. Kara, G. Yaylali, A. Pusane, T. Tugcu
Nano Communication Networks Journal, 2022
We propose a novel convolutional neural network-based architecture for a uniquely designed molecular multiple-input-single-output topology, aimed at mitigating the detrimental effects of molecular interference in nano molecular communication.
Paper / Code
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior

M. Arican*, O. Kara*, G. Bredell, E. Konukoglu (* denotes equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
ISNAS-DIP is an image-specific Neural Architecture Search (NAS) strategy designed for the Deep Image Prior (DIP) framework, offering significantly reduced training requirements compared to conventional NAS methods.
Paper / Code / Video
Towards Fair Affective Robotics: Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition

O. Kara, N. Churamani, H. Gunes
Workshop on Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI), 2021
We propose the novel use of Continual Learning (CL) as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems.
Paper / Code
Neuroweaver: a platform for designing intelligent closed-loop neuromodulation systems

P. Sarikhani, H. Hsu, O. Kara, J. Kim, H. Esmaeilzadeh, B. Mahmoudi
Brain Stimulation: Basic, Translational, and Clinical Research in Neuromodulation, Elsevier, 2021
Our interactive platform enables the design of neuromodulation pipelines through a visually intuitive and user-friendly interface. (Google Summer of Code 2021 project)
Paper / Code

Service & Recognition

  • Served as a mentor in the Google Summer of Code program in 2022, 2023, 2024, and 2025. 2025
  • Recognized as an Outstanding Reviewer at ECCV, ranked among the top 10% of all reviewers. 2024
  • Attended the 2023 International Computer Vision Summer School (ICVSS), ranked among the top 25% of 614 applicants (approximately 154 individuals), [Project] 2023
  • Participated in the CIMPA Research School on Graph Structure and Complex Network Analysis. 2023
  • Attended the highly competitive Summer@EPFL program, with a 2% acceptance rate. 2022
  • Placed among the top 50 teams worldwide in the Google Developer’s Solution Challenge, selected from over 5,000 teams, [Project] 2022
  • Placed 3rd in the Yildiz Bootcamp and was directly invited to the Yildiz Technopark Pre-Incubation Program. 2022
  • Successfully completed Google Summer of Code with the project "Graphical User Interface for OpenAI Gym," selected among 1,205 students from 6,991 applicants (17% acceptance rate), [Project] 2021
  • Placed 3rd out of 172 projects (top 1.7%) in the TUBITAK Undergraduate Research Project Competition. 2021
  • Ranked among the top 10 teams regionwide in the Google Solution Challenge with the project titled "Torch in Darkness." 2020
  • Placed 1st out of 100 projects (top 1%) in the TUBITAK Undergraduate Research Project Competition for the project "Joint Depth Estimation and Object Detection Software," [Project] 2020
  • Placed 3rd out of 15 projects (top 20%) in the IEEE METU Pixery Hackathon with the project "Mobile Application for Blind People," [Project] 2020
  • Finalist among 81 teams in the Turkish Airlines Travel Datathon. 2019
  • Ranked 180th out of 2 million (top 0.009%) in the Turkish National University Entrance Exam. 2018
  • Received the Republic Honour Award at Kadikoy Anadolu High School, awarded to one student out of 340 annually. 2018
  • Placed 3rd nationwide in the TUBITAK High School Research Project Competition for the project "Drone for Landmine Detection Using GPS," [Project] 2018
  • Placed 1st regionwide in the TUBITAK High School Research Project Competition for the project "An Autonomous Hexapod for Helping Search Teams After Earthquake," [Project] 2017
  • Accepted into the CS Bridge program, a two-week programming course led by Stanford TAs. 2016

This website is adapted from this source code.