Ozgur Kara

I am a Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), where I am advised by Professor James M. Rehg. My research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. I design computationally efficient models for long-form video editing, develop frameworks that give users precise control over story and appearance, and build safe systems to protect visual media from unauthorized manipulation.

I have since completed research internships at Adobe with Tobias Hinz in Summer 2024 (multi-shot video generation), and am currently at Google working with Du Tran (video generation).

I am open to opportunities for collaboration and am always interested in discussing new research ideas. Please feel free to contact me via email.

I am always looking for self-motivated students who want to focus on Generative AI-related projects. Feel free to reach out to me if you are interested and located at UIUC.

📃 Download my CV.

Email / Google Scholar / Github / LinkedIn / Twitter / Some Travels

Health Care Engineering Systems Center

1206 W Clark St. UIUC

Urbana, IL, USA, 61801

News— scroll down for more ↓

Sep 2025:🏆 I was recognized as an Outstanding Reviewer at ICCV 2025 (top %4).
Sep 2025:📄 Thrilled to share that our paper, DiffEye, on generating continuous eye-tracking data has been accepted to NeurIPS 2025!
Aug 2025:🗓️ The 7th edition of our workshop, CVEU, was conducted at SIGGRAPH 2025, where I served as a co-organizer.
Jun 2025:🏆 I was recognized as an Outstanding Reviewer at CVPR 2025.
Jun 2025:📄 Our paper, ShotAdapter, was accepted to CVPR 2025.
May 2025:👨‍💻‍ I started my summer internship at Google (BAIR).
Dec 2024:🎉 The 6th edition of our workshop, CVEU, has been accepted for CVPR 2025, where I serve as the primary organizer.
Sep 2024:👨‍🎓‍ I transferred to the University of Illinois Urbana-Champaign to continue my PhD!
Aug 2024:🏆 I was recognized as an Outstanding Reviewer at ECCV 2024.
Jul 2024:📄 Our paper on Point Tracking was accepted to the ECCV 2024 ILR Workshop.
Jun 2024:📄 Our paper, RAVE, was accepted as a Highlight at CVPR 2024.
May 2024:👨‍💻‍ I started my summer internship at Adobe (FireflyTeam).
Mar 2024:📄 Our paper on Sign Language Recognition was accepted to FG 2024.
Jul 2023:🏫 I attended the International Computer Vision Summer School (ICVSS).
Jun 2023:🏫 I participated in the CIMPA Research School on Graph Structure.
Aug 2022:🎓 I started my PhD at Georgia Institute of Technology with Prof. James M. Rehg.
Jul 2022:📄 Our paper on Fair Affective Robotics was accepted to the LEAP-HRI Workshop.
Jun 2022:📄 Our paper, ISNAS-DIP, was accepted to CVPR 2022.
May 2022:👨‍💻‍ I started my summer research at EPFL in the VILAB.
Aug 2021:✅ I successfully completed the Google Summer of Code program.

Education

	PhD, Computer Science 2024 - Present University of Illinois Urbana-Champaign Advisor: Founder Prof. James M. Rehg Overall GPA: 4.00/4.00
	PhD (transferred) and MSc, Computer Science 2022 - 2024 Georgia Institute of Technology Advisor: Founder Prof. James M. Rehg Overall GPA: 4.00/4.00 ECE Departmental Fellowship (2022-2024) Otto F. and Jenny H. Krauss Fellowship (2022-2023)
	BSc, Electrical-Electronics Engineering 2018 - 2022 Bogazici University Advisors: Prof. Lale Akarun, Prof. Murat Saraclar Overall GPA: 3.92/4.00 Undergraduate Thesis: Data Discovery and Domain Adaptation for Isolated Sign Language Recognition TEV Outstanding Success Scholarship & TUBITAK Scholar (2019-2022)
	High School, Math and Science 2013 - 2018 Kadikoy Anadolu High School Republic Honour Award (Awarded to one student in a graduating class of 340).

Research Experience— scroll down for more ↓

	UIUC / Georgia Tech, Rehg Lab 2023 Spring - Present Supervised by Founder Prof. James M. Rehg Diffusion-based model for human gaze trajectory generation (NeurIPS 2025). Optimization-free scalable image immunization against diffusion-based editing (In Submission 2025). Multi-shot video generation (CVPR 2025). Zero-shot text guided video editing with diffusion models (CVPR 2024 Highlight). Contributed to the preparation of a survey on Understanding Social AI (ArXiv). Finding learnable directions in latent space of diffusion models for mitigating bias (poster at ICVSS 2023). Point tracking with novel regularization loss in collaboration with Toyota Research Institute (ECCV Workshop 2024).
	EPFL, Visual Intelligence and Learning Lab 2022 Summer Supervised by Asst. Prof. Amir Zamir Accepted to Summer@EPFL* (2% admission rate).* Interpretability and explainability of Vision Transformer (ViT).
	Bogazici University, Perceptual Intelligence Lab 2021 Fall - 2022 Spring Supervised by Prof. Lale Akarun Transfer learning for under-resourced sign language recognition (IEEE FG 2024).
	ETH Zurich, Computer Vision Lab 2021 - 2022 Supervised by Assoc. Prof. Ender Konukoglu Training-free neural architecture search for image restoration (CVPR 2022).
	University of Tübingen, Explainable ML Group 2021 Spring Supervised by Prof. Zeynep Akata and Yongqin Xian Few-shot and generalized zero-shot learning using generative models.
	University of Cambridge, Affective Intelligence Lab 2020 - 2021 Supervised by Prof. Hatice Gunes Continual learning for enhancing fairness in facial expression recognition (LEAP-HRI 2021, IEEE TAC 2022).
	Bogazici University, Nanonetworking Research Group 2019 - 2020 Supervised by Prof. Ali Emre Pusane Use of convolutional neural networks for developing molecular communication (Nano Comms. Net. 2022).

Industry Experience

[May 2025 – Present] Research Intern
Google, Pixel Biometrics AI Research (BAIR)

Working on generative video layer decomposition/composition.
Running large-scale distributed training on a video dataset using a video diffusion transformer model.
Advisor: Du Tran Collaborators: Yujia Chen, Vincent Chu, & Prof. Ming-Hsuan Yang.

[May 2024 – December 2024] Research Intern
Adobe, Firefly Team

Worked on diffusion transformer-based long-term multi-shot video generation.
A CVPR 2025 paper was published and a patent has been filed for this work.
Experienced large-scale distributed training on an internal Adobe video model.
Advisor: Tobias Hinz Collaborators: Krishna Kumar Singh, Prof. Feng Liu, & Duygu Ceylan.

Publications

	DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images O. Kara, H. Nisar, J. M. Rehg (* denotes equal contribution) The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025 We propose DiffEye, a diffusion-based generative model for creating realistic, raw eye-tracking trajectories conditioned on natural images, which outperforms existing methods on scanpath generation tasks. Project Webpage / Paper
	DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing T. C. Ozden, O. Kara, O. Akcin, K. Zaman, S. Srivastava, S. P. Chinchali, J. M. Rehg (* denotes equal contribution) In Submission, 2025 DiffVax is an optimization-free image immunization framework that effectively protects against diffusion-based editing, generalizes to unseen content, is robust against counter-attacks, and shows promise in safeguarding video content. Project Webpage / Paper
	ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models O. Kara, K. K. Singh, F. Liu, D. Ceylan, J. M. Rehg, T. Hinz IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 ShotAdapter enables text-to-multi-shot video generation with minimal fine-tuning, providing users control over shot number, duration, and content through shot-specific text prompts, along with a multi-shot video dataset collection pipeline. Project Webpage / Paper
	RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models O. Kara, B. Kurtkaya, H. Yesiltepe, J. M. Rehg, P. Yanardag IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight) RAVE is a zero-shot, lightweight, and fast framework for text-guided video editing, supporting videos of any length utilizing text-to-image pretrained diffusion models. Project Webpage / Paper / Code / HuggingFace Demo / Video
	Towards Social AI: A Survey on Understanding Social Interactions S. Lee, M. Li, B. Lai, B. Jia, F. Ryan, X. Cao, O. Kara, B. Boote, B. Shi, D. Yang, J. M. Rehg In Submission, 2025 This is the first survey to provide a comprehensive overview of machine learning studies on social understanding, encompassing both verbal and non-verbal approaches. Paper
	Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets A. Kindiroglu, O. Kara, O. Ozdemir, L. Akarun (* denotes equal contribution) IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2024 This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets. Paper / Code
	Leveraging Object Priors for Point Tracking B. Boote, N. A. Thai, W. Jia, O. Kara, S. Stojanov, J. M. Rehg, S. Lee Instance-Level Recognition (ILR) Workshop at European Conference on Computer Vision (ECCV), 2024 (Oral) We propose a novel objectness regularization approach that guides points to be aware of object priors by forcing them to stay inside the the boundaries of object instances. Paper / Code
	Domain-Incremental Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition N. Churamani, O. Kara, H. Gunes IEEE Transactions on Affective Computing, 2022 we propose the novel use of Continual Learning (CL), in particular, using Domain-Incremental Learning (Domain-IL) settings, as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems. Paper / Code
	Molecular Index Modulation using Convolutional Neural Networks O. Kara, G. Yaylali, A. Pusane, T. Tugcu Nano Communication Networks Journal, 2022 We propose a novel convolutional neural network-based architecture for a uniquely designed molecular multiple-input-single-output topology, aimed at mitigating the detrimental effects of molecular interference in nano molecular communication. Paper / Code
	ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior M. Arican, O. Kara, G. Bredell, E. Konukoglu (* denotes equal contribution) IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 ISNAS-DIP is an image-specific Neural Architecture Search (NAS) strategy designed for the Deep Image Prior (DIP) framework, offering significantly reduced training requirements compared to conventional NAS methods. Paper / Code / Video
	Towards Fair Affective Robotics: Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition O. Kara, N. Churamani, H. Gunes Workshop on Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI), 2021 We propose the novel use of Continual Learning (CL) as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems. Paper / Code
	Neuroweaver: a platform for designing intelligent closed-loop neuromodulation systems P. Sarikhani, H. Hsu, O. Kara, J. Kim, H. Esmaeilzadeh, B. Mahmoudi Brain Stimulation: Basic, Translational, and Clinical Research in Neuromodulation, Elsevier, 2021 Our interactive platform enables the design of neuromodulation pipelines through a visually intuitive and user-friendly interface. (Google Summer of Code 2021 project) Paper / Code

Service & Recognition

Served as a mentor in the Google Summer of Code program in 2022, 2023, 2024, and 2025. 2025
Recognized as an Outstanding Reviewer at ECCV, ranked among the top 10% of all reviewers. 2024
Attended the 2023 International Computer Vision Summer School (ICVSS), ranked among the top 25% of 614 applicants (approximately 154 individuals), [Project] 2023
Participated in the CIMPA Research School on Graph Structure and Complex Network Analysis. 2023
Attended the highly competitive Summer@EPFL program, with a 2% acceptance rate. 2022
Placed among the top 50 teams worldwide in the Google Developer’s Solution Challenge, selected from over 5,000 teams, [Project] 2022
Placed 3^rd in the Yildiz Bootcamp and was directly invited to the Yildiz Technopark Pre-Incubation Program. 2022
Successfully completed Google Summer of Code with the project "Graphical User Interface for OpenAI Gym," selected among 1,205 students from 6,991 applicants (17% acceptance rate), [Project] 2021
Placed 3^rd out of 172 projects (top 1.7%) in the TUBITAK Undergraduate Research Project Competition. 2021
Ranked among the top 10 teams regionwide in the Google Solution Challenge with the project titled "Torch in Darkness." 2020
Placed 1^st out of 100 projects (top 1%) in the TUBITAK Undergraduate Research Project Competition for the project "Joint Depth Estimation and Object Detection Software," [Project] 2020
Placed 3^rd out of 15 projects (top 20%) in the IEEE METU Pixery Hackathon with the project "Mobile Application for Blind People," [Project] 2020
Finalist among 81 teams in the Turkish Airlines Travel Datathon. 2019
Ranked 180^th out of 2 million (top 0.009%) in the Turkish National University Entrance Exam. 2018
Received the Republic Honour Award at Kadikoy Anadolu High School, awarded to one student out of 340 annually. 2018
Placed 3^rd nationwide in the TUBITAK High School Research Project Competition for the project "Drone for Landmine Detection Using GPS," [Project] 2018
Placed 1^st regionwide in the TUBITAK High School Research Project Competition for the project "An Autonomous Hexapod for Helping Search Teams After Earthquake," [Project] 2017
Accepted into the CS Bridge program, a two-week programming course led by Stanford TAs. 2016

Georgia Tech ECE Departmental Fellowship 2022-2024
Georgia Tech Otto F. and Jenny H. Krauss Fellowship 2022-2023
Summer@EPFL Fully Funded Summer Internship Program 2022
2205 TUBITAK Undergraduate Scholarship Holder 2022
Outstanding Success Scholarship Holder from Turkish Educational Foundation (TEV). 2019-2022
2247-C TUBITAK Research Internship Scholarship 2021-2022

Workshop Organizer: AI for Creative Visual Content @ CVPR'25 (Lead), SIGGRAPH'25 (Co-organizer). [Webpage]
Outstanding Reviewer: ICCV'25,CVPR'25, ECCV'24.
Program Committee/Reviewer: CVPR'24/'25, ICCV'25, ECCV'24, NeurIPS'23, '25, ICML'24, ICLR'24/'25, AAAI'25, IEEE TAC, IEEE ToG.

This website is adapted from this source code.