I am a Computer Science PhD student at the University of Illinois Urbana-Champaign (UIUC), where I am advised by Professor James M. Rehg. My research is focused on advancing the capabilities of video generative AI.
My research builds the next generation of video AI by tackling three core challenges: efficiency, controllability, and safety. I design computationally efficient models for long-form video editing, develop frameworks that give users precise control over story and appearance, and build safe systems to protect visual media from unauthorized manipulation.
Before my PhD, I conducted research on continual learning to mitigate bias with Prof. Hatice Gunes (Affective Intelligence and Robotics Lab, University of Cambridge), on zero-shot learning with GANs with Prof. Zeynep Akata and Dr. Yongqin Xian (University of Tubingen), and on neural architecture search & image restoration with Prof. Ender Konukoglu (Computer Vision Lab, ETH Zurich).
I have since completed research internships at Adobe with Tobias Hinz in Summer 2024 (multi-shot video generation), and am currently at Google working with Du Tran (video generation).
I am open to opportunities for collaboration and am always interested in discussing new research ideas. Please feel free to contact me via email.
I am always looking for self-motivated students who want to focus on Generative AI-related projects. Feel free to reach out to me if you are interested and located at UIUC.
DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images
O. Kara*, H. Nisar*, J. M. Rehg (* denotes equal contribution)
The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
We propose DiffEye, a diffusion-based generative model for creating realistic, raw eye-tracking trajectories conditioned on natural images, which outperforms existing methods on scanpath generation tasks.
DiffVax: Optimization-Free Image Immunization Against Diffusion-Based Editing
T. C. Ozden*, O. Kara*, O. Akcin, K. Zaman, S. Srivastava, S. P. Chinchali, J. M. Rehg (* denotes equal contribution)
In Submission, 2025
DiffVax is an optimization-free image immunization framework that effectively protects against diffusion-based editing, generalizes to unseen content, is robust against counter-attacks, and shows promise in safeguarding video content. Project Webpage / Paper
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
O. Kara, K. K. Singh, F. Liu, D. Ceylan, J. M. Rehg, T. Hinz
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
ShotAdapter enables text-to-multi-shot video generation with minimal fine-tuning, providing users control over shot number, duration, and content through shot-specific text prompts, along with a multi-shot video dataset collection pipeline. Project Webpage / Paper
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
O. Kara, B. Kurtkaya, H. Yesiltepe, J. M. Rehg, P. Yanardag
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024 (Highlight)
RAVE is a zero-shot, lightweight, and fast framework for text-guided video editing, supporting videos of any length utilizing text-to-image pretrained diffusion models. Project Webpage /
Paper /
Code /
HuggingFace Demo /
Video
Towards Social AI: A Survey on Understanding Social Interactions
S. Lee, M. Li, B. Lai, B. Jia, F. Ryan, X. Cao, O. Kara, B. Boote, B. Shi, D. Yang, J. M. Rehg
In Submission, 2025
This is the first survey to provide a comprehensive overview of machine learning studies on social understanding, encompassing both verbal and non-verbal approaches. Paper
Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets
A. Kindiroglu*, O. Kara*, O. Ozdemir, L. Akarun (* denotes equal contribution)
IEEE International Conference on Automatic Face and Gesture Recognition (FG), 2024
This study provides a publicly available cross-dataset transfer learning benchmark from two existing public Turkish SLR datasets. Paper /
Code
Leveraging Object Priors for Point Tracking
B. Boote, N. A. Thai, W. Jia, O. Kara, S. Stojanov, J. M. Rehg, S. Lee
Instance-Level Recognition (ILR) Workshop at European Conference on Computer Vision (ECCV), 2024 (Oral)
We propose a novel objectness regularization approach that guides points to be aware of object priors by forcing them to stay inside the the boundaries of object instances. Paper /
Code
Domain-Incremental Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition
N. Churamani, O. Kara, H. Gunes
IEEE Transactions on Affective Computing, 2022
we propose the novel use of Continual Learning (CL), in particular, using Domain-Incremental Learning (Domain-IL) settings, as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems. Paper / Code
Molecular Index Modulation using Convolutional Neural Networks
O. Kara, G. Yaylali, A. Pusane, T. Tugcu
Nano Communication Networks Journal, 2022
We propose a novel convolutional neural network-based architecture for a uniquely designed molecular multiple-input-single-output topology, aimed at mitigating the detrimental effects of molecular interference in nano molecular communication. Paper / Code
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
M. Arican*, O. Kara*, G. Bredell, E. Konukoglu (* denotes equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
ISNAS-DIP is an image-specific Neural Architecture Search (NAS) strategy designed for the Deep Image Prior (DIP) framework, offering significantly reduced training requirements compared to conventional NAS methods. Paper /
Code /
Video
Towards Fair Affective Robotics: Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition
O. Kara, N. Churamani, H. Gunes
Workshop on Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI), 2021
We propose the novel use of Continual Learning (CL) as a potent bias mitigation method to enhance the fairness of Facial Expression Recognition (FER) systems. Paper /
Code
Neuroweaver: a platform for designing intelligent closed-loop neuromodulation systems
P. Sarikhani, H. Hsu, O. Kara, J. Kim, H. Esmaeilzadeh, B. Mahmoudi
Brain Stimulation: Basic, Translational, and Clinical Research in Neuromodulation, Elsevier, 2021
Our interactive platform enables the design of neuromodulation pipelines through a visually intuitive and user-friendly interface. (Google Summer of Code 2021 project) Paper / Code
Service & Recognition
Served as a mentor in the Google Summer of Code program in 2022, 2023, 2024, and 2025. 2025
Recognized as an Outstanding Reviewer at ECCV, ranked among the top 10% of all reviewers. 2024
Attended the 2023 International Computer Vision Summer School (ICVSS), ranked among the top 25% of 614 applicants (approximately 154 individuals), [Project]2023
Participated in the CIMPA Research School on Graph Structure and Complex Network Analysis. 2023
Attended the highly competitive Summer@EPFL program, with a 2% acceptance rate. 2022
Placed among the top 50 teams worldwide in the Google Developerβs Solution Challenge, selected from over 5,000 teams, [Project]2022
Placed 3rd in the Yildiz Bootcamp and was directly invited to the Yildiz Technopark Pre-Incubation Program. 2022
Successfully completed Google Summer of Code with the project "Graphical User Interface for OpenAI Gym," selected among 1,205 students from 6,991 applicants (17% acceptance rate), [Project]2021
Placed 3rd out of 172 projects (top 1.7%) in the TUBITAK Undergraduate Research Project Competition. 2021
Ranked among the top 10 teams regionwide in the Google Solution Challenge with the project titled "Torch in Darkness." 2020
Placed 1st out of 100 projects (top 1%) in the TUBITAK Undergraduate Research Project Competition for the project "Joint Depth Estimation and Object Detection Software," [Project]2020
Placed 3rd out of 15 projects (top 20%) in the IEEE METU Pixery Hackathon with the project "Mobile Application for Blind People," [Project]2020
Finalist among 81 teams in the Turkish Airlines Travel Datathon. 2019
Ranked 180th out of 2 million (top 0.009%) in the Turkish National University Entrance Exam. 2018
Received the Republic Honour Award at Kadikoy Anadolu High School, awarded to one student out of 340 annually. 2018
Placed 3rd nationwide in the TUBITAK High School Research Project Competition for the project "Drone for Landmine Detection Using GPS," [Project]2018
Placed 1st regionwide in the TUBITAK High School Research Project Competition for the project "An Autonomous Hexapod for Helping Search Teams After Earthquake," [Project]2017
Accepted into the CS Bridge program, a two-week programming course led by Stanford TAs. 2016
Outstanding Success Scholarship Holder from Turkish Educational Foundation (TEV). 2019-2022
2247-C TUBITAK Research Internship Scholarship 2021-2022
Workshop Co-organizer, 7th Edition of AI for Creative Visual Content Generation Editing and Understanding, {SIGGRAPH}, 2025, [Webpage]
Lead Workshop Organizer, 6th Edition of AI for Creative Visual Content Generation Editing and Understanding, {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, 2025, [Webpage]
Outstanding Reviewer, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Reviewer, IEEE/CVF International Conference on Computer Vision (ICCV), 2025
Reviewer, The Association for the Advancement of Artificial Intelligence (AAAI), 2025
Reviewer, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
Outstanding Reviewer, European Conference on Computer Vision (ECCV), 2024
Reviewer, International Conference on Learning Representations (ICLR), 2024/2025
Reviewer, International Conference on Machine Learning (ICML), 2024
Reviewer, Conference on Neural Information Processing Systems (NeurIPS), 2023/2025