Product-oriented AI research leader building and shipping agentic AI systems, multimodal & LLM generative AI, and foundation-model applications across Adobe products — from Adobe Analytics and Customer Journey Analytics to Firefly. Leads large cross-functional AI initiatives from architecture through customer-facing release, with 15+ technology transfers into products.
I am continuously looking for outstanding and highly motivated Ph.D. students for spring, fall, or summer research positions. If you're interested, please send me your CV and research interests.
Theory, algorithms, and large-scale systems at the intersection of personalization, model routing, agents, multimodal models, and graphs. Erdős number: 3.
Personalized text generation, graph-based retrieval, and multimodal personalization — leading personalization of AI systems at Adobe.
LLM agents beyond predefined actions, reward-weighted conversation optimization, and preference-guided code generation.
Image and video generation and editing, judge and reward modeling, post-training, and multimodal RAG.
Formal taxonomies and methods for high-fidelity LLM-based conversational user simulation and evaluation.
Foundational work on GNNs, knowledge graphs, roles, temporal embeddings, and network analysis.
Model routing, efficient algorithms, scalable inference and training, and production LLM platforms.
A few representative papers — each links directly to the full text.
A unifying taxonomy bridging personalized text generation and downstream personalization, defining the foundations of personalized LLMs.
Read the paper →The definitive survey of bias evaluation and mitigation for LLMs — 1,700+ citations and a companion open benchmark.
Read the paper →A long-context benchmark with minimal lexical overlap, revealing sharp degradation of frontier models beyond 32K tokens.
Read the paper →LLM agents that write and accumulate code as a universal action space — formerly #1 on the GAIA leaderboard.
Read the paper →Graph construction plus an LLM-guided traversal agent for multi-document QA. Best Paper Award.
Read the paper →6.3 trillion tokens across 167 languages, fully released on Hugging Face — 26K+ downloads per month.
Read the paper →An orchestrator–specialist agent framework that lifts VLMs to fine-grained, pixel-level visual reasoning.
Read the paper →Transforms pretrained transformers into subquadratic architectures with gated attention and meta-memory.
Read the paper →A formal taxonomy for high-fidelity synthetic users, enabling scalable evaluation of conversational AI.
Read the paper →Geometric consistency for vision–language contrastive learning, improving zero-shot robustness over CLIP.
Read the paper →Foundational treatment of structural roles in graphs — a cornerstone of role-based network representation learning.
Read the paper →networkrepository.com — the largest network data repository, with 250M+ downloads worldwide.
Read the paper →32 journal articles and 230 peer-reviewed conference papers. Every title links to the paper on Google Scholar.
Inventor on 100+ patents spanning AI personalization, agents, RAG, AI fairness, AI-as-a-judge, simulation, graph learning, knowledge graphs, recommendation, and scalable AI. Adobe Distinguished Inventor. Each entry links to a Google Patents search.
Widely used open-source frameworks, benchmarks, and data resources — including the Network Repository, the largest network data repository with 250M+ downloads.
A comprehensive benchmark compiling over 20 publicly available bias and fairness evaluation datasets for LLMs, enabling standardized evaluation across diverse dimensions.
A dual-architecture framework that unifies the exact generation fidelity of autoregressive large language models with the high-speed parallel token generation of diffusion models, sharing one key-value cache across both views for lossless, memory-efficient inference.
A cleaned, multilingual dataset of 6.3 trillion tokens spanning 167 languages for large language model development, fully released on Hugging Face.
An LLM agent framework that generates and executes Python programs as a universal action representation, creating and accumulating reusable actions for open-ended tasks.
A long-context evaluation benchmark that minimizes lexical overlap between questions and target content, requiring models to infer latent assoc. rather than rely on literal matching.
Framework & benchmark for personalized graph-based retrieval-augmented generation that integrates user-centric knowledge graphs to enrich personalization under sparse user history.
A benchmark for personalized long-form text generation, providing a diverse evaluation framework across long-text tasks such as personalized email writing, review generation, and topic writing.
The first benchmark evaluating large language models for anomaly detection, spanning zero-shot detection, data augmentation, and model selection.
A benchmark and multimodal retrieval-augmented generation approach for multi-document question answering over visually rich elements such as tables, charts, and slides.
A large-scale training paradigm for graph generative models, pretrained over thousands of graphs, with zero-shot, fine-tuning, and text-to-graph generation capabilities.
A knowledge graph prompting method for multi-document question answering that pairs graph construction over documents with an LLM-guided graph traversal agent.
A comprehensive benchmark for instantaneous graph learning model selection, providing extensive performance records, evaluation testbeds, and meta-graph features.
A figure-to-caption generative framework and benchmark that incorporates human feedback to optimize generated captions for reader preferences.
A framework and resources for instruction tuning large language models in 26 languages using reinforcement learning from human feedback.
A contrastive vision-language pretraining framework that enforces geometric consistency in the image and text representation spaces.
A graph neural network framework that learns an interpretable compatibility matrix, generalizing message passing to graphs with either homophily or heterophily.
An efficient framework for user stitching that encodes multi-dimensional node context from feature-based temporal walks into compact binary hashcodes.
A widely used community implementation of our popular role2vec role-based network embedding method.
A fast parallel high-performance parameterized graphlet decomposition library for massive networks. Code at .
The first interactive data repository that integrates visualization with state-of-the-art statistical methods and analytic techniques to support discovery and exploration of data in real-time. NR is the largest network data repository, with over 6,000 donations across 30+ collections and growing.
Interactive visual graph mining and machine learning on the web. Visualize and explore network data easily. GraphVIS is the result of years of research in relational machine learning and graph mining. A free demo version is available at http://networkrepository.com/graphvis
A parallel high-performance library for solving the maximum clique problem on dense graphs and large sparse networks.
An interactive data repository that makes it easy to find, explore, and understand machine learning data, providing researchers with open, persistent, and accessible data alongside web-based visual analytic tools
A package for modeling the importance and influence of nodes in dynamic networks with external interest and attributes.
Machine Learning group
Palo Alto, CA USA Research focused on theory, algorithms & applications of relational (graph-based) machine learning
Palo Alto, CA USA Developed recommendation systems via collective matrix-tensor factorization
USA Research: Machine Learning, Statistical Relational Learning Proposed methods for role discovery in large dynamic graphs and dynamic relational classification
USA Research focused on developing ML algorithms to characterize and model user behavior for detecting malicious intent/intrusions in real-time. Invited back for second year. Resulted in two papers on modeling dynamic roles in large networks
USA Advisor: David Aha, Co-advisor: Luke McDowell (U.S. Naval Academy), NREIP Resulted in the JAIR paper "Transformation of Graph Data for Statistical Relational Learning"
USA Advisor: Mark W. Powell, Summer Research Fellowship (returned to continue my research)
USA Advisor: Mark W. Powell, Spring USRP Fellowship
USA Advisor: David Jensen, Co-advisor: Brian Taylor. REU NSF Fellowship. "Experimental Methods for Improving the Design of Participatory Sensing Systems"
USA Advisor: Srinivas Mukkamala, Senior Research Scientist, ICASA
USA Advisor: Jean-Louis Lassez, Retired IBM T.J. Watson Research Center
Title: "Improving Relational Machine Learning by Modeling Temporal Dependencies" Recipient of Four Ph.D. Fellowships: – National Science Foundation Graduate Fellowship (NSF GRFP) – DoD: National Defense Science and Engineering Graduate Fellowship (NDSEG) – Bilsland Dissertation Fellowship Awarded to Outstanding Ph.D. candidates – Purdue University Fredrick N. Andrews Doctoral Fellowship
Concentrate in Machine Learning
Valedictorian class of 2009. GPA: 4.0., Summa Cum Laude Advisor: Jean-Louis Lassez (Retired IBM T.J. Watson Research Center)
Research fellowships: LLNL Scholar, Lawrence Livermore National Laboratory (Summer 11–12) · NREIP, Naval Research Laboratory (AI Center) (Summer 2010) · NASA Fellow, California Institute of Technology, JPL (2009) · USRP Fellow, Jet Propulsion Laboratory (2009) · NSF REU Fellow, University of Massachusetts at Amherst (Summer 2008) · Research Fellow, New Mexico Institute of Technology (Summer 2007)
Supervised and mentored 100+ students from Stanford, CMU, Berkeley, Michigan, Georgia Tech, KAIST, and many more — most resulting in top-tier publications.