publications | Jaehun Jeong

2026

A Mutual Information-based Metric for Temporal Expressivity and Trainability Estimation in Quantum Policy Gradient Pipelines

Jaehun Jeong, Donghwa Ji, and Kabgyun Jeong

2026

Abs

In recent years, various limitations of conventional supervised learning have been identified, motivating the development of reinforcement learning–and quantum reinforcement learning that leverages quantum resources such as entanglement and superposition. Among the various reinforcement learning methodologies, the policy gradient method is considered to have many benefits; for instance, it allows an agent to learn without explicitly knowing the crucial information of the environment such as state transition probabilities and initial state distribution. Meanwhile, from the perspective of learning, two indicators are often regarded as significant: expressivity and trainability (for gradient-based methods). While a number of attempts have been made to quantify the expressivity and trainability of Neural Network models and PQCs, clear efforts suitable for reinforcement learning settings have so far been lacking, despite the inherent differences between conventional supervised learning and reinforcement learning. Therefore, in this study, we propose revising the notion of expressivity into a temporal expressivity suited to reinforcement learning dynamics, and show that the mutual information between the action distribution and the discretized reward signal provides an upper bound for the scaled gradient norm, while yielding an information-theoretic decomposition and a residual-aware upper bound for the proposed temporal expressivity metric. Finally, under explicit concentration assumptions, we show that MI-TET induces an assumption-based, one-sided prescreening criterion for initialization-time gradient fragility across PQC architectures.
Projected Dynamic Programming for Sequential Quantum State Discrimination

Jaehun Jeong, Donghwa Ji, Hyunjun Jang, and 1 more author

2026

Abs

Sequential Quantum State Discrimination (SQSD) can be naturally framed as a sequential decision-making problem: at each time step, an agent must decide whether to perform an additional measurement to gather more information or to conclude with an optimal decision based on the current belief. In this paper, we formally cast SQSD into a static-hidden-state Partially Observable Markov Decision Process (POMDP) framework. We demonstrate that this formulation precisely subsumes the conventional minimum-error discrimination (MED) scheme as a special one-step case. Furthermore, we apply a regular grid-based discretization to the continuous belief simplex and approximate the possibly continuous measurement space using a finite library. Then we provide rigorous mathematical bounds on the resulting errors and analyze the computational complexity for both offline planning and online execution. Our analysis confirms that the inherent trade-off between accuracy and complexity, as well as the curse of dimensionality regarding the number of hypotheses, are also prominently observed in the quantum regime. Finally, we provide a working example of binary state discrimination to derive explicit forms of various functions and present numerical simulations for trine state discrimination to visualize the sequential structure of our POMDP-based SQSD.