Krotov’s Method¶

The following overview has been adapted from Ref [GoerzPhd2015].

Functionals¶

Krotov’s method [KonnovARC99], adapted to quantum control, considers one or more quantum systems, with a set of Hamiltonians \(\{\Op{H}_k(\{\epsilon_l(t)\})\}\) where each Hamiltonian depends on a set of time-continuous controls \(\epsilon_l(t)\). It seeks to find control fields that optimally steers a set of initial states \(\{\ket{\phi_k}\}\) in some desired way. To this end, in each iteration \(i\), it minimizes a functional of the form

\[J[\epsilon_l^{(i)}(t)] = J_T(\{\ket{\phi_k^{(i)}(T)}\}) + \sum_l \int_0^T g_a[\epsilon_l^{(i)}(t)] \mathrm{d} t + \int_0^T g_b[\{\phi^{(i)}_k(t)\}] \mathrm{d} t\,. \label{eq:J_krotov}\]

where \(\ket{\phi_k^{(i)}(T)}\) are the time-evolved states initial states \(\ket{\phi_k}\) under the controls \(\{\epsilon_l(t)\}\) of the \(i\)’th iteration.

The functional consists of three parts:

A final time functional \(J_T\). This is the “main” part of the functional, and we can usually think of \(J\) as being an auxiliary functional in the optimization of \(J_T\).
A running cost on the control fields \(g_b\). As we will see below, specific forms of running costs are required to obtain a closed-form update equation. The typical form, and the only one we consider here (and that is realized in the krotov package) is

\[g_a[\epsilon(t)] = \frac{\lambda_a}{S(t)} \Abs{\Delta\epsilon(t)}^2\,.\]

We introduce two parameters, the “inverse Krotov step width” \(\lambda_a\) and the shape function \(S(t)\) which can be used to influence desired properties of the optimized controls. \(\Delta\epsilon(t)\) is the update of the control in a single iteration of the optimization algorithm. It is best to think of this running cost as a technical requirement, and not to assign physical meaning to it.
An optional state-dependent running cost \(g_b\), e.g., to penalize population in a subspace.

The most commonly used functionals optimize for a set of initial states \(\{\ket{\phi_k}\}\) to evolve to a set of target states \(\{\ket{\phi_k^\tgt}\}\). The functionals can then be expressed in terms of the complex overlaps of the final-time states with the target states under the given control. Thus,

\[\tau_k = \Braket{\phi_k^\tgt}{\phi_k(T)}\]

in Hilbert space, or

\[\label{eq:tau_liouville} \tau_k = \tr\left[\Op{\rho}_k^{\tgt\,\dagger} \Op{\rho}_k(T) \right]\]

in Liouville space.

The following functionals \(J_T\) can be formed from these complex overlaps, taking into account that any optimization functional \(J_T\) must be real. They differ by the way they treat the phases \(\varphi_k\) in the physical optimization goal \(\ket{\phi_k(T)} \overset{!}{=} e^{i\varphi_k}\ket{\phi_k^{\tgt}}\) [PalaoPRA2003]:

Optimize for simultaneous state-to-state transitions, with completely arbitrary phases \(\varphi_k\)

(1)¶\[J_{T,\text{ss}} = 1- \frac{1}{N} \sum_{k=1}^{N} \Abs{\tau_k}^2\]

Optimize for simultaneous state-to-state transitions, with an arbitrary global phase, i.e., \(\varphi_k = \varphi_{\text{global}}\) for all \(k\) with arbitrary \(\varphi_{\text{global}}\)

(2)¶\[J_{T,\text{sm}} = 1- \frac{1}{N^2} \Abs{\sum_{k=1}^{N} \tau_k}^2 = 1- \frac{1}{N^2} \sum_{k=1}^{N} \sum_{k'=1}^{N} \tau_{k'}^* \tau_{k}\,,\]

Optimize for simultaneous state-to-state transitions, with a global phase of zero, i.e., \(\varphi_k = 0\) for all \(k\)

(3)¶\[J_{T,\text{re}} = 1-\frac{1}{N} \Re \left[\, \sum_{k=1}^{N} \tau_k \,\right] = 1-\frac{1}{N} \sum_{k=1}^{N} \frac{1}{2} \left( \tau_k + \tau_k^* \right)\]

Note that in state-to-state optimizations in Liouville space the phases \(\varphi_k\) do not appear at all. This is reflected by the fact that the overlaps \(\tau_k\) are always real-valued. Thus the three functionals above coincide when density matrices are employed.

Conditions for the update equation¶

Krotov’s method uses an auxiliary functional to disentangle the interdependence of the states and the field, allowing to find an updated field \(\epsilon^{(i+1)}(t)\) such that \(J[\epsilon^{(i+1)}] \leq J[\epsilon^{(i)}]\) is guaranteed.

Here, and in the following, we drop the index \(l\) of the controls; all equations are valid for each individual control.

In Hilbert space, Enforcing the monotonicity condition, \(J[\epsilon^{(i+1)}] \leq J[\epsilon^{(i)}]\), see Ref. [ReichJCP12], yields the equation

(4)¶\[\begin{split}\begin{split} \left.\frac{\partial g_a}{\partial \epsilon}\right\vert_{\epsilon^{(i+1)}(t)} & = 2 \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle + \right. \\ & \qquad \left. + \frac{1}{2} \sigma(t) \Bigg\langle \Delta\phi_k(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,, \end{split}\end{split}\]

with

\[\ket{\Delta \phi_k(t)} \equiv \ket{\phi_k^{(i+1)}(t)} - \ket{\phi_k^{(i)}(t)}\,.\]

Assuming the equation of motion for the forward propagation of \(\ket{\phi_k}\) to be written as

(5)¶\[\frac{\partial}{\partial t} \Ket{\phi_k^{(i+1)}(t)} = -\frac{\mathrm{i}}{\hbar} \Op{H}^{(i+1)} \Ket{\phi_k^{(i+1)}(t)}\,,\]

the co-states \(\Ket{\chi_k}\) are propagated backwards under the old pulse, i.e., the pulse from the previous iteration, as

(6)¶\[\frac{\partial}{\partial t} \Ket{\chi_k^{(i)}(t)} = -\frac{\mathrm{i}}{\hbar} \Op{H}^{\dagger\,(i)} \Ket{\chi_k^{(i)}(t)} + \left.\frac{\partial g_b}{\partial \Bra{\phi_k}}\right\vert_{\phi^{(i)}(t)}\,,\]

with the boundary condition

(7)¶\[\Ket{\chi_k^{(i)}(T)} = - \left.\frac{\partial J_T}{\partial \Bra{\phi_k}}\right\vert_{\phi^{(i)}(T)}\,.\]

Note that the backward propagation uses the adjoint Hamiltonian which becomes relevant for non-Hermitian Hamiltonians or dissipative dynamics. The working equations in Liouville space follow an analogous structure, see, e.g., Ref. [ReichJCP12].

In Eq. (4), \(\sigma(t)\) is a scalar function that must be properly chosen to ensure monotonic convergence.

First order update equation¶

In many cases, it is sufficient to set \(\sigma(t) \equiv 0\), in particular if the equation of motion is linear (\(\Op{H}\) does not depend on the states \(\ket{\phi_k(t)}\)), the functional \(J_T\) is convex, and no state-dependent constraints are used (\(g_b\equiv 0\)). Even for some types of state-dependent constraints \(\sigma(t)\) may be set to zero, specifically for keeping the population in an allowed subspace [PalaoPRA2008]. However, a state-dependent constraint adds an inhomogeneity to the equation of motion for \(\ket{\chi_k(t)}\).

In order to obtain an explicit equation for \(\epsilon^{(i+1)}(t)\), a state-dependent running cost \(g_a\) must be used. It usually takes the form

\[g_a[\epsilon(t)] = \frac{\lambda_a}{S(t)} \left(\epsilon(t) - \epsilon^{\text{ref}}(t)\right)^2\,,\]

with a scaling parameter \(\lambda_a\) and a shape function \(S(t) \in [0,1]\). When \(\epsilon^{\text{ref}}\) is set to the optimized field \(\epsilon^{(i)}\) from the previous iteration this yields

\[g_a[\epsilon^{(i+1)}(t)] = \frac{\lambda_a}{S(t)} \left(\Delta\epsilon(t)\right)^2\,, \quad \Delta\epsilon(t) \equiv \epsilon^{(i+1)}(t) - \epsilon^{(i)}(t)\,,\]

and for \(\sigma(t) \equiv 0\), the first-order Krotov update equation is re-obtained [PalaoPRA2003][SklarzPRA2002],

(8)¶\[\begin{split}\Delta\epsilon(t) = \frac{S(t)}{\lambda_a} \Im \left[ \sum_{k=1}^{N} \Bigg\langle \chi_k^{(i)}(t) \Bigg\vert \Bigg( \left.\frac{\partial \Op{H}}{\partial \epsilon}\right\vert_{{\scriptsize \begin{matrix}\phi^{(i+1)}(t)\\\epsilon^{(i+1)}(t)\end{matrix}}} \Bigg) \Bigg\vert \phi_k^{(i+1)}(t) \Bigg\rangle \right]\,.\end{split}\]

If \(S(t) \in [0,1]\) is chosen as a function that smoothly goes to zero at \(t=0\) and \(t=T\), then the update will be suppressed near the edges of the optimization time interval. Thus, a smooth switch-on and switch-off can be maintained. The scaling factor \(\lambda_a\) controls the overall magnitude of the pulse update thereby taking the role of an “inverse Krotov step width”. Values that are too large will change \(\epsilon^{(i)}(t)\) by only a small amount in every iteration, causing slow convergence. Values that are too small will cause sharp spikes in the optimized control, and numerical instabilities (including a loss of monotonic convergence).

The functional \(J_T\) enters the first-order update equation only in the boundary condition for the backward propagated co-state, Eq. (7). For the standard functionals defined in Eq. (2) and Eq. (3), this evaluates to

\[\begin{split}\begin{aligned} \left.\frac{\partial J_{T,\text{sm}}}{\partial \Bra{\phi_k}}\right\vert_{\phi_k^{(i)}(T)} &= - \left( \frac{1}{N^2} \sum_{l=1}^N \tau_l \right) \Ket{\phi_k^\tgt}\,, \\ \left.\frac{\partial J_{T,\text{re}}}{\partial \Bra{\phi_k}}\right\vert_{\phi_k^{(i)}(T)} &= - \frac{1}{2N} \Op{O} \Ket{\phi_k^\tgt}\,. \end{aligned}\end{split}\]

Second order update equation¶

Where \(\sigma(t) \neq 0\) is required, it can be approximated numerically as shown in Ref. [ReichJCP12]. In Refs [WattsPRA2015][GoerzPRA2015], non-convex final-time functionals that depend higher than quadratically on the states are considered, for a standard equation of motion given by linear Schrödinger equation. In this case,

\[\sigma(t) \equiv -\max\left(\varepsilon_A,2A+\varepsilon_A\right)\,, \label{eq:sigma_A}\]

where \(\varepsilon_A\) is a small non-negative number that can be used to enforce strict inequality in the second order optimality condition. The optimal value for \(A\) in each iteration can be approximated numerically as [ReichJCP12]

\[A = \frac{2 \sum_{k=1}^{N} \Re\left[ \langle \chi_k(T) \vert \Delta\phi_k(T) \rangle \right] + \Delta J_T} {\sum_{k=1}^{N} \Abs{\Delta\phi_k(T)}^2} \,,\]

with

\[\Delta J_T \equiv J_T(\{\phi_k^{(i+1)}(T)\}) -J_T(\{\phi_k^{(i)}(T)\})\,.\]

Time discretization¶

Fig. 1 Sequential update scheme in Krotov’s method on a time grid.

The derivation of Krotov’s method assumes time-continuous control fields. In this case, it mathematically guarantees monotonic convergence. However, for practical numerical applications, we have to consider controls on a discrete time grid.

Discretization to a time grid yields the numerical scheme shown in Fig. 1, and resolves the seeming contradiction that the calculation of \(\epsilon^{(i+1)}(t)\) requires knowledge of the states \(\ket{\Psi_k^{(i+1)}(t)}\) propagated under \(\epsilon^{(i+1)}(t)\). The scheme starts with \(\ket{\chi_k(T)}\) obtained from Eq. (7), which is backward-propagated under Eq. (6). All backward-propagated states \(\ket{\chi_k(t)}\) must be stored. The first pulse value is updated according to Eq. (8), using \(\ket{\chi_k(0)}\) and the known initial state \(\ket{\Psi_k(0)}\). Then, \(\ket{\Psi_k(0)}\) is forward-propagated by one time step under Eq. (5) using the updated pulse value. The updates proceed sequentially, until the final forward-propagated state \(\ket{\Psi_k(T)}\) is reached. For numerical stability, it is useful to define the normalized states

\[\ket{\Psi_k^{\text{bw}}(T)} = \frac{1}{\Norm{\ket{\chi_k}}} \ket{\chi_{k}(T)}\]

and then later multiply again with \(\Norm{\ket{\chi_k}}\) when calculating the pulse update.

Choice of λₐ¶

The monotonic convergence of Krotov’s method is only guaranteed in the continuous limit; a coarse time step must be compensated by larger values of the inverse step width \(\lambda_a\), slowing down convergence. Generally, choosing \(\lambda_a\) too small will lead to numerical instabilities and unphysical features in the optimized pulse. A lower limit for \(\lambda_a\) can be determined from the requirement that the change \(\Delta\epsilon(t)\) should be at most on the same order of magnitude as the guess pulse \(\epsilon^{(i)}(t)\) for that iteration. The Cauchy-Schwarz inequality applied to the update equation yields

\[\Norm{\Delta \epsilon(t)}_{\infty} \le \frac{\Norm{S(t)}}{\lambda_a} \sum_{k} \Norm{\ket{\chi_k (t)}}_{\infty} \Norm{\ket{\psi_k (t)}}_{\infty} \Norm{\frac{\partial \Op{H}}{\partial \epsilon}}_{\infty} \stackrel{!}{\le} \Norm{\epsilon^{(i)}(t)}_{\infty}\,,\]

where \(\Norm{\frac{\partial \Op{H}}{\partial \epsilon}}_{\infty}\) denotes the supremum norm of the operator norms of the operator \(\frac{\partial \Op{H}}{\partial \epsilon}\) obtained at time \(t\). Since \(S(t) \in [0,1]\) and \(\ket{\psi_k}\) is normalized, the condition for \(\lambda_a\) becomes

\[\lambda_a \ge \frac{1}{\Norm{\epsilon^{(i)}(t)}_{\infty}} \left[ \sum_{k} \Norm{\ket{\chi_k(t)}}_{\infty} \right] \Norm{\frac{\partial \Op{H}}{\partial \epsilon}}_{\infty}\,.\]

From a practical point of view, the best strategy is to start the optimization with a comparatively large value of \(\lambda_a\), and after a few iterations lower \(\lambda_a\) as far as possible without introducing numerical instabilities. The value of \(\lambda_a\) may be adjusted dynamically with the rate of convergence. Generally, the optimal choice of \(\lambda_a\) requires some trial and error.

Rotating wave approximation¶

When using the rotating wave approximation (RWA), it is important to remember that the target transformation \(\Op{O}\) is usually defined in the lab frame, not in the rotating frame. This is relevant for the construction of \(\ket{\chi_k(T)}\). The easiest approach is to transform the result of the forward propagation \(\ket{\phi_k(T)}\) from the rotating frame to the lab frame, then construct \(\ket{\chi_k(T)}\) for the next OCT iteration, and transform \(\ket{\chi_k(T)}\) back to the rotating frame, before starting the backward-propagation for the next OCT iteration. When the RWA is used, the control fields are complex-valued. In this case, the Krotov update equation is valid for both the real and the imaginary part independently. The most straightforward implementation of the method is for real controls only, requiring that any complex control Hamiltonian is rewritten as two independent control Hamiltonians, one for the real part and one for the imaginary part of the control field. For example,

\[\epsilon^*(t) \Op{a} + \epsilon(t) \Op{a}^\dagger = \epsilon_{\text{re}}(t) (\Op{a} + \Op{a}^\dagger) + \epsilon_{\text{im}}(t) (i \Op{a}^\dagger - i \Op{a})\]

with two independent control fields \(\epsilon_{\text{re}}(t)= \Re[\epsilon(t)]\) and \(\epsilon_{\text{im}}(t) = \Im[\epsilon(t)]\).

Optimization in Liouville space¶

The control equations have been written in the notation of Hilbert space. However, they are equally valid for a gate optimization in Liouville space, by replacing states with density matrices, \(\Op{H}\) with \(\Liouville\), and inner products with Hilbert-Schmidt products, see, e.g., Ref. [ReichJCP12].