# Krotov’s Method¶

## The quantum control problem¶

Quantum optimal control methods formalize the problem of finding “control fields” that steer the time evolution of a quantum system in some desired way. For closed systems, described by a Hilbert space state \(\ket{\Psi(t)}\), this time evolution is given by the Schrödinger equation,

where the Hamiltonian \(\Op{H}(t)\) depends on one or more control fields \(\{\epsilon_l(t)\}\). We often assume the Hamiltonian to be linear in the controls,

but non-linear couplings may also occur, for example when considering non-resonant multi-photon transitions. For open quantum systems described by a density matrix \(\hat{\rho}(t)\), the Liouville-von-Neumann equation

replaces the Schrödinger equation, with the (non-Hermitian) Liouvillian \(\Liouville(t)\). The most direct example of a control problem is a state-to-state transition. The objective is for a known quantum state \(\ket{\phi}\) at time zero to evolve to a specific target state \(\ket{\phi^\tgt}\) at final time \(T\), controlling, e.g. a chemical reaction [37]. Another example is the realization of quantum gates, the building blocks of a quantum computer. In this case, the states forming a computational basis must transform according to a unitary transformation [2], see How to optimize towards a quantum gate. Thus, the control problem involves not just the time evolution of a single state, but a set of states \(\{\ket{\phi_k(t)}\}\). Generalizing even further, each state \(\ket{\phi_k(t)}\) in the control problem may evolve under a different Hamiltonian \(\Op{H}_k(\{\epsilon_l(t)\})\), see How to optimize for robust pulses.

Physically, the control fields \(\{\epsilon_l(t)\}\) might be the amplitudes of a laser pulse for the control of molecular systems or trapped atom/ion quantum computers, radio-frequency fields for nuclear magnetic resonance, or microwave fields for superconducting circuits. When there are multiple independent controls \(\{\epsilon_l(t)\}\) involved in the dynamics, these may correspond e.g., to different color lasers used in the excitation of a Rydberg atom, or different polarization components of an electric field.

The quantum control methods build on a rich field of classical control theory [38][39]. This includes Krotov’s method [40][41][42][43], which was originally formulated to optimize the soft landing of a spacecraft from orbit to the surface of a planet, before being applied to quantum mechanical problems [5][44][45][46][22]. Fundamentally, they rely on the variational principle, that is, the minimization of a functional \(J[\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}]\) that includes any required constraints via Lagrange multipliers. The condition for minimizing \(J\) is then \(\nabla_{\phi_k, \epsilon_l} J = 0\). In rare cases, the variational calculus can be solved in closed form, based on Pontryagin’s maximum principle [39]. Numerical methods are required in any other case. These start from an initial guess control (or set of guess controls, if there are multiple controls), and calculate an update to these controls that will decrease the value of the functional. The updated controls then become the guess for the next iteration of the algorithm, until the value of the functional is sufficiently small, or convergence is reached.

## Optimization functional¶

Mathematically, Krotov’s method, when applied to quantum systems [5][22], minimizes a functional of the most general form

where the \(\{\ket{\phi_k^{(i)}(T)}\}\) are the time-evolved initial states \(\{\ket{\phi_k}\}\) under the controls \(\{\epsilon^{(i)}_l(t)\}\) of the \(i\)’th iteration. In the simplest case of a single state-to-state transition, the index \(k\) vanishes. For the example of a two-qubit quantum gate, \(\{\ket{\phi_k}\}\) would be the logical basis states \(\ket{00}\), \(\ket{01}\), \(\ket{10}\), and \(\ket{11}\), all evolving under the same Hamiltonian \(\Op{H}_k \equiv \Op{H}\). The sum over \(l\) vanishes if there is only a single control. For open system dynamics, the states \(\{\ket{\phi_k}\}\) may be density matrices.

The functional consists of three parts:

A final time functional \(J_T\). This is the “main” part of the functional, and we can usually think of \(J\) as being an auxiliary functional in the optimization of \(J_T\).

A running cost on the control fields, \(g_a\). The most commonly used expression (and the only one currently supported by the

`krotov`

package) is [47](2)¶\[\begin{split}\begin{split} g_a(\epsilon_l^{(i)}(t)) &= \frac{\lambda_{a,l}}{S_l(t)} \left( \epsilon_l^{(i)}(t) - \epsilon_{l, \text{ref}}^{(i)}(t) \right)^2\,; \quad \epsilon^{(i)}_{l, \text{ref}}(t) = \epsilon_l^{(i-1)}(t)\\ &= \frac{\lambda_{a,l}}{S_l(t)} \left( \Delta\epsilon_l^{(i)}(t) \right)^2 \,, \end{split}\end{split}\]with the inverse “step width” \(\lambda_{a,l} > 0\), the “update shape” function \(S_{l}(t) \in [0, 1]\), and the Iterative control update

(3)¶\[\Delta\epsilon_l^{(i)}(t) \equiv \epsilon_l^{(i)}(t) - \epsilon_l^{(i-1)}(t)\,,\]where \(\epsilon_l^{(i-1)}(t)\) is the optimized control of the previous iteration – that is, the guess control of the current iteration \((i)\).

An optional state-dependent running cost, \(g_b\). This may be used to encode time-dependent control targets [48][49], or to penalize population in a subspace [50]. The presence of a state-dependent constraint in the functional entails an inhomogeneous term in the backward propagation in the calculation of the control updates in each iteration of Krotov’s method, see Eq. (14), and is currently not supported by the

`krotov`

package. Penalizing population in a subspace can also be achieved through simpler methods that do not require a \(g_b\), e.g., by using a non-Hermitian Hamiltonian to remove population from the forbidden subspace during the time evolution.

The most commonly used final-time functionals (cf. `krotov.functionals`

)
optimize for a set of initial states \(\{\ket{\phi_k}\}\) to evolve to a
set of target states \(\{\ket{\phi_k^\tgt}\}\). The functionals can then
be expressed in terms of the complex overlaps of the target states with the
final-time states under the given control. Thus,

in Hilbert space, or

in Liouville space.

The following functionals \(J_T\) can be formed from these complex overlaps, taking into account that any optimization functional \(J_T\) must be real. They differ by the way they treat the phases \(\varphi_k\) in the physical optimization goal \(\ket{\phi_k(T)} \overset{!}{=} e^{i\varphi_k}\ket{\phi_k^{\tgt}}\) [47]:

Optimize for simultaneous state-to-state transitions, with completely arbitrary phases \(\varphi_k\),

(5)¶\[J_{T,\text{ss}} = 1- \frac{1}{N} \sum_{k=1}^{N} \Abs{\tau_k}^2\,,\]cf.

`J_T_ss()`

.Optimize for simultaneous state-to-state transitions, with an arbitrary

*global*phase, i.e., \(\varphi_k = \varphi_{\text{global}}\) for all \(k\) with arbitrary \(\varphi_{\text{global}}\),(6)¶\[J_{T,\text{sm}} = 1- \frac{1}{N^2} \Abs{\sum_{k=1}^{N} \tau_k}^2 = 1- \frac{1}{N^2} \sum_{k=1}^{N} \sum_{k'=1}^{N} \tau_{k'}^* \tau_{k}\,,\]cf.

`J_T_sm()`

.Optimize for simultaneous state-to-state transitions, with a global phase of zero, i.e., \(\varphi_k = 0\) for all \(k\),

(7)¶\[J_{T,\text{re}} = 1-\frac{1}{N} \Re \left[\, \sum_{k=1}^{N} \tau_k \,\right]\,,\]cf.

`J_T_re()`

.

## Iterative control update¶

Starting from the initial guess control \(\epsilon_l^{(0)}(t)\), the optimized field \(\epsilon_l^{(i)}(t)\) in iteration \(i > 0\) is the result of applying a control update,

Krotov’s method is a clever construction of a particular \(\Delta\epsilon_l^{(i)}(t)\) that ensures

Krotov’s solution for \(\Delta\epsilon_l^{(i)}(t)\) is given in below (First order update and Second order update). As shown there, for the specific running cost of Eq. (2), using the guess control field \(\epsilon_l^{(i-1)}(t)\) as the “reference” field, the update \(\Delta\epsilon^{(i)}_l(t)\) is proportional to \(\frac{S_l(t)}{\lambda_{a,l}}\). Note that this also makes \(g_a\) proportional to \(\frac{S_l(t)}{\lambda_{a,l}}\), so that Eq. (2) is still well-defined for \(S_l(t) = 0\). The (inverse) Krotov step width \(\lambda_{a,l}\) can be used to determine the overall magnitude of \(\Delta\epsilon^{(i)}_l(t)\). Values that are too large will change \(\epsilon_l^{(i)}(t)\) by only a small amount in every iteration, causing slow convergence. Values that are too small will result in numerical instability, see Time discretization and Choice of λₐ. The “update shape” function \(S_l(t)\) allows to ensure boundary conditions on \(\epsilon^{(i)}_l(t)\): If both the guess field \(\epsilon^{(i-1)}_l(t)\) and \(S_l(t)\) switch on and off smoothly around \(t=0\) and \(t=T\), then this feature will be preserved by the optimization. A typical example for an update shape is

(9)¶\[\begin{split}S_l(t) = \begin{cases} B(t; t_0=0, t_1=2 t_{\text{on}}) & \text{for} \quad 0 < t < t_{\text{on}} \\ 1 & \text{for} \quad t_{\text{on}} \le t \le T - t_{\text{off}} \\ B(t; t_0=T-2 t_{\text{off}}, t_1=T) & \text{for} \quad T - t_{\text{off}} < t < T\,, \end{cases}\end{split}\]cf.

`krotov.shapes.flattop()`

, with the Blackman shape(10)¶\[B(t; t_0, t_1) = \frac{1}{2}\left( 1 - a - \cos\left(2\pi \frac{t - t_0}{t_1 - t_0}\right) + a \cos\left(4\pi \frac{t - t_0}{t_1 - t_0}\right) \right)\,,\quad a = 0.16\,,\]

which is similar to a Gaussian, but exactly zero at
\(t = t_0, t_1\). This is essential to maintain the typical boundary
condition of zero amplitude at the beginning and end of the optimized
control field. Generally, *any* part of the control field can be kept
unchanged in the optimization by choosing \(S_l(t) = 0\) for the
corresponding intervals of the time grid.

Note

In the remainder of this chapter, we review some of the mathematical details
of how Krotov’s method calculates the update in Eqs. (3), (8).
These details are not necessary to *use* the `krotov`

package as a
“black box” optimization tool, so you may skip ahead to
Using Krotov with QuTiP and come back at a later time.

## First order update¶

Krotov’s method is based on a rigorous examination of the conditions for
calculating the updated fields \(\{\epsilon_l^{(i)}(t)\}\) such that
\(J(\{\ket{\phi_k^{(i)}(t)}\}, \{\epsilon_l^{(i)}(t)\}) \leq
J(\{\ket{\phi_k^{(i-1)}(t)}\}, \{\epsilon_l^{(i-1)}(t)\})\) is true *by
construction* [42][43][47][46][22].
For a general functional of the form in
Eq. (1), with a convex final-time
functional \(J_T\), the condition for monotonic convergence is

see Ref. [47]. The notation for the derivative on the right hand side being evaluated at \({(i)}\) should be understood to apply when the control Hamiltonian is not linear so that \(\frac{\partial \Op{H}}{\partial \epsilon_l(t)}\) is still time-dependent; the derivative must then be evaluated for \(\epsilon^{(i)}_l(t)\) – or, numerically, for \(\epsilon^{(i-1)}_l(t) \approx \epsilon^{(i)}_l(t)\). If there are multiple controls, Eq. (11) holds for every control field \(\epsilon_l(t)\) independently.

For \(g_a\) as in Eq. (2), this results in an
*update*
equation [5][47][46],

with the equation of motion for the forward propagation of \(\ket{\phi_k^{(i)}}\) under the optimized controls \(\{\epsilon_l^{(i)}(t)\}\) of the iteration \((i)\),

The co-states \(\ket{\chi_k^{(i-1)}(t)}\) are propagated backwards in time under the guess controls of iteration \((i)\), i.e., the optimized controls from the previous iteration \((i-1)\), as

with the boundary condition

where the right-hand-side is evaluated for the set of states \(\{\ket{\phi_k^{(i-1)}(T)}\}\) resulting from the forward-propagation of the initial states under the guess controls of iteration \((i)\) – that is, the optimized controls of the previous iteration \((i-1)\).

For example, for the functional \(J_{T,\text{ss}}\) in Eq. (5) for a single state-to-state transition (\(N=1\)),

cf. `krotov.functionals.chis_ss()`

and the `krotov.functionals`

module
in general.

## Second order update¶

The update Eq. (12) assumes that the equation of motion is linear (\(\Op{H}\) does not depend on the states \(\ket{\phi_k(t)}\)), the functional \(J_T\) is convex, and no state-dependent constraints are used (\(g_b\equiv 0\)). When any of these conditions are not fulfilled, it is still possible to derive an optimization algorithm with monotonic convergence via a “second order” term in Eqs. (11), (12) [43][22],

The full update equation then reads

with

see Ref. [22] for the full construction of the second-order condition. In Eq. (16), \(\sigma(t)\) is a scalar function that must be properly chosen to ensure monotonic convergence.

In Refs. [28][29], a non-convex final-time functional for the optimization towards an arbitrary perfect entangler is considered. For this specific example, a suitable choice is

where \(\varepsilon_A\) is a small non-negative number. The optimal value for \(A\) in each iteration can be approximated numerically as [22]

cf. `krotov.second_order.numerical_estimate_A()`

, with

with

See the Optimization towards a Perfect Entangler for an example.

Note

Even when the second order update equation is mathematically required to guarantee monotonic convergence, very often an optimization with the first-order update equation (12) will give converging results. Since the second order update requires more numerical resources (calculation and storage of the states \(\ket{\Delta\phi_k(t)}\)), you should always try the optimization with the first-order update equation first.

## Time discretization¶

The derivation of Krotov’s method assumes time-continuous control fields. Only in this case, monotonic convergence is mathematically guaranteed. However, for practical numerical applications, we have to consider controls on a discrete time grid with \(N_T+1\) points running from \(t=t_0=0\) to \(t=t_{N_T}=T\), with a time step \(\dd t\). The states are defined on the points of the time grid, while the controls are assumed to be constant on the intervals of the time grid. See the notebook Time Discretization in Quantum Optimal Control for details.

The discretization yields the numerical scheme shown in Fig. 1 for a single control field (no index \(l\)), and assuming the first-order update is sufficient to guarantee monotonic convergence for the chosen functional. For simplicity, we also assume that the Hamiltonian is linear in the control, so that \(\partial \Op{H} / \partial \epsilon(t)\) is not time-dependent. The scheme proceeds as follows:

Construct the states \(\{\ket{\chi^{(i-1)}_k(T)}\}\) according to Eq. (15). For most functionals, specifically any that are more than linear in the overlaps \(\tau_k\) defined in Eq. (4), the states \(\{\ket{\chi^{(i-1)}_k(T)}\}\) depend on the states \(\{\ket{\phi^{(i-1)}_k(T)}\}\) forward-propagated under the optimized pulse from the previous iteration, that is, the guess pulse in the current iteration.

Perform a backward propagation using Eq. (14) as the equation of motion over the entire time grid. The resulting state at each point in the time grid must be stored in memory.

Starting from the known initial states \(\{\ket{\phi_k}\} = \{\ket{\phi_k(t=t_0=0)}\}\), calculate the pulse update for the first time step according to

(17)¶\[\Delta\epsilon^{(i)}_1 \equiv \Delta\epsilon^{(i)}(\tilde{t}_0) = \frac{S(\tilde{t}_0)}{\lambda_{a}} \Im \left[\, \sum_{k=1}^{N} \bigg\langle \chi_k^{(i-1)}(t_0) \bigg\vert \frac{\partial \Op{H}}{\partial \epsilon} \bigg\vert \phi_k(t_0) \bigg\rangle \right]\,.\]The value \(\Delta\epsilon^{(i)}_1\) is taken on the midpoint of the first time interval, \(\tilde{t}_0 \equiv t_0 + \dd t/2\), based on the assumption of a piecewise-constant control field and an equidistant time grid with spacing \(\dd t\).

Use the updated field \(\epsilon^{(i)}_1\) for the first interval to propagate \(\ket{\phi_k(t=t_0)}\) for a single time step to \(\ket{\phi_k^{(i)}(t=t_0 + \dd t)}\), with Eq. (13) as the equation of motion. The updates then proceed sequentially, using the discretized update equation

(18)¶\[\Delta\epsilon^{(i)}_{n+1} \equiv \Delta\epsilon^{(i)}(\tilde{t}_n) = \frac{S(\tilde{t}_n)}{\lambda_{a}} \Im \left[\, \sum_{k=1}^{N} \bigg\langle \chi_k^{(i-1)}(t_n) \bigg\vert \frac{\partial \Op{H}}{\partial \epsilon} \bigg\vert \phi_k^{(i)}(t_n) \bigg\rangle \right]\]with \(\tilde{t}_n \equiv t_n + \dd t / 2\) for each time interval \(n\), until the final forward-propagated state \(\ket{\phi^{(i)}_k(T)}\) is reached.

The updated control field becomes the guess control for the next iteration of the algorithm, starting again at step 1. The optimization continues until the value of the functional \(J_T\) falls below some predefined threshold, or convergence is reached, i.e., \(\Delta J_T\) approaches zero so that no further significant improvement of \(J_T\) is to be expected.

Eq. (12) re-emerges as the continuous limit of the time-discretized update equation (18), i.e., \(\dd t \rightarrow 0\) so that \(\tilde{t}_n \rightarrow t_n\). Note that Eq. (18) resolves the seeming contradiction in the time-continuous Eq. (12) that the calculation of \(\epsilon^{(i)}(t)\) requires knowledge of the states \(\ket{\phi_k^{(i)}(t)}\) which would have to be obtained from a propagation under \(\epsilon^{(i)}(t)\). By having the time argument \(\tilde{t}_n\) on the left-hand-side of Eq. (18), and \(t_n < \tilde{t}_n\) on the right-hand-side (with \(S(\tilde{t}_n)\) known at all times), the update for each interval only depends on “past” information.

For multiple objectives, the scheme can run in parallel, and each objective
contributes a term to the update. Summation of these terms yields the sum
in Eq. (12). See `krotov.parallelization`

for
details. For a second-order update, the forward propagated states from step 4,
both for the current iteration and the previous iteration, must be stored in
memory over the entire time grid.

## Pseudocode¶

A complete pseudocode for Krotov’s method as described in the previous section Time discretization is available in PDF format: krotov_pseudocode.pdf.

## Choice of λₐ¶

The monotonic convergence of Krotov’s method is only guaranteed in the continuous limit; a coarse time step must be compensated by larger values of the inverse step size \(\lambda_{a,l}\), slowing down convergence. Values that are too small will cause sharp spikes in the optimized control and numerical instabilities. A lower limit for \(\lambda_{a,l}\) can be determined from the requirement that the change \(\Delta\epsilon_l^{(i)}(t)\) should be at most of the same order of magnitude as the guess pulse \(\epsilon_l^{(i-1)}(t)\) for that iteration. The Cauchy-Schwarz inequality applied to the update equation (12) yields

where \(\norm{\partial \Op{H}/\partial \epsilon_l(t)}_{\infty}\) denotes the supremum norm of the operator \(\partial \Op{H}/\partial \epsilon_l\) obtained at time \(t\). Since \(S(t) \in [0,1]\) and \(\ket{\phi_k}\) are normalized, the condition for \(\lambda_{a,l}\) becomes

From a practical point of view, the best strategy is to start the
optimization with a comparatively large value of \(\lambda_{a,l}\),
and after a few iterations lower \(\lambda_{a,l}\) as far as
possible without introducing numerical instabilities. In principle, the value
of \(\lambda_{a,l}\) may be adjusted dynamically with respect to the
rate of convergence, via the modify_params_after_iter argument to
`optimize_pulses()`

. Generally, the ideal choice of
\(\lambda_{a,l}\) requires some trial and error, but once a suitable value
has been found, it does not have to be adjusted further. In particular, it is
not necessary to perform a line search over \(\lambda_{a,l}\).

## Complex controls and the RWA¶

When using the rotating wave approximation (RWA), it is important to remember that the target states are usually defined in the lab frame, not in the rotating frame. This is relevant for the construction of \(\ket{\chi_k(T)}\). When doing a simple optimization, such as a state-to-state or a gate optimization, the easiest approach is to transform the target states to the rotating frame before calculating \(\ket{\chi_k(T)}\). This is both straightforward and numerically efficient.

Another solution would be to transform the result of the forward propagation \(\ket{\phi_k(T)}\) from the rotating frame to the lab frame, then constructing \(\ket{\chi_k(T)}\), and finally to transform \(\ket{\chi_k(T)}\) back to the rotating frame, before starting the backward propagation.

When the RWA is used, the control fields are complex-valued. In this case the Krotov update equation is valid for both the real and the imaginary part independently. The most straightforward implementation of the method is for real controls only, requiring that any complex control Hamiltonian is rewritten as two independent control Hamiltonians, one for the real part and one for the imaginary part of the control field. For example,

with two independent control fields \(\epsilon_{\text{re}}(t)= \Re[\epsilon(t)]\) and \(\epsilon_{\text{im}}(t) = \Im[\epsilon(t)]\).

See the Optimization of a State-to-State Transfer in a Lambda System in the RWA for an example.

## Optimization in Liouville space¶

The coupled equations (12)–(14) can be
generalized to open system dynamics by replacing Hilbert space states with
density matrices, \(\Op{H}\) with \(\mathrm{i} \Liouville\), and brakets (inner products) with Hilbert-Schmidt products,
\(\langle \cdot \vert \cdot \rangle \rightarrow \langle\!\langle \cdot
\vert \cdot \rangle\!\rangle\). In full generality, \(\Op{H}\) in
Eq. (12) is the operator \(H\) on the right-hand
side of whatever the equation of motion for the forward propagation of the
states is, written in the form \(\mathrm{i} \hbar \dot\phi = H \phi\),
cf. Eq. (13). See `krotov.mu`

for details.

Note also that the backward propagation Eq. (14) uses the adjoint \(H\), which is relevant both for a dissipative Liouvillian [51][52][27] and a non-Hermitian Hamiltonian [25][53].

See the Optimization of Dissipative Qubit Reset for an example.