This study refines the methodology for solving stochastic optimal control problems with quality criteria that include the sum of the quality functional of the classical formulation and an extremal measure. A two-level optimization solution of these kinds of problems is presented already for the case where the quality functional consists only of the extremal measure. Our study shows the possibility of solving the original time inconsistency problem through solving a two-level optimization problem, where the outer problem is solved by gradient methods since the value function is convex and the inner problem is solved by classical methods. Some experiments were carried out and confirmed the validity of the theory. The results of the study can be applied to the case of portfolio management with quality criteria containing the Conditional Value-at-Risk (CVaR) metric. © 2024 by the authors.