Okay let’s get started with our discussion on dynamic optimization dynamic optimization problem so the state is the state space equation for this system is given by XT plus 1 equals to F T the function of the current state and the action that is taken so this is the state this is state this is action and F T is.
Known to is called the state transition function state transition function ok we have a performance index a.
Cost function cost function J of U 0 so T goes from 0 to capital T minus 1 so u 0 to UT minus 1 G capital T XT plus summation T equals 0 to t minus 1 G small T XD u T so this is known as terminal cost this is known as running cost okay this is.
All the recap of what we did in the previous class and then there are two things to notions of optimality that we can talk about so the first is we want optimal actions you zero-star you capital T minus one star and the other one is optimal policy which.
Is let’s say gamma T star that map’s XT to duty okay and we are currently talking about computing the optimal actions and we will be talking about maximum principal today maximum principle or minimum principle today so that will give us optimal actions and we will talk about optimal policy when we talk about dynamic programming perhaps in the next class okay all right so one.
Of the simplifications we made yesterday we were considering the case when G sub T is equal to zero so.
We only have a terminal cost J of you I’m just going to you write this.
Collection as a vector u so U is equal to G capital T X capital T okay so we only have a terminal cost we don’t have a running cost in this particular system or in this particular case and this other thing we talked about was that XT can be written as a function of U but VT only uses.
Uses only you not to UT minus one okay any questions so far this is all a recap of the previous class yes actions of the current state and the current time step is a function of the actions yes until that time step yeah yeah everything that happened before so it it also is a function of X naught which is the.
Initial State but that is that initial state is absorbed in this function VT because that doesn’t change from time to time initial status initial State that’s little t minus 1 yeah even though I’m writing it as a function of the whole set it’s just a function of only the first T terms small T terms okay all right.
So based on this this equality I can actually eliminate all X T from the.
Objective function and I can write J of U as G of T V.
Of T of U right and the goal is to minimize this over let me just write it as u bar I mean u vector in whatever space this u vector sits in okay Susan costs just choosing actions to minimize cause or minimize cost specifically it and yeah yes yes assume that it does I mean throughout this course we have always.
Assumed that the main exists okay any other question yes same value of money right oh yeah that would be part.
Of the running cost okay all right so this is the formulation so what’s the what’s a.
Necessary condition for optimality so I started with a problem that was evolving over a state space I had a bunch of actions that I need to take over time and then what I did was I suppressed the entire so by by by substituting variables I basically suppress the dynamic nature of the problem and I got a static optimization problem okay the static optimization problem is I want to minimize over all.
Actions this big cost okay so GT composition PT of this vector you okay so what would be a necessary condition for optimality any thoughts well not you someone else okay what would be.
A necessary condition for optimality so if you start is optimal then what come on someone should be able to give an answer this is the third month.
In optimization no one okay is that all there is.
A first order necessary condition and then.
There is a second order so yeah so the first order necessary condition is you star optimal implies what be equal to zero at you star right and the second order necessary condition for optimality is yeah well I wouldn’t say greater than zero I will say positive definite or positive semi-definite okay so I started with a dynamic problem I did the substitution now I came up with.
A static problem and then I can use all the results from for static optimization problem here – to conclude the first order necessary condition and second order necessary condition for optimality okay so now what I’m going to do is this expression is not very informative okay it’s some long expression.
That we don’t quite know what it looks like so what I’m going to do is.
Try to figure out a recursive equation to be able to compute this expression the derivative of this function with respect to you okay so that’s the next goal the question is what is the expression of okay derivative.
With respect to you small T of this objective.
Function so let’s do it for time for let let’s do it for time T minus one first so UT minus one of GT XD that’s given by UT minus one of GT of FD minus.
1 XT minus 1 UT minus 1 ok so this is what I have now here is a formula that we did at the very beginning of this class of this course so I just want to recall it so p1 composition p2 is a function from RN to R then gradient of P one composition.
P2 at X multiplied by gradient of P one at p2 of X oh let me write it in a cleaner way so p1 of Y y equals to P 2 of X ok that’s chain rule let’s known as chain rule chain rule for differentiation okay so is.
This is this do every everyone recalls this expression okay we did that at the very beginning of this course yes yes so I’m going to apply the the chain rule that I’ve written there in red so.
What I have is gradient well and need more space so I’m going to write it here so I’ll have gradient of UT minus 1 ft minus 1 multiplied by gradient of XT of TT okay is that clear all right let’s do gradient of UT minus to GT of XD okay now one thing to note here is that XT minus one actually doesn’t depend on UT minus one okay so this is so the derivative with respect to XT minus.
Don’t need to take into that the dependence but here is another expression so I now I’m not taking derivative with respect to UT minus 2 so I have derivative.
Of UT minus 2 of GT of FD minus 1 of F T minus 2 okay so now I have like pretty long expression so what would be the derivative here so the derivative is great Yente of UT minus 2 ft minus 2.
X the derivative of XT minus 1 f e minus 1 multiplied by the derivative of G T with respect to X D ok so I.
At these expressions for some time okay so.
Let’s stare at this expression for sometime what do we observe we see a kind of a pattern here so the last term is always the gradient with respect to GT so gradient of G T with respect to X T and then the first term is always the.
Derivative of the action over which we are taking the derivative of the of the state-transition function in which that action appears that’s the same thing we see here so UT minus 1 so derivative of F T minus 2 with respect to UT minus 2 and then we have a term here that bridges this term with this term which is the derivative of F T minus 1 with respect to XT minus 1 so that this term appearing in the middle okay so first we see the derivative with respect to.
The innermost function in the last we have the derivative with respect to the outermost function and in the middle we have the derivative with respect to the function in the middle okay so this X this this expression would appear in all such derivatives in the same fashion so let me write it No okay so we get such a long expression or derivative with respect to UT you.
Can use induction to prove it or you can use repeated chain rule to prove the same result okay now.
Let me try to give let me try to introduce another variable that represents this big multiplication and I’m going to call it PT.
Plus one okay and this will be called co-state vector so what I get is I’m defining P capital T as gradient of G T I am defining P small T as gradient with respect to X T of ft.
X PT plus 1 and I have gradient of U T of G capital T X capital T is given by okay so once I fix X naught so X naught is something that’s given X naught is the initial state that’s.
Given then I come up with some sort of control actions u 1 u 2 u 3 u naught u 1 u 2 u 3 all the way up to u capital t minus 1 okay and that fixes the trajectory the trajectory is the.
Sequence of states that would appear in the system okay so that so once I once I.
Initial state is given and I pick a set of control actions I get the entire trajectory now I want to look at how what the derivative of the final.
Cost is with respect to the actions that I’ve picked up that I have picked at various points of time and we could see that this is what the derivative.
Looks like in the long expression and then we introduced a co-state vector in order to simplify the expression and this is what we get well so I’m assuming that you want to do the optimization of action so you initially just like in the case of gradient descent you pick the initial point randomly you are essentially picking the initial actions randomly initial actions so u naught.
U 1 u 2 all the way up to u capital t minus 1.
So when you are okay so here is how to think about it when you were running gradient descent you would not pick just X 1 which is the first coordinate of X you will pick all coordinates of X randomly so you can think of UT as a coordinate of this vector U and so.
You have to pick all the coordinates time steps right yes yes now the other question that you.
Do you pick you not well not you not but the initial vector you so sometimes you of course will have to go with random guesses but sometimes you might have some domain knowledge and you can use that to initialize your you so this.
Is particularly useful in space the trajectory optimization of rockets that go into space so of course people.
Who have been trained in astrophysics and who have done this several for several years they kind of know which trajectory to take and then they want to fine-tune the trajectory using this optimization so yeah in fact you know if you.
Want to get to Mars purely on the basis of this optimal control with random initialization it will probably take millions not million several hundreds of years for you to get to Mars but if you call an astrophysics person who has studied physics and studied celestial objects what he will suggest is here is what we need to do we need to put this spacecraft in orbit around the earth then we need to get it into the orbit around the moon and then what you get is a slingshot effect so the velocity of the vehicle.
Is equal to the velocity of.
The vehicle the initial velocity of the vehicle plus the velocity of the moon so you get the velocity of the moon as part of the velocity of the vehicle so you get an initial boost without actually.
Doing spending fuel and then you go to Venus.
Okay and that’s like in the opposite direction okay so you go to Venus.
And then you get the velocity of Venus for free and then you go to Mars okay and that’s why there is a specific window in which you need to send the spacecraft.
So you won’t be able to get that kind of solution by.
Using this optimization okay so domain expertise would be needed there yes I don’t think it is because of the moving target it’s because of the distance if you want to cover this much like whatever.
15 billion kilometres and in certain amount of time like seven years you need to have a lot of velocity and how do you get that velocity it’s through using the velocity of.
Existing celestial objects in the space yes that’s exactly how the rover got.
There you know the Pathfinder and all that Mars rovers okay so anyway so this is what the derivative of GT with respect to UT looks like okay now now you will notice that the computation of the state happens in forward time the computation of co-state vector happens in back with using backward going backward in time so you start with X naught then you have this.
Vector u you feed that into your algorithm you compute X 1 all the way up to X T then you feed that into this recursion and you get PT PT minus 1 all the way up to t1 right so you you you find the co state vectors backward in time and then you get the derivative gradient of U T of J as gradient of F T evaluated at X T and u T multiplied by P T plus 1 okay okay so you compute the forward you compute using the.
Initial state and the control actions you’ve chosen you compute the states then you compute the co-state vector so you can’t truncate this at some X T and then start computing the co.
State vectors because you cannot you need the final co-state vector in order for this recursion to work then you compute the co state vectors and then you go ahead and compute the gradient of your cost function with respect to.
UT and then you can do gradient descent or whatever you want to do to optimize you use this.
Recursion to compute X 1 and then X 2 and the next three it is part of the iterative process okay okay yes and so then you update you so let’s say u bar 0 and then u bar 1 equals 2 gradient of u bar of j.
Is an alpha term here so so then your gradient descent is going to look like u bar 0 or u bar 1 equals to u bar 0 minus alpha 0 gradient.
Of u bar J of u bar u bar 0 okay and you will keep doing this iteration again and again so again for this u 1 you will get the trajectory you will get the co-state vector compute the derivative and then go through the gradient descent step again you can pick alpha naught according to whatever is your favorite method okay yes what information.
Yes so initial position which is let’s say for the spacecraft it’s on earth and then you bar.
Naught is the initial yes initial action so how much thrust are you going to give to the vehicle at every point of time yeah okay and then you keep and then you have.
This dynamical equation and based on that you will compute.
All the all the state trajectory and then you will compute the co-state and then you will update but you still need this action yeah well when I speak when I pick this vector u0 its so okay so what.
So that is u u 0 K u 1 K u TK u capital t minus 1 K right so you have used so when I am writing this I’m specifying all the actions that I am taking all the way up to time T minus 1 okay now in the context of training neural network.
Which is also a terminal cost problem this algorithm is known as back propagation algorithm okay so this is the derivation of back propagation algorithm that is used for training neural networks can you speak a little louder yes this this one yeah it does it so it is certainly a function.
Of XT minus 1 and XT minus 2 and all that yes yes yes well so once you go through this iteration what you get is U naught star u 1 star u 2 star all the way up to UT star UT minus 1 star right so it doesn’t have any state dependence so what I’m telling you is no matter what happens in the world if.
You start with this X not just blindly take u 1 star u 2 star you 3 star all the way up to u capital T minus 1 star and you will have the minimum cost ok so it is completely blind I I don’t even have to observe the state I just have to keep applying this action and I will be optimal because it’s a deterministic system so it doesn’t.
Really matter if you observe the state or not okay so strictly.
Speaking observing the state and using the state to act requires you to know the policy.
Optimal policy and that’s useful in stochastic system but may not be as useful in deterministic system but then 99.99% of the systems in the world of stochastic systems so you should observe your state yes Oh non-trivial Oh okay so a stochastic system in which.
The uncertainty is known for the next five seconds or five milliseconds or five microsecond becomes a deterministic system okay so when you are driving car even if you so you kind of know that people will not collide into you immediately right.
You will they will take like four or five seconds if everyone is far apart yes yes then the next instant becomes deterministic no this one no this is just a.
Specific yes yes so your gradient of U of J of U is gradient of U 0 of J of you gradient with respect to u 1 of T of you let’s say in the state transition transition about you thank you.
– Carol you this one yes yes so FD of X D nu T okay yeah yes if the state transition if we have to calculate the whole trajectory right yes but the state transition system depends upon X and yes we only have you not so that’s what I.
Said this is U vector not which means you have u 0k u1 k u TK and you all the way to capital T – the actions for the entire duration yes yes that’s right that’s right okay yes you mean you mean this model was not correct and then you had to add.
Another correction term to the model yes then you have to go through the entire thing all over.
Again okay if the correction is substantial then too bad okay you should have done your modeling exercise better all right so that’s the terminal cost problem now we want to solve the problem with running cost okay and how do we do that so.
So now my J is G capital t plus summation of GT T equals.
0 to capital t minus 1 and I want to find an analogous expression for this particular case when we have running cost so what’s the idea well I’m going to add another state which keeps track of the running cost so YT plus GT XD u T ok.
And y 0 is equal to 0 okay what’s the updated cost function for this particular system so my J let me call it tilde of U is equal to G capital T XT plus y capital T so by carefully augmenting this state I have transformed this problem.
Dynamic problem with running cost into a problem with terminal cost okay so I have added a new state scalar state YT and then it becomes the terminal cost problem so I can apply this result there so let me rewrite this entire set of expressions so X tilde.
Of T plus 1 equals XT plus 1 YT plus 1 equals F tilde of X del dot E and and this is given by f t XD UT + YT + GP X T UT and I’m going to define my co-state vector P tilde T plus 1.
Or patella T equals PT + ZT so so this.
Is the co-state vector corresponding to X.
Would be the co-state vector corresponding to YT okay we’ll find out what the values look like in a bit I am going to erase so what all things I do I need in order to compute you know I don’t quite.
Question so your question is I have all mented the state I have added the running cost does that affect the gradient descent algorithm steps written the first time anything that gets entangled with the Z T terms for what.
We had generated in the previous problem is is right when we’re generating the gradient.