讲解:Gambler’s Ruin、Java,Python、C/C++Java|Processing

(Exercise 9) For this week’s exercise I will provide some code below to help you reachPart II. Recall the Gambler’s Ruin problem on the integers {0, 1, 2, . . . , 10} with betsat each stage of an amount a ∈ [1, 2, . . . , x], and a fixed (independent) probabilityp of winning each bet. A very useful way to formulate Gambler’s Ruin has alreadybeen seen in the notes:• to have no rewards at all, because instead we . . .• place a boundary condition on the value function, essentially enforcing thatVn(x) = 1 for any x ≥ 10. (This is like receiving a reward of +1 after reachingan x ≥ 10.)• Note the reward is obtained after moving, not during the action, so steprewardsare 0 and also independent of the action taken.What we therefore do is as follows: let VT (x) represent the maximized rewardover T steps following an optimal betting strategy starting with wealth x. Then ifduring our T steps we reach a state ≥ 10 we will have collected reward of +1, ifwe haven’t reached such a state (including all cases where we have gone bankruptalready) we have a reward of 0. As discussed in lectures this approach will maximizethe probability of reaching such a state ≥ 10.In part (h) of Exercise 9 you should use the following ‘shell’ for a function:findValueFunction steps # In s e r t your code here c rea ting V# In s e r t your code here c rea ting W, pu t ting . . .# in boundary valueswhile ( steps # In s e r t code here f o r pu t ting in . . .# boundary values o f W# In s e r t loop code here f o r pu t ting . . .# values i n t o W using Op timali ty . . .# Equation , V , and also s t o rin g the . . .# best choice a∗ f o r each s ta te xV return ( l i s t ( value=W, act ions= )}findValueFunction(100,0.4), for example, will now work out V100(x) when p = 0.4.Can you see why? It even tells you which action to optimally take initially.(The idea here is that the code increments t from 1, with V always containingVt−1 and being used (at each x) to work out Vt which is then stored in W. Then tis incremented, W stored into V and the process repeated. The value of t althoughnot explicit, is always steps+1.)3-30Exercise 9 (Al-Khwarizmi). Part I: Solving Gambler’s Ruin.Our value function for VT (x) satisfies this optimality equation:VT (x) = max1≤a≤xpVT −1(x + a) + (1 − p)VT −1(x 代做Gambler’s Ruin作业、代写Java,Python程序语言作业、代写C/C++课程设计作业 帮做Java程− a),which holds for all T ≥ 1, and x ∈ {1, 2, 3, . . . , 9}. We also have boundary conditions:• For all T ≥ 0: VT (0) = 0 and VT (x) = 1 for any x ≥ 10;• V0(x) = 0 for x ≤ 9;You will use this Optimality Equation (for each x ∈ {1, 2, . . . , 9}) to calculate Vnfrom Vn−1 for each n ≥ 1 until you have found V1000(x). (I suggest you fix p = 0.4until Part II). Parts (a)-(b)-(c) are intended to be very quick, the exercise really startsat (d).(a) Why is X = {0, 1, 2, . . . , 18} the full set of reachable state while following thegame rules? For which values of x ∈ X do you already explicilty know the valuesof Vn(x), for every n ≥ 0? (i.e. boundary conditions)(b) Create a vector, V of length 19, to hold the values of V0(·). Note that in R, thefirst element of a vector is called element 1, so V[i ] will need to contain V0(i−1).(c) Create another vector, W, this time holding the values of V1(·). Only insertvalues from the boundary conditions. Values not known from the boundaryconditions should be set equal to NA.(d) Define a function of x, a, p and VT −1 which calculates the right-hand side (RHS)bracket of the optimality equation for the given values.(e) Identify the valid choices a ∈ Ax. Construct a loop over all valid (x, a) combinations(we only need 1 ≤ x ≤ 9, and I suggest an outer loop over x, and innerloop over a). Test it by printing all (x, a) pairs.(f) Inside the loop, add a step to call the function defined in (d) for each a ∈ Ax inorder to find the best choice a at each x. Then for each x ∈ {1, 2, . . . , 9} storethe best choice of a in some vector you create.(g) In your loop above, use the the vector W to store the values of VT (x) calculatedas the best value of the RHS of the optimality equation when given x and VT −1.(You can use V to store VT −1, and W for VT .)(h) Copy your code above inside the function shell described in the accompanyingpurple block from the notes (see previous page) and ...Part II: Use your value function function to answer questions... most importantlycomment on the answers!• Find V1(x), V2(x) and V3(x) for x ∈ {1, 2, . . . , 9}, when p = 0.6. Also findingthe optimal first initial actions.• How does V1000(5) vary for p ∈ { 110 ,13,12,34}?• What is the initial optimal betting action for each state when p = 0.4 andT = 20?• How do V1000(x) and V999(x) compare?• Feel free to answer other questions you pose yourself too!3-31转自:http://www.daixie0.com/contents/3/4508.html

你可能感兴趣的:(讲解:Gambler’s Ruin、Java,Python、C/C++Java|Processing)