5.On-line search and unknown environments

Definition

Normally, we have two kinds of search: Offline-search, Online-search

For Offline-search, it commits a solution to an agent and then executes it.

For Online-search, it interleaves computation and action(search → execute → observe → search ...)

On-line search can build a map and find a goal state if there are no dead-ends.

In terms of Online-search, it works in following situation:

  1. Dynamic environments.

2.Time-constrained environments.

3.Unknown environment

Formulation

Actions(s): returns the set of legal actions in states.

The step cost function c(s, a, s′): returns the cost of going from s to s′ via action a. This cannot be used until the agent has tried a in s and knows that s′ is the outcome.

Goal-test(s): returns true if s is a goal state

Requirement

1.Observability: the current percept identifies the current state.

2.The agent might have access to some (admissible) heuristic function h(s).

3.No dead-end: the goal must be reachable from every state no algorithm can avoid dead-ends in all environments.

4.The agent cannot determine Result(s,a) except by being in s, executing a and observing the result.

Solution---LRTA*: Learning Real-Time A

1.Store a table result(s, a) recording the result of actions tried When a is first tried in s, record the resulting state.

2.Store and update a table H(s) recording the estimate of visited states When s is first met, initialise H(s) = h(s) for some admissible h(s) When s is left, update its cost estimate H(s) to mina C(s, a)

3.C (s, a) is the estimated cost of reaching the goal from state s via action a:
C(s, a) =
c(s, a, s′) + H(s′)------if result(s, a) is known to be s′
h(s)------------------otherwise

4.Take the best local move (real-time):In state s select the action a minimising the estimated cost C(s,a)

LRTA* updates heuristic information to escape local minima.

你可能感兴趣的:(5.On-line search and unknown environments)