Can states be mutable? #527

kpa28-git · 2023-11-17T22:53:27Z

kpa28-git
Nov 17, 2023

The reason I would want this is to have an MDP that transitions to a terminated state after a non-zero reward is observed, something like this:

mutable struct MyState
    ...
    done::Bool # initialized false
end

...

function POMDPs.transition(::MyMDP, s::MyState, a)
    ... # generate new state here using `(s, a)`
    MyState(..., s.done) # propagate terminal-ness from current to next
end

function POMDPs.reward(::MyMDP, s::MyState, a)
    s.done && return 0
    ...
    reward = ... # calculate `reward` for non-terminal states
    ...
    if reward != 0
        s.done = true
    end
    ....
    return reward
end
...

POMDPs.isterminal(::MyMDP, s::MyState) = s.done

I'm not sure this would work. I could imagine mutating the current state could cause problems for some or all solvers.

EDIT: I could also make the MyMDP object mutable instead of MyState. This could avoid state mutation issues, but I'm also not sure if this is fine to do in general.

Answered by zsunberg

Nov 17, 2023

You're correct! Mutating the state or mdp object will break most solvers. Philosophically, POMDPs.jl assumes that states and problems are immutable. This has significant performance and clarity advantages.

You can still implement your desired behavior with an immutable state - you just have to make sure to decide if the problem is done within transition. If the reward function is not too expensive (and it usually isn't compared to other things involved in (PO)MDP solving), you could just do this:

function transition(m, s, a)
    ... # generate new state here using (s, a)
    done = reward(m, s, a) != 0
    return Deterministic(MyState(..., done))
end

(Also transition should return a distr…

View full answer

zsunberg · 2023-11-17T23:20:15Z

zsunberg
Nov 17, 2023
Maintainer

You're correct! Mutating the state or mdp object will break most solvers. Philosophically, POMDPs.jl assumes that states and problems are immutable. This has significant performance and clarity advantages.

You can still implement your desired behavior with an immutable state - you just have to make sure to decide if the problem is done within transition. If the reward function is not too expensive (and it usually isn't compared to other things involved in (PO)MDP solving), you could just do this:

function transition(m, s, a)
    ... # generate new state here using (s, a)
    done = reward(m, s, a) != 0
    return Deterministic(MyState(..., done))
end

(Also transition should return a distribution over states (see e.g. POMDPTools.Deterministic or POMDPTools.SparseCat) rather than the state itself)

2 replies

zsunberg Nov 17, 2023
Maintainer

If that's not satisfying, there are some other techniques I can explain as well.

kpa28-git Nov 17, 2023
Author

Ah I don't know why I didn't think of just calling the reward in transition! Thanks! I think that will work for my problem. And thanks for the pointer about transition return values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can states be mutable? #527

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Can states be mutable? #527

kpa28-git Nov 17, 2023

Replies: 1 comment · 2 replies

zsunberg Nov 17, 2023 Maintainer

zsunberg Nov 17, 2023 Maintainer

kpa28-git Nov 17, 2023 Author

kpa28-git
Nov 17, 2023

Replies: 1 comment 2 replies

zsunberg
Nov 17, 2023
Maintainer

zsunberg Nov 17, 2023
Maintainer

kpa28-git Nov 17, 2023
Author