I am working on a side project that is modelling a the inverted pendulum problem and solving it with a reinforcement learning algorithm, most notably Q-Learning. I have already engineered a simple MDP solver for a grid world - easy stuff.
However, I am struggling to figure out how to do this after days of scouring research papers. Nothing explains how to build up a framework for representing the problem.
When modelling the problem, can a standard Markov Decision Process be used? Or must it be a POMDP?
What is represented in each state (i.e. what state info is passed to the agent)? The coordinates, velocity, angle of the pendulum etc?
What actions can the agent take? Is it a continuous range of velocities in + or - x direction?
Advice on this is greatly appreciated.
However, I am struggling to figure out how to do this after days of scouring research papers. Nothing explains how to build up a framework for representing the problem.
When modelling the problem, can a standard Markov Decision Process be used? Or must it be a POMDP?
What is represented in each state (i.e. what state info is passed to the agent)? The coordinates, velocity, angle of the pendulum etc?
What actions can the agent take? Is it a continuous range of velocities in + or - x direction?
Advice on this is greatly appreciated.