"Reducing state space exploration in reinforcement learning problems by rapid identification of initial solutions and progressive improvement of them" Abstract: Reinforcement learning is mainly used when an agent needs to explore a new environment and gradually find good solutions by trial-and-error. It is also typical that the "goodness" of an action taken by the agent is not known immediately, leading to the need to treat "delayed reward". It is often argued that reinforcement learning is the kind of learning that resembles learning by humans and animals the most. Maze route finding is a typical reinforcement learning problem, where the success of an action might not be known before the goal has been reached. Classical reinforcement learning is based on propagating this success backwards to all the action decisions having participated in finding the route, which leads to slow learning and necessitates numerous tries, often using Monte-Carlo simulation. The neural network that is presented uses a new network structure based on "short term memory" for treating the current problem to be solved and "long- term memory" for storing old solutions to the problem. The reinforcement learning used is of one-shot type and does not require defining real-valued reinforcement values, so despite its simplicity, it should be a good candidate for many practical problems. Remark: The presentation will include an on-line computer simulation. The emphasis of the presentation will be on the big principles, not on the theoretical (mathematical or neurological) proof.