InvestorsHub Logo
Followers 28
Posts 7358
Boards Moderated 1
Alias Born 09/13/2010

Re: None

Wednesday, 01/18/2017 5:39:37 PM

Wednesday, January 18, 2017 5:39:37 PM

Post# of 10460
Protocolprogramforpreventingcatastrophesinhigh-dimensionalstate spaces

We provide an informal overview of the protocol program for avoiding catastrophes. We focus on the differences between the high-dimensional case and the ?nite case described in Section 5.1. In the ?nite case, pruned actions are stored in a table. When the human is satis?ed that all catastrophic actions are in the table, the human’s monitoring of the agent can be fully automated by the protocol program. The human may need to be in the loop until the agent has attempted each catastrophic action once – after that the human can “retire”. In the in?nite case, we replace this look-up table with a supervised classi?cation algorithm. All visited state-actions are stored and labeled (“catastrophic” or “not catastrophic”) based on whether the human decides to block them. Once this labeled set is large enough to serve as a training set, the human trains the classi?er and tests performance on held-out instances. If the classi?er passes the test, the human can be replaced by the classi?er. Otherwise the data-gathering process continues until the training set is large enough for the classi?er to pass the test. If the class of catastrophic actions is learnable by the classi?er, this protocol prevents all catastrophes and has minimal side-effects on the agent’s learning. However, there are limitations of the protocol that will be the subject of future work: • The human may need to monitor the agent for a very long time to provide suf?cient training data. One possible remedy is for the human to augment the training set by adding synthetically generated states to it. For example, the human might add noise to genuine states without altering their labels. Alternatively, extra training data could be sampled from an accurate generative model. • Some catastrophic outcomes have a “local” cause that is easy to block. If a car moves very slowly,thenitcanavoidhittinganobstaclebybrakingatthelastsecond. Butifacarhaslots ofmomentum, itcannotbesloweddownquicklyenough. Insuchcasesahumanin-the-loop would have to recognize the danger some time before the actual catastrophe would take place. • Topreventcatastrophesfromevertakingplace,theclassi?erneedstocorrectlyidentifyevery catastrophic action. This requires strong guarantees about the generalization performance of the classi?er. Yet the distribution on state-action instances is non-stationary (violating the usual i.i.d assumption).

https://arxiv.org/pdf/1701.04079v1.pdf

Join the InvestorsHub Community

Register for free to join our community of investors and share your ideas. You will also get access to streaming quotes, interactive charts, trades, portfolio, live options flow and more tools.