1918f160796525d2b5f241f81d3224a941418935,agents/ppo_agent.py,PPOAgent,update_kl_coefficient,#PPOAgent#,205

Before Change


            self.tp.sess.run(self.increase_kl_coefficient, feed_dict={self.kl_coefficient: kl_coefficient})
        elif self.total_kl_divergence_during_training_process < 0.7 * kl_target:
            // kl too low => decrease regularization
            self.tp.sess.run(self.decrease_kl_coefficient, feed_dict={self.kl_coefficient: kl_coefficient})
        screen.log_title("KL penalty coefficient change = {} -> {}".format(
            kl_coefficient, self.tp.sess.run(self.policy_network.online_network.output_heads[0].kl_coefficient)))

    def post_training_commands(self):

After Change


                new_kl_coefficient,
                self.policy_network.online_network.output_heads[0].kl_coefficient_ph)

        screen.log_title("KL penalty coefficient change = {} -> {}".format(kl_coefficient, new_kl_coefficient))

    def post_training_commands(self):
        if self.tp.agent.use_kl_regularization:
Italian Trulli
In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 3

Instances


Project Name: NervanaSystems/coach
Commit Name: 1918f160796525d2b5f241f81d3224a941418935
Time: 2017-10-31
Author: itai.caspi@intel.com
File Name: agents/ppo_agent.py
Class Name: PPOAgent
Method Name: update_kl_coefficient


Project Name: NervanaSystems/coach
Commit Name: 5fadb9c18e3de16cc5633175199f9e9e2c381102
Time: 2018-11-07
Author: sina.beh@gmail.com
File Name: rl_coach/graph_managers/graph_manager.py
Class Name: GraphManager
Method Name: restore_checkpoint


Project Name: ray-project/ray
Commit Name: 57544b1ff9f97d4da9f64d25c8ea5a3d8d247ffc
Time: 2020-05-11
Author: sven@anyscale.io
File Name: rllib/examples/rock_paper_scissors_multiagent.py
Class Name:
Method Name: run_heuristic_vs_learned