1918f160796525d2b5f241f81d3224a941418935,agents/ppo_agent.py,PPOAgent,update_kl_coefficient,#PPOAgent#,205

Before Change


            self.tp.sess.run(self.increase_kl_coefficient, feed_dict={self.kl_coefficient: kl_coefficient})
        elif self.total_kl_divergence_during_training_process < 0.7 * kl_target:
            // kl too low => decrease regularization
            self.tp.sess.run(self.decrease_kl_coefficient, feed_dict={self.kl_coefficient: kl_coefficient})
        screen.log_title("KL penalty coefficient change = {} -> {}".format(
            kl_coefficient, self.tp.sess.run(self.policy_network.online_network.output_heads[0].kl_coefficient)))

    def post_training_commands(self):

After Change


                new_kl_coefficient,
                self.policy_network.online_network.output_heads[0].kl_coefficient_ph)

        screen.log_title("KL penalty coefficient change = {} -> {}".format(kl_coefficient, new_kl_coefficient))

    def post_training_commands(self):
        if self.tp.agent.use_kl_regularization:

In pattern: SUPERPATTERN

Frequency: 3

Non-data size: 3

Instances

Link

Project Name: NervanaSystems/coach

Commit Name: 1918f160796525d2b5f241f81d3224a941418935

Time: 2017-10-31

Author: itai.caspi@intel.com

File Name: agents/ppo_agent.py

Class Name: PPOAgent

Method Name: update_kl_coefficient

Link

Project Name: NervanaSystems/coach

Commit Name: 5fadb9c18e3de16cc5633175199f9e9e2c381102

Time: 2018-11-07

Author: sina.beh@gmail.com

File Name: rl_coach/graph_managers/graph_manager.py

Class Name: GraphManager

Method Name: restore_checkpoint

Link

Project Name: ray-project/ray

Commit Name: 57544b1ff9f97d4da9f64d25c8ea5a3d8d247ffc

Time: 2020-05-11

Author: sven@anyscale.io

File Name: rllib/examples/rock_paper_scissors_multiagent.py

Class Name:

Method Name: run_heuristic_vs_learned