a6652b0c1997bb47dd502bf674e0b3b9b2d09d23,examples/reinforcement_learning/tutorial_cartpole_ac.py,Actor,learn,#Actor#Any#Any#Any#,119

Before Change


    def learn(self, s, a, td):
            // _, exp_v = self.sess.run([self.train_op, self.exp_v], {self.s: [s], self.a: [a], self.td_error: td[0]})
        with tf.GradientTape() as tape:
            _logits = self.model([s]).outputs
            // _probs = tf.nn.softmax(_logits)
            _exp_v = tl.rein.cross_entropy_reward_loss(logits=_logits, actions=[a], rewards=td[0])
        grad = tape.gradient(_exp_v, self.model.trainable_weights)
        self.optimizer.apply_gradients(zip(grad, self.model.trainable_weights))

After Change



    def learn(self, s, a, td):
        with tf.GradientTape() as tape:
            _logits = self.model(np.array([s]))
            //// cross-entropy loss weighted by td-error (advantage), 
            // the cross-entropy mearsures the difference of two probability distributions: the predicted logits and sampled action distribution,
            // then weighted by the td-error: small difference of real and predict actions for large td-error (advantage); and vice versa. 
            _exp_v = tl.rein.cross_entropy_reward_loss(logits=_logits, actions=[a], rewards=td[0])

In pattern: SUPERPATTERN

Frequency: 4

Non-data size: 4

Instances

Link

Project Name: tensorlayer/tensorlayer

Commit Name: a6652b0c1997bb47dd502bf674e0b3b9b2d09d23

Time: 2019-05-16

Author: 1402434478@qq.com

File Name: examples/reinforcement_learning/tutorial_cartpole_ac.py

Class Name: Actor

Method Name: learn

Link

Project Name: tensorlayer/tensorlayer

Commit Name: a6652b0c1997bb47dd502bf674e0b3b9b2d09d23

Time: 2019-05-16

Author: 1402434478@qq.com

File Name: examples/reinforcement_learning/tutorial_cartpole_ac.py

Class Name: Critic

Method Name: learn

Link

Project Name: tensorlayer/tensorlayer

Commit Name: a6652b0c1997bb47dd502bf674e0b3b9b2d09d23

Time: 2019-05-16

Author: 1402434478@qq.com

File Name: examples/reinforcement_learning/tutorial_cartpole_ac.py

Class Name: Actor

Method Name: choose_action

Link

Project Name: tensorlayer/tensorlayer

Commit Name: a6652b0c1997bb47dd502bf674e0b3b9b2d09d23

Time: 2019-05-16

Author: 1402434478@qq.com

File Name: examples/reinforcement_learning/tutorial_cartpole_ac.py

Class Name: Actor

Method Name: choose_action_greedy