b15037e32ded46cc254d3bf3bcffc6ce8896a3ae,tests/algorithms/test_policy_gradient.py,,test_REINFORCE,#,49
Before Change
def test_REINFORCE():
agent = REINFORCE(policy, mdp.info, **algorithm_params)
agent.fit(dataset)
w = np.array([-.07071067, 2.07071068])
assert np.allclose(w, agent.policy.get_weights())
After Change
def test_REINFORCE():
params = dict(learning_rate=AdaptiveParameter(value=.01))
policy = learn(REINFORCE, params)
w = np.array([-0.0084793 , 2.00536528])
assert np.allclose(w, policy.get_weights())
In pattern: SUPERPATTERN
Frequency: 3
Non-data size: 8
Instances
Project Name: AIRLab-POLIMI/mushroom
Commit Name: b15037e32ded46cc254d3bf3bcffc6ce8896a3ae
Time: 2019-11-03
Author: carlo.deramo@gmail.com
File Name: tests/algorithms/test_policy_gradient.py
Class Name:
Method Name: test_REINFORCE
Project Name: AIRLab-POLIMI/mushroom
Commit Name: b15037e32ded46cc254d3bf3bcffc6ce8896a3ae
Time: 2019-11-03
Author: carlo.deramo@gmail.com
File Name: tests/algorithms/test_policy_gradient.py
Class Name:
Method Name: test_eNAC
Project Name: AIRLab-POLIMI/mushroom
Commit Name: b15037e32ded46cc254d3bf3bcffc6ce8896a3ae
Time: 2019-11-03
Author: carlo.deramo@gmail.com
File Name: tests/algorithms/test_policy_gradient.py
Class Name:
Method Name: test_GPOMDP