Post by @sunsetjesus • Hey

The online RL finetuning results use Cal-QL, a simple modification to CQL that makes it suitable for online finetuning: https://t.co/iZ1TKkaAqi In the PT

Stats

Comments