-
RLVR Reward Landscape
zooming into the local reward landscape around a policy
-
Beyond the Lottery Ticket: Multiple Winning Subnetworks in Pretrained LLMs
Preliminary evidence that random parameter selection can match full parameter RL finetuning.
-
Attention sink
More evidence
-
Reinforcement Learning Meets NER
an attempt at solving Named Entity Recognition with RL training.
-
Replicating GraphRAG paper
a replication of 'From Local to Global'