Reinforcement Learning Engine

Train AI agents through reward-based optimization

1. Select Language Model

Drag and drop your JSONL file here or click to browse

Upload training prompts in JSONL format

Multi-step Reinforcement

Enable for complex environments with multiple interactions

Learning Rate

0.0010.010.1

Training Iterations

The reward function evaluates model responses in Python. Must define reward_fn that returns a scalar score.

REWARD FUNCTION

PYTHON

completion: List of message dicts with model's response

**kwargs: Additional fields from JSONL (e.g., expected_result)

Must define def reward_fn(completion, **kwargs)

Return a scalar (higher is better)