Replace the reward model with real humans
Real-time and continuous human-in-the-loop ranking

Continuous human feedback for RLHF.

Flows continuously route model outputs to human annotators and aggregate pairwise preferences into live reward signals. Collect up to 6K+ human annotations per minute to refresh or replace reward-model signals with direct human feedback. It is lightweight, low-latency, and ttl-bounded so your training step never blocks.

V1 · Swap

Reward model out, humans in.

policy  →  flow  →  reward  →  policy
conventional rlhf

Reward Model

A neural net trained to approximate what humans would say.

swap
with rapidata flows

The humans themselves

247 humans · live

Hundreds of real evaluators score every batch, in the loop.

conventional rlhf

A trained reward model approximates what a human would say.

with rapidata flows

Hundreds of humans actually say it, every batch.

01why flows

Reward models approximate preference. Flows collect it directly.

Reward models enabled major improvements in RLHF by approximating human preference at scale, because collecting feedback from real humans has historically been too slow for continuous optimization. Flows reduce that latency enough to keep humans directly in the optimization loop.

No jobs to spin up

Persistent preference pipelines

Flows stay active across batches, allowing you to continuously stream generations into the same human feedback pipeline.

ttl-bounded

Time-bounded, not blocking

Flows return whatever human feedback is available within a configurable time window, allowing training and evaluation pipelines to continue without blocking.

any modality

Image · video · audio · text

The same flow API ranks generations across modalities. Same instruction, same win-loss matrix, same reward shape.

02how it works

A continuous preference pipeline in four SDK calls.

01

Create a flow

Define the question shown to evaluators and set your per-item response budget. Min/max thresholds bound the variance of your reward.

flow = client.flow.create_ranking_flow(
    name="Image Quality Ranking",
    instruction="Which image looks better?",
    max_response_threshold=200,
    min_response_threshold=50,
)
02

Add a flow batch

During training, push a batch of rollouts. Optionally tag with context and a time_to_live so the call is non-blocking.

flow_item = flow.create_new_flow_batch(
    datapoints=rollouts,                  # urls, paths, or text
    context="generations from step 12k",
    time_to_live=300,                     # seconds
)
03

Get results

Retrieve pairwise preferences, rankings, and response statistics continuously as feedback arrives. Use get_status() to access partial results on the go.

status  = flow_item.get_status()
results = flow_item.get_results()
matrix  = flow_item.get_win_loss_matrix()
count   = flow_item.get_response_count()
04

Update flow configuration

Rewrite the instruction mid-run. Existing flow items keep their original config; only new items pick up the change.

flow.update_config(
    instruction="Which image has higher visual quality?",
)
03inside a running flow

What the SDK gives back.

Image Quality Ranking
flow_8f3a7c · 5 batches · 600 responses each · ttl 300s each · independent
flow live
batch-014
step 18,200
complete
600/600closed
0s150s300s · ttl
batch-015
step 18,300
complete
600/600closed
0s150s300s · ttl
batch-016
step 18,400
collecting
412/60047s until ttl
0s150s300s · ttl
batch-017
step 18,500
collecting
138/600168s until ttl
0s150s300s · ttl
batch-018
step 18,600
queued
0/600awaiting step 18,600
0s150s300s · ttl
each batch is independent · own ttl · own response budgetany batch that hits ttl returns partial results as incomplete
flow_item.get_win_loss_matrix()response_count: 312
gen-001
gen-002
gen-003
gen-004
gen-001
38
42
21
gen-002
12
28
14
gen-003
8
22
19
gen-004
29
36
31
rows: preferred · cols: compared against · cell: # of pairwise wins
04rest of the surface

Other utilities.

When you know a training run is about to hit a high-cadence phase, you can preheat ahead of time so the first batch returns with the same latency as the hundredth. Other utilities allow you to retrieve flows you previously created, list your recent flows & so on.

Tip
Call client.flow.preheat() about 5 minutes before a latency-sensitive sequence of batches.
flow_api.py
# Warm up internal resources before a hot phase
client.flow.preheat()

# Retrieve a flow you created earlier
flow = client.flow.get_flow_by_id("flow_8f3a7c...")

# List your recent flows
recent = client.flow.find_flows(amount=10)

# All flow items for a flow
all_items = flow.get_flow_items()

# Tear it down
flow.delete()
05use it

Enable Online RLHF loops.

Reward models approximate preference. Flows let you collect fresh human feedback at training cadence, so RLHF systems can optimize directly against the signal they were designed to model.

image
Realism · Coherence · Composition
video
Motion · Consistency· Scene stability
audio
Clarity · Naturalness · Tone
text
Alignment · Reasoning · Style