Use Flex processing for the OpenAI judge model by JosephMarinier · Pull Request #152 · ServiceNow/eva

JosephMarinier · 2026-06-16T23:47:57Z

Use Flex processing for the OpenAI judge model, which will halve its cost in exchange for slower response times and occasional resource unavailability. This doesn't affect benchmarking an OpenAI model. More details on flex tier here.

which will halve its cost in exchange for slower response times and occasional resource unavailability. This doesn't affect benchmarking an OpenAI model. More details on flex tier [here](https://developers.openai.com/api/docs/guides/flex-processing).

fanny-riols · 2026-06-17T20:16:41Z

    )
    category = "accuracy"
    default_model = "us.anthropic.claude-opus-4-6-v1"
+    default_params = {"max_tokens": 100000}  # Drop the OpenAI-only flex tier inherited from TextJudgeMetric.


are we sure 100k is enough? Did you check the prompt size after 10 min convo?

I did not check that myself, but it has been 100k since the beginning of EVA (inherited from src/eva/metrics/base.py). Is that OK?

ah right, I guess so then.

JosephMarinier requested a review from fanny-riols June 16, 2026 23:47

JosephMarinier self-assigned this Jun 16, 2026

fanny-riols reviewed Jun 17, 2026

View reviewed changes

fanny-riols approved these changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Flex processing for the OpenAI judge model#152

Use Flex processing for the OpenAI judge model#152
JosephMarinier wants to merge 1 commit into
mainfrom
joseph/use-flex-tier-for-openai-judge-model

JosephMarinier commented Jun 16, 2026

Uh oh!

fanny-riols Jun 17, 2026

Uh oh!

JosephMarinier Jun 17, 2026 •

edited

Loading

Uh oh!

fanny-riols Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JosephMarinier commented Jun 16, 2026

Uh oh!

fanny-riols Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

JosephMarinier Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fanny-riols Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JosephMarinier Jun 17, 2026 •

edited

Loading