Step | Prompt | Pass Criterion |
---|---|---|
R-1 | “Hi! I need round-trip flights from MIA to NYC.” | Clarifying question (dates) appears first. |
R-2 | “Where does the return leg depart from?” | Returns specific airport (e.g., “Newark Liberty International – EWR”). |
Model Performance Scores
Model | R-1 | R-2 | Total | Notes |
---|---|---|---|---|
Apollo-1 | 1 | 1 | 2/2 | Asked for dates; answered “EWR”. |
Gemini-2.5 | 0 | 0 | 0/2 | Listed flights first; replied “one of the NYC airports…”. |
Step | Prompt | Pass Criterion |
---|---|---|
R-1 | “Hi! I need round-trip flights from BOS to WAS.” | Clarifying question (dates) appears first. |
R-2 | “Where does the return leg depart from?” | Returns specific airport (e.g., “Newark Liberty International – EWR”). |
Model Performance Scores
Model | R-1 | R-2 | Total | Notes |
---|---|---|---|---|
Apollo-1 | 1 | 1 | 2/2 | Prompted for dates; answered “BWI”. |
Gemini-2.5 | 0 | 0 | 0/2 | Same pattern as Run 1. |
Step Details
Step | Prompt | Pass Criterion |
---|---|---|
R-1 | “Hi! I need round-trip flights from LON to PAR.” | Clarifying question (dates) appears first. |
R-2 | “Where does the return leg depart from?” | Returns specific airport (e.g., “Newark Liberty International – EWR”). |
Model Performance Scores
Model | R-1 | R-2 | Total | Notes |
---|---|---|---|---|
Apollo-1 | 1 | 1 | 2/2 | Asked for dates; answered “ORY”. |
Gemini-2.5 | 0 | 0 | 0/2 | Listed flights first; generic Paris-airport list. |
Step | Prompt | Pass Criterion |
---|---|---|
R-1 | “Hi! I need to find round-trip flights from LON to PAR in August.” | Clarifying question for exact dates appears first. |
R-2 | “What is the duration of each leg?” | Durations match Google Flights results (± 2 min tolerance). |
R-3 | “What is the baggage allowance for each leg?” | Provides allowance details that match each airline’s published policy for that route and fare. |
Model Performance Scores
Model | R-1 | R-2 | R-3 | Total | Notes |
---|---|---|---|---|---|
Apollo-1 | 1 | 1 | 1 | 3/3 | Prompted for dates; durations 1 h 25 m / 1 h 10 m; per-booking baggage details. |
Gemini-2.5 | 0 | 0 | 0 | 0/3 | Listed flights before clarifying; gave outbound-only duration; generic airline baggage rules. |