---
annotations_creators:
- human-annotated
language_creators:
- machine-generated
languages:
- en
licenses:
- cc-by-nc-4.0
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
source_datasets:
- original
task_categories:
- conditional-text-generation
- sequence-modeling
task_ids:
- conditional-text-generation-other-dialogue-generation
- dialogue-modeling
- language-modeling
---

# Dataset Card for air_dialogue

## Table of Contents
- [Dataset Description](#dataset-description)
  - [Dataset Summary](#dataset-summary)
  - [Supported Tasks](#supported-tasks-and-leaderboards)
  - [Languages](#languages)
- [Dataset Structure](#dataset-structure)
  - [Data Instances](#data-instances)
  - [Data Fields](#data-instances)
  - [Data Splits](#data-instances)
- [Dataset Creation](#dataset-creation)
  - [Curation Rationale](#curation-rationale)
  - [Source Data](#source-data)
  - [Annotations](#annotations)
  - [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
  - [Social Impact of Dataset](#social-impact-of-dataset)
  - [Discussion of Biases](#discussion-of-biases)
  - [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
  - [Dataset Curators](#dataset-curators)
  - [Licensing Information](#licensing-information)
  - [Citation Information](#citation-information)
  - [Contributions](#contributions)

## Dataset Description

- **Homepage:** https://worksheets.codalab.org/worksheets/0xa79833f4b3c24f4188cee7131b120a59
- **Repository:** https://github.com/google/airdialogue
- **Paper:** https://www.aclweb.org/anthology/D18-1419/
- **Leaderboard:** https://worksheets.codalab.org/worksheets/0xa79833f4b3c24f4188cee7131b120a59
- **Point of Contact:** [AirDialogue-Google](mailto:airdialogue@gmail.com) 
[Aakash Gupta](mailto:aakashg80@gmail.com)

### Dataset Summary

AirDialogue, is a large dataset that contains 402,038 goal-oriented conversations. To collect this dataset, we create a contextgenerator which provides travel and flight restrictions. Then the human annotators are asked to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions.

### Supported Tasks and Leaderboards

We use perplexity and BLEU score to evaluate the quality of the language generated by the model. We also compare the dialogue state generated by the model s and the ground truth state s0. Two categories of the metrics are used: exact match scores and scaled scores

The inference competition & leaderboard can be found here:
https://worksheets.codalab.org/worksheets/0xa79833f4b3c24f4188cee7131b120a59



### Languages

The text in the dataset is in English. The BCP 47 code is `en`

## Dataset Structure

### Data Instances

The data is provided in two set of files. The first one has the dialogues (`air_dialogue_data`) and the knowledge-base (`air_dialogue_kb`)


BuilderConfig: `air_dialogue_data`

```
{"action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "intent": {"return_month": "June", "return_day": "14", "max_price": 200, "departure_airport": "DFW", "return_time": "afternoon", "max_connections": 1, "departure_day": "12", "goal": "book", "departure_month": "June", "name": "Emily Edwards", "return_airport": "IAD"}, "timestamps": [1519233239, 1519233244, 1519233249, 1519233252, 1519233333, 1519233374, 1519233392, 1519233416, 1519233443, 1519233448, 1519233464, 1519233513, 1519233525, 1519233540, 1519233626, 1519233628, 1519233638], "dialogue": ["customer: Hello.", "agent: Hello.", "customer: My name is Emily Edwards.", "agent: How may I help you out?", "customer: I need some help in my flight ticket reservation to attend a convocation meeting, can you please help me?", "agent: Sure, I will help you out. May I know your travelling dates please?", "customer: Thank you and my dates are 06/12 and back on 06/14.", "agent: Can I know your airport codes?", "customer: The airport codes are from DFW to IAD.", "agent: Ok, please wait a moment.", "customer: Sure.", "agent: There is a flight with connection 1 and price 200, can I proceed with this flight?", "customer: Yes, do proceed with booking.", "agent: Ok, your ticket has been booked.", "customer: Thank you for your assistance in my flight ticket reservation.", "agent: Thank you for choosing us.", "customer: You are welcome."], "expected_action": {"status": "book", "name": "Emily Edwards", "flight": [1027]}, "correct_sample": true}
```

BuilderConfig: `air_dialogue_kb`

```
{"kb": [{"return_airport": "DTW", "airline": "Spirit", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1000, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1001, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 15, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 500}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1002, "departure_month": "June", "departure_time_num": 0, "class": "business", "return_time_num": 13, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 600}, {"return_airport": "IAD", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1003, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 5, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1004, "departure_month": "June", "departure_time_num": 9, "class": "economy", "return_time_num": 11, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "AA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1005, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 17, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Frontier", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1006, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "IAD", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1007, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 20, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "AA", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1008, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1009, "departure_month": "June", "departure_time_num": 18, "class": "economy", "return_time_num": 6, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Frontier", "departure_day": "13", "departure_airport": "DTW", "flight_number": 1010, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 2, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1011, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 100}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1012, "departure_month": "June", "departure_time_num": 13, "class": "economy", "return_time_num": 22, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1013, "departure_month": "June", "departure_time_num": 16, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1014, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 8, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "Southwest", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1015, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 300}, {"return_airport": "DTW", "airline": "UA", "departure_day": "11", "departure_airport": "DFW", "flight_number": 1016, "departure_month": "June", "departure_time_num": 10, "class": "economy", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 200}, {"return_airport": "DFW", "airline": "AA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1017, "departure_month": "June", "departure_time_num": 14, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 400}, {"return_airport": "DTW", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1018, "departure_month": "June", "departure_time_num": 3, "class": "economy", "return_time_num": 1, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Hawaiian", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1019, "departure_month": "June", "departure_time_num": 7, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "Delta", "departure_day": "12", "departure_airport": "IAD", "flight_number": 1020, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 2, "price": 200}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1021, "departure_month": "June", "departure_time_num": 11, "class": "business", "return_time_num": 8, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 1000}, {"return_airport": "IAD", "airline": "JetBlue", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1022, "departure_month": "June", "departure_time_num": 4, "class": "economy", "return_time_num": 14, "return_month": "June", "return_day": "13", "num_connections": 0, "price": 200}, {"return_airport": "IAD", "airline": "Frontier", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1023, "departure_month": "June", "departure_time_num": 19, "class": "economy", "return_time_num": 23, "return_month": "June", "return_day": "13", "num_connections": 1, "price": 200}, {"return_airport": "DFW", "airline": "UA", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1024, "departure_month": "June", "departure_time_num": 11, "class": "economy", "return_time_num": 19, "return_month": "June", "return_day": "15", "num_connections": 1, "price": 200}, {"return_airport": "DTW", "airline": "Hawaiian", "departure_day": "11", "departure_airport": "IAD", "flight_number": 1025, "departure_month": "June", "departure_time_num": 6, "class": "economy", "return_time_num": 10, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DTW", "airline": "UA", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1026, "departure_month": "June", "departure_time_num": 0, "class": "economy", "return_time_num": 18, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 300}, {"return_airport": "IAD", "airline": "Delta", "departure_day": "12", "departure_airport": "DFW", "flight_number": 1027, "departure_month": "June", "departure_time_num": 17, "class": "economy", "return_time_num": 15, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 200}, {"return_airport": "IAD", "airline": "Southwest", "departure_day": "12", "departure_airport": "DTW", "flight_number": 1028, "departure_month": "June", "departure_time_num": 23, "class": "economy", "return_time_num": 13, "return_month": "June", "return_day": "14", "num_connections": 1, "price": 100}, {"return_airport": "DFW", "airline": "Spirit", "departure_day": "11", "departure_airport": "DTW", "flight_number": 1029, "departure_month": "June", "departure_time_num": 22, "class": "business", "return_time_num": 4, "return_month": "June", "return_day": "14", "num_connections": 0, "price": 800}], "reservation": 0}
```

### Data Fields

BuilderConfig: `air_dialogue_data`:
Provides for customer context, dialogue states and environment

key name | Description |
|---|---|
|'search_action' | search action performed by customer |
|'action' | Action taken by the agent |
|'intent' | Intents from the conversation |
|'timestamps' | Timestamp for each of the dialogues |
|'dialogue' | Dialogue recorded between agent & customer |
|'expected_action' | Expected action from agent (human-annotated)|
|'correct_sample' | whether action performed by agent was same as expected_action |

BuilderConfig: `air_dialogue_kb`:
Provides for the Agent Context _ca_ = (_db_, _r_ )

key name | Description |
|---|---|
|'kb' | Available flights in the database |
|'reservation' | whether customer has an existing reservation|


### Data Splits

Data is split into Train/Dev & Test in the ration of 80%, 10% and 10%

## Dataset Creation

### Curation Rationale

[Needs More Information]

### Source Data

#### Initial Data Collection and Normalization

[Needs More Information]

#### Who are the source language producers?

[Needs More Information]

### Annotations

#### Annotation process

To collect this dataset, we create a contextgenerator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail.

#### Who are the annotators?

[Needs More Information]

### Personal and Sensitive Information

No personal and sensitive information is stored

## Considerations for Using the Data

### Social Impact of Dataset

[Needs More Information]

### Discussion of Biases

[Needs More Information]

### Other Known Limitations

[Needs More Information]

## Additional Information

### Dataset Curators

[AirDialogue team](mailto:airdialogue@gmail.com)

For issues regarding HuggingFace Dataset Hub implementation [Aakash Gupta](mailto:aakashg80@gmail.com)

### Licensing Information

cc-by-nc-4.0

### Citation Information

@inproceedings{wei-etal-2018-airdialogue,
    title = "{A}ir{D}ialogue: An Environment for Goal-Oriented Dialogue Research",
    author = "Wei, Wei  and
      Le, Quoc  and
      Dai, Andrew  and
      Li, Jia",
    booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
    month = oct # "-" # nov,
    year = "2018",
    address = "Brussels, Belgium",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D18-1419",
    doi = "10.18653/v1/D18-1419",
    pages = "3844--3854",
    abstract = "Recent progress in dialogue generation has inspired a number of studies on dialogue systems that are capable of accomplishing tasks through natural language interactions. A promising direction among these studies is the use of reinforcement learning techniques, such as self-play, for training dialogue agents. However, current datasets are limited in size, and the environment for training agents and evaluating progress is relatively unsophisticated. We present AirDialogue, a large dataset that contains 301,427 goal-oriented conversations. To collect this dataset, we create a context-generator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail. Our experimental results indicate that state-of-the-art dialogue models can only achieve a score of 0.17 while humans can reach a score of 0.91, which suggests significant opportunities for future improvement.",
}

### Contributions

Thanks to [@skyprince999](https://github.com/skyprince999) for adding this dataset.