Query-conditioned Natural Language Inference

This repository contains the dataset and code for the paper "Benchmarking Query-conditioned Natural Language Inference" (Canby et al., 2025).

Natural language inference (NLI). (a) Sentence-level NLI has a label ℓ indicating the semantic relationship between a premise sentence s_p and hypothesis sentence s_h. (b) Document-level NLI conditions ℓ on a premise document d_p and a hypothesis document d_h. (c) Query-conditioned NLI conditions label ℓ_i on premise document d_p, hypothesis document d_h, and a query q_i, which indicates the aspect of the documents the semantic relationship should be based on.

Environment Setup

Prerequisites

Python 3.8+
Required API keys (OpenAI, Google AI)

Installation

Clone this repository:

git clone https://github.com/amazon-science/Query-Conditioned-NLI.git
cd Query-Conditioned-NLI

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required packages:

pip install -r requirements.txt

Set up API keys:

export OPENAI_API_KEY="your-openai-key"
export GOOGLE_API_KEY="your-google-key"

Dataset

The QC-NLI dataset is located in the data/ folder and includes adaptations from four existing datasets:

Dataset	Task	Size	Label Set
SNLI (Bowman et al., 2015)	Image descriptions	4,452	`entailment`, `not_entailment`
RobustQA (Han et al., 2023)	Inconsistent document detection	2,578	`contradiction`, `not_contradiction`
RAGTruth (Niu et al., 2024)	Hallucination detection	829	`entailment`, `not_entailment`
FactScore (Min et al., 2023)	Fact verification	13,796	`entailment`, `not_entailment`

Usage

Running QC-NLI Task

Use src/perform_task.py to evaluate models on QC-NLI data:

python src/perform_task.py \
  --dataset robustqa \
  --prompt-type zero \
  --do-merge True \
  --use-query True \
  --start-num 0 \
  --model gpro

Parameters:

--dataset: Dataset to use
- Options: snli, ragtruth, robustqa, factscore_chatgpt, factscore_instructgpt, factscore_perplexityai
--prompt-type: Prompting strategy
- zero: Zero-shot prompting
- few: Few-shot prompting
- qanli: QA+NLI (question-answering followed by NLI)
--do-merge: Merge neutral and contradiction into not_entailment (set to True for experiments in paper)
--use-query: Include query in inference (True/False)
--start-num: Starting index in dataset (typically 0)
--model: Model to use
- gpt: GPT-4o
- gpt3: GPT-3.5-turbo-0125
- gpt4: GPT-4-0613
- gflash: Gemini 1.5 Flash
- gpro: Gemini 1.5 Pro

Converting Datasets to QC-NLI Format

Use src/perform_generations.py to convert existing datasets into QC-NLI format:

python src/perform_generations.py \
  --dataset snli \
  --partition train \
  --start-num 0 \
  --model gpt

Parameters:

--dataset: Source dataset
- Options: snli, ragtruth, robustqa, factscore
--partition: Data partition to convert (valid partitions depend on dataset)
- SNLI: train, val, test
- RobustQA: all
- RagTruth: train, test
- Factscore: chatgpt, instructgpt, perplexityai
--start-num: Starting index in dataset (typically 0)
--model: Model for generation (same options as above)

Adding New Datasets

To adapt a new dataset to QC-NLI format:

Create a class extending ExampleGenerator in src/generator.py
Implement the required methods:
- read_data(self): Load your dataset
- generate(self, idx): Convert the idxth data example to QC-NLI format

Example structure:

class YourDatasetGenerator(ExampleGenerator):
    def __init__(self, **kwargs):
        self.dname = 'your-dataset-name'
        super().__init__(**kwargs)
    
    def read_data(self):
        # Load your dataset
        pass
    
    def generate(self, idx):
        # Convert to QC-NLI format
        pass

Citation

Coming soon!

License

This library is licensed under the CC-By-4.0 License.

Security

See CONTRIBUTING for more information.

Contact

For questions or issues, please contact marc.canby@gmail.com or open an issue on GitHub.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
imgs		imgs
src		src
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Query-conditioned Natural Language Inference

Table of Contents

Environment Setup

Prerequisites

Installation

Dataset

Usage

Running QC-NLI Task

Converting Datasets to QC-NLI Format

Adding New Datasets

Citation

License

Security

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Query-conditioned Natural Language Inference

Table of Contents

Environment Setup

Prerequisites

Installation

Dataset

Usage

Running QC-NLI Task

Converting Datasets to QC-NLI Format

Adding New Datasets

Citation

License

Security

Contact

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages