Accept compressed files as input to `predict` when using a `Predictor`

**Is your feature request related to a problem? Please describe.**
I typically used compressed datasets (e.g. gzipped) to save disk space. This works fine with AllenNLP during training because I can write my dataset reader to load the compressed data. However, the `predict` command opens the file and reads lines for the `Predictor`. This fails when it tries to load data from my compressed files.

https://github.com/allenai/allennlp/blob/39d7e5ae06551fe371d3e16f4d93162e55ec5dcc/allennlp/commands/predict.py#L208-L218

**Describe the solution you'd like**
Either automatically detect the file is compressed or add a flag to `predict` that indicates that the file is compressed. One method that I have used to detect if a file is gzipped is [here](https://stackoverflow.com/questions/3703276/how-to-tell-if-a-file-is-gzip-compressed), although it isn't 100% accurate. I have an implementation [here](https://github.com/danieldeutsch/sacrerouge/blob/master/sacrerouge/io/util.py). Otherwise a flag like `--compression-type` to mark how the file is compressed should be sufficient. Passing the type of compression would allow support for gzip, bz2, or any other method.




	def _get_json_data(self) -> Iterator[JsonDict]:
	if self._input_file == "-":
	for line in sys.stdin:
	if not line.isspace():
	yield self._predictor.load_line(line)
	else:
	input_file = cached_path(self._input_file)
	with open(input_file, "r") as file_input:
	for line in file_input:
	if not line.isspace():
	yield self._predictor.load_line(line)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept compressed files as input to `predict` when using a `Predictor` #5237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Accept compressed files as input to predict when using a Predictor #5237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Accept compressed files as input to `predict` when using a `Predictor` #5237