When running from promptsource.seqio_tasks import tasks it takes a huge amount of time. One of the main reasons is this queries all dataset infos:
|
dataset_splits = utils.get_dataset_splits(dataset_name, subset_name) |
This is problematic for two reasons:
IMO both are unnecessary and should be fixed. Is there a reasons why one cannot load seqio tasks dynamically, in the sense of fetching only what is necessary? Something along the lines of:
def add_seqio_task(task_name):
seqio.TaskRegistry.add(...)
When running
from promptsource.seqio_tasks import tasksit takes a huge amount of time. One of the main reasons is this queries all dataset infos:promptsource/promptsource/seqio_tasks/tasks.py
Line 84 in dba1d41
HF_DATASETS_OFFLINE=1as described in Transferpromptsource.seqio_tasksto https://github.com/bigscience-workshop/t-zero #703 (comment)IMO both are unnecessary and should be fixed. Is there a reasons why one cannot load seqio tasks dynamically, in the sense of fetching only what is necessary? Something along the lines of: