Checklist
Description
The consistency parser adds white spaces before and after special characters like ".?-," and etc.
For example, for the sentence: "Hi there, I'm LawlAoux." the output for the root is "Hi there , I 'm LawlAoux ."(full tree in details).
Details
{'word': "Hi there , I 'm LawlAoux .",
'nodeType': 'S',
'attributes': ['S'],
'link': 'S',
'children': [{'word': 'Hi there',
'nodeType': 'INTJ',
'attributes': ['INTJ'],
'link': 'INTJ',
'children': [{'word': 'Hi',
'nodeType': 'UH',
'attributes': ['UH'],
'link': 'UH'},
{'word': 'there',
'nodeType': 'ADVP',
'attributes': ['ADVP'],
'link': 'ADVP',
'children': [{'word': 'there',
'nodeType': 'RB',
'attributes': ['RB'],
'link': 'RB'}]}]},
{'word': ',', 'nodeType': ',', 'attributes': [','], 'link': ','},
{'word': 'I',
'nodeType': 'NP',
'attributes': ['NP'],
'link': 'NP',
'children': [{'word': 'I',
'nodeType': 'PRP',
'attributes': ['PRP'],
'link': 'PRP'}]},
{'word': "'m LawlAoux",
'nodeType': 'VP',
'attributes': ['VP'],
'link': 'VP',
'children': [{'word': "'m",
'nodeType': 'VBP',
'attributes': ['VBP'],
'link': 'VBP'},
{'word': 'LawlAoux',
'nodeType': 'NP',
'attributes': ['NP'],
'link': 'NP',
'children': [{'word': 'LawlAoux',
'nodeType': 'NNP',
'attributes': ['NNP'],
'link': 'NNP'}]}]},
{'word': '.', 'nodeType': '.', 'attributes': ['.'], 'link': '.'}]}
As you can see, it adds spaces for the entire sentence for the root of the tree.
Related issues or possible duplicates
Environment
OS: OS X
Python version: 3.9.7
Output of pip freeze:
absl-py==1.1.0
agenda @ git+https://github.com/hyroai/agenda.git@1321e49ec433d62901e1e00dfd546c00d43db544
aiohttp==3.8.1
aioredis==2.0.1
aiosignal==1.2.0
allennlp==2.9.3
allennlp-models==2.9.3
analysis @ git+https://gitlab.com/airbud/analysis.git@d3829b9896c9a8af6a498a859695b3a00fa3556d
anyio==3.6.1
appdirs==1.4.4
appnope==0.1.3
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.2
asgiref==3.5.2
asttokens==2.0.5
astunparse==1.6.3
async-cache==1.1.1
async-generator==1.10
async-lru==1.0.3
async-timeout==4.0.2
asyncio==3.4.3
attrs==21.4.0
azure-common==1.1.28
azure-storage-blob==2.1.0
azure-storage-common==2.1.0
Babel==2.10.2
backcall==0.2.0
base58==2.1.1
beautifulsoup4==4.11.1
bleach==5.0.0
blis==0.7.7
boto3==1.24.8
botocore==1.27.8
breadability==0.1.20
cached-path==1.1.3
cachetools==4.2.4
catalogue==2.0.7
certifi==2022.5.18.1
cffi==1.15.0
cfgv==3.3.1
chardet==4.0.0
charset-normalizer==2.0.12
click==7.1.2
-e git+https://github.com/hyroai/cloud-utils.git@df2c01628b807064cda2a408b897cc822fb8558c#egg=cloud_utils
commonmark==0.9.1
computation-graph==38
conllu==4.4.1
coverage==6.4.1
cryptography==37.0.2
cycler==0.11.0
cymem==2.0.6
dataclass-type-validator==0.1.2
dataclasses==0.6
dataclasses-json==0.5.5
datasets==2.2.2
dateparser==1.1.1
ddsketch==2.0.3
ddtrace==1.1.4
debugpy==1.6.0
decorator==5.1.1
defusedxml==0.7.1
Deprecated==1.2.13
dill==0.3.4
distlib==0.3.4
dnspython==2.2.1
docker-pycreds==0.4.0
docopt==0.6.2
docutils==0.18.1
ecdsa==0.17.0
emoji==1.7.0
en-core-web-lg @ https://github.com/explosion/spacy-models/releases/download/en_core_web_lg-3.2.0/en_core_web_lg-3.2.0-py3-none-any.whl
en-core-web-sm @ https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.2.0/en_core_web_sm-3.2.0-py3-none-any.whl
entrypoints==0.4
et-xmlfile==1.1.0
execnet==1.9.0
executing==0.8.3
fairscale==0.4.6
falcon==3.1.0
fastapi==0.78.0
fastjsonschema==2.15.3
filelock==3.6.0
findimports==2.2.0
flaky==3.7.0
flatbuffers==1.12
fonttools==4.33.3
frozenlist==1.3.0
fsspec==2022.5.0
ftfy==6.1.1
fuzzyset2==0.1.1
gamla==132
gast==0.4.0
Geohash @ https://github.com/uriva/geohash/tarball/master
geotext==0.4.0
gitdb==4.0.9
GitPython==3.1.27
google-api-core==1.31.6
google-api-python-client==2.50.0
google-auth==1.35.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-cloud-core==2.3.1
google-cloud-dlp==1.0.0
google-cloud-kms==1.4.0
google-cloud-storage==2.4.0
google-crc32c==1.3.0
google-pasta==0.2.0
google-resumable-media==2.3.3
googleapis-common-protos==1.56.2
grpc-google-iam-v1==0.12.4
grpcio==1.46.3
gspread==5.4.0
gunicorn==20.1.0
h11==0.13.0
h5py==3.7.0
heapq-max==0.21
html2text==2020.1.16
httpcore==0.13.2
httplib2==0.20.4
httptools==0.2.0
httpx==0.18.1
httpx-auth==0.10.0
huggingface-hub==0.7.0
humanize==4.1.0
identify==2.5.1
idna==3.3
immutables==0.18
importlib-metadata==4.11.4
inflect==5.6.0
iniconfig==1.1.1
ipykernel==6.14.0
ipython==8.4.0
ipython-genutils==0.2.0
ipywidgets==7.7.0
iso4217parse==0.5.1
jedi==0.18.1
Jinja2==3.1.2
jmespath==1.0.0
joblib==1.1.0
json5==0.9.8
jsonnet==0.18.0
jsonschema==4.6.0
jupyter-client==7.3.4
jupyter-core==4.10.0
jupyter-server==1.17.1
jupyterlab==3.4.3
jupyterlab-pygments==0.2.2
jupyterlab-server==2.14.0
jupyterlab-widgets==1.1.0
keras==2.9.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.3
-e git+https://github.com/hyroai/knowledge-graph.git@a3030683577ed4a3828ea72ff71842dab50c1b1c#egg=knowledge_graph
kubernetes==23.6.0
langcodes==3.3.0
libclang==14.0.1
lmdb==1.3.0
lxml==4.9.0
marisa-trie==0.7.7
Markdown==3.3.7
markdownify==0.11.2
MarkupSafe==2.1.1
marshmallow==3.16.0
marshmallow-enum==1.5.1
matplotlib==3.5.2
matplotlib-inline==0.1.3
mistune==0.8.4
mona-sdk==0.0.24
more-itertools==8.13.0
motor==2.5.1
mpu==0.23.1
multidict==6.0.2
multiprocess==0.70.12.2
murmurhash==1.0.7
mypy-extensions==0.4.3
names-dataset==3.1.0
nbclassic==0.3.7
nbclient==0.6.4
nbconvert==6.5.0
nbformat==5.4.0
nest-asyncio==1.5.5
nltk==3.7
-e git+ssh://git@gitlab.com/airbud/nlu-lib.git@e9c3577d3761cc645706d670c85c6727abc4f3cf#egg=nlu
nodeenv==1.6.0
nostril @ https://github.com/casics/nostril/tarball/master
notebook==6.4.12
notebook-shim==0.1.0
number-parser==0.2.1
numpy==1.22.4
oauth2client==4.1.3
oauthlib==3.2.0
openpyxl==3.0.10
opt-einsum==3.3.0
outcome==1.1.0
overrides==6.1.0
packaging==21.3
pandas==1.4.2
pandocfilters==1.5.0
parsec==3.13
parso==0.8.3
pathtools==0.1.2
pathy==0.6.1
pexpect==4.8.0
phonenumbers==8.12.24
pickleshare==0.7.5
Pillow==9.1.1
plac==1.3.5
platformdirs==2.5.2
plotly==5.8.2
pluggy==1.0.0
pre-commit==2.19.0
preshed==3.0.6
presidio-analyzer==2.2.28
prometheus-async==22.2.0
prometheus-client==0.14.1
promise==2.3
prompt-toolkit==3.0.29
protobuf==3.19.4
psutil==5.9.1
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
py-rouge==1.1
pyap==0.3.1
pyarrow==8.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycld2==0.41
pycountry==22.3.5
pycparser==2.21
pydantic==1.7.4
pyee==8.2.2
Pygments==2.12.0
pyhumps==3.7.1
PyJWT==1.7.1
pymongo==3.12.3
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyppeteer==1.0.2
pyrsistent==0.18.1
PySocks==1.7.1
pytest==7.1.2
pytest-asyncio==0.18.3
pytest-forked==1.4.0
pytest-instafail==0.4.2
pytest-mock==3.7.0
pytest-repeat==0.9.1
pytest-sugar==0.9.4
pytest-test-groups==1.0.3
pytest-timeout==2.1.0
pytest-tmnet @ https://testmon.org/static/c28870f08/pytest-tmnet-1.3.2.tar.gz
pytest-xdist==2.5.0
python-dateutil==2.8.2
python-dotenv==0.20.0
python-jose==3.3.0
python-Levenshtein==0.12.2
python-stdnum==1.17
pytimeparse==1.1.8
pytz==2022.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==5.4.1
pyzmq==23.1.0
redis==4.3.3
regex==2022.6.2
requests==2.28.0
requests-file==1.5.1
requests-mock==1.9.3
requests-oauthlib==1.3.1
responses==0.18.0
retrying==1.3.3
rfc3986==1.5.0
rich==12.4.4
rsa==4.8
s3transfer==0.6.0
sacremoses==0.0.53
scikit-learn==1.1.1
scipy==1.8.1
selenium==4.2.0
Send2Trash==1.8.0
sentencepiece==0.1.96
sentry-sdk==1.5.12
setproctitle==1.2.3
shortuuid==1.0.9
six==1.16.0
sklearn==0.0
smart-open==5.2.1
smmap==5.0.0
sniffio==1.2.0
sortedcontainers==2.4.0
soupsieve==2.3.2.post1
spacy==3.2.4
spacy-legacy==3.0.9
spacy-loggers==1.0.2
srsly==2.4.3
stack-data==0.2.0
starlette==0.19.1
stringcase==1.2.0
sumy==0.10.0
tabulate==0.8.9
tenacity==8.0.1
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorboardX==2.5.1
tensorflow==2.9.1
tensorflow-estimator==2.9.0
tensorflow-hub==0.12.0
tensorflow-io-gcs-filesystem==0.26.0
termcolor==1.1.0
terminado==0.15.0
thinc==8.0.17
threadpoolctl==3.1.0
tinycss2==1.1.1
tldextract==3.1.0
tokenizers==0.12.1
toml==0.10.2
tomli==2.0.1
toolz==0.11.2
toposort==1.7
torch==1.11.0
torchvision==0.12.0
tornado==6.1
tqdm==4.64.0
traitlets==5.2.2.post1
transformers==4.18.0
trio==0.21.0
trio-websocket==0.9.2
typeguard==2.13.3
typer==0.4.1
typing-inspect==0.7.1
typing-utils==0.1.0
typing_extensions==4.2.0
tzdata==2022.1
tzlocal==4.2
undetected-chromedriver==3.1.5.post4
Unidecode==1.3.4
uritemplate==4.1.1
urllib3==1.26.9
uvicorn==0.15.0
uvloop==0.16.0
virtualenv==20.14.1
wandb==0.12.18
wasabi==0.9.1
watchgod==0.8.2
wcwidth==0.2.5
webdriver-manager==3.7.0
webencodings==0.5.1
websocket-client==1.3.2
websockets==10.3
Werkzeug==2.1.2
widgetsnbextension==3.6.0
word2number==1.1
wptools @ https://github.com/hyroai/wptools/tarball/master
wrapt==1.14.1
wsproto==1.1.0
xmltodict==0.13.0
xxhash==3.0.0
yappi==1.3.5
yarl==1.7.2
yattag==1.14.0
zipp==3.8.0
Steps to reproduce
Example source:
_PARSER = pretrained.load_predictor("structured-prediction-constituency-parser")
_PARSER("Hi there, I'm LawlAoux.")["hierplane_tree"]["root"]
Checklist
mainbranch of AllenNLP.pip freeze.Description
The consistency parser adds white spaces before and after special characters like ".?-," and etc.
For example, for the sentence: "Hi there, I'm LawlAoux." the output for the root is "Hi there , I 'm LawlAoux ."(full tree in details).
Details
{'word': "Hi there , I 'm LawlAoux .", 'nodeType': 'S', 'attributes': ['S'], 'link': 'S', 'children': [{'word': 'Hi there', 'nodeType': 'INTJ', 'attributes': ['INTJ'], 'link': 'INTJ', 'children': [{'word': 'Hi', 'nodeType': 'UH', 'attributes': ['UH'], 'link': 'UH'}, {'word': 'there', 'nodeType': 'ADVP', 'attributes': ['ADVP'], 'link': 'ADVP', 'children': [{'word': 'there', 'nodeType': 'RB', 'attributes': ['RB'], 'link': 'RB'}]}]}, {'word': ',', 'nodeType': ',', 'attributes': [','], 'link': ','}, {'word': 'I', 'nodeType': 'NP', 'attributes': ['NP'], 'link': 'NP', 'children': [{'word': 'I', 'nodeType': 'PRP', 'attributes': ['PRP'], 'link': 'PRP'}]}, {'word': "'m LawlAoux", 'nodeType': 'VP', 'attributes': ['VP'], 'link': 'VP', 'children': [{'word': "'m", 'nodeType': 'VBP', 'attributes': ['VBP'], 'link': 'VBP'}, {'word': 'LawlAoux', 'nodeType': 'NP', 'attributes': ['NP'], 'link': 'NP', 'children': [{'word': 'LawlAoux', 'nodeType': 'NNP', 'attributes': ['NNP'], 'link': 'NNP'}]}]}, {'word': '.', 'nodeType': '.', 'attributes': ['.'], 'link': '.'}]}Related issues or possible duplicates
Environment
OS: OS X
Python version: 3.9.7
Output of
pip freeze:Steps to reproduce
Example source: