Add tests by ethan-tonic · Pull Request #26 · TonicAI/textual

ethan-tonic · 2025-03-19T16:30:47Z

No description provided.

…d of 'textual_sdk_tests'

gandersteele

Looks good, but need to have coverage of synthesis mode. See above comments. Once env vars are added to secrets we'll merge

gandersteele · 2025-03-22T17:28:05Z

+def check_dataset_str(original_text: str, dataset_str: str):
+    # Extract all redacted portions using regex pattern for [ENTITY_TYPE_*]
+    redaction_pattern = r"\[([A-Z_]+)(?:_[a-zA-Z0-9]+)?\]"
+    redactions = re.findall(redaction_pattern, dataset_str)
+
+    # Replace all redactions with empty string to get the non-redacted text
+    non_redacted_text = re.sub(redaction_pattern, "", dataset_str)
+
+    # Check if the non-redacted portions exist in the original text
+    for segment in non_redacted_text.split():
+        if segment.strip():  # Skip empty segments
+            assert segment in original_text, (
+                f"Non-redacted segment '{segment}' not found in original text"
+            )
+
+    # Ensure we found at least one redaction
+    assert len(redactions) > 0, "No redactions found in the dataset string"


this is good, but note that it doesnt apply in synthesis mode. i'd suggest a similar method that
1.asserts len(spans) > 0
2. asserts that original_text[span['start']:span['end']] == span['text']
3. asserts that dataset_str[span['new_start']:span['new_end']] == span['new_text']
this is a slightly different test than yours, so can be done in addition to, but the main point is that this exercises the synthesis mode as well. we could add additioanl checks that in synthesis mode, replacement text doesnt contain the standard redaction pattern

This check is only used for tests that don't test with synthesis anyhow. The other tests have their own checks for this stuff.

…' format

…n for improved flexibility in dataset validation

…ion before upload

gandersteele

LGTM

ethan-tonic added 5 commits March 19, 2025 11:59

Initial add

bef4bcd

Remove reqs

0b831bc

Fix up workflows

8b3c777

Add environment variables for test execution in GitHub Actions

98b66f3

Update resource path validation to check for 'tests' directory instea…

01ebe5b

…d of 'textual_sdk_tests'

gandersteele requested changes Mar 22, 2025

View reviewed changes

ethan-tonic added 11 commits April 4, 2025 17:37

Fixed

4b31ec1

Update file type check in verify_tables_by_file_type to include 'xlsx…

fd99ba2

…' format

Fixes

65d48b0

Merge branch 'main' into add-tests

6685ca5

Ruff

c52c2f1

Add S3 bucket configurations to pytest workflow

e09610a

Fix tests

17408ee

Fix matching

14bd066

Refactor poll_until_file_rescans to accept a content-checking functio…

e051231

…n for improved flexibility in dataset validation

Up retries

8a4e03b

Fixed

25199d0

ethan-tonic requested a review from gandersteele April 5, 2025 03:30

ethan-tonic added 4 commits April 4, 2025 23:38

Update S3 pipeline test to use new sample files and handle file delet…

b0cb87f

…ion before upload

Remove upload

430303b

Ruff

89fd65b

Update files

e323e1f

gandersteele approved these changes Apr 7, 2025

View reviewed changes

ethan-tonic merged commit 4cf99cc into main Apr 7, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tests#26

Add tests#26
ethan-tonic merged 20 commits intomainfrom
add-tests

ethan-tonic commented Mar 19, 2025

Uh oh!

gandersteele left a comment

Uh oh!

Uh oh!

gandersteele Mar 22, 2025

Uh oh!

ethan-tonic Apr 4, 2025

Uh oh!

gandersteele left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ethan-tonic commented Mar 19, 2025

Uh oh!

gandersteele left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gandersteele Mar 22, 2025

Choose a reason for hiding this comment

Uh oh!

ethan-tonic Apr 4, 2025

Choose a reason for hiding this comment

Uh oh!

gandersteele left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants