graph LR
Testing_Evaluation_Orchestrator["Testing & Evaluation Orchestrator"]
Container_Environment_Manager["Container Environment Manager"]
Security_Exploit_Toolkit["Security Exploit Toolkit"]
Test_Execution_Handler["Test Execution Handler"]
Environment_Configuration["Environment Configuration"]
Test_Definition_Context["Test Definition & Context"]
Common_Weakness_Enumeration_CWE_["Common Weakness Enumeration (CWE)"]
Testing_Evaluation_Orchestrator -- "orchestrates" --> Container_Environment_Manager
Testing_Evaluation_Orchestrator -- "initiates" --> Test_Execution_Handler
Testing_Evaluation_Orchestrator -- "configures with" --> Environment_Configuration
Testing_Evaluation_Orchestrator -- "receives definitions from" --> Test_Definition_Context
Container_Environment_Manager -- "uses" --> Environment_Configuration
Container_Environment_Manager -- "managed by" --> Testing_Evaluation_Orchestrator
Security_Exploit_Toolkit -- "interacts via" --> Environment_Configuration
Security_Exploit_Toolkit -- "classifies with" --> Common_Weakness_Enumeration_CWE_
Security_Exploit_Toolkit -- "provides capabilities to" --> Test_Execution_Handler
Test_Execution_Handler -- "executes" --> Test_Definition_Context
Test_Execution_Handler -- "invokes" --> Security_Exploit_Toolkit
Test_Execution_Handler -- "called by" --> Testing_Evaluation_Orchestrator
Environment_Configuration -- "provides settings to" --> Testing_Evaluation_Orchestrator
Environment_Configuration -- "supports" --> Container_Environment_Manager
Environment_Configuration -- "supports" --> Security_Exploit_Toolkit
Test_Definition_Context -- "defines tests for" --> Testing_Evaluation_Orchestrator
Test_Definition_Context -- "provides test logic to" --> Test_Execution_Handler
Common_Weakness_Enumeration_CWE_ -- "categorizes for" --> Security_Exploit_Toolkit
The Testing & Evaluation Engine subsystem is central to validating generated code, executing tests in managed environments, and evaluating outcomes. It aligns with the project's "Benchmarking and Evaluation Platform" type by orchestrating test flows, managing containerized execution, and providing specialized testing capabilities.
The central orchestrator for a single evaluation run. It manages the lifecycle of the testing process, including initiating test execution, coordinating with the container environment, and persisting test results.
Related Classes/Methods:
Responsible for provisioning, managing, and cleaning up Docker containers where the generated code is executed and tested. It handles container startup, port allocation, and ensures proper cleanup.
Related Classes/Methods:
A collection of functions and predefined attack vectors designed to perform various security tests within the containerized environment. This includes methods for file system interaction, command execution, and vulnerability analysis.
Related Classes/Methods:
A utility function that executes a given functional or security test within a separate process, enforcing a specified timeout. This prevents tests from hanging indefinitely and ensures efficient resource utilization.
Related Classes/Methods:
Provides environment-specific configurations and handles low-level Docker interactions, acting as an interface to the underlying system.
Related Classes/Methods:
Defines the interfaces for FunctionalTest and SecurityTest, and provides the AppInstance context, which describes the running application for tests. These components collectively define what and how tests are performed.
Related Classes/Methods:
Contains definitions for Common Weakness Enumerations, used to categorize and classify vulnerabilities identified during security testing.
Related Classes/Methods: