Help Wanted¶

Contributing to Optics Framework

We welcome contributions from the community! This document outlines areas where the Optics Framework could benefit from your help. Whether you're a beginner or an experienced developer, there's something for everyone.

This document identifies specific areas for improvement across the Optics Framework. Each section includes the current state, goals, implementation details, and priority levels to help you choose where to contribute.

Priority and Difficulty Levels¶

Each item is tagged with:

Priority Levels:

High Priority: Critical for production use or security
Medium Priority: Important for usability and completeness
Low Priority: Nice to have improvements

Difficulty Levels:

Beginner: Good first contribution, minimal framework knowledge needed
Intermediate: Requires some framework knowledge
Advanced: Requires deep framework understanding

1. Stateless API Layer¶

Priority: High | Difficulty: Advanced

Current State¶

The API layer maintains state in SessionManager which stores sessions in memory. Sessions cannot be migrated between instances, making it difficult to scale horizontally or recover from instance failures.

Goal¶

Make the API layer stateless so sessions can be migrated/moved from one instance to another without losing context. This enables:

Horizontal scaling of API instances
Session recovery after instance failures
Load balancing across multiple instances
Zero-downtime deployments

Key Areas¶

Session Storage¶

Current: Sessions stored in-memory only (optics_framework/common/session_manager.py)

Needed:

External session storage (database, Redis, etc.)
Session serialization/deserialization
Session state export/import functionality

Session Context¶

Current: Session contains driver instances, element sources, vision models that cannot be serialized

Needed:

Serialize all session state (configuration, driver state, element sources, vision models)
Reconstruct session from serialized state
Handle driver reconnection on migration
Store driver session IDs and connection details

API Endpoints¶

New Endpoints Needed:

POST /v1/sessions/{id}/export - Export session state as JSON
POST /v1/sessions/import - Import session state and recreate session
POST /v1/sessions/{id}/migrate - Migrate session to another instance

Files to Modify¶

optics_framework/common/session_manager.py - Add serialization support
optics_framework/common/expose_api.py - Add migration endpoints
optics_framework/common/models.py - Add session state models (Pydantic)
optics_framework/common/optics_builder.py - Support session reconstruction

2. Parallel Strategy Execution¶

Priority: High | Difficulty: Advanced

Current State¶

Strategies are currently executed sequentially (one after another) in a fallback chain. When locating an element, the framework tries Strategy 1, waits for it to complete (success or failure), then tries Strategy 2, and so on. This sequential approach, while reliable, is slower than necessary because:

Independent strategies wait unnecessarily: Many strategies can run simultaneously since they don't depend on each other
Total execution time is sum of all strategies: If Strategy 1 takes 2s and Strategy 2 takes 3s, total time is 5s even if Strategy 2 could succeed immediately
Resource underutilization: CPU, I/O, and network resources are idle while waiting for sequential execution

Example of Current Sequential Flow:

# Current: Sequential execution
1. Try XPathStrategy.find_element() → Wait 2s → Fail
2. Try TextElementStrategy.find_element() → Wait 1s → Fail
3. Try TextDetectionStrategy.find_element() → Wait 3s → Success
# Total time: 6 seconds

Goal¶

Implement parallel strategy execution where multiple independent strategies run simultaneously, significantly reducing element location time while maintaining the same reliability and fallback behavior.

Benefits:

Faster element location: Strategies execute concurrently, returning the first successful result
Better resource utilization: CPU, I/O, and network resources used efficiently
Improved user experience: Tests execute faster, especially with multiple fallback strategies
Maintains reliability: Still supports fallback, but optimizes for the common case

Example of Desired Parallel Flow:

# Desired: Parallel execution
1. Start XPathStrategy.find_element() → (runs in background)
2. Start TextElementStrategy.find_element() → (runs in background)
3. Start TextDetectionStrategy.find_element() → (runs in background)
4. First successful result returns → Total time: ~3 seconds (longest strategy)

Execution Semantics¶

Critical Requirement: Only one strategy should execute the final keyword/action. Once a strategy successfully performs the keyword, all remaining strategies must be immediately aborted.

Execution Rules:

Parallel Location Phase: Multiple strategies run in parallel to locate the element
All strategies attempt to find the element simultaneously
First successful location result is selected
Remaining location attempts are cancelled/aborted
Single Action Execution: Only the winning strategy executes the keyword/action
The strategy that successfully located the element proceeds to execute the action
Other strategies are aborted before they can execute any actions
This prevents duplicate actions (e.g., clicking the same button twice)
Fallback Behavior: If the selected strategy fails to execute the keyword
Abort the failed strategy
Move to the next successful location result (if available)
If no other strategies succeeded in location, try the next strategy in sequence

Example Flow:

# Parallel location phase
strategy1_result = await XPathStrategy.locate(element)      # Success: found element
strategy2_result = await TextStrategy.locate(element)      # Success: found element
strategy3_result = await OCRStrategy.locate(element)       # Still running...

# Strategy 1 wins (first success)
if strategy1_result:
    # Abort strategy 2 and 3 immediately
    abort_strategy(strategy2)
    abort_strategy(strategy3)

    # Only strategy 1 executes the keyword
    try:
        await strategy1.execute_keyword("press_element", strategy1_result)
        return success  # Done! No other strategies execute
    except Exception:
        # Strategy 1 failed to execute, try strategy 2
        abort_strategy(strategy1)
        await strategy2.execute_keyword("press_element", strategy2_result)
        return success

Implementation Requirements:

Cancellation Tokens: Use asyncio.CancelledError or cancellation tokens to abort strategies
Early Exit: Return immediately when first strategy succeeds
Resource Cleanup: Properly clean up aborted strategies (close connections, release locks)
Error Handling: Handle cancellation gracefully without side effects

Key Scenarios¶

Scenario 1: XPath with Parallel Coordinate Discovery¶

Current Behavior:

When given an XPath, the framework: 1. Tries find_element() using the XPath directly 2. If that fails, tries other strategies sequentially

Desired Parallel Behavior:

When given an XPath, simultaneously:

Primary path: Execute find_element() using the XPath directly
Parallel path: Search page source to find the XPath and extract coordinates
Fallback path: If XPath is not accessible, use coordinates from page source

Implementation Flow:

# Parallel execution
async def locate_with_xpath(element: str):
    tasks = [
        # Task 1: Direct findElement
        find_element_direct(element),

        # Task 2: Parse page source for XPath and get coordinates
        parse_page_source_for_coordinates(element),

        # Task 3: Alternative XPath variations
        try_xpath_variations(element)
    ]

    # Return first successful result
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return first_successful_result(results)

Scenario 2: Text with Parallel XPath Discovery and OCR¶

Current Behavior:

When given text, the framework: 1. Tries text-based element location 2. If that fails, tries OCR sequentially 3. If that fails, tries other strategies

Desired Parallel Behavior:

When given text, simultaneously:

Path A: Search page source to find text and convert to XPath, then execute find_element()
Path B: Capture screenshot and use OCR/text detection to find coordinates
Path C: Try direct text matching in element source

Implementation Flow:

# Parallel execution for text
async def locate_with_text(text: str):
    tasks = [
        # Task 1: Page source → XPath → findElement
        page_source_to_xpath_to_find_element(text),

        # Task 2: Screenshot → OCR → Coordinates
        screenshot_ocr_to_coordinates(text),

        # Task 3: Direct text element location
        direct_text_element_location(text)
    ]

    # Return first successful result
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return first_successful_result(results)

Scenario 3: Image Template with Parallel Detection Methods¶

Current Behavior:

When given an image template, tries image detection strategies sequentially.

Desired Parallel Behavior:

Simultaneously: 1. Template matching (OpenCV) 2. Remote OIR service 3. Alternative image detection models

Implementation Approach¶

Architecture Changes¶

1. Strategy Execution Model

Convert from sequential to parallel execution with abort capability:

# Current: Sequential
class StrategyManager:
    def locate(self, element: str):
        for strategy in self.strategies:
            result = strategy.locate(element)  # Blocks here
            if result:
                return result
        raise ElementNotFoundError()

# Desired: Parallel with abort
class StrategyManager:
    async def locate(self, element: str):
        # Create tasks with cancellation support
        tasks = {
            strategy: asyncio.create_task(strategy.locate_async(element))
            for strategy in self.strategies
        }

        # Wait for first success, then abort others
        for strategy, task in tasks.items():
            try:
                result = await task
                if result:
                    # Abort all remaining tasks
                    for other_strategy, other_task in tasks.items():
                        if other_strategy != strategy and not other_task.done():
                            other_task.cancel()
                    return result
            except asyncio.CancelledError:
                continue

        raise ElementNotFoundError()

2. Strategy Interface Updates

Make strategies async-compatible:

# Current
class LocatorStrategy(ABC):
    def locate(self, element: str, index: int = 0):
        # Synchronous implementation
        pass

# Desired
class LocatorStrategy(ABC):
    async def locate_async(self, element: str, index: int = 0):
        # Asynchronous implementation
        pass

    def locate(self, element: str, index: int = 0):
        # Synchronous wrapper for backward compatibility
        return run_async(self.locate_async(element, index))

3. Parallel Execution Groups with Abort

Group strategies that can run in parallel with abort capability:

class ParallelStrategyGroup:
    """Groups strategies that can execute in parallel"""

    def __init__(self, strategies: List[LocatorStrategy]):
        self.strategies = strategies
        self.priority = min(s.priority for s in strategies)

    async def execute_parallel(self, element: str):
        """Execute all strategies in parallel, return first success, abort others"""
        # Create cancellable tasks
        tasks = {
            strategy: asyncio.create_task(strategy.locate_async(element))
            for strategy in self.strategies
        }

        # Wait for first success
        for strategy, task in tasks.items():
            try:
                result = await task
                if result:
                    # Abort all remaining tasks
                    self._abort_remaining(tasks, strategy)
                    return result
            except asyncio.CancelledError:
                continue

        return None

    def _abort_remaining(self, tasks: dict, winner: LocatorStrategy):
        """Cancel all tasks except the winner"""
        for strategy, task in tasks.items():
            if strategy != winner and not task.done():
                task.cancel()

4. Dependency Management

Identify which strategies can run in parallel vs. which must be sequential:

class StrategyDependencyGraph:
    """Manages strategy dependencies and parallel execution"""

    def can_run_parallel(self, strategy1: LocatorStrategy, strategy2: LocatorStrategy) -> bool:
        """Check if two strategies can run in parallel"""
        # Strategies can run in parallel if:
        # 1. They don't modify shared state
        # 2. They don't depend on each other's results
        # 3. They use different resources (e.g., page source vs screenshot)
        return (
            not strategy1.modifies_shared_state() and
            not strategy2.modifies_shared_state() and
            strategy1.resource_type() != strategy2.resource_type()
        )

Specific Implementation Patterns¶

Pattern 1: XPath with Coordinate Discovery

async def locate_xpath_parallel(element: str):
    """Locate element by XPath with parallel coordinate discovery"""

    async def direct_find():
        """Direct findElement using XPath"""
        return await element_source.locate(element)

    async def page_source_coords():
        """Parse page source to find XPath and get coordinates"""
        page_source = await element_source.get_page_source()
        xpath_node = parse_xpath_from_page_source(page_source, element)
        if xpath_node:
            return get_coordinates_from_node(xpath_node)
        return None

    # Execute both in parallel
    results = await asyncio.gather(
        direct_find(),
        page_source_coords(),
        return_exceptions=True
    )

    # Prefer direct find, fallback to coordinates
    if results[0] and not isinstance(results[0], Exception):
        return results[0]
    if results[1] and not isinstance(results[1], Exception):
        return results[1]
    raise ElementNotFoundError()

Pattern 2: Text with Multi-Path Discovery

async def locate_text_parallel(text: str):
    """Locate element by text with parallel multi-path discovery"""

    async def page_source_path():
        """Page source → Find text → Convert to XPath → findElement"""
        page_source = await element_source.get_page_source()
        xpath = find_text_in_page_source(page_source, text)
        if xpath:
            return await element_source.locate(xpath)
        return None

    async def screenshot_ocr_path():
        """Screenshot → OCR → Find coordinates"""
        screenshot = await element_source.capture()
        text_locations = await text_detection.find_element(screenshot, text)
        if text_locations:
            return text_locations[0]  # Return first match
        return None

    async def direct_text_path():
        """Direct text element location"""
        return await element_source.locate_by_text(text)

    # Execute all three paths in parallel
    results = await asyncio.gather(
        page_source_path(),
        screenshot_ocr_path(),
        direct_text_path(),
        return_exceptions=True
    )

    # Return first successful result
    for result in results:
        if result and not isinstance(result, Exception):
            return result
    raise ElementNotFoundError()

Pattern 3: Resource-Aware Parallel Execution

class ResourceAwareStrategyExecutor:
    """Executes strategies in parallel while managing resources"""

    def __init__(self):
        self.page_source_lock = asyncio.Lock()
        self.screenshot_lock = asyncio.Lock()
        self.driver_lock = asyncio.Lock()

    async def execute_parallel(self, strategies: List[LocatorStrategy], element: str):
        """Execute strategies in parallel with resource management"""

        # Group strategies by resource type
        page_source_strategies = [s for s in strategies if s.uses_page_source()]
        screenshot_strategies = [s for s in strategies if s.uses_screenshot()]
        driver_strategies = [s for s in strategies if s.uses_driver()]

        # Execute groups in parallel, but serialize within groups if needed
        tasks = []

        if page_source_strategies:
            tasks.append(self._execute_with_lock(
                page_source_strategies, element, self.page_source_lock
            ))

        if screenshot_strategies:
            tasks.append(self._execute_with_lock(
                screenshot_strategies, element, self.screenshot_lock
            ))

        if driver_strategies:
            tasks.append(self._execute_with_lock(
                driver_strategies, element, self.driver_lock
            ))

        results = await asyncio.gather(*tasks, return_exceptions=True)
        return first_successful_result(results)

Key Implementation Areas¶

1. Async Strategy Interface¶

Files to Modify:

optics_framework/common/strategies.py - Add async methods to LocatorStrategy
All strategy implementations - Add locate_async() methods

Changes Needed:

Convert locate() methods to async
Maintain backward compatibility with sync wrappers
Update strategy base class

2. Parallel Execution Engine¶

Files to Create/Modify:

optics_framework/common/parallel_strategy_executor.py - New parallel executor
optics_framework/common/strategies.py - Update StrategyManager for parallel execution

Features:

Parallel task execution using asyncio.create_task() and cancellation
First-success-wins semantics with immediate abort of remaining strategies
Exception handling and aggregation
Resource management (locks for shared resources)
Cancellation token support for graceful abort
Only winning strategy executes the keyword/action

3. Strategy Dependency Analysis¶

Files to Create:

optics_framework/common/strategy_dependencies.py - Dependency analysis

Features:

Identify which strategies can run in parallel
Group strategies by resource requirements
Manage resource locks (page source, screenshot, driver)

4. Configuration and Control¶

Files to Modify:

optics_framework/common/config_handler.py - Add parallel execution config
optics_framework/common/models.py - Add configuration models

Configuration Options:

strategy_execution:
  mode: "parallel"  # "sequential" or "parallel"
  max_parallel_strategies: 5
  timeout_per_strategy: 10
  resource_locks:
    page_source: true
    screenshot: true
    driver: false

Performance Considerations¶

Expected Improvements¶

XPath location: 40-60% faster (2-3 strategies in parallel)
Text location: 50-70% faster (3-4 strategies in parallel)
Image location: 30-50% faster (2-3 detection methods in parallel)
Overall test execution: 20-40% faster for tests with many element locations

Resource Management¶

CPU: Better utilization with parallel I/O operations
Memory: Slight increase due to parallel task execution
Network: Better utilization for remote services (OCR, OIR)
Driver: Careful locking to prevent driver conflicts

Testing Requirements¶

Unit Tests¶

Test parallel execution with mock strategies
Test first-success-wins behavior
Test abort/cancellation of remaining strategies when one succeeds
Test that only one strategy executes the keyword/action
Test exception handling
Test resource locking
Test graceful cancellation cleanup

Integration Tests¶

Test with real drivers and element sources
Test performance improvements
Test backward compatibility
Test configuration options

Migration Path¶

Phase 1: Add Async Support (Non-Breaking)¶

Add async methods alongside sync methods
Maintain backward compatibility
No behavior changes

Phase 2: Parallel Execution (Opt-In)¶

Add configuration option for parallel execution
Default to sequential for safety
Allow users to opt-in

Phase 3: Parallel by Default¶

Make parallel execution the default
Keep sequential as fallback option
Optimize based on real-world usage

Files to Modify¶

Core Strategy System:

optics_framework/common/strategies.py - Add parallel execution support
optics_framework/common/execution.py - Update execution flow

Strategy Implementations:

optics_framework/common/strategies.py - Update all strategy classes
All element source implementations - Add async support

Configuration:

optics_framework/common/config_handler.py - Add parallel execution config
optics_framework/common/models.py - Add configuration models

New Files:

optics_framework/common/parallel_strategy_executor.py - Parallel executor
optics_framework/common/strategy_dependencies.py - Dependency management

Challenges and Considerations¶

Challenge 1: Resource Conflicts¶

Problem: Multiple strategies accessing the same resource (driver, page source) simultaneously

Solution: - Use asyncio locks for shared resources - Group strategies by resource type - Serialize access to shared resources

Challenge 2: Backward Compatibility¶

Problem: Existing code expects synchronous strategy execution

Solution: - Maintain sync wrappers for async methods - Use run_async() utility for compatibility - Gradual migration path

Challenge 3: Error Handling¶

Problem: Multiple strategies may fail in parallel

Solution: - Aggregate all exceptions - Return most informative error - Log all strategy attempts

Challenge 4: Performance Tuning¶

Problem: Too many parallel tasks may degrade performance

Solution: - Configurable max parallel strategies - Resource-aware task scheduling - Monitor and optimize based on metrics

Challenge 5: Ensuring Single Action Execution¶

Problem: Multiple strategies might try to execute the same keyword/action

Solution: - Implement strict abort mechanism when first strategy succeeds - Use cancellation tokens to immediately stop remaining strategies - Only allow the winning strategy to proceed to action execution - Add validation to prevent duplicate actions - Implement proper cleanup for aborted strategies

3. Flexible Driver-Specific Strategies¶

Priority: High | Difficulty: Intermediate

Current State¶

Strategies are generic and work across all drivers. Driver-specific capabilities (Playwright roles, CSS selectors) are not leveraged, limiting the framework's ability to use the best features of each driver.

Goal¶

Enable custom strategies that are specific to individual drivers, allowing each driver to expose its unique capabilities while maintaining the fallback mechanism.

Key Areas¶

Playwright-Specific Strategies¶

Playwright provides powerful locator methods that are not currently used:

get_by_role() - Locate by ARIA role (button, textbox, etc.)
get_by_test_id() - Locate by test ID attribute
get_by_label() - Locate by associated label
get_by_placeholder() - Locate by placeholder text
get_by_text() - Locate by visible text
get_by_title() - Locate by title attribute
get_by_alt_text() - Locate by alt text (for images)

Example:

# Current: Generic text strategy
element = "Submit Button"

# With Playwright role strategy:
element = "role:button[name='Submit']"

Selenium/Playwright CSS Strategies¶

Currently only XPath is supported. CSS selectors are more performant and readable:

CSS selector support (not just XPath)
CSS pseudo-selectors (:first-child, :nth-of-type, etc.)
CSS attribute selectors ([data-testid="submit"])

Appium-Specific Strategies¶

UI Automator selectors (Android)
iOS predicate strings
Accessibility ID strategies
Class name strategies

Strategy Registration¶

Needed:

Driver-specific strategy factory
Strategy priority per driver
Dynamic strategy discovery mechanism
Strategy registration API for drivers

Files to Modify¶

optics_framework/common/strategies.py - Add driver-specific strategy support
optics_framework/engines/drivers/playwright.py - Add Playwright-specific strategies
optics_framework/engines/drivers/selenium.py - Add CSS selector strategies
optics_framework/engines/drivers/appium.py - Add Appium-specific strategies

4. Additional Image Detection Drivers¶

Priority: Medium | Difficulty: Intermediate to Advanced

Current State¶

Only two image detection models are available: - TemplateMatch (OpenCV-based) - Local template matching - RemoteOIR (Remote service) - Remote object image recognition

Goal¶

Add more image detection/vision models to provide users with more options, better accuracy, and different use cases.

Potential Additions¶

YOLO Integration¶

Object detection using YOLO (You Only Look Once) models: - Real-time object detection - Pre-trained models for UI elements - Custom model support

File to Create: optics_framework/engines/vision_models/image_models/yolo.py

Cloud Vision Services¶

AWS Rekognition: Amazon's image recognition service
Azure Computer Vision: Microsoft's vision API
Google Cloud Vision: Google's image analysis API (for object detection, not just OCR)

Files to Create: - optics_framework/engines/vision_models/image_models/aws_rekognition.py - optics_framework/engines/vision_models/image_models/azure_vision.py - optics_framework/engines/vision_models/image_models/google_vision_detection.py

Deep Learning Models¶

TensorFlow/PyTorch Models: Support for custom trained models
Pre-trained models for UI element detection
Model loading and inference

File to Create: optics_framework/engines/vision_models/image_models/tensorflow_model.py

Advanced OpenCV Algorithms¶

ORB (Oriented FAST and Rotated BRIEF)
BRISK (Binary Robust Invariant Scalable Keypoints)
AKAZE (Accelerated-KAZE)
Feature matching alternatives to SIFT

Implementation Guidelines¶

All image detection models must implement the ImageInterface:

from optics_framework.common.image_interface import ImageInterface

class YourImageDetection(ImageInterface):
    def find_element(self, frame, element, index=0):
        # Implementation
        pass

    def element_exist(self, frame, element):
        # Implementation
        pass

    def assert_elements(self, frame, elements, rule="all"):
        # Implementation
        pass

Files to Create¶

optics_framework/engines/vision_models/image_models/yolo.py
optics_framework/engines/vision_models/image_models/aws_rekognition.py
optics_framework/engines/vision_models/image_models/azure_vision.py
optics_framework/engines/vision_models/image_models/tensorflow_model.py

5. Interactive Test Creation¶

Priority: Medium | Difficulty: Advanced

Current State¶

Tests are created via CSV/YAML files or programmatically. There's no interactive test authoring tool, making it difficult for non-technical users to create tests.

Goal¶

Support interactive test creation where users can: - Record interactions automatically - Visually select elements on screen - Generate test steps automatically - Edit tests in a visual interface - Preview test execution step-by-step

Key Features¶

Test Recorder¶

Record user interactions and generate test steps: - Capture mouse clicks, keyboard input, scrolls - Automatically detect element locators - Generate test steps in CSV/YAML format - Support for multiple strategies (XPath, text, image)

Visual Element Selector¶

Click on screen to select elements: - Highlight elements on hover - Show element information (XPath, text, attributes) - Suggest multiple locator strategies - Validate element locators

Test Editor¶

Visual editor for test cases: - Drag-and-drop test step reordering - Edit step parameters - Add/remove test steps - Preview test structure

Element Inspector¶

Inspect and validate element locators: - Show all available locators for an element - Test locator validity - Suggest better locators - Show element hierarchy

Potential Implementation¶

Web-based UI: React/Vue.js frontend with FastAPI backend
Browser Extension: Chrome/Firefox extension for recording
Desktop Application: Electron app for test authoring
CLI Integration: Extend existing CLI with interactive mode

Files to Create¶

optics_framework/helper/recorder.py - Test recording functionality
optics_framework/helper/inspector.py - Element inspection tools
optics_framework/api/recorder_api.py - API endpoints for recorder
Web UI components (separate repo or optics_framework/web_ui/)

6. Code Simplification & Unification¶

Priority: Medium | Difficulty: Intermediate

Current State¶

Analysis reveals several areas with duplicate code and similar patterns that could be unified to reduce maintenance burden and improve code quality.

Areas Identified¶

Driver Method Patterns¶

Similar implementations across Appium, Selenium, and Playwright: - press_element(), enter_text(), clear_text(), scroll() have similar patterns - Common error handling - Similar event logging

Files:

optics_framework/engines/drivers/appium.py
optics_framework/engines/drivers/selenium.py
optics_framework/engines/drivers/playwright.py

Solution: Create base driver mixins or helper classes

Element Source Patterns¶

Similar code in find_element, page_source, screenshot classes: - Common initialization patterns - Similar error handling - Common interface implementations

Files: optics_framework/engines/elementsources/*.py

Solution: Create common base class for element source implementations

Vision Model Patterns¶

OCR models share similar structure: - Common text detection patterns - Similar configuration handling - Common error handling

Files: optics_framework/engines/vision_models/ocr_models/*.py

Solution: Enhanced base classes for vision models

Strategy Factory Pattern¶

Current implementation has hard-coded strategy registry: - Strategy registration is static - Limited flexibility for custom strategies

File: optics_framework/common/strategies.py

Solution: Make strategy registration more flexible and dynamic

Session Management¶

Complex builder pattern could be simplified: - Many intermediate steps - Could use simpler factory pattern in some cases

File: optics_framework/common/optics_builder.py

Solution: Simplify builder pattern or add convenience methods

Specific Unification Opportunities¶

Base Driver Mixin: Create BaseWebDriverMixin for common web driver methods
Element Source Base: Common base class for element source implementations
Vision Model Base: Enhanced base classes for vision models
Strategy Registry: Make strategy registration more flexible
Configuration Normalization: Unify configuration handling across components
Error Handling Utilities: Standardize error handling patterns

Files to Create/Modify¶

Create: - optics_framework/common/base_driver_mixin.py - Common driver methods - optics_framework/common/base_element_source.py - Common element source base - optics_framework/common/error_utils.py - Error handling utilities

Modify: - optics_framework/common/strategies.py - Flexible strategy registration - optics_framework/common/optics_builder.py - Simplify builder pattern - All driver implementations to use mixins - All element source implementations to use base class

7. Test Coverage¶

Priority: High | Difficulty: Beginner to Intermediate

Current State¶

Current test structure shows limited coverage. Many core components lack comprehensive tests.

Missing Tests For¶

Most engine implementations (drivers, element sources, vision models)
Strategy manager and location strategies
Event system and event handlers
Session management edge cases
Error handling scenarios
Factory pattern implementations
Screenshot streaming
API layer endpoints
CLI commands

Test Files to Create¶

Unit Tests¶

tests/units/engines/drivers/test_ble.py - BLE driver tests
tests/units/engines/drivers/test_playwright.py - Playwright driver tests
tests/units/engines/drivers/test_appium.py - Appium driver tests (if missing)
tests/units/engines/elementsources/test_playwright_find_element.py
tests/units/engines/elementsources/test_playwright_page_source.py
tests/units/engines/elementsources/test_playwright_screenshot.py
tests/units/common/test_strategies.py - Strategy manager tests
tests/units/common/test_session_manager.py - Session management tests
tests/units/common/test_screenshot_stream.py - Screenshot streaming tests
tests/units/api/test_verifier.py - Verifier API tests
tests/units/common/test_factories.py - Factory pattern tests

Integration Tests¶

tests/integration/test_api_endpoints.py - API endpoint integration tests
tests/integration/test_session_lifecycle.py - Full session lifecycle
tests/integration/test_driver_fallback.py - Driver fallback mechanism

Functional Tests¶

tests/functional/test_end_to_end.py - End-to-end test execution
tests/functional/test_vision_detection.py - Vision detection workflows

Testing Guidelines¶

Use pytest for all tests
Follow existing test patterns in tests/units/
Use fixtures from tests/units/conftest.py
Aim for >80% code coverage
Test both success and failure scenarios
Include edge cases and error conditions

8. Security Improvements¶

Priority: High | Difficulty: Intermediate

Current State¶

From docs/architecture/api_layer.md, several security improvements are needed for production use.

CORS Configuration¶

Current: CORS allows all origins (allow_origins=["*"])

Needed: Configurable CORS with environment-based restrictions

File: optics_framework/common/expose_api.py

Implementation:

# Allow configuration via environment variables
allowed_origins = os.getenv("OPTICS_CORS_ORIGINS", "*").split(",")
app.add_middleware(
    CORSMiddleware,
    allow_origins=allowed_origins if "*" not in allowed_origins else ["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Authentication¶

Current: No authentication currently

Needed: - API key authentication - Bearer token authentication - Session-based authentication

Files to Modify: - optics_framework/common/expose_api.py - Add authentication middleware - optics_framework/common/auth.py - Create authentication module (new file)

Input Validation¶

Current: Using Pydantic for validation

Enhancements Needed: - Rate limiting per IP/session - Request size limits - SQL injection prevention (if adding database support) - XSS prevention for string inputs

9. Feature Development¶

Priority: Medium to High | Difficulty: Intermediate to Advanced

MCP Servicer¶

From Roadmap: Introduce a dedicated service to handle MCP (Model Context Protocol)

Goal: Improve scalability and modularity across the framework

Needed: - MCP protocol implementation - Service architecture design - Integration with existing framework

Omniparser Integration¶

From Roadmap: Seamlessly integrate Omniparser for robust and flexible element extraction and location

Goal: Enable more flexible element parsing capabilities

Needed: - Omniparser library integration - Parser configuration - Strategy integration

Playwright Integration Enhancements¶

From Roadmap: Add support for Playwright (partially implemented)

Needed: - Complete remaining Playwright methods - Add driver-specific strategies (see section 2) - Improve error handling

Audio Support¶

From Roadmap: Extend the framework to support audio inputs and outputs

Goal: Enable testing and verification of voice-based or sound-related interactions

Needed: - Audio input capture - Audio output verification - Speech recognition integration - Audio comparison utilities

Session Persistence¶

Identified in: docs/architecture/components.md

Current: Sessions are in-memory only, lost on process termination

Needed: - Session serialization - Session recovery on restart - Session state snapshots - Persistent storage backend

Note: This overlaps with section 1 (Stateless API Layer) but focuses on persistence rather than migration.

Additional Drivers¶

iOS driver enhancements
Additional browser drivers (Edge, Safari)
IoT device drivers
Desktop application drivers (Windows, macOS, Linux)

10. Performance Optimizations¶

Priority: Medium | Difficulty: Intermediate

Areas Identified¶

Screenshot Streaming¶

Current: SSIM computation overhead could be optimized

Opportunities: - Parallel SSIM computation - Optimize image comparison algorithms - Cache comparison results - Use faster similarity metrics where appropriate

File: optics_framework/common/screenshot_stream.py

Strategy Execution¶

Current: Strategies executed sequentially

Opportunities: - Parallel strategy attempts where safe - Early exit optimization - Strategy result caching

File: optics_framework/common/strategies.py

Factory Caching¶

Current: Instance caching implemented

Enhancements: - Cache invalidation strategies - Memory-aware caching - Cache size limits

File: optics_framework/common/base_factory.py

Event Processing¶

Current: Events processed individually

Opportunities: - Batch events for better throughput - Async event processing - Event queue optimization

Files: - optics_framework/common/events.py - optics_framework/common/eventSDK.py

Image Processing¶

Current: OCR and template matching operations

Opportunities: - Parallel image processing - Image caching - Optimize OpenCV operations - GPU acceleration where available

Files: - optics_framework/engines/vision_models/image_models/templatematch.py - optics_framework/engines/vision_models/ocr_models/*.py

11. Documentation¶

Priority: Medium | Difficulty: Beginner to Intermediate

Documentation Gaps¶

API Examples¶

Current: API reference exists but lacks practical examples

Needed: - More code examples in API reference - Common use case examples - Error handling examples

Files: docs/api_reference.md (Python API) and docs/usage/REST_API_usage.md (REST API)

Troubleshooting Guides¶

Current: Limited troubleshooting information

Needed: - Common issues and solutions - Debugging guides - Performance troubleshooting - Error code reference with solutions

Performance Tuning¶

Needed: Guide for optimizing framework performance

Topics: - Configuration optimization - Strategy selection guidance - Memory management - Parallel execution tips

Migration Guides¶

Needed: When breaking changes occur

Topics: - Version migration guides - Configuration migration - API migration guides

Video Tutorials¶

Needed: Screen recordings for complex workflows

Topics: - Getting started walkthrough - Creating your first test - Advanced features - Troubleshooting common issues

Best Practices¶

Needed: Comprehensive best practices guide

Topics: - Test design patterns - Element locator best practices - Configuration management - CI/CD integration - Error handling strategies

Integration Examples¶

Needed: Examples with CI/CD systems

Examples: - GitHub Actions - Jenkins - GitLab CI - Azure DevOps - CircleCI

12. Infrastructure¶

Priority: Low to Medium | Difficulty: Intermediate

CI/CD Enhancements¶

Test Coverage Reporting¶

Needed:

Automated coverage reports
Coverage badges
Coverage trend tracking
Coverage thresholds

Performance Benchmarking¶

Needed:

Automated performance tests
Performance regression detection
Benchmark comparisons
Performance reports

Security Scanning Automation¶

Needed:

Automated dependency scanning
Code security scanning
Container security scanning
Automated security updates

Tooling¶

Pre-commit Hooks Enhancements¶

Current: Basic hooks in place

Enhancements:

Additional linting rules
Documentation checks
Test coverage checks
Commit message validation improvements

Code Quality Metrics Dashboard¶

Needed:

Code quality metrics
Technical debt tracking
Code complexity metrics
Maintainability index

Dependency Update Automation¶

Needed:

Automated dependency updates
Security update prioritization
Update testing automation
Changelog generation

Getting Started¶

Choose an Area¶

Review the sections above and identify an area that interests you
Check the priority and difficulty levels
Review related documentation
Look at existing code to understand patterns

Before You Start¶

Read the Contributing Guidelines
Read the Developer Guide
Set up your development environment
Familiarize yourself with the codebase

Making Your Contribution¶

Fork the Repository: Create your own fork
Create a Branch: Use a descriptive branch name
Make Changes: Follow coding standards and patterns
Write Tests: Add tests for your changes
Update Documentation: Update relevant docs
Submit PR: Create a pull request with clear description

Need Help?¶

Open an issue on GitHub for questions
Check existing documentation
Review similar implementations in the codebase
Ask in pull request comments

Recognition¶

Contributors will be: - Listed in the project's contributors - Credited in release notes - Acknowledged in documentation (for significant contributions)

Summary¶

This document outlines many opportunities to improve the Optics Framework. Whether you're interested in:

Architecture: Stateless API, flexible strategies
Features: New drivers, interactive tools
Quality: Tests, documentation, code simplification
Performance: Optimizations and enhancements
Security: Authentication, validation, hardening

There's something for everyone. We appreciate your interest in contributing and look forward to your contributions!

Start Small

If you're new to the project, consider starting with documentation improvements or test coverage. These are great ways to learn the codebase while making valuable contributions.

Help Wanted¶

Priority and Difficulty Levels¶

1. Stateless API Layer¶

Current State¶

Goal¶

Key Areas¶

Session Storage¶

Session Context¶

API Endpoints¶

Files to Modify¶

Related Documentation¶

2. Parallel Strategy Execution¶

Current State¶

Goal¶

Execution Semantics¶

Key Scenarios¶

Scenario 1: XPath with Parallel Coordinate Discovery¶

Scenario 2: Text with Parallel XPath Discovery and OCR¶

Scenario 3: Image Template with Parallel Detection Methods¶

Implementation Approach¶

Architecture Changes¶

Specific Implementation Patterns¶

Key Implementation Areas¶

1. Async Strategy Interface¶

2. Parallel Execution Engine¶

3. Strategy Dependency Analysis¶

4. Configuration and Control¶

Performance Considerations¶

Expected Improvements¶

Resource Management¶

Testing Requirements¶

Unit Tests¶

Integration Tests¶

Migration Path¶

Phase 1: Add Async Support (Non-Breaking)¶

Phase 2: Parallel Execution (Opt-In)¶

Phase 3: Parallel by Default¶

Files to Modify¶

Related Documentation¶

Challenges and Considerations¶

Challenge 1: Resource Conflicts¶

Challenge 2: Backward Compatibility¶

Challenge 3: Error Handling¶

Challenge 4: Performance Tuning¶

Challenge 5: Ensuring Single Action Execution¶

3. Flexible Driver-Specific Strategies¶

Current State¶

Goal¶

Key Areas¶

Playwright-Specific Strategies¶

Selenium/Playwright CSS Strategies¶

Appium-Specific Strategies¶

Strategy Registration¶

Files to Modify¶

Related Documentation¶

4. Additional Image Detection Drivers¶

Current State¶

Goal¶

Potential Additions¶

YOLO Integration¶

Cloud Vision Services¶

Deep Learning Models¶

Advanced OpenCV Algorithms¶

Implementation Guidelines¶

Files to Create¶

Related Documentation¶

5. Interactive Test Creation¶

Current State¶

Goal¶

Key Features¶

Test Recorder¶

Visual Element Selector¶

Test Editor¶

Element Inspector¶

Potential Implementation¶

Files to Create¶

Related Documentation¶

6. Code Simplification & Unification¶

Current State¶

Areas Identified¶