Engine Implementations¶
This document details the engine implementations in the Optics Framework, including drivers, element sources, and vision models.
Drivers¶
Drivers are responsible for executing user actions on target applications or devices. All drivers implement the DriverInterface and are located in optics_framework/engines/drivers/.
Appium Driver¶
Location: optics_framework/engines/drivers/appium.py
The Appium driver provides mobile app automation capabilities for both Android and iOS.
Key Features¶
- Native mobile app automation
- Support for Android (UiAutomator2) and iOS (XCUITest)
- Gesture support (swipe, scroll, tap)
- Hardware key simulation
- App lifecycle management
Configuration¶
driver_sources:
- appium:
enabled: true
url: "http://localhost:4723"
capabilities:
platformName: "Android"
deviceName: "emulator-5554"
appPackage: "com.example.app"
appActivity: ".MainActivity"
Key Methods¶
launch_app()- Launch with package/activitypress_coordinates()- Tap at absolute coordinatespress_element()- Tap UI elementsswipe()/swipe_percentage()- Swipe gesturesscroll()- Scroll actionsenter_text()- Text inputpress_keycode()- Hardware key codesget_app_version()- App version info
Special Features¶
- Mobile-specific keycodes (BACK, HOME, MENU, etc.)
- Mobile type command support
- UI helper integration for element interaction
Selenium Driver¶
Location: optics_framework/engines/drivers/selenium.py
The Selenium driver provides web browser automation capabilities.
Key Features¶
- Cross-browser support (Chrome, Firefox, Edge, Safari)
- Web element interaction
- JavaScript execution
- Window and frame management
Configuration¶
driver_sources:
- selenium:
enabled: true
url: "http://localhost:4444"
capabilities:
browserName: "chrome"
version: "latest"
Key Methods¶
launch_app()- Navigate to URLpress_element()- Click web elementsenter_text()- Form inputexecute_script()- JavaScript executionget_text_element()- Extract text from elements
Playwright Driver¶
Location: optics_framework/engines/drivers/playwright.py
The Playwright driver provides modern web automation with better reliability.
Key Features¶
- Modern browser automation
- Auto-waiting for elements
- Network interception
- Multi-browser support
Configuration¶
BLE Driver¶
Location: optics_framework/engines/drivers/ble.py
The BLE (Bluetooth Low Energy) driver provides non-intrusive automation for production devices.
Key Features¶
- Non-intrusive mouse and keyboard control
- Serial communication via BLE
- Coordinate mapping (pixels to mickeys)
- Production-safe automation
Use Cases¶
- Production app monitoring
- Devices without USB debugging
- DRM-protected applications
- Smart TV automation
Configuration¶
driver_sources:
- ble:
enabled: true
capabilities:
device_id: "BLE_DEVICE_ID"
port: "/dev/ttyUSB0"
pixel_width: 1920
pixel_height: 1080
mickeys_width: 32767
mickeys_height: 32767
x_invert: 1
y_invert: 1
Key Methods¶
press_coordinates()- Mouse click at coordinatespress_percentage_coordinates()- Percentage-based clicksswipe()- Mouse drag gesturesenter_text_using_keyboard()- Keyboard inputpress_keycode()- Special key codes
Coordinate Mapping¶
The BLE driver maps screen coordinates to mouse movements (mickeys): - Pixel coordinates → Mickey coordinates - Supports coordinate inversion - Configurable mapping ratios
Element Sources¶
Element sources are responsible for detecting and locating UI elements. They implement ElementSourceInterface and are located in optics_framework/engines/elementsources/.
Screenshot-Based Sources¶
AppiumScreenshot¶
Location: optics_framework/engines/elementsources/appium_screenshot.py
Captures screenshots from Appium sessions.
- Uses Appium's screenshot capability
- Returns NumPy arrays for image processing
- Supports interactive element detection
SeleniumScreenshot¶
Location: optics_framework/engines/elementsources/selenium_screenshot.py
Captures screenshots from Selenium browser sessions.
- Browser screenshot capture
- Full page or viewport screenshots
- Base64 encoding support
PlaywrightScreenshot¶
Location: optics_framework/engines/elementsources/playwright_screenshot.py
Captures screenshots from Playwright sessions.
- High-quality screenshots
- Multiple screenshot formats
- Element-specific screenshots
CameraScreenshot¶
Location: optics_framework/engines/elementsources/camera_screenshot.py
Captures screenshots from external cameras or capture cards.
- External camera support
- Video capture card support
- Production monitoring use case
Page Source-Based Sources¶
AppiumPageSource¶
Location: optics_framework/engines/elementsources/appium_page_source.py
Extracts and parses Appium page source XML.
- XML-based element location
- XPath support
- Element hierarchy navigation
- Interactive element detection
SeleniumPageSource¶
Location: optics_framework/engines/elementsources/selenium_page_source.py
Extracts and parses Selenium page source HTML.
- HTML DOM parsing
- CSS selector support
- Element attribute extraction
PlaywrightPageSource¶
Location: optics_framework/engines/elementsources/playwright_page_source.py
Extracts and parses Playwright page source.
- Modern DOM API
- Accessibility tree support
- Element state detection
Find Element Sources¶
AppiumFindElement¶
Location: optics_framework/engines/elementsources/appium_find_element.py
Direct element location using Appium's find element methods.
- Multiple locator strategies
- Element interaction
- Element state checking
SeleniumFindElement¶
Location: optics_framework/engines/elementsources/selenium_find_element.py
Direct element location using Selenium's find element methods.
- WebDriver locator strategies
- Element waiting
- Element interaction
PlaywrightFindElement¶
Location: optics_framework/engines/elementsources/playwright_find_element.py
Direct element location using Playwright's locator API.
- Auto-waiting for elements
- Multiple locator types
- Element state queries
Element Source Selection¶
Element sources are automatically matched with compatible drivers:
# Element source declares required driver type
REQUIRED_DRIVER_TYPE = "appium"
# Factory automatically matches with Appium driver
ElementSourceFactory.get_driver(config, driver_fallback)
Vision Models¶
Vision models provide image and text detection capabilities. They are located in optics_framework/engines/vision_models/.
Image Detection Models¶
TemplateMatch¶
Location: optics_framework/engines/vision_models/image_models/templatematch.py
OpenCV-based template matching using SIFT and FLANN.
Key Features¶
- SIFT feature detection
- FLANN-based matching
- Confidence thresholding
- Multiple match support (index-based)
- Template loading from project directory
Configuration¶
image_detection:
- templatematch:
enabled: true
project_path: "/path/to/project"
execution_output_path: "/path/to/output"
Methods¶
find_element()- Locate template in imageelement_exist()- Check if template existsassert_elements()- Verify multiple templates
Algorithm¶
- Load template image from project directory
- Convert input and template to grayscale
- Detect SIFT keypoints and descriptors
- Match descriptors using FLANN
- Filter matches by distance ratio
- Calculate homography for geometric verification
- Return center coordinates of matched template
RemoteOIR¶
Location: optics_framework/engines/vision_models/image_models/remote_oir.py
Remote Object Image Recognition service integration.
Key Features¶
- Remote service-based matching
- API integration
- High-accuracy matching
- Cloud-based processing
Configuration¶
image_detection:
- remote_oir:
enabled: true
url: "https://api.example.com/oir"
api_key: "your-api-key"
OCR/Text Detection Models¶
EasyOCR¶
Location: optics_framework/engines/vision_models/ocr_models/easyocr.py
EasyOCR library integration for text detection.
Key Features¶
- Multi-language support
- High accuracy
- Bounding box detection
- Confidence scores
Configuration¶
Methods¶
detect_text()- Full text detectionfind_element()- Locate specific textelement_exist()- Check text presence
GoogleVision¶
Location: optics_framework/engines/vision_models/ocr_models/googlevision.py
Google Cloud Vision API integration.
Key Features¶
- Cloud-based OCR
- High accuracy
- Document text detection
- Handwriting recognition
Configuration¶
text_detection:
- googlevision:
enabled: true
credentials_path: "/path/to/credentials.json"
project_id: "your-project-id"
PyTesseract¶
Location: optics_framework/engines/vision_models/ocr_models/pytesseract.py
Tesseract OCR engine integration.
Key Features¶
- Open-source OCR
- Multiple language support
- Configurable OCR modes
- Page segmentation modes
Configuration¶
RemoteOCR¶
Location: optics_framework/engines/vision_models/ocr_models/remote_ocr.py
Remote OCR service integration.
Key Features¶
- Remote service-based OCR
- API integration
- Custom OCR engines
- Scalable processing
Configuration¶
text_detection:
- remote_ocr:
enabled: true
url: "https://api.example.com/ocr"
api_key: "your-api-key"
Engine Discovery and Loading¶
Engines are automatically discovered by the factory system:
graph LR
A[Factory] --> B[Scan Package]
B --> C[Discover Modules]
C --> D[Load Classes]
D --> E[Check Interface]
E --> F[Register]
F --> G[Instantiate on Demand] Discovery Process¶
- Factory scans package directory (
engines/drivers/, etc.) - Discovers all Python modules
- Imports modules dynamically
- Inspects classes for interface implementation
- Registers implementations in module registry
- Instantiates on demand based on configuration
Configuration-Based Selection¶
Engines are selected based on configuration:
The factory creates InstanceFallback wrapper that tries each in order.
Extension Guide¶
Adding a New Driver¶
- Create file in
optics_framework/engines/drivers/your_driver.py - Implement
DriverInterface - Set class attributes:
- Factory automatically discovers it
Adding a New Element Source¶
- Create file in
optics_framework/engines/elementsources/your_source.py - Implement
ElementSourceInterface - If driver-dependent, set:
- Factory automatically matches with compatible driver
Adding a New Vision Model¶
- Create file in appropriate subdirectory:
image_models/for image detectionocr_models/for text detection- Implement
ImageInterfaceorTextInterface - Factory automatically discovers it
Best Practices¶
- Error Handling: Always handle driver initialization failures gracefully
- Resource Cleanup: Implement proper cleanup in
terminate()methods - Configuration Validation: Validate configuration in
__init__ - Logging: Use
internal_loggerfor debug information - Event Tracking: Integrate with
EventSDKfor action tracking - Fallback Support: Design for graceful degradation
Performance Considerations¶
- Screenshot Caching: Element sources may cache screenshots
- Connection Pooling: Drivers should reuse connections when possible
- Lazy Loading: Engines are loaded on demand, not at startup
- Parallel Execution: Multiple engines can run in parallel for fallback
Related Documentation¶
- Components - Component architecture and factory system
- Strategies - Strategy pattern and element location
- Execution - Test execution
- Extending - Creating custom engines
- Error Handling - Error codes and handling
- Architecture Decisions - Factory pattern and engine design decisions
- Factory System - Dynamic module discovery