About the Role
As a Gen AI Testing Subject Matter Expert (SME), you will be responsible for defining and implementing comprehensive testing strategies for Generative AI models, ensuring their accuracy, relevance, and compliance with industry standards.
Responsibilities
- Test Strategy & Planning
- Define comprehensive testing strategies tailored for Gen AI models (LLMs, diffusion models, etc.).
- Identify key testing dimensions: accuracy, relevance, coherence, bias, toxicity, hallucination, and safety.
- Develop test plans for different stages: pre-training, fine-tuning, prompt engineering, and deployment.
- Test Case Design & Automation
- Design test cases for both deterministic and non-deterministic outputs.
- Create benchmark datasets and golden sets for evaluation.
- Develop automated testing pipelines using tools like LangChain, PromptLayer, or custom frameworks.
- Evaluation Metrics & Analysis
- Define and apply appropriate evaluation metrics (e.g., BLEU, ROUGE, perplexity, factual consistency).
- Analyze model outputs for hallucinations, bias, and harmful content.
- Conduct A/B testing and human-in-the-loop evaluations.
- Prompt & Scenario Testing
- Test prompt robustness across variations, edge cases, and adversarial inputs.
- Validate prompt templates and chaining logic in RAG or agent-based systems.
- Ensure consistency and reliability across different user intents and contexts.
- Risk & Compliance Testing
- Validate adherence to responsible AI principles (fairness, transparency, accountability).
- Test for compliance with data privacy laws (e.g., GDPR, PDPA) and industry regulations.
- Identify and mitigate risks related to model misuse or unintended behavior.
- Tooling & Infrastructure
- Set up and maintain testing environments for Gen AI models (cloud-based or on-prem).
- Integrate testing into CI/CD pipelines for continuous validation.
- Leverage synthetic data generation and simulation tools for scalable testing.
- Collaboration & Reporting
- Work closely with Testing Domain Lead, data scientists, ML engineers, product teams.
- Document test results, issues, and recommendations clearly.
- Provide feedback loops to improve model training, fine-tuning, and deployment.