Implement comprehensive evaluation strategies for LLM applications using automated metrics, human feedback, and benchmarking. Use when testing LLM performance, measuring AI application quality, or establishing evaluation frameworks.
Implement evaluation strategies for LLM applications using automated metrics, human feedback, and LLM-as-judge approaches.
Download the skill repository as a ZIP file
/plugin marketplace add wshobson/agents