RELEVANCE Banner Image

RELEVANCE:

Generative AI (GenAI) evaluation framework designed to automatically evaluate creative responses from Large Language Models (LLMs)

Expanding the horizon

The evaluation framework can be adapted to assess a variety of open-ended responses beyond textual content:

ModalityDescriptionPotential Evaluation FocusExample Metrics
GenAI ImagesEvaluate the relevance and creativity of images generated from textual descriptions. By applying similar metrics, one can assess how closely the sequence of generated images aligns with a series of descriptions or the thematic consistency across a portfolio of generated artwork.– Relevance of images to descriptions. – Thematic consistency across a series.– Permutation Entropy (assesses diversity of visual elements across images).
– Custom metrics based on visual similarity (e.g., Inception Score).
Interactive MediaIn interactive applications like video games or virtual reality, the framework can be used to evaluate narrative coherence or the adaptive responses of AI entities to user interactions.– Narrative coherence (video games, VR).
– Adaptiveness of AI responses to user interactions.
– Longest Increasing Subsequence (evaluates consistency of responses across interactions).
– Custom metrics based on user engagement or task completion.
Educational ContentFor AI-generated educational material, these metrics can help in assessing the alignment of content with educational standards and learning objectives, ensuring the material’s utility and relevance.– Alignment with educational standards.
– Achievement of learning objectives.
– Content analysis to assess coverage of key topics.
– Task-specific metrics based on learner performance (e.g., MCQ accuracy).

Conclusion

The integration of advanced mathematical metrics with custom relevance evaluations presents a new frontier in the automatic assessment of AI-generated content. The RELEVANCE framework enhances the depth and reliability of evaluations, ensuring AI systems remain aligned with evolving human standards and expectations. Future work should explore adapting these metrics across different forms of AI outputs, such as visual content and interactive AI systems.