RELEVANCE Banner Image

RELEVANCE:

Generative AI (GenAI) evaluation framework designed to automatically evaluate creative responses from Large Language Models (LLMs)

LimitationDescription
Data DependencyThe reliability of metrics like Permutation Entropy depends on having a diverse and representative dataset. If the data used to train the LLM is too homogeneous or limited in scope, the entropy score might not accurately reflect the model’s capability to handle a wider range of scenarios.
Sensitivity to Prompt NatureThe effectiveness of these metrics can vary depending on the nature of the prompts used. Prompts that are too narrow or specific may limit the observable diversity in responses, affecting entropy and inversion counts. For instance, a very specific prompt might not allow the LLM to showcase its ability to generate creative or diverse responses.
Regular CalibrationAs the Large Language Model (LLM) continues to learn and evolve, the way it responds to prompts can change. To maintain the accuracy and reliability of the RELEVANCE framework, users who implement this method should regularly review these metrics. This helps ensure the metrics continue to effectively capture the LLM’s behavior.