Evaluation

SheerScore

A single number that communicates the degree to which AI was used in creating a work.

What is the SheerScore?

The SheerScore is simply a number that represents how deeply AI was used to produce the work being scored. That work can be code or an application, an image, a movie, a book, an article, or anything else AI can generate. The number does not imply any value judgement. The value of the number is up to you. The number provides transparency so you can make that judgement.

The higher the number the more AI was involved in creating the scored item.

How is the SheerScore Calculated?

The SheerScore is calculated based on number assigned across five areas or metrics that evaluate the depth to which AI was used in creating the work. The five areas AI can be used when creating a work are: Source, History, Execution, Evaluation, and Refinement. To learn more about these five areas, visit the SHEER Metrics page.

Each of the five metrics is assigned a score from 0 to 5 based on how deeply AI was used in that area. The number from each metric is multiplied by 4 to give a final score for that metric that ranges from 0 to 20. The sum of those numbers is the SheerScore.

For example, suppose a piece of work received the following scores across the five areas:

  • Source: 3
  • History: 4
  • Execution: 2
  • Evaluation: 5
  • Refinement: 1

The SheerScore would be calculated as follows:

  • Source: 3 x 4 = 12
  • History: 4 x 4 = 16
  • Execution: 2 x 4 = 8
  • Evaluation: 5 x 4 = 20
  • Refinement: 1 x 4 = 4

The total SheerScore would be: 12 + 16 + 8 + 20 + 4 = 60

The higher the SheerScore, the more deeply AI was used in creating the work. This simple number is designed to provide transparency about the role of AI in the creative process. Our long term plan is to provide users the ability to see the details behind each metric score so they can understand how the final SheerScore was derived.

See the SHEER Metrics page for more details on how to assess each of the five metrics.

How are SheerScores Assessed?

We designed the SheerScore to be simple and easy to understand. Here is some of our thinking behind the assessment process:

  1. The 0-5 scale for each metric is meant to be simple and while providing makers with a lightweight scale to evaluate their use of AI. A broader scale (say 0-10) would provide more granularity but would also make the assessment process more cognitively difficult to assess. We believe a 0-5 scale strikes the right balance between simplicity and granularity.
  2. The multiplication by 4 is simply a way to convert the 0-5 scale into a 0-20 scale for each metric. This allows the final SheerScore to range from 0 to 100, which is a familiar scale for many people (similar to a percentage score).
  3. The 100-point scale can imply value but it also can be used to simply communicate depth or intensity and that's the motivation behind this scale for the SheerScore. A higher SheerScore indicates a deeper use of AI in the creative process, but it does not imply that higher is better or worse. The value judgement is left to the user.

What about weighting the metrics differently?

We considered weighting the five metrics differently based on their perceived importance in the creative process. However, we ultimately decided against this approach for several reasons:

  • Simplicity: A simple additive model is easier to understand and communicate. Weighting would add complexity to the calculation and make it harder for users to grasp how the final score is derived.
  • Subjectivity: MakersAny individual human or team of humans involved in creating a body of work may disagree about the importance of one metric over another depending on the context and the type of work being created. If we decided the weighting, this could lead to unhelpful subjective decisions about each metric's relative importance, which could lead to disagreements and confusion.
  • Implied value: Weighting the metrics could imply that some aspects of AI use are more valuable or important than others. We wanted to avoid making value judgements about the role of AI in the creative process and instead focus on providing transparency.

Who or What Assesses the SheerScore?

We're still in the process of designing a framework and approach for assessing SheerScores as objectively as possible and in a way that consumers can validate the accuracy of the score. Here's our current thinking.

Phase 1: Self-Evaluation

In the initial phase, we envision that makers will self-assess their work using the SheerScore framework. We will provide clear guidelines and examples to help makers evaluate their use of AI across the five metrics. This self-assessment approach allows for quick adoption and encourages makers to reflect on their use of AI in the creative process.

Phase 2: A SHEER Registry

As the SheerScore gains traction, we plan to establish a SHEER Registry where makers can submit their SheerScores (or have them automatically generated). This registry will serve as a centralized database of SheerScores and provide insights, reports, and a deeper dive into how and why the SheerScores were assigned. The registry could also facilitate peer reviews and community validation of SheerScores to enhance credibility.

At the beginning, the registry would primarily rely on self-assessments, but over time we envision incorporating third-party assessments and audits to ensure the integrity of the scores in the registry.

Phase 3: Lightweight Tooling

In order to provide some objectivity into the development of the SheerScore, we would like to offer tools that help makers assess the use of AI in their work. For example, we envision a background agent that works with current AI tooling to track and log the use of AI throughout the creative process. This agent could provide a report that makers can use to inform their self-assessment of the SheerScore.

Phase 4: Third-Party Evaluation

In the long term, we envision a system where independent third-party assessors can evaluate and validate SheerScores. These assessors would follow standardized guidelines and methodologies to ensure consistency and objectivity in the assessment process. This phase would add an additional layer of credibility to the SheerScore and help build trust among consumers.

We also envision large AI companies participating in the assessment process by providing transparency reports or logs that detail how their AI systems were used in the creative process. This transparency would help validate the SheerScores assigned to works created using their AI technologies.

In this phase, the SheerScore would function like a credit score or a certification like those used in the food industry (e.g., USDA Organic, Fair Trade). Independent assessors would evaluate the use of AI in the creative process and assign a SheerScore based on standardized criteria. This approach would provide consumers with confidence in the accuracy and reliability of the SheerScore.

Next Steps

We're excited about the potential of the SheerScore to provide transparency about the role of AI in the creative process. As we continue to develop and refine the SheerScore framework, we welcome feedback and collaboration from makers, consumers, and other stakeholders.

See our FAQ page for answers to common questions about the SheerScore. If you want to get involved or have feedback, please let us know of your interest and let use keep you informed as we develop the SheerScore further.

SheerScore for SheerScore

You might be curious how much AI was involved in creating the SHEER framework and the SheerScore. We intentionally did not use AI heavily to develop the framework or the scoring system. We did this not because we're not firm believers in the value of AI but because we wanted this framework to be developed based on our experience and existing academic research that we read, discussed, and synthesized.

Here's how we would assess the SheerScore for the SHEER framework itself:

  1. Source: 0 - The primary idea for the 5 metrics and the SheerScore scoring system came from our reading and disucssions about AI transparency, and the specific framework was developed by us without direct AI assistance.
  2. History: 0 - The SHEER framework was developed from scratch without using any existing AI-generated frameworks or templates. We did do research on existing transparency frameworks, but none were AI-generated.
  3. Execution: 2 - We used AI tools to help with naming the 5 metrics and how to structure the scoring system, but the core development of the framework was done by us.
  4. Evaluation: 0 - We did not use AI to evaluate or validate the SHEER framework. The evaluation was done by us based on our understanding of AI transparency.
  5. Refinement: 1 - We used AI tools to help with editing and refining the naming of the metrics and the overall presentation of the framework, but the core refinement and authoring was done by us.

Based on this assessment, the SheerScore for the SHEER framework would be:

  • Source: 0 x 4 = 0
  • History: 0 x 4 = 0
  • Execution: 2 x 4 = 8
  • Evaluation: 0 x 4 = 0
  • Refinement: 1 x 4 = 4

The total SheerScore for the SheerScore and framework: 12

Further Reading

Art-ificial Intelligence: The Effect of AI Disclosure on Evaluations of Creative Content

Foregrounding Artist Opinions: A Survey Study on Transparency, Ownership, and Fairness in AI Generative Art

The transparency dilemma: How AI disclosure erodes trust

Disclosing artificial intelligence use in scientific research and publication: When should disclosure be mandatory, optional, or unnecessary?

Labeling AI-Generated Content: Promises, Perils, and Future Directions