Framework
SHEER Metrics
The five key metrics that make up the SheerScore, each measuring a different aspect of AI involvement in the creative process.
The SHEER Framework
The SHEER Framework is a structured approach to evaluating the depth to which AI was used to create a workAny product or art (such as images, music, literature, poetry) for which AI can be used to create work. The framework consists of five key metrics that assess different aspects of AI involvement in the creative process. Each metric is scored on a scale from 0 to 5, with higher scores indicating deeper AI involvement.
Based on how each metric is scored, a final SheerScore is calculated that ranges from 0 to 100. This score provides a simple way to communicate the overall depth of AI involvement in the creation of a work. See the SheerScore page for more details on how the final SheerScore is calculated.
The Five SHEER Metrics
This metric measures AI's involvement in the initial creative act and how deeply AI was involved in crafting the idea which resulted in final outcome.
Your SheerScore for this metric will be higher if:
- You relied on AI to help you come up with an idea rather than providing AI with details that it refined
- Your original prompt for the project was high-level and vague
- You needed help knowing where to start or how to think about what you wanted to create and used AI to provide that help. You then used AI's suggestions to build the final work.
Your SheerScore for this metric will be lower if:
- You developed most of the core ideas but used AI to refine it
- Your original idea was fully formed but you needed AI's help to research specifics about how to proceed
- You started with source material you or your team created and used AI to refine the material for market fit or used it to help you ideate about alternate ways to develop the product.
Examples:
-
Higher score:
- Your original prompt was about a subject matter but lacked specifics about what that subject should contain or what it should include or avoid.
-
Lower score:
- You described, in detail, precisely what you want to create or build and used AI tools to evaluate your idea and taking only small refinements from its responses
- You provided AI with a body of work that was largely complete and used AI to help you address the needs of an additional customer segment using that work.
This score measures the depth to which the final work came from AI's general knowledge vs. work the maker provided to an AI tool in order for the tools to create the final work. This score also measures how transparent the AI-created portion of the work is about original sources.
Your SheerScore for this metric will be higher if:
- More than 50% of the content for the final work came from the tool's general, publicly-trained knowledge base.
- The maker did not supply the AI tool with much or a maker-created content to help the tool develop its portion of the work.
- The AI tools were not fine-tuned or was lightly-tuned with maker-generated content.
- The AI tools did not use or lightly used maker-generated RAG content.
- The sources for the AI-created portion of the work are vague or non-existent.
Your SheerScore for this metric will be lower if:
- Less than 50% of the content for the final work came from AI tool's general, publicly-trained knowledge base.
- The maker supplied the AI tool with maker-created content that the tool largely used to create the final work
- The AI tools were heavily fine-tuned with maker-generated content and it used this content to generate the AI-created portion of the work.
- The AI tools largely used maker-generated RAG content to generate its portion of the final work.
- The sources used for the AI-created portion of the work are transparent and the relationship between the source material and the final work are clear.
This score measures the depth to which AI was involved in the generation of the final creative work and how deeply AI was involved in generating the material content which resulted in final work.
Your SheerScore for this metric will be higher if:
- More than 50% of the final work was generated by AI
- AI was used to automate the final work with little involvement by humans (low human-in-the-loop or HITL involvement)
Your SheerScore for this metric will be lower if:
- Less than 50% of the final work was generated by AI
- The maker supplied AI with human-generated source material and a good portion of it appears or was carried through as a part of the final work.
- AI was used as a labor-saving tool but humans were heavily involved in producing the final work (high human-in-the-loop) involvement.
Examples:
- Higher score
- AI generated a majority of a given code base
- AI used existing data to evaluate and generate reports with little or no human oversight
This score measures the depth to which AI was involved in assessing a work's quality, factuality, adherence to inter-disciplinary best practices, or any other quality measure, accuracy measure, or adherence to norms. This metric assumes that any result of AI's assessment were either largely followed or incorporated into the final work.
Your SheerScore for this metric will be higher if:
- AI was used, primarily, to assess the work with little to know HITL involvement.
- AI was given human-generated guidelines to follow in assessing the work and AI's assessment based on those guidelines were accepted with little or no human-led follow up.
- One AI tool was used to evaluate the output of another AI tool and make changes based on that assessment with little to no human involvement.
Your SheerScore for this metric will be lower if:
- AI was used moderately or marginally to assess the final work. Most of the work's assessment was done by humans.
- AI was used to assess the work but the final work was primarily assessed by humans
Examples:
- Higher score:
- AI was used to check the color palette of a work of art or photo.
- AI was used to fact check a document and it's assessment itself was taken as accurate
- AI was used to assess a code base for adherence to a group's best practices.
- AI was used to check the logical flow of a document.
The score measures the depth to which AI was used to refine a work regardless of how involved AI was in generating the work. Refinement occurs at the stage where no new features or additional content is being added. A work largely is complete but needs slight modifications to improve quality. The refinement can be the next step after the work was assessed.
Your SheerScore for this metric will be higher if:
- AI was used, primarily, to refine a work with little to no human oversight or evaluation.
- AI was given human-generated guidelines for what to refine in a work and used those guidelines to refine the work with little to no human oversight or evaluation.
Your SheerScore for this metric will be lower if:
-
AI was used moderately or marginally to refine the final work. Most refinement was performed by humans.
-
AI generated guidelines for what should be refined but humans, primarily, made the final judgement about what to refine and performed those refinements.
-
How deeply was AI used to refine or clean up the final product?
- Assumes the creative work is complete. No new features or additional content.
- Focuses on refactoring, grammatical clean up, removing dead code paths, removing or adding pixels to an image to make it cleaner and clearer.
Examples:
- Higher score:
- You used AI to fix errors in code with little to no human oversight
- You used AI to address grammatical and spelling errors in a document with little to no human oversight.
- You used AI to remove artifacts from a photo or work of art with little to no human oversight.
- AI was used to change variable names across a codebase with little to no human oversight.