How is the app organized?
This use case evaluates AI technologies' ability to give a clear and useful overview of a codebase. A useful overview is:
Easy to understand
Incremental - Able to start simple and then give increasingly complex explanations
Possibly visual
Accurate
Able to explain:
How the app is structured:
What is the file/module pattern?
What is the overall architecture?
What technologies are being used to solve key challenges?
Evaluation Dimensions
Codebase
Metadata about the repo being tested, not the tool itself.
Lines of code (LOC):
S - 0–5,000
M - 5,001–50,000
L - 50,001–250,000
XL - 250,001+
Number of files (NOF):
S - 1–100
M - 101–500
L - 501–2,000
XL - 2,001+
Popularity (1-5) (npm weekly downloads / github stars):
5 - 5M+ / 50,000+ (React, Express)
4 - 1M–5M / 20,000–49,999 (Angular, Vue)
3 - 100K–1M / 5,000–19,999 (Svelte, Vite)
2 - 10K–100K / 1,000–4,999 (SolidJS, Remix)
1 - <10K / < 1,000 (Rective.js, custom/legacy frameworks)
Result Evaluation
Rubric score (1-5) (Non-specific):
5 - Excellent: Outstanding result. Fully meets or exceeds expectations.
4 - Good: Very solid result. Minor issues or gaps. Mostly meets expectations.
3 - Fair / Mixed: Acceptable but with noticeable flaws. Partially meets expectations.
2 - Poor: Low quality or off-target. Major issues. Barely meets intent.
1 - Unusable: Completely fails to meet expectations. Not usable.
How the actual output produced by the tool/prompt combo did.
Metric | Scale / Format | Description |
|---|---|---|
Accuracy | 1–5 | Is the output factually and functionally correct based on your knowledge of the repo? 5 - 100% of the sentences are correct 4 - >= 85% of the sentences are correct 3 - >= 60% of the sentences are correct 2 - >= 45% of the sentences are correct 1 - < 45% of the sentences are correct |
Quality. To make quality more objective, we broke it down into different criteria:
Metric | Scale / Format | Description |
|---|---|---|
Explains business logic (not just API) | Yes / No | Context beyond code |
Has code examples | Yes / No |
|
Reference actual files/functions | Yes / No |
|
Image/Diagram Support | Yes / No | Can it produce diagrams? |
Includes API endpoint documentation | Yes / No / Don’t apply | For frontend/backend collaboration when relevant |
Organization & Structure | 1–5 | 5 - Table of contents, anchor links, headings, sidebars, search 4 - Mostly easy to scan and jump around 3 - Navigation exists, but sections aren’t well labeled 2 - Few navigational aids 1 - Wall of text or scattered files with no clear way to find things |
Depth of Explanation | 1–5 | 5 - Explains purpose, reasoning, edge cases, and usage thoroughly 4 - Explains most parts well, but lacks detail in a few areas 3 - Gives a surface-level overview, but lacks deep insight 2 - Very shallow; barely explains beyond the code 1 - No real explanation, just pasted code or vague comments |
Digestibility | 1–5 | Is the content concise, easy to understand for your audience?
|
Formatting Consistency | 1–5 | 5 - Fully consistent formatting; clear hierarchy; code blocks well styled 4 - Mostly clean, with a few inconsistencies 3 - Some formatting issues, but readable 2 - Cluttered or inconsistent formatting 1 - Messy, broken Markdown, or unreadable layout |
Multi-Page Output | Yes / No | Can it create full-length documentation or just fragments? |
Process
How difficult it is to use the tool effectively.
Metric | Scale / Format | Description |
|---|---|---|
Ease of Setup | 1–5 | How complex is the setup? Installs, config, permissions, etc. |
Prompt Simplicity | 1–5 | Are you able to use short, natural prompts? Or does it need detailed scaffolding? |
Run Cost | $ / Tokens / Subscription | How expensive (financially or time-wise) is each run? |
Repeatability | 1–5 | Can you easily reuse the approach on other repos? Is it consistent? |
Technologies and Approaches
IDE (Integrated Development Environment):
A software suite that brings together essential tools (such as a code editor, debugger, and compiler) into one interface, enabling developers to write, test, and debug code more efficiently. It's a productivity environment for building software.AI Agent:
A self-contained AI entity capable of perceiving its environment, making decisions, and taking actions to perform specific tasks. It follows a sense-think-act loop and is typically focused on executing a single or bounded set of goals.Agentic AI:
A more advanced, autonomous AI system that not only performs tasks but also sets goals, plans across multiple steps, adapts dynamically, and may coordinate multiple AI agents or tools. It's like a conductor that directs simpler agents or tools to accomplish complex, evolving objectives.Autonomous AI Tool (Black-box Tool):
A self-contained, goal-oriented AI application that performs complex tasks (e.g., generating tutorials from code or summarizing entire repos) with no direct control over the internal AI agent or workflow.
Users interact by providing minimal input—like a GitHub repo URL or a topic—and the tool autonomously handles planning and execution.
Technology:
CodeToTutorial CodeToTutorial
VSCode
Copilot
GPT-4.1
Claude Sonnet 4
Gemini 2.5 Pro
Amazon Q
Claude Sonnet 4
Cursor
GPT-4.1
Claude Sonnet 4
Windsurf
GPT-4.1
Claude Sonnet 4
Gemini 2.5
Approaches:
Prompting for a markdown file
Prompting for a docx
Prompting for docx with images
Prompting for markdown with Mermaid diagrams
Prompting for:
Architecture and File Structure
Describe the overall architecture of the application (e.g., monolith, microservices, layered).
Explain the folder/module structure.
Identify key technologies used and their roles in solving core problems.
EXTRAS:
Prompt:
Generate a Markdown document that explains how this application works, both from a business and technical perspective. Follow this structure:
Overview
Describe what the application does from a business/user perspective.
Summarize the primary use cases and who the end users are.
Key Business Processes
List the main user flows or automated system processes.
Explain the steps involved in completing typical tasks within the app.
Architecture and File Structure
Describe the overall architecture of the application (e.g., monolith, microservices, layered).
Explain the folder/module structure.
Identify key technologies used and their roles in solving core problems.
Flowchart
Generate a flowchart (in Mermaid.js syntax) that visualizes a typical user or system process.
Include the Mermaid code directly in the markdown.
Requirements:
Write progressively: start with a high-level overview and move into more technical detail.
Ensure all explanations are accurate and based on the codebase, not assumptions.
Do not include unnecessary commentary.
Use clear and concise language.
References to function and component names where relevant.
File paths when pointing out architecture or business logic locations.
Why Agent over Ask mode?
Use Agent Mode if:
You want the agent to read and analyze the full codebase (or a large portion of it).
You're looking for a deep, contextual, multi-file overview.
You're starting a new documentation or exploration task.
Use Ask Mode if:
You only want a quick, isolated answer based on the currently open file.
You're testing or iterating on a smaller piece of the codebase.
You don’t need the AI to understand the overall project structure.
If I use Cursor or Copilot with the same model, will it give me different results?
Yes — you will get different results from Cursor and Copilot, even if you're using the same model (like GPT-4.1 or Claude 3.5 Sonnet). The difference isn’t the model — it’s in how each tool structures the context and prompt that it sends to the model.