How is the app organized?

This use case evaluates AI technologies' ability to give a clear and useful overview of a codebase. A useful overview is:

Easy to understand
- Incremental - Able to start simple and then give increasingly complex explanations
- Possibly visual
Accurate
Able to explain:
- How the app is structured:
  - What is the file/module pattern?
- What is the overall architecture?
- What technologies are being used to solve key challenges?

Evaluation Dimensions

Codebase

Metadata about the repo being tested, not the tool itself.

Lines of code (LOC):

S - 0–5,000
M - 5,001–50,000
L - 50,001–250,000
XL - 250,001+

Number of files (NOF):

S - 1–100
M - 101–500
L - 501–2,000
XL - 2,001+

Popularity (1-5) (npm weekly downloads / github stars):

5 - 5M+ / 50,000+ (React, Express)
4 - 1M–5M / 20,000–49,999 (Angular, Vue)
3 - 100K–1M / 5,000–19,999 (Svelte, Vite)
2 - 10K–100K / 1,000–4,999 (SolidJS, Remix)
1 - <10K / < 1,000 (Rective.js, custom/legacy frameworks)

Result Evaluation

Rubric score (1-5) (Non-specific):

5 - Excellent: Outstanding result. Fully meets or exceeds expectations.
4 - Good: Very solid result. Minor issues or gaps. Mostly meets expectations.
3 - Fair / Mixed: Acceptable but with noticeable flaws. Partially meets expectations.
2 - Poor: Low quality or off-target. Major issues. Barely meets intent.
1 - Unusable: Completely fails to meet expectations. Not usable.

How the actual output produced by the tool/prompt combo did.

Metric	Scale / Format	Description

Metric

Scale / Format

Description

Accuracy

1–5

Is the output factually and functionally correct based on your knowledge of the repo?

5 - 100% of the sentences are correct

4 - >= 85% of the sentences are correct

3 - >= 60% of the sentences are correct

2 - >= 45% of the sentences are correct

1 - < 45% of the sentences are correct

Quality. To make quality more objective, we broke it down into different criteria:

Metric	Scale / Format	Description

Metric	Scale / Format	Description
Explains business logic (not just API)	Yes / No	Context beyond code
Has code examples	Yes / No
Reference actual files/functions	Yes / No
Image/Diagram Support	Yes / No	Can it produce diagrams?
Includes API endpoint documentation	Yes / No / Don’t apply	For frontend/backend collaboration when relevant
Organization & Structure	1–5	5 - Table of contents, anchor links, headings, sidebars, search 4 - Mostly easy to scan and jump around 3 - Navigation exists, but sections aren’t well labeled 2 - Few navigational aids 1 - Wall of text or scattered files with no clear way to find things
Depth of Explanation	1–5	5 - Explains purpose, reasoning, edge cases, and usage thoroughly 4 - Explains most parts well, but lacks detail in a few areas 3 - Gives a surface-level overview, but lacks deep insight 2 - Very shallow; barely explains beyond the code 1 - No real explanation, just pasted code or vague comments
Digestibility	1–5	Is the content concise, easy to understand for your audience? “Terms & Definitions”
Formatting Consistency	1–5	5 - Fully consistent formatting; clear hierarchy; code blocks well styled 4 - Mostly clean, with a few inconsistencies 3 - Some formatting issues, but readable 2 - Cluttered or inconsistent formatting 1 - Messy, broken Markdown, or unreadable layout
Multi-Page Output	Yes / No	Can it create full-length documentation or just fragments?

Process

How difficult it is to use the tool effectively.

Metric	Scale / Format	Description

Metric	Scale / Format	Description
Ease of Setup	1–5	How complex is the setup? Installs, config, permissions, etc.
Prompt Simplicity	1–5	Are you able to use short, natural prompts? Or does it need detailed scaffolding?
Run Cost	$ / Tokens / Subscription	How expensive (financially or time-wise) is each run?
Repeatability	1–5	Can you easily reuse the approach on other repos? Is it consistent?

Technologies and Approaches

IDE (Integrated Development Environment):
A software suite that brings together essential tools (such as a code editor, debugger, and compiler) into one interface, enabling developers to write, test, and debug code more efficiently. It's a productivity environment for building software.
AI Agent:
A self-contained AI entity capable of perceiving its environment, making decisions, and taking actions to perform specific tasks. It follows a sense-think-act loop and is typically focused on executing a single or bounded set of goals.
Agentic AI:
A more advanced, autonomous AI system that not only performs tasks but also sets goals, plans across multiple steps, adapts dynamically, and may coordinate multiple AI agents or tools. It's like a conductor that directs simpler agents or tools to accomplish complex, evolving objectives.
Autonomous AI Tool (Black-box Tool):
A self-contained, goal-oriented AI application that performs complex tasks (e.g., generating tutorials from code or summarizing entire repos) with no direct control over the internal AI agent or workflow.
Users interact by providing minimal input—like a GitHub repo URL or a topic—and the tool autonomously handles planning and execution.

Technology:

DeepWiki DeepWiki (made up with Devin)
CodeToTutorial CodeToTutorial
VSCode
- Copilot
  - GPT-4.1
  - Claude Sonnet 4
  - Gemini 2.5 Pro
- Amazon Q
  - Claude Sonnet 4
Cursor
- GPT-4.1
- Claude Sonnet 4
Windsurf
- GPT-4.1
- Claude Sonnet 4
- Gemini 2.5

Approaches:

Prompting for a markdown file
Prompting for a docx
Prompting for docx with images
Prompting for markdown with Mermaid diagrams

Prompting for:

Architecture and File Structure
Describe the overall architecture of the application (e.g., monolith, microservices, layered).
Explain the folder/module structure.
Identify key technologies used and their roles in solving core problems.

EXTRAS:

Prompt:

Generate a Markdown document that explains how this application works, both from a business and technical perspective. Follow this structure:
Overview
Describe what the application does from a business/user perspective.
Summarize the primary use cases and who the end users are.
Key Business Processes
List the main user flows or automated system processes.
Explain the steps involved in completing typical tasks within the app.
Architecture and File Structure
Describe the overall architecture of the application (e.g., monolith, microservices, layered).
Explain the folder/module structure.
Identify key technologies used and their roles in solving core problems.
Flowchart
Generate a flowchart (in Mermaid.js syntax) that visualizes a typical user or system process.
Include the Mermaid code directly in the markdown.
Requirements:
Write progressively: start with a high-level overview and move into more technical detail.
Ensure all explanations are accurate and based on the codebase, not assumptions.
Do not include unnecessary commentary.
Use clear and concise language.
References to function and component names where relevant.
File paths when pointing out architecture or business logic locations.

Why Agent over Ask mode?

Use Agent Mode if:

You want the agent to read and analyze the full codebase (or a large portion of it).
You're looking for a deep, contextual, multi-file overview.
You're starting a new documentation or exploration task.

Use Ask Mode if:

You only want a quick, isolated answer based on the currently open file.
You're testing or iterating on a smaller piece of the codebase.
You don’t need the AI to understand the overall project structure.

If I use Cursor or Copilot with the same model, will it give me different results?

Yes — you will get different results from Cursor and Copilot, even if you're using the same model (like GPT-4.1 or Claude 3.5 Sonnet). The difference isn’t the model — it’s in how each tool structures the context and prompt that it sends to the model.