How an LLM Learns to Draw Like an Architect

A look at the architecture behind BIM Monkey — the AI agent that authors construction documents inside Revit

By Eric Chrabot and Barrett Eastwood · Published April 30, 2026

When we started BIM Monkey, we were trying to answer a question that should have a simple answer but doesn't: why is it 2026 and architects are still placing every callout, every dimension, every elevation marker by hand?

Construction documents — the technical drawing sets that describe a building well enough to actually build it — are the largest cost center in most architecture firms. A mid-sized commercial project requires hundreds of plan, section, elevation, and detail views, all of them placed manually, dimensioned manually, annotated manually, and graphically tuned to match firm standards manually. The tools haven't changed in twenty years. Revit gave architects parametric modeling. It did not give them parametric documentation.

The natural response, once large language models started showing real reasoning ability, was: surely an LLM can do this. And it can — almost. There are three problems that have to be solved together, and most of the public LLM-CAD experiments solve maybe one of them. This post describes how we approached all three.

We're publishing it because we think the architecture is interesting, because we want to attract architects and engineers to use it, and because we want to be public about the technical direction we're committed to.

Problem 1: An LLM needs more than a chat box. It needs a tool surface.

Most "AI in CAD" demos work like this: a chat panel sits next to your Revit window, you type a request, the AI tells you what to do, you do it. That's a chat interface, not an agent. It doesn't scale to a 200-sheet construction document set, and architects don't want a chatbot — they want the work done.

For an LLM to actually author documents, it needs the same kind of access a human user has: the ability to create sheets, place views, draw callouts, generate schedules, modify families, and do all of this transactionally — meaning if a multi-step operation fails partway through, the document doesn't end up half-baked.

The way we solved this: we wrote a Revit plugin that loads inside the Revit process itself and exposes the Revit API to an LLM agent as a structured tool surface. Each authoring operation is a named, callable tool. Every operation that mutates the document runs inside a Revit Transaction, which means the same atomicity guarantees that protect you when you click "Place View" also protect the AI when it calls the equivalent tool.

We expose hundreds of these tools, organized into the same categories an architect thinks in: sheet creation, view placement, annotation, dimensioning, scheduling, parameters, site. The LLM doesn't drive Revit's user interface; it drives Revit's underlying object model directly. The difference matters: UI-driving automation breaks when the UI changes, can't run in the background, and can't guarantee transactional integrity. Process-resident tool servers do all three.

This is the part of the architecture closest to existing public work — exposing applications to LLMs via tool protocols is now standard practice. What's distinctive is the depth of the surface. Most tool integrations expose a dozen or so high-level functions. Production-grade construction document authoring requires an order of magnitude more, and it requires every one of them to be transactionally safe.

Problem 2: Generic AI doesn't know your firm's standards. And you don't want it to.

Here's a thing every practicing architect knows that most AI engineers don't: there is no single "correct" way to draw a construction document. Every firm has its own conventions. Line weights. Callout styles. Where the elevation marker goes in relation to the room it cuts. Whether dimensions are above or below the string. Which note appears in which schedule.

These conventions are not arbitrary. They're how a firm communicates its design intent to the contractors who build the work. They're also how a firm signals quality and consistency to clients who hire them again. If our system produced documents that looked like generic AI architecture, it would be useless.

So our system has to learn each firm's conventions. The conventional answer is "fine-tune a model per customer," but that has serious problems:

It requires expensive compute per customer
It locks you into a specific model architecture
It creates an opaque artifact that's hard to audit, hard to update, and hard to undo
It doesn't let the firm see what the model has learned

We took a different approach: we keep the model weights identical for every customer, and we adapt behavior through structured natural-language rules that we inject into the model's working context at the start of every session. When an architect at a firm overrides one of the AI's actions — moves a callout, changes a line weight, edits an annotation — that override is captured as a structured correction, indexed by the operation that triggered it.

The correction is reviewed by a human curator who decides whether it's a firm-specific convention or a universal drafting rule that should apply across all customers. The result is a two-tier hierarchy of behavioral rules: universal rules that apply to every project everywhere, and firm-specific rules that apply only to that firm's work. At the start of each authoring session, the rules are assembled into a single readable block of natural language and prepended to the model's instructions. The model then drafts the way the firm drafts — not because it's been retrained, but because it's been briefed.

A few things follow from this design choice. The model that drafts your documents tomorrow is the same model that drafted them yesterday. It just has different instructions. The firm can read every behavioral rule the system is operating under — auditability is built in. A correction made by a senior architect on Tuesday improves the system for that firm by Wednesday morning. No retraining cycle, no compute cost, no risk of weight drift.

The corrections compound. Every override the firm catches becomes durable institutional knowledge that the system applies to every future project. This is what we believe makes the moat compound. The universal rule layer gets stronger every time we promote a correction from a single firm. The firm-specific layer gets stronger every time a senior architect catches something. Both layers survive staff turnover, model upgrades, and product iteration. They are, in the most literal sense, the firm's draft conventions written down — except the system reads them every time it works.

Problem 3: Even with rules, the AI needs to check its own work.

Here's the third problem, and it's the one we initially underestimated: even with a complete set of behavioral rules, the AI sometimes produces output that looks wrong in ways the rules don't catch. A callout in the right location with the right text and the right line weight can still feel off because the surrounding graphic context is wrong. Architects know it when they see it. The rules don't always articulate it.

The solution we landed on: the system audits its own output visually, against the firm's own past work. When the AI generates a view, it can autonomously capture the rendered viewport, retrieve a reference image from a library of the firm's previously approved drawings — drawings the firm itself has stamped and issued — and submit both images to a vision model with a structured comparison prompt.

The vision model assesses the new drawing's compliance with the firm's drawing standards as exhibited in their own historical output, and flags discrepancies for revision or human review.

This is different from generic image-comparison or design-checking tools in two important ways. First, the reference corpus is built from the firm's own approved work — not a generic standard, not a competitor's library, not a public dataset. The AI is checking whether the new drawing looks like your drawings, not whether it looks like some idealized drawing. Second, the AI initiates the check itself. The architect doesn't have to remember to ask. The system can audit its own output proactively, the way a senior reviewer would.

The combination of explicit behavioral rules and implicit visual self-audit against firm-private reference work is what gets the output close enough to firm-quality that an architect can spend their time on judgment rather than production.

What we're not publishing

We're being intentionally non-specific about a few things in this post:

The exact endpoint registry of the Revit plugin
The database schemas and the structure of stored corrections
The prompts and curation criteria used to promote rules from firm-specific to universal
The vision-model prompt structure for the visual QC step
The customer-facing review workflows

These are implementation details that don't change the architectural picture but do change how hard the system is to replicate. We're publishing the architecture because we believe documenting it openly is the right thing to do — both for our potential customers, who deserve to understand what they're buying, and for the field, which is going to see a lot of AI-CAD work over the next few years and benefits from a clear vocabulary. We're holding the implementation specifics because they're how we earn the right to keep building this.

Why we're publishing this now

A few reasons, honestly.

For architects: if you've ever spent a Friday night placing the same callout 200 times, we built this for you. We're in early access beta testing with a small number of firms, and we'll be expanding access over the next year. If you want to be one of them, we'd love to hear from you.

For other engineers in the AEC AI space: the architecture above is, in our view, the right shape for this problem. We expect a lot of other systems to converge on something similar, and we'd rather have an honest public conversation about the design tradeoffs than watch the same mistakes get made privately at five different startups.

For ourselves: publishing the architecture forces us to be clear about it. Every section above represents a decision we made deliberately and that we're prepared to defend. If we change the architecture later, we'll explain why.

Acknowledgments

The system described above wouldn't exist without Barrett's twenty years of practice at the bench, the architects at Wood Studio who've been generous with their time during the alpha, and the genuinely remarkable progress in foundation models that's made the LLM layer of the system feasible at all. We are grateful to all three.

Eric Chrabot is Co-Founder of BIM Monkey Incorporated, a Delaware C corporation.

Barrett Eastwood is Co-Founder and an Architect with Wood Studio in Seattle.

BIM Monkey is live — sign up for a free trial at bimmonkey.ai.
This post is a public technical disclosure of the BIM Monkey system architecture as of April 30, 2026. A formal technical disclosure has also been published to the Technical Disclosure Commons: https://www.tdcommons.org/dpubs_series/10010/. Permanent archive: https://web.archive.org/web/20260501033926/https://bimmonkey.ai/blog/how-an-llm-learns-to-draw-like-an-architect/