In 2026, every web agency we talk to is asking the same question: can I just use ChatGPT to convert HTML to WordPress and skip the paid tools? We ran a controlled test with GPT-5, Claude Opus 4.7, and Gemini 2.5 Pro on the same 19-page static site. Same prompt, same files, same rubric. The results are in this report — and they are not what the marketing decks suggest.
This article covers what each model produced, where each one broke, and the real cost in agency hours of getting LLM output to production parity.
Who this is for: Agencies, freelancers, and developers deciding whether vanilla LLMs can replace specialized conversion tools.
Key Takeaways
- No vanilla LLM produced a deployable WordPress theme on the first run from a 19-page real-world site.
- Claude Opus 4.7 scored highest among the three (42/80) but still missed on WordPress-specific conventions like nonces, escaping, and the template hierarchy.
- GPT-5 hallucinated WordPress hook names that do not exist (e.g.,
wp_enqueue_hooksinstead ofwp_enqueue_scripts). - Gemini 2.5 Pro produced the cleanest HTML semantics but dropped schema markup and the multilingual setup entirely.
- "Free" LLM conversion costs 8–20 hours of internal time at $75/hr — $600–$1,500 — to reach production parity.
- A specialized AI conversion pipeline covers the same scope for a fraction of that — see the homepage for current pricing.
- For a 1–3 page personal site with no forms or SEO stakes, a vanilla LLM is fine.
1. The Test Setup
1-1. The Site (19-Page Static HTML, Real-World Structure)
We picked 19 pages because that is the median page count from real WP Pro Converter signup data — the "median real-world site size" most agencies are dealing with (internal test data). The test site mirrors what we see weekly:
- A header with primary nav and a language toggle.
- A hero section with a custom three-image slider in vanilla JS.
- Three service pages, each using ACF-style repeating cards with image, title, and rich text.
- A blog listing page with pagination.
- A blog detail page with featured image, author, date, and category.
- A contact form posting to a PHP handler with email validation.
- A footer with social links and a newsletter signup.
- A Calendly inline embed on the contact page.
- Web fonts loaded via
@font-facefrom a self-hosted font folder. - Google Analytics 4 + Google Tag Manager tags.
- Schema.org JSON-LD on every page (
Organization,WebSite,BreadcrumbList, andArticlefor blog posts). - Mobile breakpoints at 768px and 1024px with a sticky mobile menu.
This is the kind of site that pays an agency $4,000–$12,000 to migrate to WordPress today.
1-2. The Prompt (Identical Across All Three LLMs)
We gave every model the same prompt and the same zipped HTML source. No follow-up messages, no nudging, no system prompts. We wanted to see what each model produces from a clean, generous brief.
Generate:
- A complete functions.php with proper WordPress hooks
- ACF field groups for editable sections
- A header.php and footer.php with correct asset enqueueing
- Page templates that match the original HTML structure
- SEO meta tags and schema.org JSON-LD preserved
Show me the file tree and full code for each file.
1-3. Scoring Rubric
We scored each output 0–10 across 8 dimensions, for a total of 80 possible points:
- Theme structure — correct file hierarchy,
style.cssheader,index.phpfallback. - ACF fields generated — usable field groups with sane keys and locations.
- Asset paths correct —
get_stylesheet_directory_uri()and friends used properly. - Responsive breakpoints preserved — mobile menu, breakpoints, no layout regressions.
- Form handler wired — nonces, sanitization, escaping, error states.
- Multilingual handling — at minimum a working WPML or Polylang scaffold and translatable strings.
- SEO meta + schema — title, description, OG tags, JSON-LD preserved.
- Time-to-installable-theme — minutes from "paste output" to a theme that activates without fatal errors.
A score of 9–10 means production-ready or near it. A score of 0–3 means you are starting over.
2. GPT-5 Results
GPT-5 produced a complete file tree with 14 files and roughly 1,200 lines of PHP. The first impression was good: a real style.css header, a functions.php with add_theme_support() calls, and page templates for each top-level section.
The breaks showed up within minutes.
Hallucinated hook name. GPT-5 invented a hook that does not exist in WordPress core:
function wpc_enqueue_assets() {
wp_enqueue_style('wpc-main', get_stylesheet_uri());
wp_enqueue_script('wpc-slider', get_template_directory_uri() . '/js/slider.js', array('jquery'), null, true);
}
add_action('wp_enqueue_hooks', 'wpc_enqueue_assets');
The correct hook is wp_enqueue_scripts. This is the kind of bug that loads silently — no PHP error, no theme crash, just a page with no styles and no JS. You discover it in production.
Wrong template hierarchy. GPT-5 created single-blog.php for blog detail pages, but the test site used WordPress's default post post type. The correct template name is single.php. WordPress fell back to index.php, which had its own bugs.
ACF fields under-specified. Field groups were generated, but the location rules pointed to a custom post type (services) that GPT-5 invented without registering in functions.php. The fields never appeared in the editor.
Schema dropped on the floor. The original JSON-LD blocks were not ported. The output had no wp_head injection of schema, no plugin recommendation, nothing.
| Dimension | Score (0–10) |
|---|---|
| Theme structure | 6 |
| ACF fields | 4 |
| Asset paths | 5 |
| Responsive breakpoints | 7 |
| Form handler | 3 |
| Multilingual | 2 |
| SEO + schema | 2 |
| Time-to-installable | 4 |
| Total | 33 / 80 |
3. Claude Opus 4.7 Results
Claude Opus 4.7 was the strongest of the three on raw code quality. It produced 17 files, used real WordPress core functions, and was the only model to write a functions.php that did not throw a fatal error on activation.
But "best at code" is not the same as "right for WordPress."
Template hierarchy: half right. Claude correctly named single.php, archive.php, and page.php, but it created a template-services.php file without the matching template header comment. The Page Attributes panel never showed it as a template option.
Form handler: missing nonces. The contact form posted to a custom admin-post.php action, but the nonce verification was absent:
function wpc_handle_contact() {
$name = $_POST['name'];
$email = $_POST['email'];
$msg = $_POST['message'];
wp_mail(get_option('admin_email'), 'Contact', $msg);
wp_redirect(home_url('/thank-you'));
exit;
}
add_action('admin_post_nopriv_wpc_contact', 'wpc_handle_contact');
No wp_verify_nonce(). No sanitize_text_field(). No is_email() validation. This is CSRF and email injection waiting to happen. A specialized pipeline would never ship this.
ACF: better than GPT-5, still wrong. Claude generated proper field groups in PHP using acf_add_local_field_group(), but it pointed location rules at custom post types it then forgot to register. Same bug as GPT-5, slightly more polished failure.
Multilingual: scaffolded but unwired. Claude added load_theme_textdomain() and wrapped strings in __() and _e(). But there were no .po or .mo files generated, and no WPML or Polylang configuration. The site was English-only on activation.
Schema: partial. Claude ported the Organization and WebSite schemas into header.php, but dropped BreadcrumbList and Article for blog posts.
| Dimension | Score (0–10) |
|---|---|
| Theme structure | 7 |
| ACF fields | 5 |
| Asset paths | 7 |
| Responsive breakpoints | 8 |
| Form handler | 3 |
| Multilingual | 3 |
| SEO + schema | 4 |
| Time-to-installable | 5 |
| Total | 42 / 80 |
4. Gemini 2.5 Pro Results
Gemini 2.5 Pro was the most cautious of the three. It produced 11 files and frequently inserted // TODO comments where it was uncertain. That honesty is useful — but uncertainty is not a deliverable.
HTML semantics: cleanest of the three. Gemini's header.php and footer.php had the tightest markup. No leftover utility classes from the source, no orphan divs. If you only cared about markup, Gemini won.
Asset paths: mostly right. Gemini used get_template_directory_uri() consistently, though it confused it with get_stylesheet_directory_uri() in two places (relevant for child themes, irrelevant for this test).
ACF: skipped. The output included a comment block:
// TODO: Generate ACF field groups for service pages.
// Recommended: install Advanced Custom Fields Pro and create:
// - service_cards (repeater)
// - hero_image (image)
// - cta_text (text)
That is a recommendation, not a deliverable. You still have to build the field group by hand.
Schema: dropped entirely. No JSON-LD anywhere in the output. The original site had structured data on every page; the WordPress version had none.
Multilingual: ignored. No load_theme_textdomain(), no __() wrappers, no language toggle component. Gemini treated multilingual as out of scope despite the prompt explicitly listing it.
Form handler: TODO. Same pattern. A scaffolded <form> with a comment saying "wire to your preferred form plugin."
| Dimension | Score (0–10) |
|---|---|
| Theme structure | 5 |
| ACF fields | 2 |
| Asset paths | 7 |
| Responsive breakpoints | 6 |
| Form handler | 1 |
| Multilingual | 1 |
| SEO + schema | 1 |
| Time-to-installable | 4 |
| Total | 27 / 80 |
5. Side-by-Side Score Table
| Dimension | GPT-5 | Claude Opus 4.7 | Gemini 2.5 Pro | WP Pro Converter |
|---|---|---|---|---|
| Theme structure | 6 | 7 | 5 | 10 |
| ACF fields | 4 | 5 | 2 | 10 |
| Asset paths | 5 | 7 | 7 | 10 |
| Responsive breakpoints | 7 | 8 | 6 | 9 |
| Form handler | 3 | 3 | 1 | 9 |
| Multilingual | 2 | 3 | 1 | 9 |
| SEO + schema | 2 | 4 | 1 | 10 |
| Time-to-installable | 4 | 5 | 4 | 8 |
| Total | 33 / 80 | 42 / 80 | 27 / 80 | 75 / 80 |
The specialized pipeline does not score a perfect 80. Two honest weaknesses: turnaround is hours, not seconds (LLMs return output instantly), and the service is paid, not free. We stand by both trade-offs.
6. What Vanilla LLMs Get Right
Being fair to the LLMs: there is real value in what they produce. If you treat the output as a starting point, not a deliverable, here is what works.
- Basic HTML semantics. All three models clean up source markup well. Stray inline styles, redundant wrapper
divs, and non-semantic tags get rationalized. - Header and footer extraction. Pulling shared layout into
header.phpandfooter.phpis a routine operation, and every model handled it. - Style.css headers. Every model produced a valid theme header block with
Theme Name,Version, andDescription. This is the lowest bar, but it is met. - Simple ACF field naming. When fields were generated, the keys followed sane conventions (
hero_title,service_cards,cta_text). Naming is the easy part of ACF; locations and registration are where it falls apart.
If you have a small site, the patience to debug, and you treat LLM output as a 30%-finished draft, you can get there. The question is whether your hourly rate makes that math work.
7. What Vanilla LLMs Consistently Break
We have written a separate piece on the most common HTML-to-WordPress conversion mistakes; the LLM-specific failures are a subset of those mistakes, plus a few new ones.
- Hallucinated WordPress functions and hook names.
wp_enqueue_hooks,register_post_meta_field,acf_register_group_local— all examples of hooks or functions that do not exist in core or in ACF. They look plausible, but they fail silently. - Asset path mismatches.
style.cssreferenced as a relative path instead of throughget_stylesheet_directory_uri(), or template parts loaded withincludeinstead ofget_template_part(). - Custom JS dependencies broken silently. When a slider depends on jQuery or a third-party module, LLMs frequently drop the dependency declaration in
wp_enqueue_script(), breaking the script in unpredictable browser contexts. - Form handlers either missing or insecure. No nonces, no sanitization, no escaping. This is a compliance and security issue, not a polish issue.
- Schema markup dropped. Original JSON-LD blocks rarely make it into the WordPress output. The site loses its structured data and its rich result eligibility.
- Multilingual handling abandoned. Despite explicit prompt instructions, none of the three models scaffolded a real WPML or Polylang setup.
- No verification layer. You discover failures in production, not before. There is no second pair of eyes on the output.
- No way to preview output before committing time. You spend hours iterating prompts before knowing whether the result is even close. The cost compounds with every revision.
8. The Real Cost of "Free" LLM Conversion
The LLM API call is free or nearly free. The engineering hours after the call are not. This is the math agencies skip when they price a "DIY with ChatGPT" approach.
For a deeper breakdown of conversion economics, see our HTML to WordPress conversion cost guide.
Assume a $75/hour internal rate (conservative for senior agency dev time in North America or Europe). Here is what we observed bringing each LLM output to production parity on the test site:
| Path | Hours to production parity | Internal cost ($75/hr) |
|---|---|---|
| GPT-5 | 14 | $1,050 |
| Claude Opus 4.7 | 9 | $675 |
| Gemini 2.5 Pro | 18 | $1,350 |
| Specialized AI conversion pipeline | 0.5 (review only) | $37.50 + service fee |
The 0.5 hours for the specialized pipeline is the time to review the delivered theme and confirm it matches the design. Current service pricing is on the homepage.
Even against the most efficient LLM (Claude at $675 in dev time), the specialized pipeline lands well under the dev-hour cost. The break-even is roughly 1–2 hours of dev time. If you spend more than 90 minutes prompting, you have already lost the cost argument.
9. When a Vanilla LLM Is Actually Fine
We are not anti-LLM. We use them daily. There are real cases where a vanilla LLM is the right tool.
- A 1–3 page personal site with no forms, no ACF fields, no SEO migration concerns.
- A static prototype you intend to throw away in a month.
- A developer who already knows WordPress well and wants the LLM to scaffold boilerplate they will then heavily edit.
- A site with no schema, no multilingual, and no client expectations — your blog, your hobby project, your local meetup site.
If you are reading this and the project fits, our static HTML vs WordPress decision guide has more detail on whether a WordPress conversion is even the right move.
10. When You Need a Specialized Pipeline
The math flips the moment any of the following are true:
- The site is more than 5 pages.
- There are forms that need security review.
- ACF fields need to be editable by a non-technical client.
- SEO migration is in scope (schema, meta, redirects).
- The design must match pixel-perfect to the original.
- The conversion is for a paying client where a broken theme costs more than the price of a specialized service.
- You want to see the converted output before committing to a build path.
For agencies, this is most projects. The economics of LLM conversion only work for the smallest, lowest-stakes engagements.
11. About WP Pro Converter
WP Pro Converter is an AI-powered service that converts static HTML websites into fully functional WordPress themes, preserving the original design pixel-perfectly. Built by Utsubo, an award-winning creative studio headquartered in Osaka, Japan.
The service uses a specialized conversion pipeline — not a single prompt. For current pricing and plan details, see the homepage.
Ready to Convert Your HTML Site?
Tired of debugging hallucinated WordPress hooks? WP Pro Converter uses a specialized AI pipeline to transform your static site into a fully functional WordPress theme — preserving your design pixel-perfectly. See current plans and pricing on the homepage.
Questions? Contact us at: contact@utsubo.co
13. Pre-Conversion Checklist
Before you feed your site to any tool — LLM or specialized — work through this list.
- Inventory every page. Real count, not "around 20."
- List every form on the site and where it submits.
- Note every third-party embed (Calendly, YouTube, Maps, Stripe, etc.).
- Pull every schema.org JSON-LD block into a single reference file.
- Identify which sections need to be editable by a client (those become ACF fields).
- Document the responsive breakpoints and any sticky or fixed elements.
- Decide on multilingual scope: WPML, Polylang, or none.
- Confirm the deployment target: shared hosting, managed WordPress, or self-hosted.
If you can answer all eight, any conversion path will go smoother. If you cannot answer most, the project is not ready for any tool yet.
In summary: across the same 19-page test site, no vanilla LLM produced a deployable WordPress theme on the first run. Claude Opus 4.7 came closest at 42/80, GPT-5 followed at 33/80, and Gemini 2.5 Pro at 27/80. The labor cost to bring any LLM output to production parity exceeded the price of a specialized HTML-to-WordPress conversion service in every case we measured. Vanilla LLMs are a useful starting point for small projects; for client work, the specialized pipeline wins on time, security, and accuracy.
FAQs
Can I use ChatGPT to convert my site to WordPress? You can use ChatGPT to scaffold a WordPress theme, but the output will not be deployable on the first run for a real-world site. In our 19-page test, GPT-5 hallucinated hook names, dropped schema markup, and left ACF field groups pointing to unregistered post types. Plan on 8–20 hours of dev work to bring the output to production parity.
Why does ChatGPT hallucinate WordPress functions?
Large language models predict plausible code, not correct code. WordPress has thousands of hooks, filters, and template tags with overlapping names and conventions. When a model has not seen a specific hook in enough training data, it generates one that "looks right" — like wp_enqueue_hooks instead of wp_enqueue_scripts. The result compiles and runs, but silently does nothing.
Is Claude better than ChatGPT for WordPress? In our test, Claude Opus 4.7 scored 42/80 vs GPT-5's 33/80, so yes — Claude was stronger on raw code quality. But "best of three" is not the same as "production-ready." Claude still missed nonces on form handlers and dropped two of the four schema types. Both models produced output that needed significant cleanup.
What about Gemini? Gemini 2.5 Pro produced the cleanest HTML markup of the three but skipped ACF field groups, dropped schema entirely, and ignored the multilingual requirement. It scored 27/80. If you only care about clean markup, Gemini is fine. If you need WordPress-specific scaffolding, it is the weakest of the three.
How much does WP Pro Converter cost? For current plans and pricing in USD and JPY, see the homepage. Pricing is significantly less than the dev-hour equivalent of cleaning up vanilla LLM output for the same site.
Can I see the output before paying? Yes — the dashboard walks you through the conversion before you commit. Vanilla LLMs cannot offer this, since you have to spend hours prompting before knowing whether the output is usable.
How long does conversion take? Most conversions are delivered within 24–72 hours of submission, depending on site complexity. The pipeline runs the AI conversion immediately; the rest of the time covers a verification pass that catches issues no model will catch on its own.
Is the converted theme really pixel-perfect? Pixel-perfect to the original HTML, yes — that is the standard we hold every delivery to. The verification step compares the rendered WordPress theme to the source HTML at multiple breakpoints before delivery. If something does not match, it goes back into the pipeline before you see it.