Gemini Split Test Dashboard

Sites Test Websites

S1

Professional Lawyer

Law Firm (Dubai)

S2

The Groomer

Pet Grooming (Dubai)

S3

Now Consultant

Accounting (Dubai)

S4

PAL Auto Garage

Auto Repair (Dubai)

S5

Jazz Lounge Spa

Spa (Dubai)

S6

La Clé du Barbier

Barbershop (France)

S7

Avocat Fiscaliste Semon

Tax Attorney (Paris)

S8

Tygroo

Auto Service (France)

S9

Lucas Sebban

Criminal Law (Paris)

S10

Cabinet Benane

Family Law (Paris)

Phase 1 Finding the Optimal Config (R1–R7)

R1: Instruction Level

full vs medium vs lean vs zero (20 tests)

Winner: lean / full (tied 8.6)

Level	S1	S2	S3	S4	S5	Avg
Full	8 demo	9 demo	9 demo	8 demo	9 demo	8.6
Medium	8 demo	7 demo	9 demo	9 demo	9 demo	8.4
Lean	9 demo	9 demo	8 demo	8 demo	9 demo	8.6
Zero	8 demo	8 demo	8 demo	8 demo	9 demo	8.2

Insight: Less is more. Gemini needs minimal guidance.

R2: Image Source

real vs stock vs mix vs none (20 tests)

Winner: real / mix / none (tied 8.6)

Source	S1	S2	S3	S4	S5	Avg
Real	8 demo	9 demo	8 demo	9 demo	9 demo	8.6
Stock	7 demo	5 demo	7 demo	7 demo	4 demo	6.0
Mix	8 demo	9 demo	8 demo	9 demo	9 demo	8.6
None	9 demo	9 demo	8 demo	8 demo	9 demo	8.6

Insight: Stock tanked at 6.0. Gemini finds its own images when given none.

R3: Design Control

specified vs vibe vs free (15 tests)

Winner: specified / free (tied 9.0)

Control	S1	S2	S3	S4	S5	Avg
Specified	9 demo	9 demo	9 demo	9 demo	9 demo	9.0
Vibe	9 demo	9 demo	8 demo	9 demo	4 demo	7.8
Free	9 demo	9 demo	9 demo	9 demo	9 demo	9.0

Insight: Free design = same quality as specified, zero prompt cost. Vibe had a bad outlier.

R4: Model Comparison

Pro vs Flash vs Lite (15 tests)

Winner: Pro (9.0)

Model	S1	S2	S3	S4	S5	Avg	$/M in	$/M out
Pro	9 demo	9 demo	9 demo	9 demo	9 demo	9.0	$1.25	$10.00
Flash	9 demo	9 demo	8 demo	9 demo	9 demo	8.8	$0.10	$0.40
Lite	7 demo	4 demo	8 demo	6 demo	8 demo	6.6	$0.02	$0.10

R5: Temperature

0.3 vs 0.5 vs 0.7 (15 tests)

Winner: 0.5 (9.0)

Temp	S1	S2	S3	S4	S5	Avg
0.3	9 demo	9 demo	8 demo	9 demo	9 demo	8.8
0.5	9 demo	9 demo	9 demo	9 demo	9 demo	9.0
0.7	9 demo	8 demo	9 demo	9 demo	8 demo	8.6

Insight: 0.5 = perfect 9s. 0.3 slightly lower (8.8). 0.7 inconsistent (8.6).

R6: Best Combo Validation

lean + none + free + pro + 0.5 (5 tests)

9.0 on all 5 niches

Site	Score	Demo
S1 Professional Lawyer	9	demo
S2 The Groomer	9	demo
S3 Now Consultant	9	demo
S4 PAL Auto Garage	9	demo
S5 Jazz Lounge Spa	9	demo

Config locked: lean + none + free + pro + temp 0.5 = 9.0/10.

R7: Section Completeness

lean vs checklist vs skeleton (15 tests)

Tied at 8.6

Method	S1	S2	S3	S4	S5	Avg
Lean	9 demo	9 demo	8 demo	8 demo	9 demo	8.6
Checklist	9 demo	8 demo	8 demo	9 demo	9 demo	8.6
Skeleton	9 demo	8 demo	9 demo	7 demo	9 demo	8.4

Key finding: More sections = more image placeholders = lower scores. Image quality is the bottleneck, not sections.

Phase 2 Feedback Loop (R9)

R9: 2-Pass Generate → Score → Fix

All 10 sites. Generate, screenshot, AI score, critique → regenerate.

Avg +1.1 overall

Site	Niche	Pass 1	Pass 2	Δ	Img P1→P2	Demos
S1	Law Firm	8	9	+1	7→8	P1 P2
S2	Pet Grooming	7	7	0	5→4	P1 P2
S3	Accounting	8	9	+1	6→9	P1 P2
S4	Auto Repair	9	9	0	9→9	P1 P2
S5	Spa	9	8	-1	9→5	P1 P2
S6	Barbershop	9	9	0	9→9	P1 P2
S7	Tax Attorney	8	9	+1	7→9	P1 P2
S8	Auto Service	7	7	0	3→5	P1 P2
S9	Criminal Law	7	9	+2	2→9	P1 P2
S10	Family Law	7	7	0	4→5	P1 P2

Rules: Pass 1 = 9 → skip Pass 2. Pass 1 = 8 → run Pass 2 (guaranteed 9). Pass 1 = 7 → 50/50. Images = #1 bottleneck.

Phase 3 Prompt Engineering & Image Bank (R10–R14)

R10: Anti-Slop Design Rules

Editorial typography + distinctive palettes + layout variety

Better aesthetics, same scores

Site	P1	P2	Img P1	Demos
S7 Tax Attorney	9	9	8	P1 P2
S9 Criminal Law	7	8	3	P1 P2
S10 Family Law	8	8	5	P1 P2

R11: Layout-Only Rules

4 layout rules only (~60 words). No font/color constraints.

9/10 all sites on Pass 1

Site	P1	P2	Img P1	Demos
S7 Tax Attorney	9	9	8	P1 P2
S9 Criminal Law	9	9	8	P1 P2
S10 Family Law	9	9	8	P1 P2

Breakthrough: 4 layout rules = all 9s on Pass 1. No Pass 2 needed. Font/color rules in R10 caused breakage.

R12: CRO + Content Rules

R11 + 10 CRO/content rules (~200 words)

REGRESSION

Site	P1	P2	Img P1	Demos
S7 Tax Attorney	9	8	7	P1 P2
S9 Criminal Law	7	9	2	P1 P2
S10 Family Law	7	8	3	P1 P2

Lesson: More rules = worse output. 14 rules diluted Gemini's attention.

R13: Niche Rules + Image Bank ⭐

R11 layout rules + law-firm design direction + copywriting + 38 Nano Banana Pro images. Markdown. Temp 0.5. Single pass.

NEW BEST: 9/10, images 9/10

Site	Visual	Sections	Images	Copy	Mobile	Overall	Demo
S7 Tax Attorney	9	10	9	9	8	9	demo
S9 Criminal Law	9	10	9	9	8	9	demo
S10 Family Law	9	10	9	9	8	9	demo

Image bank solved the #1 bottleneck. Images: avg 4-8 → consistent 9. Niche direction adds polish without rule-count bloat.

R14: XML + Temp 1.0 + Verbosity

Same as R13 but XML tags, temp 1.0 (Google guide), explicit verbosity. Single pass.

Also 9/10 but slower

Site	Visual	Sections	Images	Copy	Mobile	Overall	Demo
S7 Tax Attorney	9	9	9	9	8	9	demo
S9 Criminal Law	9	10	9	9	8	9	demo
S10 Family Law	9	10	8	9	8	9	demo

Verdict: XML + temp 1.0 = no improvement. R14 slightly worse images (8 vs 9 on S10), 25% slower. R13 wins.

Phase 3 Evolution (3 French Law Sites)

Site	R9	R10	R11	R12	R13 ⭐	R14
S7 Tax	8→9	9→9	9→9	9→8	9 (img:9)	9 (img:9)
S9 Criminal	7→9	7→8	9→9	7→9	9 (img:9)	9 (img:9)
S10 Family	7→7	8→8	9→9	7→8	9 (img:9)	9 (img:8)

Summary Final Results

9/10

Best Score

105+

Tests Run

14

Rounds

~$0.30

Cost/Site (Pro)

~2 min

Gen Time

38

Images in Bank

Winning Configuration (R13)

Parameter	Value	Why
Instructions	Skeleton + niche rules	HTML skeleton ensures all sections, niche rules add polish
Images	Curated image bank (38)	Solved #1 bottleneck: avg 4-8 → consistent 9
Design	Niche direction	Navy/Gold + DM Serif Display/Inter
Layout	4 creativity rules	Asymmetry, scale contrast, variety, atmosphere
Copy	Niche-specific	Outcome headlines, proper CTAs, trust signals
Prompt	Markdown	XML didn't improve. Simpler and faster.
Model	gemini-3.1-pro	9.0/10. Flash at 8.8 for 10x less
Temp	0.5	Beats 0.3 and 0.7
Passes	Single	9/10 first try. Pass 2 only when <9

Key Learnings

1	Images are everything. AI image bank (Nano Banana Pro, $0.01/image) eliminated broken/irrelevant images.
2	Fewer rules beat more. R11 (4 rules, 60 words) = 9/10. R12 (14 rules, 200 words) = 7-9/10.
3	Niche direction > generic freedom. Palette + fonts + copy style adds polish without penalty.
4	XML and temp 1.0 are hype. R14 matched or underperformed R13.
5	Feedback loop has diminishing returns. Great for 8→9, risky at 9 (can regress).
6	Skeleton templates work. Pre-defined sections ensure completeness without limiting creativity.

Gemini Website Generation — Split Test Dashboard