Build Day 0: Engineering the Virtual CTO Advisor with Google Cloud Vertex AI

AI-Powered Strategic Guidance, Grounded in Real-World Experience

Aug 18, 2025

You’ve heard me talk about AI in healthcare, smart manufacturing, and even agriculture. But today, I’m bringing it home to something deeply relevant to our field – IT strategy. This is an update on a project I’ve been working on behind the scenes: the Virtual CTO Advisor.

My vision? An AI that can answer complex IT strategy questions, generate foundational frameworks, and guide architectural decisions, all based on decades of my public experience and published thought leadership.

But here’s the hard truth: This isn’t about prompting a chatbot and calling it a day. My experience building this has made it abundantly clear why ad-hoc "Custom GPTs" are not sufficient for production-level strategic AI. Today, I want to walk you through the intentional decisions I’m making as I move towards building this tool using Google Cloud's Vertex AI platform.

From Hype to Foundation: The CTO Advisor Story

I started this project with a simple idea: capture the essence of my strategic advice in a consumable, accessible format. My initial experiments involved tools like OpenAI's Custom GPT builder – the kind you see showcased in demos.

The results were… disappointing.

I meticulously fed it my key frameworks and policies, only to find it would consistently:

Hallucinate wildly: Inventing internal processes or misstating foundational advice.
Ignore basic instructions: Defying explicit constraints and pulling from generic, irrelevant knowledge.
Lack reliability: Providing inconsistent or completely inaccurate answers.

Frankly, it was unusable – even for a demo. That experience taught me a hard lesson: reliable, production-grade AI requires a robust underlying architecture, not just a polished front end.

The Vertex AI Decision: Simplicity, Control, and Scalability

So, what’s the alternative? After evaluating the options, I landed on Google Cloud's Vertex AI. Why? Because it offers the critical combination of:

Managed Services: This is huge for a solo operator. I don’t need to worry about patching servers, scaling inference clusters, or managing the embedding infrastructure. Vertex AI handles all of that, allowing me to focus on the AI’s knowledge and persona.
Powerful Model Capabilities: I'm starting with Gemini 2.5 Flash. Why Flash?
- Cost-Efficiency: It's significantly cheaper per token than Gemini Pro, crucial for a non-revenue generating project.
- Performance: It’s designed for high-throughput tasks like RAG, which will be the backbone of the Virtual CTO Advisor.
- Large Context Window: With a 1 million token context window, it can handle vast amounts of information, significantly enhancing RAG's grounding capabilities.
- Fine-Tuning Support: Critically, it supports fine-tuning via Vertex AI, allowing me to imbue it with Keith's voice and strategic nuances.
Seamless RAG Integration: Vertex AI's Vector Search is a game-changer. It provides a managed, scalable, and efficient way to index and retrieve information from my corpus of content – ensuring the AI’s responses are always grounded in my published work.
A Complete AI Ecosystem: Vertex AI provides the embedding models, tuning tools, managed endpoints, and pipeline orchestration needed to build a complete, end-to-end AI application.

The Architectural Blueprint: A Look Under the Hood

Here’s how it all comes together at an architectural level:

Data Storage: Google Cloud Storage serves as the central repository for all raw and processed content (documents, transcripts, embeddings).
Data Ingestion Pipeline:
- Cloud Functions: Triggered periodically to scrape The CTO Advisor website, Substack, and retrieve YouTube video transcripts via the YouTube API.
- Document AI: Used to process PDFs and extract text from various document types.
- Python Scripts (in Cloud Functions/Cloud Run): Clean text, segment it into context-aware chunks, tag it with metadata (source, topic, date), and generate vector embeddings using Vertex AI Embeddings.
- Vertex AI Vector Search: Indexes these embeddings for semantic retrieval.
AI Inference Layer:
- Gemini 2.5 Flash Fine-Tuned Model: Deployed on a Vertex AI Endpoint.
- RAG Orchestration: A Cloud Run service houses the Python code that:
  - Accepts user queries via an HTTP API.
  - Uses embeddings and Vertex AI Vector Search to retrieve context.
  - Constructs a prompt for the fine-tuned Gemini 2.5 Flash model.
  - Handles token verification (Firebase Authentication) to identify users.
  - Applies rate limiting using Cloud Memorystore (Redis).
  - Returns the final, grounded, persona-aligned response.
Frontend:
- Static HTML/CSS/JavaScript hosted directly on Cloud Storage, configured for static website hosting and accessible via virtual.thectodvisor.com (with HTTPS via Cloud CDN).
- Integrates Firebase Authentication SDK for seamless user sign-in.
MLOps & Governance:
- Vertex AI Pipelines & Model Registry: For automating retraining, versioning models, and managing workflows.
- Cloud Logging & Monitoring: To observe performance, costs, and errors.
- Cloud Firestore/BigQuery: For storing user feedback and analyzing usage trends.

Why This Matters for IT Leaders

The Virtual CTO Advisor project is more than just a personal AI experiment; it's a real-world case study of how to approach AI implementation in an enterprise setting:

Data is Paramount: The quality of your AI is fundamentally limited by the quality of your data.
Model Choice Matters: The right model, fine-tuned correctly, with the right orchestration, makes all the difference. Gemini 2.5 Flash, when paired with RAG and a strong dataset, is a highly capable and cost-effective starting point.
Managed Services = Speed & Simplicity: Leveraging services like Vertex AI and Cloud Run allows a solo developer or small team to build and deploy production-ready systems without becoming infrastructure experts.
Grounding is Non-Negotiable: For strategic applications, the ability to tie responses back to verified sources (like Keith's published work) is crucial for credibility and preventing misinformation.

I’m still deep in the trenches, refining the data processing and prompt engineering to perfectly capture Keith’s voice and strategic nuance. But the early results are incredibly promising. This isn't just a demo; it's a glimpse into a future where AI can genuinely empower IT leaders with actionable, trustworthy strategic guidance.

Stay tuned for more updates!

Cloud Everyday

Discussion about this post

Ready for more?