Google Cloud Next 2026 — From Raw Data to Knowledge Graph

The Data Collection Story

Google Cloud Next 2026 was one of the largest cloud conferences of the year — over a thousand sessions spanning AI infrastructure, developer tools, enterprise platforms, security, and industry verticals.

The challenge: how do you make sense of that much material at once?

The answer was to treat the entire conference as a corpus and build a knowledge graph from it. Here's how the data pipeline worked, layer by layer.

The Pipeline: Layer by Layer

Five steps turned scattered conference artifacts into a graph you can query and reason about on the go.

Session Metadata — 1,146 sessions

The first layer was the structured index from the Google Cloud Next 2026 session explorer. Each entry had a title, description, speaker names, company affiliations, topic tags, room, and timing. This became the skeleton: every session title and speaker was a candidate entity in the graph.

Slide Decks — 444 PDFs, ~2,200 slides analyzed

For 444 of the 1,146 sessions, a slide deck was available on the Google site. Each PDF was downloaded, converted to images, and processed with vision analysis — extracting product names, architecture diagrams, metrics, and key claims from individual slides. This yielded 2,205 slide-level analysis records and per-session brief summaries.

Slides are uniquely valuable: they contain the distilled, intentional message a speaker wants to convey. A bullet point on a slide is more signal than two minutes of verbal framing around it.

YouTube Transcripts — 323 full recordings

For 323 sessions, a YouTube recording was available. Full audio transcripts were retrieved when available, or generated when needed, and stored — capturing the speaker's actual words, including Q&A, live demos, and audience interactions that never appear in a slide deck.

The combination of slides + transcript for the same session was particularly powerful: slides provided structured claims while transcripts provided context, nuance, and the path that led to each conclusion.

Graphify — Entity & Relationship Extraction

All 477 source files (~5 million words) were fed through graphify, an AI-powered knowledge graph builder that reads raw text and extracts named entities, concepts, products, companies, and people — along with the semantic relationships between them.

Most of the graph was built from directly sourced connections found in the conference material itself. Separately, 114 additional edges were added by cross-document reasoning, and only kept when they cleared a 0.85 confidence threshold.

Graph Output — 8,285 nodes · 24,421 edges · 44 communities

The resulting graph was then analyzed for community structure using modularity-based detection. 44 distinct communities emerged — each representing a cluster of entities that are more densely connected to each other than to the rest of the graph. Community names are derived from the most distinctive nodes in each cluster.

The Raw Graph — Beautiful, Barely Readable

This is the full output: 8,285 entities and 24,421 relationships rendered as a force-directed graph. Each dot is a node, each line a relationship, each color a community.

Full knowledge graph — 8,285 nodes, 24,421 edges

44 color-coded communities are visible as rough clusters, but the density of the core makes individual relationships impossible to read. This is the starting point for simplification.

Distilling the Ecosystem

Rather than showing every node, the strategy graph surfaces the architecture underneath: six distinct zones that Google Cloud organized its Agentic AI story around at Next '26.

The simplification works by promoting only the highest-degree hub nodes within each detected community and collapsing everything else around them. Intra-layer connections become solid lines within each bubble; cross-layer connections become dashed lines between zones. The result is a map you can actually read and reason about.

Conceptual strategy graph — six ecosystem zones

Six zones: Protocol/Tool Fabric (MCP, ADK, A2A Protocol) · Agent Control Plane (operational + security) · Governed Data Substrate (BigQuery et al.) · Agent Platform (Vertex AI, Gemini) · Runtime/Infra (GKE, Cloud Run) · Enterprise Surface (Google Workspace, NotebookLM)

The picture is pretty simple: Google appears to be assembling a full enterprise agent stack. At the bottom are the protocols and tools that let agents reach other systems. Then come the control layers that secure them, the data systems that ground them, the model and agent platform that powers them, the infrastructure that runs them, and the products where people actually use them.

The graph makes that structure visible. A long list of launches starts to read like one architecture for enterprise agents.

What Can You Ask This Graph?

A knowledge graph built from 5 million words of conference material isn't just a visualization — it's a queryable model of an entire industry conversation.

The kinds of questions it can help answer span from high-level strategy down to very specific session recommendations. On mobile, tap any question to expand the answer.

Strategic orientation

What technologies is Google betting on most heavily this year?

By edge count: BigQuery (240), Gemini (211), Vertex AI (206), GKE (131). Together they form a clear stack — data → model → AI platform → runtime. Everything else in the conference attaches to one of these four.

Which products are newly central vs. established for years?

New but already high-degree: MCP (76 edges for a protocol announced this year), Model Armor (68), Gemini CLI (60), ADK. Long-established: BigQuery, GKE, Cloud Run — their high degree reflects years of ecosystem entrenchment, not just this conference.

Where is Google trying to own the full stack vs. partnering?

Google owns the AI layer tightly — Gemini, Vertex AI, GEAP, and ADK are all deeply integrated with each other and rarely reference external equivalents. Partnering happens at the edges: security (Palo Alto Networks, Mandiant, Wiz), enterprise apps (SAP, Salesforce), and hardware (NVIDIA).

Ecosystem & partners

Which companies appear most consistently alongside Google Cloud?

Anthropic (46 edges) and Palo Alto Networks (45) lead all non-Google companies by a wide margin. Then Mandiant (32), Salesforce (27), NVIDIA (26). Anthropic's presence is notable — it's a competitor in AI models, yet appears heavily as a reference point throughout the conference.

What independent software vendors cluster around GKE vs. Vertex AI?

GKE clusters with infrastructure and DevOps vendors — Datadog, Harness, Chronosphere, and Wiz appear frequently in its community. Vertex AI clusters with AI platform vendors and model providers — Anthropic, Hugging Face, LangChain, and Replit appear in its orbit.

Who are the surprising co-presenters across sessions?

BNY + JP Morgan Chase + Bridgewater on a single AI strategy session — three competing financial firms sharing a stage. Atlassian + Datadog + Harness openly discussing the future of developer experience despite competing for the same budget. MongoDB + Palo Alto Networks + Snowflake co-presenting on SaaS architecture. U.S. DOT + City of LA + FDA — three government agencies at a cloud conference, together.

Practitioner guidance

If I'm building agentic systems, which sessions are most interconnected?

By degree within the agentic cluster: BigQuery for agentic AI (58), AgentOps with BigQuery + ADK + MCP (57), Conversational Analytics agents + MCP (51), Elastic Agent Builder (48), What's new in Google Cloud's agent platform (45). These sessions share the most concepts with the rest of the graph — start here for maximum coverage.

What's the recommended learning path from data engineering to AI agents?

The graph traces a clear path: BigQuery → Conversational Analytics → ADK → GEAP. Data engineering sessions connect to analytics agents, which connect to the ADK for building custom agents, which connect to GEAP for production deployment. Following the edge chain gives you a natural progression.

Which talks reference the same architectural patterns?

Sessions referencing MCP + ADK together form a tight cluster — they share architecture diagrams showing agents, tool servers, and orchestration layers. Sessions on GEAP evaluations cluster with sessions on agent observability and quality flywheels, reflecting a shared pattern around continuous agent improvement.

Competitive intelligence

How does Google frame its AI products relative to the broader market?

Google consistently frames its stack as integrated vs. the market's fragmented point solutions. Vertex AI, Gemini, GEAP, and ADK are presented as a unified platform. Competitors like Anthropic are referenced as model providers that run on Google infrastructure — positioning Google as the platform, not just another model vendor.

Which product categories are most contested across the conference?

Agent security is the most contested zone — Palo Alto Networks, Wiz, Mandiant, and Google's own Model Armor and Cloud Armor all appear in overlapping sessions. Agent orchestration is second — ADK, Anthropic, LangChain, and Replit all compete for this layer. Data platforms see Snowflake, MongoDB, and BigQuery discussed in the same sessions.

What security concerns show up most frequently, and in which zones?

Prompt injection and OWASP Top 10 for agents are the most-cited emerging threats, concentrated in the Agent Control Plane zone. Zero Trust and Cloud IAM appear across the Security and Runtime zones. Traditional concerns like data exfiltration remain present in the Governed Data Substrate cluster.

Pattern & trend detection

What concepts consistently co-occur across unrelated sessions?

Vertex AI bridges 26 different communities — more than any other node. Agentic AI as a concept spans 23 communities with only 93 edges, punching well above its weight. BigQuery bridges 20 communities despite being a data warehouse — it shows up in agent sessions, security analytics, and streaming pipelines far outside its original scope.

Which topics bridge the most different communities in the graph?

Vertex AI (26 communities), Agentic AI concept (23), Gemini Enterprise (21), BigQuery (20), GKE (16), MCP (16). The key insight: bridging score and degree don't rank identically. Agentic AI has 93 edges but 23 bridges; BigQuery has 240 edges but only 20 bridges. Structural position reveals things raw popularity misses.

What was talked about far more than the official agenda suggested?

MCP appears across every zone of the graph despite having no dedicated track in the official agenda. Supply Chain & Logistics had only 5 official sessions but its concepts appear across retail, data, and agent sessions. Prompt injection and agent security surfaced across sessions nominally about entirely different topics.

Gap analysis

What topics are underrepresented relative to their market importance?

Mobile and Web (6 sessions), Games (7), and Education (8) are strikingly thin for a conference this size. CI/CD appears in only 2 sessions — surprisingly low given how central developer workflow is to the cloud story. These gaps reflect deliberate editorial choices about what Google wanted to foreground in 2026.

Which communities are isolated — barely connected to the rest of the graph?

11 communities consist of a single node with zero external edges — completely disconnected from the rest of the graph. These are entities that appeared in only one session and weren't referenced anywhere else. Community 32 (3 nodes, 0 external edges) is the smallest fully isolated cluster.

What did speakers avoid saying even when the topic was adjacent?

Cost does show up in the corpus, but it is less prominent than governance, security, architecture, and evaluation in the agent sessions. That contrast stood out while reviewing the transcripts and slide decks. The clearest cost signal came from Geotab: Daniel Lewis said their eval pipelines consume more tokens than their production traffic. OpenAI is nearly absent from the graph — referenced far less than Anthropic despite being the dominant market reference point. Sessions on enterprise AI almost universally avoided discussing failure rates or production incidents.

What Google Said About Graphs

The graph isn't just an outside analysis trick. Google repeatedly described agent-ready systems in graph-shaped terms: entities, relationships, connected context, and knowledge that agents can reason over.

In the opening keynote (session explorer), Google introduced Knowledge Catalog as a system that can read files, extract entities, map relationships, and learn business semantics. In What's new with data and AI governance: Building the catalog for AI (session explorer), that idea became even clearer: the goal is a connected context layer that agents can reason over directly.

This wasn't just abstract metadata talk. The source material also includes concrete product pushes around graph-native data systems. Spanner Graph appears in What's new in Spanner: enterprise-scale AI, search, graph, and analytics, where Google describes multimodel support, SQL PGQ, integrated graph algorithms, graphs on views, and UI-based graph modeling. BigQuery Graph appears in multiple decks as native property-graph support for relationship analytics, visualization, and graph-grounded reasoning on enterprise data.

That's what makes this feel new. For years, products like BigQuery and Spanner were mainly framed as places to store, query, and scale data. At Next '26, Google was also positioning them as systems for modeling relationships directly — not just rows and columns, but connected structures that agents and analytical workflows can traverse. This is Google reframing databases for the agent era.

And it goes beyond two product names. Other graph-shaped concepts also show up in the graph itself: BigQuery Property Graph, Property Graph, GQL / ISO GQL, Graph Analytics, Graph RAG, Knowledge Graph Grounding, Vertex AI Graph Neural Networks, and security-oriented nodes like Wiz Security Graph and SCC Security Graph.

That matters because it validates the method of this page. The conference graph here is external and independently built, but it is not conceptually alien to Google's own framing. If anything, the company's product story is shifting from isolated data systems toward connected context systems: not just documents, not just tables, and not just prompts, but systems that understand entities, relationships, provenance, and business meaning well enough for agents to act on them.

Where Geotab Landed

Three sessions at Google Cloud Next 2026 featured Geotab speakers — not as generic customer references, but as working operators paired with Google product leads.

Junaid Gill (Associate Vice President, Geotab) joined Greg Brosman (Senior Product Manager, Google Cloud) and John Murray (Group Product Manager, Google Cloud) on governing a secure agentic ecosystem.

Francois-Xavier Jeannet (Team Lead, Data & AI Governance, Geotab) joined Anit Patinker (Lead Product Manager, Google Cloud) and Shelley Hershkovitz (Product Manager, Google Cloud) on agent security at scale.

Daniel Lewis (Distinguished Data Scientist, Geotab) joined Dima Melnyk (Senior Product Manager, Google Cloud) and Alex Martin (Product Manager, Google Cloud) on the agent-quality flywheel.

Featured session video Engineer the agent-quality flywheel Daniel Lewis (Geotab), Dima Melnyk (Google Cloud), and Alex Martin (Google Cloud)

All sessions with Geotab speakers → On the conceptual map, those talks land in a tight zone around platform, governance, and evaluation. One Geotab example helps explain why: their internal hackathon generated 86 agent submissions, and 2 of those later made it into production.

That gap helps explain why Geotab mattered on stage. In earlier years, Geotab often appeared as the customer saying, in effect, BigQuery works and here's how we use it. This time the posture felt different. AI is still early, but Geotab has already been building agentic systems, and Google is bringing them on stage not merely as a reference customer, but as a partner with production lessons that are helping shape the products now being pushed more broadly.

"Anyone at Geotab should be able to build an agent."

"The models are ready. The people are motivated. What's missing is the platform."

Geotab slide from Govern your agents: Architecting a secure agentic ecosystem

"Out of the 86 agents submitted, how many made it to production? 2."

That may be the cleanest summary of the real enterprise problem: generating agent ideas is easy; getting them into production is the work.

Geotab slide presented by Junaid Gill in Govern your agents: Architecting a secure agentic ecosystem

"All of our evals consume way more tokens than our production system does from our customers."

Maybe that's what it takes: at scale, the hard part is not getting agents to demo, but building the evaluation machinery that keeps them trustworthy in production.

Daniel Lewis, Geotab, in Engineer the agent-quality flywheel

Geotab sessions mapped onto the conceptual graph

Orange dots mark each Geotab session. The legend bottom-left confirms: orange = Geotab session.

Two of the three Geotab sessions landed inside the Agent Control Plane: one on governing agents and securing the agentic ecosystem with GEAP, and one on agent security at scale through the OWASP Top 10 for agents. The third landed in the Agent Platform zone, focused on the agent-quality flywheel and GEAP evaluations.

That cluster makes sense. Geotab was not up there to show a flashy demo. It was there to talk about the hard part: getting agents into production, keeping them secure, and building the evaluation loop that makes them reliable. That is also why the control-plane layer matters so much on this map. The hackathon result was not a failure case. It was an example of how an exploratory funnel narrows into a small number of production-worthy agents, and why governance, security, evaluation, and operational control matter so much once a team tries to ship them.

The Two Big Takes

First: graphing the conference reveals structure you cannot reliably see by reading sessions one by one. Once 1,146 sessions, 444 slide decks, and 323 transcripts are turned into nodes and edges, repeated patterns become visible: which products sit at the center, which protocols connect otherwise separate systems, which themes cluster together, and which ideas show up across product, security, data, infrastructure, and partner talks.

That is where many of the strongest insights in this story come from. MCP starts to look like connective tissue. BigQuery starts to look like a governed substrate for AI systems. Cloud Run, GKE, security controls, evaluation loops, and enterprise data products start to read as parts of the same architecture.

Second: building the graph was fairly straightforward. Cleaning up the conference corpus was the hard part. The source material was messy: decks, transcripts, session pages, repeated product names, overlapping abstractions, and different levels of specificity. graphify made the workflow feel direct: feed in the corpus, extract grounded entities and relationships, and simplify the result into a graph that a human can actually read.

The graph also shows what Google seems to believe will matter in production. The center of gravity is a working stack: agents connected through MCP and A2A, grounded in enterprise data, deployed on managed runtimes, evaluated continuously, and wrapped in governance, identity, and security controls.

If you want to test that thesis yourself, browse the session explorer, or open the full interactive map and follow the edges yourself. The graph, the map, and the session index feel like the right publishable artifact here, without redistributing downloaded slides or transcripts.

One encouraging part of this project is how manageable it was. It took a few steps, and Graphify made the workflow pretty approachable.

A future attendee could use a graph like this during the next conference to decide which sessions to prioritize, spot the hubs everyone is converging around, and follow emerging ideas across tracks in real time. It works well as a retrospective, and it would be even more useful live.

Read and discuss this knowledge-graph story on LinkedIn →