Building Agentic Apps with Gemma 4: From Zero to Autonomous
> 4 levels of agentic development with Gemma 4: tool use, structured output, multi-step workflows, and multi-agent systems with production observability.
Building Agentic Apps with Gemma 4: From Zero to Autonomous
Published: May 2026
Author: Essa Mamdani
Category: AI Engineering / Agents
Read Time: 14 minutes
The Agentic Shift
2025 was the year of RAG. 2026 is the year of agents.
Gemma 4 doesn't just generate text—it generates actions. Native function calling, structured outputs, and multi-step reasoning make it the ideal foundation for autonomous systems that actually work in production.
This article shows you how to build reliable agents with Gemma 4, from simple tool use to fully autonomous workflows.
Level 1: Basic Tool Use (The Foundation)
Native Function Calling
Unlike older models that needed prompt hacks for tool use, Gemma 4 understands tool schemas natively:
python1from transformers import AutoModelForCausalLM, AutoTokenizer 2import json 3 4model = AutoModelForCausalLM.from_pretrained("google/gemma-4-27b") 5tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-27b") 6 7# Define your tools 8TOOLS = [ 9 { 10 "type": "function", 11 "function": { 12 "name": "get_weather", 13 "description": "Get current weather for a location", 14 "parameters": { 15 "type": "object", 16 "properties": { 17 "location": {"type": "string", "description": "City name"}, 18 "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]} 19 }, 20 "required": ["location"] 21 } 22 } 23 }, 24 { 25 "type": "function", 26 "function": { 27 "name": "send_email", 28 "description": "Send an email to a recipient", 29 "parameters": { 30 "type": "object", 31 "properties": { 32 "to": {"type": "string"}, 33 "subject": {"type": "string"}, 34 "body": {"type": "string"} 35 }, 36 "required": ["to", "subject", "body"] 37 } 38 } 39 } 40] 41 42# Gemma 4 automatically generates tool calls when needed 43messages = [ 44 {"role": "user", "content": "What's the weather in Tokyo? Also email my team that I'm working from home."} 45] 46 47response = model.generate( 48 tokenizer.apply_chat_template(messages, tools=TOOLS, return_tensors="pt"), 49 max_new_tokens=512 50) 51 52# Output: 53# [tool_calls: [{"name": "get_weather", "arguments": {"location": "Tokyo", "unit": "celsius"}}, 54# {"name": "send_email", "arguments": {"to": "team@company.com", "subject": "WFH today", "body": "Working remotely due to weather."}}]]
Tool Execution Loop
python1class GemmaAgent: 2 def __init__(self, model, tools): 3 self.model = model 4 self.tools = {t["function"]["name"]: t["function"]["implementation"] for t in tools} 5 6 def run(self, user_input: str, max_iterations: int = 10) -> str: 7 messages = [{"role": "user", "content": user_input}] 8 9 for i in range(max_iterations): 10 # Generate response 11 response = self.model.generate(messages) 12 13 # Check for tool calls 14 if "tool_calls" in response: 15 messages.append({"role": "assistant", "content": response}) 16 17 # Execute tools 18 for tool_call in response["tool_calls"]: 19 result = self.tools[tool_call["name"]](**tool_call["arguments"]) 20 messages.append({ 21 "role": "tool", 22 "name": tool_call["name"], 23 "content": json.dumps(result) 24 }) 25 else: 26 return response["content"] 27 28 return "Max iterations reached"
Level 2: Structured Output for Reliable Pipelines
JSON Schema Enforcement
Gemma 4 can be constrained to specific output formats, eliminating parsing failures:
python1from pydantic import BaseModel 2from typing import List, Optional 3 4class TaskBreakdown(BaseModel): 5 steps: List[str] 6 estimated_duration_minutes: int 7 dependencies: List[str] 8 risks: List[str] 9 10class CodeReview(BaseModel): 11 issues: List[dict] # [{"severity": "high", "line": 42, "description": "..."}] 12 suggestions: List[str] 13 security_concerns: Optional[List[str]] 14 overall_score: int # 1-10 15 16# Force Gemma 4 to output valid JSON matching schema 17response = model.generate( 18 "Review this Python function for security issues...", 19 response_format={"type": "json_object", "schema": CodeReview.schema()} 20) 21 22# Guaranteed valid JSON, guaranteed correct types 23review = CodeReview.parse_raw(response)
State Machines with Structured Output
python1class AgentState(BaseModel): 2 current_phase: str # "research", "planning", "execution", "review" 3 completed_tasks: List[str] 4 pending_tasks: List[str] 5 context: dict 6 next_action: str 7 8# Agent that maintains explicit state 9class StateMachineAgent: 10 def execute(self, goal: str): 11 state = AgentState( 12 current_phase="research", 13 completed_tasks=[], 14 pending_tasks=[goal], 15 context={}, 16 next_action="gather_requirements" 17 ) 18 19 while state.current_phase != "complete": 20 # Gemma 4 decides next state transition 21 state_json = model.generate( 22 f"Current state: {state.json()}\nDetermine next state and action", 23 response_format={"type": "json_object", "schema": AgentState.schema()} 24 ) 25 state = AgentState.parse_raw(state_json) 26 27 # Execute the action 28 self.tools[state.next_action](state.context) 29 30 if len(state.pending_tasks) == 0: 31 state.current_phase = "complete"
Level 3: Multi-Step Autonomous Workflows
The ReAct Pattern (Reasoning + Acting)
python1class ReActAgent: 2 """Reasoning and Acting agent with Gemma 4""" 3 4 SYSTEM_PROMPT = """You are an autonomous agent. Solve tasks by following this loop: 5 1. THINK: Analyze the current state and plan your next action 6 2. ACT: Choose exactly one tool to use 7 3. OBSERVE: Process the result and decide if the task is complete 8 9 Format your response as: 10 Thought: [your reasoning] 11 Action: [tool_name]([params]) 12 """ 13 14 def solve(self, task: str, tools: dict): 15 history = [f"Task: {task}"] 16 17 for step in range(20): # Safety limit 18 prompt = self.SYSTEM_PROMPT + "\n\n" + "\n".join(history) 19 20 response = self.model.generate(prompt) 21 22 # Parse Thought and Action 23 thought = self._extract(response, "Thought:") 24 action = self._extract(response, "Action:") 25 26 history.append(f"Thought: {thought}") 27 28 if "finish" in action.lower(): 29 return self._extract(response, "Final Answer:") 30 31 # Execute tool 32 tool_name, params = self._parse_action(action) 33 result = tools[tool_name](**params) 34 35 history.append(f"Action: {action}") 36 history.append(f"Observation: {result}") 37 38 return "Maximum steps reached"
Agent with Self-Correction
python1class SelfCorrectingAgent: 2 def execute_with_retry(self, task: str, max_retries: int = 3): 3 for attempt in range(max_retries): 4 try: 5 result = self.execute(task) 6 7 # Self-evaluation 8 evaluation = self.model.generate( 9 f"Task: {task}\nResult: {result}\n" 10 f"Evaluate if this result is correct and complete." 11 f"If not, explain what's wrong." 12 ) 13 14 if "correct" in evaluation.lower() and "complete" in evaluation.lower(): 15 return result 16 else: 17 # Incorporate feedback and retry 18 task += f"\nPrevious attempt had issues: {evaluation}" 19 20 except Exception as e: 21 if attempt == max_retries - 1: 22 raise 23 task += f"\nPrevious error: {str(e)}" 24 25 return result
Level 4: Multi-Agent Systems
Coordinator + Worker Pattern
python1class CoordinatorAgent: 2 """Distributes tasks to specialized worker agents""" 3 4 def __init__(self, workers: dict): 5 self.workers = workers 6 self.model = load_gemma_4_31b() 7 8 def orchestrate(self, complex_task: str): 9 # Break down task into subtasks 10 plan = self.model.generate( 11 f"Break this complex task into subtasks: {complex_task}\n" 12 f"Available workers: {list(self.workers.keys())}", 13 response_format={"type": "json_object"} 14 ) 15 16 results = {} 17 for subtask in plan["subtasks"]: 18 worker = self.workers[subtask["worker"]] 19 20 # Execute with context from previous subtasks 21 context = {k: results[k] for k in subtask.get("dependencies", [])} 22 result = worker.execute(subtask["description"], context) 23 results[subtask["id"]] = result 24 25 # Synthesize final answer 26 return self.model.generate( 27 f"Synthesize these results into a coherent response: {results}" 28 ) 29 30# Specialized workers 31class ResearchWorker: 32 """Searches and synthesizes information""" 33 def execute(self, query: str, context: dict): 34 search_results = self.tools["web_search"](query) 35 return self.model.generate( 36 f"Synthesize these search results: {search_results}" 37 ) 38 39class CodeWorker: 40 """Writes and tests code""" 41 def execute(self, requirement: str, context: dict): 42 code = self.model.generate( 43 f"Write code for: {requirement}\nContext: {context}" 44 ) 45 test_results = self.run_tests(code) 46 return {"code": code, "tests": test_results}
Debate Pattern (For High-Stakes Decisions)
python1class DebateAgent: 2 """Multiple agents debate to reach consensus""" 3 4 def debate(self, proposition: str, num_rounds: int = 3): 5 agents = [ 6 {"name": "Advocate", "stance": "pro", "model": load_gemma_4_9b()}, 7 {"name": "Skeptic", "stance": "con", "model": load_gemma_4_9b()}, 8 {"name": "Synthesizer", "stance": "neutral", "model": load_gemma_4_27b()} 9 ] 10 11 debate_log = [] 12 13 for round in range(num_rounds): 14 for agent in agents[:2]: # Advocate and Skeptic 15 response = agent["model"].generate( 16 f"You are {agent['name']}. Debate this proposition: {proposition}\n" 17 f"Previous arguments: {debate_log}" 18 ) 19 debate_log.append(f"{agent['name']}: {response}") 20 21 # Synthesizer evaluates 22 evaluation = agents[2]["model"].generate( 23 f"Evaluate these arguments and identify the strongest points:\n" 24 f"{debate_log}" 25 ) 26 debate_log.append(f"Synthesizer: {evaluation}") 27 28 # Final judgment 29 return agents[2]["model"].generate( 30 f"Based on this debate, provide a final reasoned judgment:\n" 31 f"{debate_log}" 32 )
Production Patterns
Observability for Agents
python1from opentelemetry import trace 2from dataclasses import dataclass 3import time 4 5@dataclass 6class AgentTrace: 7 agent_id: str 8 prompt: str 9 response: str 10 tool_calls: list 11 latency_ms: float 12 token_count: int 13 model_version: str 14 15tracer = trace.get_tracer("gemma4.agent") 16 17class ObservableAgent: 18 def execute(self, task: str): 19 with tracer.start_as_current_span("agent.execution") as span: 20 start = time.time() 21 22 # Log task 23 span.set_attribute("task", task) 24 25 # Execute 26 response = self.model.generate(task) 27 latency = (time.time() - start) * 1000 28 29 # Record metrics 30 trace_data = AgentTrace( 31 agent_id=self.id, 32 prompt=task, 33 response=response, 34 tool_calls=self.extract_tool_calls(response), 35 latency_ms=latency, 36 token_count=len(self.tokenizer.encode(response)), 37 model_version="gemma-4-27b" 38 ) 39 40 # Send to monitoring 41 self.monitoring.record(trace_data) 42 43 # Alert on anomalies 44 if latency > 5000: # 5 seconds 45 self.alerts.send(f"Agent {self.id} slow: {latency}ms") 46 47 return response
Error Recovery Strategies
python1class ResilientAgent: 2 def execute_with_resilience(self, task: str): 3 strategies = [ 4 self._normal_execution, 5 self._retry_with_simplified_prompt, 6 self._retry_with_context_window_management, 7 self._fallback_to_smaller_model, 8 self._human_escalation 9 ] 10 11 for strategy in strategies: 12 try: 13 result = strategy(task) 14 if self._is_valid(result): 15 return result 16 except Exception as e: 17 self.logger.warning(f"Strategy failed: {e}") 18 continue 19 20 raise AgentFailureException("All strategies exhausted") 21 22 def _retry_with_context_window_management(self, task: str): 23 """Reduce context when hitting token limits""" 24 max_tokens = self.model.config.max_position_embeddings 25 26 while len(self.tokenizer.encode(task)) > max_tokens * 0.8: 27 # Summarize older context 28 task = self.model.generate( 29 f"Summarize this conversation concisely: {task}" 30 ) 31 32 return self.model.generate(task)
Performance Optimizations
Parallel Tool Execution
python1import asyncio 2 3class ParallelAgent: 4 async def execute_parallel_tools(self, tool_calls: list): 5 """Execute independent tools concurrently""" 6 7 # Group by dependency 8 independent = [t for t in tool_calls if not t.get("depends_on")] 9 dependent = [t for t in tool_calls if t.get("depends_on")] 10 11 # Execute independent tools in parallel 12 tasks = [ 13 self.tools[t["name"]](**t["params"]) 14 for t in independent 15 ] 16 results = await asyncio.gather(*tasks) 17 18 # Execute dependent tools sequentially 19 for tool_call in dependent: 20 dep_results = {k: results[v] for k, v in tool_call["depends_on"].items()} 21 result = self.tools[tool_call["name"]](**tool_call["params"], **dep_results) 22 results.append(result) 23 24 return results
Caching Strategies
python1from functools import lru_cache 2import hashlib 3 4class CachingAgent: 5 def __init__(self): 6 self.response_cache = {} 7 self.embedding_cache = {} 8 9 def cached_generate(self, prompt: str, **kwargs): 10 """Cache responses for identical prompts""" 11 cache_key = hashlib.md5(f"{prompt}{str(kwargs)}".encode()).hexdigest() 12 13 if cache_key in self.response_cache: 14 return self.response_cache[cache_key] 15 16 response = self.model.generate(prompt, **kwargs) 17 self.response_cache[cache_key] = response 18 return response 19 20 def semantic_cache(self, prompt: str, threshold: float = 0.95): 21 """Cache based on semantic similarity""" 22 prompt_embedding = self.get_embedding(prompt) 23 24 for cached_prompt, cached_response in self.response_cache.items(): 25 cached_embedding = self.embedding_cache.get(cached_prompt) 26 if cached_embedding: 27 similarity = cosine_similarity(prompt_embedding, cached_embedding) 28 if similarity > threshold: 29 return cached_response 30 31 response = self.model.generate(prompt) 32 self.response_cache[prompt] = response 33 self.embedding_cache[prompt] = prompt_embedding 34 return response
Real-World Example: Autonomous DevOps Agent
python1class DevOpsAgent: 2 """Monitors infrastructure, detects issues, and fixes them autonomously""" 3 4 def __init__(self): 5 self.model = load_gemma_4_27b() 6 self.tools = { 7 "get_metrics": prometheus_client.query, 8 "get_logs": elasticsearch_client.search, 9 "restart_service": kubectl.restart, 10 "scale_deployment": kubectl.scale, 11 "send_alert": pagerduty.trigger 12 } 13 14 def run(self): 15 while True: 16 # 1. Collect observability data 17 metrics = self.tools["get_metrics"]("uptime, latency, error_rate") 18 logs = self.tools["get_logs"]("level=ERROR", limit=100) 19 20 # 2. Analyze with Gemma 4 21 analysis = self.model.generate( 22 f"Analyze these metrics and logs for anomalies:\n" 23 f"Metrics: {metrics}\nLogs: {logs}", 24 response_format={"type": "json_object", "schema": Analysis.schema()} 25 ) 26 27 # 3. Decide and act 28 if analysis["severity"] == "critical": 29 for action in analysis["recommended_actions"]: 30 self.tools[action["tool"]](**action["params"]) 31 32 self.tools["send_alert"]( 33 summary=f"Auto-remediated: {analysis['issue']}", 34 details=analysis 35 ) 36 37 time.sleep(60) # 1-minute monitoring loop
The Bottom Line
Gemma 4's native agentic capabilities eliminate the "glue code" that made previous agent systems brittle:
- No prompt engineering for tool use—schemas are first-class citizens
- Reliable structured output—JSON that actually validates
- Long context for state management—256K tokens for complex workflows
- Fast enough for real-time agents—27B serves at 50+ tokens/sec
The future of AI isn't chatbots. It's autonomous systems that observe, reason, and act. Gemma 4 is the first open model that makes this practical at production scale.
Essa Mamdani is the creator of AutoBlogging.Pro and builds agentic systems that actually ship.
Follow: essa.mamdani.com | GitHub: @essamamdani