Building Agentic Apps with Gemma 4: From Zero to Autonomous

> 4 levels of agentic development with Gemma 4: tool use, structured output, multi-step workflows, and multi-agent systems with production observability.

Audio version coming soon

Verified by Essa Mamdani

Building Agentic Apps with Gemma 4: From Zero to Autonomous

Published: May 2026
Author: Essa Mamdani
Category: AI Engineering / Agents
Read Time: 14 minutes

The Agentic Shift

2025 was the year of RAG. 2026 is the year of agents.

Gemma 4 doesn't just generate text—it generates actions. Native function calling, structured outputs, and multi-step reasoning make it the ideal foundation for autonomous systems that actually work in production.

This article shows you how to build reliable agents with Gemma 4, from simple tool use to fully autonomous workflows.

Level 1: Basic Tool Use (The Foundation)

Native Function Calling

Unlike older models that needed prompt hacks for tool use, Gemma 4 understands tool schemas natively:

python
1from transformers import AutoModelForCausalLM, AutoTokenizer
2import json
3
4model = AutoModelForCausalLM.from_pretrained("google/gemma-4-27b")
5tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-27b")
6
7# Define your tools
8TOOLS = [
9    {
10        "type": "function",
11        "function": {
12            "name": "get_weather",
13            "description": "Get current weather for a location",
14            "parameters": {
15                "type": "object",
16                "properties": {
17                    "location": {"type": "string", "description": "City name"},
18                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
19                },
20                "required": ["location"]
21            }
22        }
23    },
24    {
25        "type": "function", 
26        "function": {
27            "name": "send_email",
28            "description": "Send an email to a recipient",
29            "parameters": {
30                "type": "object",
31                "properties": {
32                    "to": {"type": "string"},
33                    "subject": {"type": "string"},
34                    "body": {"type": "string"}
35                },
36                "required": ["to", "subject", "body"]
37            }
38        }
39    }
40]
41
42# Gemma 4 automatically generates tool calls when needed
43messages = [
44    {"role": "user", "content": "What's the weather in Tokyo? Also email my team that I'm working from home."}
45]
46
47response = model.generate(
48    tokenizer.apply_chat_template(messages, tools=TOOLS, return_tensors="pt"),
49    max_new_tokens=512
50)
51
52# Output:
53# [tool_calls: [{"name": "get_weather", "arguments": {"location": "Tokyo", "unit": "celsius"}},
54#               {"name": "send_email", "arguments": {"to": "team@company.com", "subject": "WFH today", "body": "Working remotely due to weather."}}]]

Tool Execution Loop

python
1class GemmaAgent:
2    def __init__(self, model, tools):
3        self.model = model
4        self.tools = {t["function"]["name"]: t["function"]["implementation"] for t in tools}
5        
6    def run(self, user_input: str, max_iterations: int = 10) -> str:
7        messages = [{"role": "user", "content": user_input}]
8        
9        for i in range(max_iterations):
10            # Generate response
11            response = self.model.generate(messages)
12            
13            # Check for tool calls
14            if "tool_calls" in response:
15                messages.append({"role": "assistant", "content": response})
16                
17                # Execute tools
18                for tool_call in response["tool_calls"]:
19                    result = self.tools[tool_call["name"]](**tool_call["arguments"])
20                    messages.append({
21                        "role": "tool",
22                        "name": tool_call["name"],
23                        "content": json.dumps(result)
24                    })
25            else:
26                return response["content"]
27        
28        return "Max iterations reached"

Level 2: Structured Output for Reliable Pipelines

JSON Schema Enforcement

Gemma 4 can be constrained to specific output formats, eliminating parsing failures:

python
1from pydantic import BaseModel
2from typing import List, Optional
3
4class TaskBreakdown(BaseModel):
5    steps: List[str]
6    estimated_duration_minutes: int
7    dependencies: List[str]
8    risks: List[str]
9
10class CodeReview(BaseModel):
11    issues: List[dict]  # [{"severity": "high", "line": 42, "description": "..."}]
12    suggestions: List[str]
13    security_concerns: Optional[List[str]]
14    overall_score: int  # 1-10
15
16# Force Gemma 4 to output valid JSON matching schema
17response = model.generate(
18    "Review this Python function for security issues...",
19    response_format={"type": "json_object", "schema": CodeReview.schema()}
20)
21
22# Guaranteed valid JSON, guaranteed correct types
23review = CodeReview.parse_raw(response)

State Machines with Structured Output

python
1class AgentState(BaseModel):
2    current_phase: str  # "research", "planning", "execution", "review"
3    completed_tasks: List[str]
4    pending_tasks: List[str]
5    context: dict
6    next_action: str
7
8# Agent that maintains explicit state
9class StateMachineAgent:
10    def execute(self, goal: str):
11        state = AgentState(
12            current_phase="research",
13            completed_tasks=[],
14            pending_tasks=[goal],
15            context={},
16            next_action="gather_requirements"
17        )
18        
19        while state.current_phase != "complete":
20            # Gemma 4 decides next state transition
21            state_json = model.generate(
22                f"Current state: {state.json()}\nDetermine next state and action",
23                response_format={"type": "json_object", "schema": AgentState.schema()}
24            )
25            state = AgentState.parse_raw(state_json)
26            
27            # Execute the action
28            self.tools[state.next_action](state.context)
29            
30            if len(state.pending_tasks) == 0:
31                state.current_phase = "complete"

Level 3: Multi-Step Autonomous Workflows

The ReAct Pattern (Reasoning + Acting)

python
1class ReActAgent:
2    """Reasoning and Acting agent with Gemma 4"""
3    
4    SYSTEM_PROMPT = """You are an autonomous agent. Solve tasks by following this loop:
5    1. THINK: Analyze the current state and plan your next action
6    2. ACT: Choose exactly one tool to use
7    3. OBSERVE: Process the result and decide if the task is complete
8    
9    Format your response as:
10    Thought: [your reasoning]
11    Action: [tool_name]([params])
12    """
13    
14    def solve(self, task: str, tools: dict):
15        history = [f"Task: {task}"]
16        
17        for step in range(20):  # Safety limit
18            prompt = self.SYSTEM_PROMPT + "\n\n" + "\n".join(history)
19            
20            response = self.model.generate(prompt)
21            
22            # Parse Thought and Action
23            thought = self._extract(response, "Thought:")
24            action = self._extract(response, "Action:")
25            
26            history.append(f"Thought: {thought}")
27            
28            if "finish" in action.lower():
29                return self._extract(response, "Final Answer:")
30            
31            # Execute tool
32            tool_name, params = self._parse_action(action)
33            result = tools[tool_name](**params)
34            
35            history.append(f"Action: {action}")
36            history.append(f"Observation: {result}")
37        
38        return "Maximum steps reached"

Agent with Self-Correction

python
1class SelfCorrectingAgent:
2    def execute_with_retry(self, task: str, max_retries: int = 3):
3        for attempt in range(max_retries):
4            try:
5                result = self.execute(task)
6                
7                # Self-evaluation
8                evaluation = self.model.generate(
9                    f"Task: {task}\nResult: {result}\n"
10                    f"Evaluate if this result is correct and complete."
11                    f"If not, explain what's wrong."
12                )
13                
14                if "correct" in evaluation.lower() and "complete" in evaluation.lower():
15                    return result
16                else:
17                    # Incorporate feedback and retry
18                    task += f"\nPrevious attempt had issues: {evaluation}"
19                    
20            except Exception as e:
21                if attempt == max_retries - 1:
22                    raise
23                task += f"\nPrevious error: {str(e)}"
24        
25        return result

Level 4: Multi-Agent Systems

Coordinator + Worker Pattern

python
1class CoordinatorAgent:
2    """Distributes tasks to specialized worker agents"""
3    
4    def __init__(self, workers: dict):
5        self.workers = workers
6        self.model = load_gemma_4_31b()
7    
8    def orchestrate(self, complex_task: str):
9        # Break down task into subtasks
10        plan = self.model.generate(
11            f"Break this complex task into subtasks: {complex_task}\n"
12            f"Available workers: {list(self.workers.keys())}",
13            response_format={"type": "json_object"}
14        )
15        
16        results = {}
17        for subtask in plan["subtasks"]:
18            worker = self.workers[subtask["worker"]]
19            
20            # Execute with context from previous subtasks
21            context = {k: results[k] for k in subtask.get("dependencies", [])}
22            result = worker.execute(subtask["description"], context)
23            results[subtask["id"]] = result
24        
25        # Synthesize final answer
26        return self.model.generate(
27            f"Synthesize these results into a coherent response: {results}"
28        )
29
30# Specialized workers
31class ResearchWorker:
32    """Searches and synthesizes information"""
33    def execute(self, query: str, context: dict):
34        search_results = self.tools["web_search"](query)
35        return self.model.generate(
36            f"Synthesize these search results: {search_results}"
37        )
38
39class CodeWorker:
40    """Writes and tests code"""
41    def execute(self, requirement: str, context: dict):
42        code = self.model.generate(
43            f"Write code for: {requirement}\nContext: {context}"
44        )
45        test_results = self.run_tests(code)
46        return {"code": code, "tests": test_results}

Debate Pattern (For High-Stakes Decisions)

python
1class DebateAgent:
2    """Multiple agents debate to reach consensus"""
3    
4    def debate(self, proposition: str, num_rounds: int = 3):
5        agents = [
6            {"name": "Advocate", "stance": "pro", "model": load_gemma_4_9b()},
7            {"name": "Skeptic", "stance": "con", "model": load_gemma_4_9b()},
8            {"name": "Synthesizer", "stance": "neutral", "model": load_gemma_4_27b()}
9        ]
10        
11        debate_log = []
12        
13        for round in range(num_rounds):
14            for agent in agents[:2]:  # Advocate and Skeptic
15                response = agent["model"].generate(
16                    f"You are {agent['name']}. Debate this proposition: {proposition}\n"
17                    f"Previous arguments: {debate_log}"
18                )
19                debate_log.append(f"{agent['name']}: {response}")
20            
21            # Synthesizer evaluates
22            evaluation = agents[2]["model"].generate(
23                f"Evaluate these arguments and identify the strongest points:\n"
24                f"{debate_log}"
25            )
26            debate_log.append(f"Synthesizer: {evaluation}")
27        
28        # Final judgment
29        return agents[2]["model"].generate(
30            f"Based on this debate, provide a final reasoned judgment:\n"
31            f"{debate_log}"
32        )

Production Patterns

Observability for Agents

python
1from opentelemetry import trace
2from dataclasses import dataclass
3import time
4
5@dataclass
6class AgentTrace:
7    agent_id: str
8    prompt: str
9    response: str
10    tool_calls: list
11    latency_ms: float
12    token_count: int
13    model_version: str
14
15tracer = trace.get_tracer("gemma4.agent")
16
17class ObservableAgent:
18    def execute(self, task: str):
19        with tracer.start_as_current_span("agent.execution") as span:
20            start = time.time()
21            
22            # Log task
23            span.set_attribute("task", task)
24            
25            # Execute
26            response = self.model.generate(task)
27            latency = (time.time() - start) * 1000
28            
29            # Record metrics
30            trace_data = AgentTrace(
31                agent_id=self.id,
32                prompt=task,
33                response=response,
34                tool_calls=self.extract_tool_calls(response),
35                latency_ms=latency,
36                token_count=len(self.tokenizer.encode(response)),
37                model_version="gemma-4-27b"
38            )
39            
40            # Send to monitoring
41            self.monitoring.record(trace_data)
42            
43            # Alert on anomalies
44            if latency > 5000:  # 5 seconds
45                self.alerts.send(f"Agent {self.id} slow: {latency}ms")
46            
47            return response

Error Recovery Strategies

python
1class ResilientAgent:
2    def execute_with_resilience(self, task: str):
3        strategies = [
4            self._normal_execution,
5            self._retry_with_simplified_prompt,
6            self._retry_with_context_window_management,
7            self._fallback_to_smaller_model,
8            self._human_escalation
9        ]
10        
11        for strategy in strategies:
12            try:
13                result = strategy(task)
14                if self._is_valid(result):
15                    return result
16            except Exception as e:
17                self.logger.warning(f"Strategy failed: {e}")
18                continue
19        
20        raise AgentFailureException("All strategies exhausted")
21    
22    def _retry_with_context_window_management(self, task: str):
23        """Reduce context when hitting token limits"""
24        max_tokens = self.model.config.max_position_embeddings
25        
26        while len(self.tokenizer.encode(task)) > max_tokens * 0.8:
27            # Summarize older context
28            task = self.model.generate(
29                f"Summarize this conversation concisely: {task}"
30            )
31        
32        return self.model.generate(task)

Performance Optimizations

Parallel Tool Execution

python
1import asyncio
2
3class ParallelAgent:
4    async def execute_parallel_tools(self, tool_calls: list):
5        """Execute independent tools concurrently"""
6        
7        # Group by dependency
8        independent = [t for t in tool_calls if not t.get("depends_on")]
9        dependent = [t for t in tool_calls if t.get("depends_on")]
10        
11        # Execute independent tools in parallel
12        tasks = [
13            self.tools[t["name"]](**t["params"]) 
14            for t in independent
15        ]
16        results = await asyncio.gather(*tasks)
17        
18        # Execute dependent tools sequentially
19        for tool_call in dependent:
20            dep_results = {k: results[v] for k, v in tool_call["depends_on"].items()}
21            result = self.tools[tool_call["name"]](**tool_call["params"], **dep_results)
22            results.append(result)
23        
24        return results

Caching Strategies

python
1from functools import lru_cache
2import hashlib
3
4class CachingAgent:
5    def __init__(self):
6        self.response_cache = {}
7        self.embedding_cache = {}
8    
9    def cached_generate(self, prompt: str, **kwargs):
10        """Cache responses for identical prompts"""
11        cache_key = hashlib.md5(f"{prompt}{str(kwargs)}".encode()).hexdigest()
12        
13        if cache_key in self.response_cache:
14            return self.response_cache[cache_key]
15        
16        response = self.model.generate(prompt, **kwargs)
17        self.response_cache[cache_key] = response
18        return response
19    
20    def semantic_cache(self, prompt: str, threshold: float = 0.95):
21        """Cache based on semantic similarity"""
22        prompt_embedding = self.get_embedding(prompt)
23        
24        for cached_prompt, cached_response in self.response_cache.items():
25            cached_embedding = self.embedding_cache.get(cached_prompt)
26            if cached_embedding:
27                similarity = cosine_similarity(prompt_embedding, cached_embedding)
28                if similarity > threshold:
29                    return cached_response
30        
31        response = self.model.generate(prompt)
32        self.response_cache[prompt] = response
33        self.embedding_cache[prompt] = prompt_embedding
34        return response

Real-World Example: Autonomous DevOps Agent

python
1class DevOpsAgent:
2    """Monitors infrastructure, detects issues, and fixes them autonomously"""
3    
4    def __init__(self):
5        self.model = load_gemma_4_27b()
6        self.tools = {
7            "get_metrics": prometheus_client.query,
8            "get_logs": elasticsearch_client.search,
9            "restart_service": kubectl.restart,
10            "scale_deployment": kubectl.scale,
11            "send_alert": pagerduty.trigger
12        }
13    
14    def run(self):
15        while True:
16            # 1. Collect observability data
17            metrics = self.tools["get_metrics"]("uptime, latency, error_rate")
18            logs = self.tools["get_logs"]("level=ERROR", limit=100)
19            
20            # 2. Analyze with Gemma 4
21            analysis = self.model.generate(
22                f"Analyze these metrics and logs for anomalies:\n"
23                f"Metrics: {metrics}\nLogs: {logs}",
24                response_format={"type": "json_object", "schema": Analysis.schema()}
25            )
26            
27            # 3. Decide and act
28            if analysis["severity"] == "critical":
29                for action in analysis["recommended_actions"]:
30                    self.tools[action["tool"]](**action["params"])
31                    
32                self.tools["send_alert"](
33                    summary=f"Auto-remediated: {analysis['issue']}",
34                    details=analysis
35                )
36            
37            time.sleep(60)  # 1-minute monitoring loop

The Bottom Line

Gemma 4's native agentic capabilities eliminate the "glue code" that made previous agent systems brittle:

No prompt engineering for tool use—schemas are first-class citizens
Reliable structured output—JSON that actually validates
Long context for state management—256K tokens for complex workflows
Fast enough for real-time agents—27B serves at 50+ tokens/sec

The future of AI isn't chatbots. It's autonomous systems that observe, reason, and act. Gemma 4 is the first open model that makes this practical at production scale.

Essa Mamdani is the creator of AutoBlogging.Pro and builds agentic systems that actually ship.

Follow: essa.mamdani.com | GitHub: @essamamdani

#AI#Gemma 4#Agents#Autonomous