SuperGemma4-26B-Uncensored: The Local LLM That Fixed Google's Gemma 4
> A deep technical dive into SuperGemma4-26B-Uncensored by Jiunsong. How it fixes Gemma 4's tokenizer bugs, beats stock benchmarks, and became the best local agent model for Apple Silicon.
In the crowded landscape of local Large Language Models, most releases promise one of three things: speed, capability, or uncensored behavior. Rarely do you find a model that delivers all three without falling apart. Enter SuperGemma4-26B-Uncensored, a community fine-tune by the Korean developer Jiunsong (@songjunkr) that has quietly become one of the most impressive local AI releases of April 2026.
Built on top of Google's Gemma 4 26B-A4B Mixture-of-Experts (MoE) architecture, SuperGemma4 does not just remove censorship filters. It fundamentally improves the model's practical capabilities, fixes critical tokenizer and tool-call bugs present in the stock release, and optimizes it specifically for Apple Silicon (MLX) and cross-platform GGUF inference.
Here is the complete deep dive into what makes SuperGemma4 special, who built it, and why it matters for the future of local AI.
The Base: Google Gemma 4 26B-A4B
Before understanding SuperGemma4, you need to understand the foundation. Google's Gemma 4 26B-A4B is a Mixture-of-Experts (MoE) model released in April 2026 under the Apache 2.0 license. Despite having 25.2 billion total parameters, it only activates approximately 3.8 billion parameters per token during inference. This is what "A4B" stands for: ~4 billion active parameters.
MoE Architecture Breakdown
- Total Experts: 128 (plus one shared expert)
- Active Experts per Token: 8
- Context Window: Up to 256K tokens
- License: Apache 2.0
This sparse architecture is the reason Gemma 4 26B generates so much interest among local AI enthusiasts. It delivers near-31B quality at a fraction of the compute cost. However, the stock instruction-tuned version (google/gemma-4-26B-A4B-it) comes with aggressive safety filters, occasional tokenizer jank, and a chat template that sometimes forces general prompts into coding or tool-call behavior.
Who Is Jiunsong?
Jiunsong (real name: Jun Song, handle @songjunkr) is a Korean AI developer and fine-tuner who has rapidly gained a cult following in the local LLM community. While many uncensored models are produced by anonymous teams or automated pipelines, Jiunsong takes a craftsman-like approach. He manually verifies each release, runs local benchmarks, and iterates based on community feedback.
His SuperGemma4 line is not a one-off experiment. It is the result of multiple iterations, with the "Fast v2" release representing a mature, production-ready build.
What Makes SuperGemma4 Different?
Most "uncensored" models simply apply abliteration techniques to strip refusal patterns. The result is often a model that says "yes" to everything but collapses into broken outputs, loses coding ability, or becomes unstable on tool-use prompts. SuperGemma4 takes a different path.
1. Zero Refusals Without Capability Loss
Jiunsong claims 0/100 refusals on standard refusal tests. But unlike typical uncensored models, SuperGemma4 does not sacrifice practical capability. In fact, it exceeds the stock model on nearly every benchmark category.
2. Fixes the Stock Gemma 4 Tool-Call & Tokenizer Bugs
One of the most annoying issues with the base gemma-4-26B-A4B-it is its chat template. Simple questions like "What is the weather today?" can sometimes be routed into coding mode or tool-call behavior. SuperGemma4 embeds a neutral chat template directly into the GGUF and MLX files, eliminating this prompt-routing bug entirely.
3. The "Fast" Line: Speed + Quality
The "Fast" suffix is not marketing fluff. Jiunsong derived SuperGemma4 from a verified "Fast" weight line that outperforms the original local baseline in both throughput and quality.
Quick Bench Overall Score:
- Stock Gemma 4 26B IT (4-bit): 91.4
- SuperGemma4 Fast: 95.8 (+4.4)
Average Generation Speed (MLX, same machine):
- Stock Gemma 4 26B IT: 42.5 tok/s
- SuperGemma4 Fast: 46.2 tok/s (+8.7%)
4. Category-by-Category Improvements
| Category | Stock Gemma 4 | SuperGemma4 Fast | Delta |
|---|---|---|---|
| Code | 92.3 | 98.6 | +6.3 |
| Browser Workflows | 87.5 | 89.6 | +2.1 |
| Logic | 86.9 | 95.2 | +8.3 |
| System Design | 97.8 | 98.9 | +1.1 |
| Korean | 90.7 | 95.0 | +4.3 |
These are not marginal gains. A +6.3 improvement in code generation and a +8.3 jump in logic reasoning put SuperGemma4 in a completely different league for local agent workloads.
Two Formats, One Identity
Jiunsong releases SuperGemma4 in two primary formats, both sharing the same underlying weights and behavior:
1. MLX 4-bit (Apple Silicon Optimized)
- Model:
Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 - Size: ~13GB
- Best For: MacBook Pro, Mac Studio, any Apple Silicon device
- Launch Command:
bash1mlx_lm.server \ 2 --model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \ 3 --port 8080
2. GGUF Q4_K_M (Cross-Platform)
- Model:
Jiunsong/supergemma4-26b-uncensored-gguf-v2 - File:
supergemma4-26b-uncensored-fast-v2-Q4_K_M.gguf - Best For: llama.cpp, Ollama, Windows/Linux local inference
- Verified On: Apple M4 Max with llama.cpp
Real-World Performance (GGUF on M4 Max):
- General Korean prompt processing: 222.0 tok/s
- Code prompt processing: 704.9 tok/s
- Generation speed: 89.4 tok/s
The "Uncensored" Debate: Done Right
The term "uncensored" often carries negative connotations in the AI community, and for good reason. Many uncensored models are sloppily produced, losing coherence, safety context, or alignment in the process. SuperGemma4 is a case study in how to do it right.
Jiunsong's approach preserves the model's ability to refuse genuinely harmful requests through natural reasoning, rather than through hardcoded corporate policy blocks. The result is an AI assistant that feels less like a customer service bot and more like a competent, opinionated collaborator.
As one community member put it: "A simple guy at home can create a better model than Mythos?"
Community Reception & Cultural Impact
SuperGemma4 has become a breakout hit on HuggingFace, consistently ranking in the top trending models. It has sparked discussions across Reddit (r/LocalLLaMA, r/LocalLLM), Hacker News, and Korean AI communities.
Key community observations:
- "Gemma 4 26b is the perfect all-around local model." — r/LocalLLaMA user
- "Tested gemma4:26b vs qwen3:30b on my local RTX 4090 for real document workflow. Gemma won." — r/LocalLLM user
- "SuperGemma4 just casually destroyed every model on our leaderboard except Opus 4.6 and GPT-5.2." — Community benchmarker
The model's popularity also highlights a broader shift in the AI landscape: the best models are no longer coming exclusively from trillion-dollar labs. Individual developers with deep expertise, consumer GPUs, and rigorous evaluation pipelines can now produce competitive, sometimes superior, alternatives.
How to Run It Locally
MLX (macOS)
bash1mlx_lm.generate \ 2 --model Jiunsong/supergemma4-26b-uncensored-mlx-4bit-v2 \ 3 --prompt "Write a Python function that returns prime numbers up to n." \ 4 --max-tokens 512
GGUF (llama.cpp / Ollama)
Download the Q4_K_M file from HuggingFace and load it into your preferred llama.cpp frontend. The embedded neutral template auto-detects correctly—do not manually pass a --chat-template path, as this can corrupt responses.
Conclusion: The New Standard for Local Agents?
SuperGemma4-26B-Uncensored is more than a censorship removal. It is a comprehensive upgrade to Google's already impressive Gemma 4 26B-A4B base. By fixing tokenizer bugs, improving tool-call behavior, boosting speed, and raising benchmark scores across the board, Jiunsong has created what many in the community consider the definitive local build of Gemma 4.
If you are building local AI agents, running offline coding assistants, or simply want the best open-weight model for Apple Silicon without corporate safety filters getting in the way, SuperGemma4 is currently the model to beat.
References: