No API Needed, Generate Images on Mobile and Computer – 1-bit Diffusion Model Shrinks 7.75GB to 0.93GB

AI Image Generation 1-bit Quantization Bonsai Image Local Inference Open Source Model

发布于 2026-07-02 10:41:46 7 次浏览

No API Needed, Generate Images on Mobile and Computer – 1-bit Diffusion Model Shrinks 7.75GB to 0.93GB

The 7.75GB image generation model can't run on your phone.

The 0.93GB one can.

PrismML's newly released Bonsai Image 4B compresses the diffusion transformer weights of FLUX.2 Klein 4B from 16-bit floating point to just two values {-1, +1} — the size is reduced by 8.3x while retaining 88% of image quality.

On iPhone 17 Pro Max, it generates an image in 9.4 seconds. On Mac M4 Pro, 6 seconds. Open the HuggingFace demo in a browser — no registration, no API key, local inference directly.

Apache 2.0 open source, free.

This is not "a little smaller" — it's a qualitative leap from "must have a GPU with large VRAM" to "runs on a phone."

What Exactly Did 1-bit Quantization Do?

First, the conclusion: It did not retrain the model; it performed extreme compression on the weights of an existing model.

The transformer weights of FLUX.2 Klein 4B were originally stored in FP16 (16-bit floating point), each weight taking 2 bytes. Bonsai Image 4B quantizes these weights to only two values {-1, +1} (binary version) or three values {-1, 0, +1} (ternary version), plus a set of FP16 scaling factors to compensate for precision loss.

Comparison of the two variants:

Variant	Transformer Size	Compression Ratio	Quality Retention
1-bit Bonsai Image 4B	0.93 GB	8.3×	~88%
Ternary Bonsai Image 4B	1.21 GB	6.4×	~95%
FLUX.2 Klein 4B (original)	7.75 GB	1×	100%

Including the text encoder and VAE, the complete deployment package on Apple Silicon for the 1-bit version is only 3.42GB — the original FLUX.2 requires nearly 16GB.

The 1-bit version is equivalent to 1.125 bits/weight, the ternary version 1.71 bits/weight. One extra 0 state significantly boosts expressiveness, bringing image quality closer to the original.

PrismML conducted evaluations on three benchmarks — GenEval (object composition and attribute binding), HPSv3 (human preference and aesthetic quality), and DPG-Bench (dense prompt following). The ternary version retained 88%, 95%, and 99.8% of FLUX.2's performance respectively, with an overall 95% quality retention rate.

In other words: without careful comparison, the difference is minimal.

Three Steps to Run

Bonsai Image's GitHub demo supports macOS, Linux, Windows, no WSL2 required.

Step 1: Clone and install.

macOS / Linux:

git clone https://github.com/PrismML-Eng/Bonsai-image-demo.git
cd Bonsai-image-demo
./setup.sh

Windows (PowerShell):

Set-ExecutionPolicy -Scope CurrentUser RemoteSigned
.\setup.ps1

The setup script automatically pulls model weights — macOS uses MLX format, Linux/Windows use Gemlite format.

Step 2: Choose model version.

# Recommended ternary version (better quality)
./scripts/download_model.sh

# For smallest 1-bit version
./scripts/download_model.sh binary

Step 3: Launch.

One-click launch Web Studio (FastAPI + Next.js):

./scripts/serve.sh
# Open browser to localhost:3000

Or generate directly from command line:

./scripts/generate.sh -p "An icy bonsai tree in a rainy forest, photo realistic." --size 1024x1024 --seed 9909

Default 512×512 for quick preview, 1024×1024 for final output. Dimensions must be multiples of 32.

Windows note: Ensure NVIDIA driver version is recent, and install vcredist. 1024×1024 may cause OOM on GPUs with less than 4GB VRAM; reduce to 512×512.

Don't want to deploy yourself? There's a WebGPU demo on HuggingFace — open in browser, run locally. iPhone users can also download the Bonsai Studio App.

Three Scenarios, Three Qualitative Leaps

📱 Local image generation on mobile.

Previously, FLUX.2 Klein 4B couldn't run on an iPhone — insufficient memory. Bonsai Image 1-bit version runs 512×512 with only 1.5GB active memory, generating an image in 9.4 seconds. PrismML also released the iOS app Bonsai Studio for direct experience on iPhone.

🌐 WebGPU image generation in browser.

Open the HuggingFace demo webpage, enter a prompt, and the image is generated locally in the browser. No registration, no API key, no queue. Data stays entirely local, never uploaded to any server.

🖥️ Low-cost server deployment.

Previously, batch generation might require multiple A100s. Now a single consumer-grade GPU or even a CUDA iGPU can run it. Deployment costs drop dramatically.

Image generation is inherently iterative — you don't just generate one image; you tweak prompts, change seeds, compare results. Local inference transforms this loop from "waiting for the server" to "instant feedback," fundamentally changing the creative experience.

PrismML included this statement in their announcement, which I think captures the model's value:

"Cloud APIs will continue to be the right choice for many products. But cloud-only generation imposes certain product constraints: every prompt is a remote request, every iteration carries marginal serving cost, and every interaction adds round-trip latency."

In other words: Cloud APIs have their place, but if every prompt change requires waiting for a server, paying a cost, and experiencing latency, creative flow is disrupted. Local inference lets you experiment freely at zero cost.

Bonsai Image 4B is truly for those who want to run image generation themselves without buying an A100 or waiting in API queues. The trade-off from 100% to 88% quality yields a leap from "can't run" to "instant generation" — a deal that's undeniably worthwhile.

The "DeepSeek Moment" for Image Generation

My first reaction to Bonsai Image 4B reminded me of DeepSeek.

Not technically similar, but similar in paradigm.

DeepSeek proved language models can achieve near GPT-4 performance with far less compute. Bonsai Image does something analogous in image generation: proving a diffusion model compressed under 1GB can approach the quality of a 7.75GB full model.

And like DeepSeek, it's Apache 2.0 open source.

The concept of 1-bit diffusion models has been discussed in academia for a long time, but Bonsai Image 4B is the first to deliver it at product-grade quality. 9.4 seconds, 1.5GB memory, mobile — these numbers together aren't just publishing a paper; they're shipping a product.

Image generation is transitioning from "cloud privilege" to "local standard."

Reference Links:

PrismML Official Announcement: https://prismml.com/news/bonsai-image-4b

WebGPU Browser Demo: https://huggingface.co/spaces/webml-community/bonsai-image-webgpu

GitHub: https://github.com/PrismML-Eng/Bonsai-image-demo

Found this useful? Share it with that friend who's still waiting in the API queue.

No API Needed, Generate Images on Mobile and Computer – 1-bit Diffusion Model Shrinks 7.75GB to 0.93GB

What Exactly Did 1-bit Quantization Do?

Three Steps to Run

Three Scenarios, Three Qualitative Leaps

The "DeepSeek Moment" for Image Generation

评论