If you are asking, "What are the best AI models to run with 8GB RAM and 4GB VRAM?", you are probably in the same place many local AI beginners start.
You are entering the homelab stage. Or maybe you are taking your first steps into running local AI models. But instead of having a powerful AI workstation, your only hardware is an old gaming laptop lying around collecting dust, a budget PC, or an old work machine you want to repurpose.
The good news is that you can still run useful local AI models on this kind of hardware.
The important thing is to stay realistic.
An 8GB RAM and 4GB VRAM machine is weak by modern local AI standards. You should not expect large models, heavy coding agents, huge context windows, or instant replies. RAM and VRAM have a major effect on how smoothly local models run. More memory usually means more room for larger models, longer prompts, and faster responses.
But that does not mean your machine is useless. With the right small models, careful settings, and realistic expectations, you can still use Ollama for general chat, study help, summaries, simple coding help, and local AI experimentation.
Keeping that in mind, here are good Ollama model starting points for 8GB RAM and 4GB VRAM.
Direct Answer
For a system with 8GB RAM and 4GB VRAM, good Ollama models to try first are:
| Model | Best For | Why Start Here |
|---|---|---|
llama3.2:3b |
General chat and first test | Small enough to test easily and more useful than tiny models |
gemma3:4b |
General use, summaries, study help | Good general-purpose starting point |
qwen3:4b |
Reasoning, coding-style prompts, structured answers | Useful if you want technical help on weak hardware |
phi3.5 |
Lightweight assistant tasks | Small model that can be practical on limited systems |
llama3.2:1b |
Very weak hardware or speed testing | Fast fallback, but limited quality |
These are good starting points, not guaranteed best models for every machine.
Performance depends on model size, quantization, context length, CPU, GPU, RAM, VRAM, operating system, drivers, and background apps.
If you are unsure, start with:
ollama run llama3.2:3b
Then test:
ollama run gemma3:4b
and:
ollama run qwen3:4b
What 8GB RAM and 4GB VRAM Can Realistically Run
A machine with 8GB RAM and 4GB VRAM can run local AI, but it is close to the lower end of comfortable usage.
You can realistically expect:
- small models to work better than large models
- 1B to 4B models to be the safest range
- some 7B models may run, but often slowly
- long prompts to slow things down
- large context windows to create memory pressure
- Open WebUI to add extra overhead
- Docker containers and browser tabs to reduce available RAM
- CPU-only fallback to be much slower than GPU-assisted inference
You should not expect this hardware to comfortably run:
- large 13B+ models
- heavy coding agents
- big document analysis workflows
- long-context chat sessions
- large vision models
- multiple AI services at the same time
If your machine has 8GB RAM and 4GB VRAM, start small.
A small model that runs smoothly is usually more useful than a larger model that technically runs but feels painful to use.
Recommended Models
1. llama3.2:3b
llama3.2:3b is a good first model to test because it
is small, simple, and practical for basic local AI use.
Use it for:
- general chat
- simple explanations
- rewriting text
- summarizing short passages
- testing whether Ollama is working properly
Command:
ollama pull llama3.2:3b
ollama run llama3.2:3b
This is a good first checkpoint model. If this feels too slow, your system may be under memory pressure, or too many background apps may be open.
2. gemma3:4b
gemma3:4b is a good general-purpose model to try
after your first small-model test.
Use it for:
- study help
- explanations
- summaries
- brainstorming
- general assistant use
- beginner local AI experiments
Command:
ollama pull gemma3:4b
ollama run gemma3:4b
On 8GB RAM and 4GB VRAM, this may feel slower than a 1B or 3B model, but it can be a useful step up in quality.
3. qwen3:4b
qwen3:4b is a useful starting point if you want a
model for reasoning, structured answers, or coding-style help.
Use it for:
- coding explanations
- small debugging questions
- technical questions
- structured answers
- reasoning-style prompts
Command:
ollama pull qwen3:4b
ollama run qwen3:4b
Do not expect it to replace a powerful cloud coding model. On weak hardware, it is better for smaller tasks like explaining code, reviewing short snippets, or helping you think through errors.
4. phi3.5
phi3.5 is another lightweight model worth testing.
Use it for:
- short answers
- basic reasoning
- lightweight assistant tasks
- simple explanations
- quick local experiments
Command:
ollama pull phi3.5
ollama run phi3.5
It is a good option if 4B models feel too heavy but 1B models feel too limited.
5. llama3.2:1b
llama3.2:1b is the fallback option for very weak
systems.
Use it for:
- speed testing
- very basic chat
- checking Ollama setup
- extremely lightweight prompts
Command:
ollama pull llama3.2:1b
ollama run llama3.2:1b
This model is not ideal for serious reasoning or coding. Think of it as a does my local AI setup work model.
Recommended Testing Order
If you are new to local AI, test models in this order:
llama3.2:1bllama3.2:3bphi3.5gemma3:4bqwen3:4b
Why this order? Because it moves from lighter models to heavier models. This helps you understand your machine's limits without immediately overwhelming it.
Models to Avoid on 8GB RAM and 4GB VRAM
Some models may technically run, but that does not mean they will be pleasant to use.
On 8GB RAM and 4GB VRAM, avoid starting with:
| Avoid | Why |
|---|---|
| 13B models | Usually too heavy for smooth use on this hardware |
| 30B+ models | Not realistic for this setup |
| Large coding models | Often require more memory and longer context |
| Large vision models | Image tasks can increase memory usage |
| Huge context windows | Can slow down or crash weak systems |
| Heavy agent workflows | Agents use more context, tools, and repeated calls |
| Running Ollama + Open WebUI + many Docker containers | 8GB RAM can get used up quickly |
If the model downloads, my PC can run it well.
That is a common beginner mistake. Downloading a model only means you have the model file. It does not guarantee good performance.
Recommended Ollama Commands
Pull and run llama3.2:3b
ollama pull llama3.2:3b
ollama run llama3.2:3b
Pull and run gemma3:4b
ollama pull gemma3:4b
ollama run gemma3:4b
Pull and run qwen3:4b
ollama pull qwen3:4b
ollama run qwen3:4b
Pull and run phi3.5
ollama pull phi3.5
ollama run phi3.5
Pull and run llama3.2:1b
ollama pull llama3.2:1b
ollama run llama3.2:1b
List installed models
ollama list
See running models
ollama ps
Remove a model
ollama rm model-name
Example:
ollama rm llama3.2:1b
Settings and Tips for Weak Hardware
1. Keep your prompts short at first
Before testing large prompts, start with simple ones:
Explain what Ollama is in simple terms.
Summarize this paragraph in 5 bullet points: [paste paragraph]
Explain this error message: [paste error]
Do not immediately paste huge files, logs, PDFs, or codebases.
2. Avoid large context windows
Many models advertise large context windows, but a large context window can use more memory and make weak hardware much slower.
For 8GB RAM and 4GB VRAM:
- avoid long conversations at first
- avoid huge pasted documents
- clear the chat when testing performance
- use smaller prompts when possible
3. Close background apps
Before running local AI, close:
- extra browser tabs
- games
- video editors
- virtual machines
- unnecessary Docker containers
- other AI tools
- heavy IDE windows
On 8GB RAM, background apps matter a lot.
4. Test in terminal before Open WebUI
Open WebUI is useful, but it adds overhead.
If your system is weak, first test Ollama directly:
ollama run llama3.2:3b
Once you know the model works, then try connecting it to Open WebUI.
5. Use one model at a time
Do not keep switching between many models while testing. Load one model, test it properly, then move to the next.
Use:
ollama ps
to check what is currently loaded.
6. Be patient on the first run
The first response can be slower because the model needs to load. Test a few prompts before deciding that a model is unusable.
Upgrade Advice
Best first upgrade: 16GB RAM
For most 8GB systems, the best first upgrade is usually moving to 16GB RAM.
This helps with:
- running Ollama
- using Open WebUI
- keeping browser tabs open
- running Docker
- avoiding memory pressure
- using your laptop normally while testing AI
Better GPU: more VRAM
A 4GB GPU can be useful for small models, but it is limiting. If you are buying hardware specifically for local AI, more VRAM gives you more room for larger models and smoother performance.
SSD upgrade
If your old machine still uses a hard drive, upgrade to an SSD. It will not make the AI model smarter, but it will make the whole system feel much better.
Do not upgrade blindly
Before spending money, test small models first. You may find that a 3B or 4B model is enough for your actual use case.
FAQ
Can I run Ollama on 8GB RAM?
Yes. You can run Ollama on 8GB RAM, but you should start with small models. 1B, 3B, and some 4B models are the safest starting points.
Is 4GB VRAM enough for Ollama?
4GB VRAM is enough to experiment with small local AI models. It is not ideal for large models, long-context use, large coding models, or heavy AI agents.
What is the best Ollama model for 8GB RAM?
There is no single best model for every 8GB system. Good
starting points include llama3.2:3b,
gemma3:4b, qwen3:4b,
phi3.5, and llama3.2:1b.
Can I run 7B models on 8GB RAM?
Sometimes, depending on quantization, operating system, available memory, and background apps. But for beginners, 7B models may feel slow or frustrating on 8GB RAM.
Which model should I try first?
Start with:
ollama run llama3.2:3b
If that works well, try:
ollama run gemma3:4b
or:
ollama run qwen3:4b
Which model is best for coding on 8GB RAM and 4GB VRAM?
Try qwen3:4b as a starting point. It can help
with small coding questions and explanations, but do not
expect it to replace stronger cloud coding models.
Why is Ollama slow on my laptop?
Common reasons include:
- the model is too large
- RAM is full
- context length is too high
- too many apps are open
- the GPU has limited VRAM
- the model is running mostly on CPU
- your system is swapping to disk
Should I use Open WebUI with 8GB RAM?
You can, but test Ollama in the terminal first. Open WebUI is useful, but it adds extra overhead. If your system is struggling, keep your setup simple.
Try the Local AI Model Recommender
Not sure which model fits your exact machine?
Enter your RAM, VRAM, operating system, use case, and priority. The tool gives you model suggestions, Ollama commands, warnings, upgrade advice, setup steps, a beginner checklist, and shareable result links.
Try the RecommenderRelated Guide
If you are still checking whether your laptop can run local AI at all, read: Can my laptop run local AI?
If you are comparing 8GB, 16GB, and 32GB systems, read: How much RAM do you need for Ollama?
If you are upgrading to 16GB RAM, read: Best Ollama models for 16GB RAM .
If you want a GPU-focused 4GB VRAM model shortlist, read: Best local AI models for 4GB VRAM .
If you want a coding-focused model shortlist for weak hardware, read: Best Ollama models for coding on low-end PCs .
Internal Links
- Local AI Model Recommender tool
- Get model recommendations by RAM and VRAM
- Local AI beginner FAQ
- Can my laptop run local AI?
- How much RAM do you need for Ollama?
- Best Ollama models for 16GB RAM
- Best local AI models for 4GB VRAM
- Best Ollama models for coding on low-end PCs
- Best models for 8GB RAM and 4GB VRAM
Disclaimer
These recommendations are estimates, not benchmarks.
Local AI performance depends on your exact hardware, model quantization, context length, operating system, drivers, background apps, CPU, GPU, RAM, and VRAM.
Use this page as a starting point, then test models yourself.