Home » How to Run OpenAI’s GPT-oss Locally (for Free!) Using LM Studio

How to Run OpenAI’s GPT-oss Locally (for Free!) Using LM Studio

by Nick Smith
Published: Updated: 4.3K views

OpenAI just did something rare. They released two fully open-source, high-powered language models (GPT-oss-20b and GPT-oss-120b) that you can use for free on your own computer, locally, without any subscriptions. No internet required after a couple of downloads. No chat limits, too.

And with LM Studio, the whole process takes about 15 minutes. We already showed you how to run DeepSeek on it for free. You did that already, too… right?

At first, I legitimately thought the name of it was GPT-ass, which, honestly, wouldn’t have been too far off from a stupid standpoint, given their current nonsensical, engineer-riddled naming convention. Instead, it’s GPT-oss, just one letter off, and it stands for “Open-Source Systems.”

Let’s break it all down: what GPT-oss is, why it matters, what kind of hardware you need, and how to get it running today.

What is GPT-oss from OpenAI?

There are two models in the GPT-oss family (so far):

  • GPT-oss-20b (21 billion parameters)
  • GPT-oss-120b (117 billion parameters)

Both are released under the Apache 2.0 license and designed to work on real-world hardware. You don’t need a high-end datacenter setup. However, the 120b model does need an 80GB GPU to run, so 99% of us are out of luck there, especially my degenerate staff running Windows 95.

But GPT-oss-20b? That one can run on a laptop with 16GB of memory. Even something like a MacBook Air M2.

The smaller model performs about as well as OpenAI’s o3-mini on reasoning tasks. Not bad for something you can run in your bedroom or on the plane.

Why Should You Care?

Running GPT-oss (or any LLM, for that matter) locally gives you:

  • Total privacy. Nothing gets sent to the cloud.
  • No usage limits or API caps.
  • Full control over responses.
  • And it costs nothing once it’s installed.

The GPT-oss models can follow instructions, use tools like a Python interpreter or a web search plugin, and handle structured output. It’s built for integration. Developers can also fine-tune it or add safety guardrails if needed.

These models aren’t just for chatting. You can use them for scripting tasks, summarizing documents, building bots, and even running agents.

Minimum Hardware Requirements for GPT-oss

Let’s keep it simple.

For GPT-oss-20b:

  • 16 GB RAM or VRAM minimum
  • Compatible with newer GPUs from NVIDIA, AMD, and Apple Silicon
  • Can run on CPU-only machines if you’re patient

For GPT-oss-120b:

  • You’ll need 60–80 GB of VRAM
  • Built for workstations or multi-GPU rigs
  • Requires hardware like the NVIDIA H100 or AMD MI300

As we already said, for the average person, GPT-oss-20b is the way to go. It fits within the limits of a good laptop or desktop, especially if you use a quantized version.

What Is LM Studio?

LM Studio is a desktop app that lets you run open-source language models with almost no setup.

It has a clean interface, supports many model formats, and runs on Windows, macOS, and Linux. Under the hood, it uses llama.cpp to handle the local inference.

You just download the model and start chatting. No coding required.

Get it at lmstudio.ai.

How To Run GPT-oss Locally in LM Studio (Step-by-Step)

1. Download LM Studio

Go to lmstudio.ai and install it.

2. Launch the App

Open it up. Update it (if it’s outdated). You’ll see tabs for models, chat, settings, etc.

3. Search for GPT-oss-20b

Head to the “Models” section. Search for gpt-oss-20b.

If you see gpt-oss-120b and you’ve got a monster rig, go for it. But most people should stick with 20b. So pick that.

4. Pick a Quantized GGUF Model (Optional but Recommended)

You’ll see different versions of the model. Choose one like GGUF. These versions use less memory and run faster.

5. Download the Model

Click “Download.” This might take some time, depending on your internet. You may need to be patient. It’s not always fast.

6. Start Chatting

Once it’s downloaded, go to the “Chat” tab, select your model at the top, and start typing.

That’s it. You’re running GPT-oss on your own computer, fully offline.

What About Hallucinations?

OpenAI intentionally left the Chain of Thought (CoT) data unfiltered during training. That makes the model easier to monitor, but also means you’ll see more hallucinations.

In other words, gpt-oss sometimes makes stuff up more than other AI models. This is normal with open models, especially smaller ones.

OpenAI recommends filtering or summarizing CoT outputs before showing them to users. But for your own projects, you’re in control of how you handle that.

The trade-off? You get transparency. You can actually see how the model is reasoning, even if it occasionally goes off the rails.

As always, always fact-check any important information.

Real Use Cases

People are already using GPT-oss locally for:

  • Writing code
  • Brainstorming
  • Taxes
  • Therapy
  • Medical advice
  • Sensitive business information or data
  • And more

Now you get to join them. Because of us. Yeah… just give us the credit, okay?

Wrapping It Up

Running GPT-oss locally on your computer is easier than ever. The 20b model delivers solid performance on consumer hardware, and LM Studio makes the setup painless.

It’s a practical, powerful tool for anyone who wants local AI without the red tape or cost.

Have you tried it yet? Let us know how it went in the comments below. Also, let us know if we missed anything important. Sometimes that happens. Not really, though.

Until next time, remember to run the prompts and prompt the planet.

Tired of AI filters and data-harvesting in tools like ChatGPT? Try Venice today, built for more creative freedom and privacy. Get 20% off Venice Pro for a limited time with promo code RUNTHE20. Disclosure: This is an affiliate link, and I may earn a commission if you purchase.

You may also like

4 comments

Lo August 8, 2025 - 2:43 pm

The model doesn’t load in llm

“`
🥲 Failed to load the model

Error loading model.

(Exit code: 18446744072635810000). Unknown error. Try a different model and/or config.
“`

Ryzen 7950x3d
64 Gb
Radeon RX 7900 XTX

Reply
Lo August 8, 2025 - 2:45 pm

I meant in lm studio

Reply
Lo August 8, 2025 - 3:10 pm

Found a solution.
To get it to load i had to select the Vulkan llama.cpp runtime.

Reply
Nick Smith, Founder/CEO/Content Creator/God
Nick Smith August 9, 2025 - 10:27 am

I’m glad it worked

Reply

Add a Thrilling Comment