I'm honestly tempted to buy an M4 Apple Silicon computer mainly for their ability to run local LLM models with unified

Ranter

jonathands

377

Comments

8

djsumdog

6868

123d

What models do you plan on running? I've found all the local coding models that can run on a 3080-Ti to be pretty terrible. Are the larger local models better than Claude4 or GPT?

I find the chatbots utterly annoying and worthless. I hate how their bullshit is now crammed into search results on DDG/Google.

What would you use local LLMs for?
1

jonathands

377

123d

@djsumdog mostly I want to try the coding models, but would be general lab stuff really,

a friend showed qwen coder running on a 24gb macbook pro I was impressed with the result

My main hangup with going nvidia is the setup would cost more for less total LLM usable ram, for the price of a single 5090 32gb , I can buy an M4 mini pro with 64gb unified ram (at least here in the jungle)

I understand it's not a replacement for claude/codex tho
2

jonathands

377

123d

@afaIk two things mostly, having the ability to play and learn locally is a great plus since it's an (very needed) upgrade

and depending on what I'm doing cloud services get incredibly expensive for me due to exchange rates, yes the computer would be expensive, but it's there no matter what.

Still this is a good question, maybe testing some 30b models on groq to test the cost before making any purchase

I'll probably wait till M5 gets released tough
2

12bitfloat

10817

123d

@jonathands Mac mini pro M4 with 64 GB is $2500, RTX 5090 is ~$2200

I'd personally keep your current pc and upgrade just the GPU

Then you have better support too! Most serious stuff is in CUDA only (e.g. if you want to train some weird model or whatever)
2

jonathands

377

123d

@retoor Right now I'm thinking mostly of using it for Coding with tools like SST/OpenCode and Cline with models medium/smaller models like QwenCoder 30b and DevMistral etc

Maybe using local models as a testbed for future LLM enabled applications before going online.

imagine a local AI enabled development machine.

Also there is the economics/politics of it, while the hardware is (obscenelly) expensive out here, it's something you own.

While I don't think I'll get rid of APIs/Subscriptions I don't have any hope they will become cheaper, so it's nice to have options.
2

jonathands

377

123d

@12bitfloat also, for some reason specifically here the m4 is slighty cheaper than the RTX 5090,

I have to keep to local vendor due to warranty (super important for me) and taxes (which are many and probably why the RTX is so expensive here)
2

jonathands

377

123d

@12bitfloat upgrading is not an option, my pc is just too old, I'll have to upgrade the whole thing
1

jonathands

377

123d

@retoor that's the kind of data I don't know how to get for my use case.
2

BordedDev

3131

123d

@djsumdog Yes, the models improve exponentially when they are that small, you can run the 480b qwen-coder model if you have enough vram/ram (24+Gb VRAM + 100+Gb RAM) and it can write pretty good code. For the isspam challenge, it even created a good performing implementation. But for the cost it can be a lot cheaper to just rent a cloud GPU for the 10 seconds it needs to generate (if you can fit it in the cloud GPU vram) over buying it outright, depending on usage of course)
1

jonathands

377

123d

@BordedDev which providers are you guys using?

something like openrouter/groq or baremetal?
1

BordedDev

3131

123d

@jonathands I just run LLMs locally, mostly because I have so little time to experiment and I have the resources (3090 + 128gb ram), which often means I let it run while I'm doing other things. I'm tempted by the same as you, but I know it's kinda wasting money because of that limited time. But I have used runpod before. But for the big AIs you're not going to be let to run them yourself, I'm mainly playing around with STT (voxtral) and TTS

AFAIK @retoor uses both openrouter and groq
1

gitstashio

311

121d

All right, here goes:
- when you say you are running an LLM locally like when running ollama run qwencoder:8b, does that mean no sort of data is collected by AliBaba. Does that also mean if I unplug the internet from my laptop forever I would have a stable qwencoder model and no other data would ever be needed by the model?
So unlike the breaking changes introduced by ChatGPT when they make you update the model, if I were to use my qwencoder to code something for me I would never have any breaking changes?
- if yes, does that mean theroetically I "trained" my local LLM on all the questions and answers I fed it, and if I lose my Macbook I have to "start over"
- yes as retoor said, I use an M1 macbook to answer questions, and only ask it to write simple code
- why does no one explain these things in simple terms: millions of AI will take your job videos and only 2 about running LLMs locally
1

BordedDev

3131

121d

@gitstashio When you run it locally, yes it will work without internet. You can even register your own functionality, e.g. to read files from disk.

Your AI will not learn anything "new" without additional tooling. As in, if you restart llama.cpp or whatever you use, it will not remember. You can "inject" additional knowledge by either fine-tuning (this essentially further training the AI) or doing stuff with RAGs and pre-inserting data before prompting. Where you store these additional knowledge is up to you, it doesn't need to be with the model file, RAGs will get requested and added in context so they can live on a separate computer (think of it like an API for data the AI can use)
2

jonathands

377

121d

@gitstashio > why does no one explain these things in simple terms: millions of AI will take your job videos and only 2 about running LLMs locally

because it's easier and more lucrative to create desperation and fear, then actually teaching stuff
1

jonathands

377

121d

@gitstashio You can run several open weights models, equivalent of opensource for LLMS, on your own hardware

OpenAI released GPT OSS, there's Qwen Coder like you mentioned, Gemma, llama from meta etc

The models themselves are stuck with the knowledge they had during training, some models can do RAG (I don't know about the specific OW ones) and there are people who do fine tuning including lora in local models , Which requires some pretty hefty , I'm not sure apple silicon is well suited for this

My interest is most on consuming and prompt engineering, apple silicon seems to do great with those.
1

jonathands

377

121d

I'm testing stuff on OpenRouter with Cliene and Roocode (Roo is just amazing) despite Qwen coder 30b being able to generate complex nextjs projects, it's not very good at making them work.

I had it create a svelte project and then 3 prompts later decide svelte was not really required (despite my prompts) and recreate everything in Pure HTML/JS

All an all , I think I'm stuck with cloud providers at least for now , as I was aiming for a 48Gb machine and it feels that it's not up to the task
1

gitstashio

311

121d

@jonathands we need to hang out. whats your github?
1

BordedDev

3131

120d

@jonathands Yeah svelte, as much as I like it, doesn't have the same market share as reshat (react). But with AI I've found it easier to just work with templates like jinja no need for a full framework (because it always seems to edit styles.css XD)
0

BordedDev

3131

98d

@jonathands A friend of mine tried the svelte for ai in their docs and said it helped immensely

Add Comment

question