Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
djsumdog694472dWhat models do you plan on running? I've found all the local coding models that can run on a 3080-Ti to be pretty terrible. Are the larger local models better than Claude4 or GPT?
I find the chatbots utterly annoying and worthless. I hate how their bullshit is now crammed into search results on DDG/Google.
What would you use local LLMs for? -
jonathands38272d@djsumdog mostly I want to try the coding models, but would be general lab stuff really,
a friend showed qwen coder running on a 24gb macbook pro I was impressed with the result
My main hangup with going nvidia is the setup would cost more for less total LLM usable ram, for the price of a single 5090 32gb , I can buy an M4 mini pro with 64gb unified ram (at least here in the jungle)
I understand it's not a replacement for claude/codex tho -
jonathands38272d@afaIk two things mostly, having the ability to play and learn locally is a great plus since it's an (very needed) upgrade
and depending on what I'm doing cloud services get incredibly expensive for me due to exchange rates, yes the computer would be expensive, but it's there no matter what.
Still this is a good question, maybe testing some 30b models on groq to test the cost before making any purchase
I'll probably wait till M5 gets released tough -
12bitfloat1096972d@jonathands Mac mini pro M4 with 64 GB is $2500, RTX 5090 is ~$2200
I'd personally keep your current pc and upgrade just the GPU
Then you have better support too! Most serious stuff is in CUDA only (e.g. if you want to train some weird model or whatever) -
jonathands38272d@retoor Right now I'm thinking mostly of using it for Coding with tools like SST/OpenCode and Cline with models medium/smaller models like QwenCoder 30b and DevMistral etc
Maybe using local models as a testbed for future LLM enabled applications before going online.
imagine a local AI enabled development machine.
Also there is the economics/politics of it, while the hardware is (obscenelly) expensive out here, it's something you own.
While I don't think I'll get rid of APIs/Subscriptions I don't have any hope they will become cheaper, so it's nice to have options. -
jonathands38272d@12bitfloat also, for some reason specifically here the m4 is slighty cheaper than the RTX 5090,
I have to keep to local vendor due to warranty (super important for me) and taxes (which are many and probably why the RTX is so expensive here) -
jonathands38272d@12bitfloat upgrading is not an option, my pc is just too old, I'll have to upgrade the whole thing -
BordedDev313572d@djsumdog Yes, the models improve exponentially when they are that small, you can run the 480b qwen-coder model if you have enough vram/ram (24+Gb VRAM + 100+Gb RAM) and it can write pretty good code. For the isspam challenge, it even created a good performing implementation. But for the cost it can be a lot cheaper to just rent a cloud GPU for the 10 seconds it needs to generate (if you can fit it in the cloud GPU vram) over buying it outright, depending on usage of course) -
afaIk21572d@retoor The specs are sick! my x230 is a bit too old on the specs too, anyway nice rugged machine. -
jonathands38272d -
BordedDev313572d@jonathands I just run LLMs locally, mostly because I have so little time to experiment and I have the resources (3090 + 128gb ram), which often means I let it run while I'm doing other things. I'm tempted by the same as you, but I know it's kinda wasting money because of that limited time. But I have used runpod before. But for the big AIs you're not going to be let to run them yourself, I'm mainly playing around with STT (voxtral) and TTS
AFAIK @retoor uses both openrouter and groq -
gitstashio31870dAll right, here goes:
- when you say you are running an LLM locally like when running ollama run qwencoder:8b, does that mean no sort of data is collected by AliBaba. Does that also mean if I unplug the internet from my laptop forever I would have a stable qwencoder model and no other data would ever be needed by the model?
So unlike the breaking changes introduced by ChatGPT when they make you update the model, if I were to use my qwencoder to code something for me I would never have any breaking changes?
- if yes, does that mean theroetically I "trained" my local LLM on all the questions and answers I fed it, and if I lose my Macbook I have to "start over"
- yes as retoor said, I use an M1 macbook to answer questions, and only ask it to write simple code
- why does no one explain these things in simple terms: millions of AI will take your job videos and only 2 about running LLMs locally -
BordedDev313570d@gitstashio When you run it locally, yes it will work without internet. You can even register your own functionality, e.g. to read files from disk.
Your AI will not learn anything "new" without additional tooling. As in, if you restart llama.cpp or whatever you use, it will not remember. You can "inject" additional knowledge by either fine-tuning (this essentially further training the AI) or doing stuff with RAGs and pre-inserting data before prompting. Where you store these additional knowledge is up to you, it doesn't need to be with the model file, RAGs will get requested and added in context so they can live on a separate computer (think of it like an API for data the AI can use) -
jonathands38270d@gitstashio > why does no one explain these things in simple terms: millions of AI will take your job videos and only 2 about running LLMs locally
because it's easier and more lucrative to create desperation and fear, then actually teaching stuff -
jonathands38270d@gitstashio You can run several open weights models, equivalent of opensource for LLMS, on your own hardware
OpenAI released GPT OSS, there's Qwen Coder like you mentioned, Gemma, llama from meta etc
The models themselves are stuck with the knowledge they had during training, some models can do RAG (I don't know about the specific OW ones) and there are people who do fine tuning including lora in local models , Which requires some pretty hefty , I'm not sure apple silicon is well suited for this
My interest is most on consuming and prompt engineering, apple silicon seems to do great with those. -
jonathands38270dI'm testing stuff on OpenRouter with Cliene and Roocode (Roo is just amazing) despite Qwen coder 30b being able to generate complex nextjs projects, it's not very good at making them work.
I had it create a svelte project and then 3 prompts later decide svelte was not really required (despite my prompts) and recreate everything in Pure HTML/JS
All an all , I think I'm stuck with cloud providers at least for now , as I was aiming for a 48Gb machine and it feels that it's not up to the task -
BordedDev313569d@jonathands Yeah svelte, as much as I like it, doesn't have the same market share as reshat (react). But with AI I've found it easier to just work with templates like jinja no need for a full framework (because it always seems to edit styles.css XD) -
BordedDev313547d@jonathands A friend of mine tried the svelte for ai in their docs and said it helped immensely

I'm honestly tempted to buy an M4 Apple Silicon computer mainly for their ability to run local LLM models with unified ram.
overall I think they are too expensive for the offering, but being able to play around will LLMs without shelling out RTX5090 kinds of money is tipping the balance.
I wonder what apple people experiencies have been?
question