Not entirely clear, but my best guess is that it will basically have an MCP implementation so that the browser can be controlled directly by an LLM
I think that’s basically what e.g. the chatgpt browser is. Despite the… hostile… response on the fediverse, I suspect it will end up being the way a lot of people interact with the internet in a few years.
baatliwala@lemmy.world 2 days ago
Serious and long answer because you won’t find people actually providing you one here: in theory (heavy emphasis on theory), an “agentic” world would be fucking awesome.
Agents
You know how you have been programmed that when you search something on Google, you need to be to terse and to the point? The worst you get is “Best Indian restaurants near me” but you don’t normally do more than that.
Well in reality most of the times when people just love rambling on or providing lots of additional info, so the natural language processing capabilities of LLMs are tremendously helpful. Like, what you actually want to do is “Best Indian restaurants near me but make sure it’s not more than 5km away and my chicken tikka plate doesn’t cost more than ₹400 and also I hope it’s near a train station so I can catch a train that will take me home by 11pm latest”. But you don’t put all that on fucking Google do ya?
“Agents” will use a protocol that works in completely in the background called Model Context Protocol (MCP). The idea is that you put all that information into an LLM (ideally speak into it because no one actually wants to type all that) and each service will have it’s own MCP server. Google will have one so it will narrow down your filters to one being near a train station and less than 5km away. Your restaurant will have one, your agent will automatically make a reservation for you. Your train operator will have one, so you agent automatically book the train ticket for you. You don’t need to pull up each app individually, it will all happen in the background. And at most you will get a “confirm all the above?”. How cool is that?
Uses
So, what companies now want to do is leverage agents for everything, making use of NLP capabilities.
Let’s say you maintain a spreadsheet or database of how your vehicle is maintained, what repairs you have done. Why do you want to manually type in each time? Just tell your agentic OS “hey add that I spent ₹5000 in replacing this car part at this location in my vehicle maintenance spreadsheet. Oh and also I filled in petrol on the way.” and boom your OS does it for you.
You are want to add a new user to a Linux server. You just say “create a new user alice, add them to these local groups, and provide them sudo access as well. But also make sure they are forced to change their password every year”.
You have accounts across 3 banks and you want to create a visualisation of your spendings? Maybe you want to also flag some anamolous spends? You tell your browser to fetch all that information and it will do that for you.
You can tell your browser to track an item’s price and instantly buy it if it goes below a certain amount.
Flying somewhere? Tell your browser to compare airline policies, maybe checkout their history of delays and cancellations
And because it’s natural language, LLMs can easily ask to clarify something
Obvious downsides
So all this sounds awesome, but let’s get to why this will only work in theory unless there is a huge shift:
LLMs still suck in terms of accuracy. Yes they are decent but still not at the level where it’s needed and still make stupid errors. Also currently they are not making as generational upgrades as before
LLMs are not easy to self host. They are one of the genuine use cases of making use of cloud compute.
This means they are going to be expensiveeeeee and also energy hogs
Commercial companies actually want you to land on their servers. Yes its good that your OS will do it for you and they get a page hit but as of now that is absolutely not what companies. How are they going to serve you ads?
korazail@lemmy.myserv.one 2 days ago
I really like this comment. It covers a variety of use cases where an LLM/AI could help with the mundane tasks and calls out some of the issues.
The ‘accuracy’ aspect is my 2nd greatest concern: An LLM agent that I told to find me a nearby Indian restaurant, which it then hallucinated is not going to kill me. I’ll deal, but be hungry and cranky. When that LLM (which are notoriously bad at numbers) updates my spending spreadsheet with a 500 instead of a 5000, that could have a real impact on my long-term planning, especially if it’s somehow tied into my actual bank account and makes up numbers. As we/they embed AI into everything, the number of people who think they have money because the AI agent queried their bank balance, saw 15, and turned it into 1500 will be too damn high. I don’t ever foresee trusting an AI agent to do anything important for me.
“trust”/“privacy” is my greatest fear, though. There’s documentation for the major players that prompts are used to train the models. I can’t immediately find an article link because ‘chatgpt prompt train’ finds me a ton of slop about the various “super” prompts I could use. Here’s OpenAI’s ToS about how they will use your input to train their model unless you specifically opt-out: openai.com/…/how-your-data-is-used-to-improve-mod…
Note that that means when you ask for an Indian restaurant near your home address, Open AI now has that address in it’s data set and may hallucinate that address as an Indian restaurant in the future. The result being that some hungry, cranky dude may show up at your doorstep asking, “where’s my tikka masala”. This could be a net-gain, though; new bestie.
The real risk, though, is that your daily life is now collected, collated, harvested and added to the model’s data set; all without your clear explicit actions: using these tools requires accepting a ToS that most people will not really read and understand. Maaaaaany people will expose what is otherwise sensitive information to these tools without understanding that their data becomes visible as part of that action.
To get a little political, I think there’s a huge downside on the trust aspect of: These companies have your queries(prompts), and I don’t trust them to maintain my privacy. If I ask something like “where to get abortion in texas”, I can fully see OpenAI selling that prompt to law enforcement. That’s an egregious example for impact, but imagine someone could query prompts (using an AI which might make shit up) and asks “who asked about topics anti-X” or “pro-Y”.
My personal use of ai: I like the NLP paradigm for turning a verbose search query into other search queries that are more likely to find me results. I run a local 8B model that has, for example, helped me find a movie from my childhood that I couldn’t get google to identify.
There’s use-case here, but I can’t accept this as a SaaS-style offering. Any modern gaming machine can run one of these LLMs and get value without the tradeoff from privacy.
Adding agent power just opens you up to having your tool make stupid mistakes on your behalf. These kinds of tools need to have oversight at all times. They may work for 90% of the time, but they will eventually send an offensive email to your boss, delete your whole database, wire money to someone you didn’t intend, or otherwise make a mistake.
I kind of fear the day that you have a crucial confrontation with your boss and the dialog goes something like:
Why did you call me an asshole?
I didn’t the AI did and I didn’t read the response as much as I should have.
Oh, OK.
baatliwala@lemmy.world 2 days ago
Oh fuck, yeah, I somehow forgot to put data ingestion as one of major negatives lmao. Yeah those LLMs are gonna know literally everything about you.