I’m actually working on this problem right now for my master’s capstone project. I’m almost done with it; I can have it generating a series of steps to try and fetch me something based on simple objectives like “I’m thirsty”, and then in simulation fetching me a drink or looking through rooms that might have a fix, like contextually knowing the kitchen is a great spot to check.
There’s also a lot of research into using the latest advancements in reasoning and contextual awareness via LLMs to work towards better more complicated embodied AI. I wrote a blog post about a lot of the big advancements here.