Why OpenAI Operator is pretty cool... but will not be coming for anyone's jobs anytime soon. ;-)

A few days ago, OpenAI announced Operator, a research preview of an agent that can use its own browser to perform tasks for you.

The possibilities of tech like this are very interesting, essentially allowing you to automate away any tedious tasks that could be done by a person sitting in front of a web browser for long enough. And in fact, when you first log in, the home screen presents you with several possibilities, ranging from booking dinner reservations to finding you a hotel in NYC to aggregating the latest political news.

However, it’s always a good idea to test out claims yourself before believing the hype, especially in any over-hyped industry a $1 trillion market cap such as Generative AI. ;P

Use case: Make a post including links to jobs on LinkedIn

Of course, I’m skipping right past all of those sample ideas, because those have probably all been battle-tested for this preview to make it look awesome. I want to instead see how it reacts to a “real-world” scenario, where someone just asks it to do something that’s on their mind, and I happen to have a request at the ready.

The past couple of years have been rough for folks across many industries, but particularly so for those in the DevRel / Community space. The job market however seems to be picking up in early 2025, and I recently posted a few jobs that sounded interesting to my LinkedIn.

That post, as the kids say, “did numbers,” so clearly folks out there are hungry for this kind of information. But as one commenter—Chris Ward—pointed out, ‘Hiring is back, but sadly so much of it back to "US only"‘.

So let’s test Operator to see if it can make a post that highlights a few DevRel roles that are available in Europe.

How OpenAI Operator works

Before we get into what happened with that request though, let’s talk a bit about how OpenAI Operator works.

Essentially, OpenAI Operator is a model based on ChatGPT-4, that runs in a virtual machine with access to the Chromium web browser. (Your Operator conversations are kept separate and distinct from the rest of your ChatGPT conversations.)

It analyzes your prompts to determine how best to respond. If it’s a task that involves online activities like browsing websites, filling out forms, or interacting with web applications, it uses Chromium. If the task can be completed through conversation or doesn't require internet access, it is handled within the chat interface like a “normal” ChatGPT chat.

When in browser mode, it performs the tasks “live” through a series of screenshots taken at regular intervals. This allows you to see the progress and actions being taken in the virtual machine in real-time. (You can also share the sequence of screenshots it took afterwards as a video containing all steps.)

Example:

(There’s also an option to get an in-browser notification when it finishes, since certain tasks could take a long time.)

Above the browser window, it shows a bit of info about how it’s thinking about the task / what it’s trying to do, as well as adjustments along the way. Example:

Worked for 2 minutes:

Searching LinkedIn for DevRel jobs

Filtering results to show jobs

Adjusting location filter to Europe

Clearing field, entering "Europe" location

Selecting European Union for jobs

Filtering results by recent postings

Applying past week filter for jobs

Compiling recent EU job listings

Reviewing job listings, preparing summary

When it encounters specific situations, such as encountering a CAPTCHA, needing to confirm sensitive actions, or when input of personal information is required (e.g. username / password), it will pause what it’s doing and seek human input to ensure accuracy and security. (You can also at any time “Take control” of the browser and get hands on keys/mouse.)

After making manual edits, you then “return control” to the Operator, along with a note of what you did (since it stops recording at this point).

If you’re happy with the results of an Operator, you can “Save” the task and, in a similar way to creating your own GPTs, specify a default prompt and website(s) to go along with it.

These then show up as “Pinned” on your home page, so are good for things you want to run frequently.

Things Operator cannot do?

  • It has no knowledge of persistent state between chats; each one is its own fresh universe.

  • Further, there seems to be some sort of a time cut-off (several hours) at which point the chat box will say “Conversation closed” and you’re not allowed to type in it anymore.

  • Except, ironically, it also has no knowledge of time, so for example you can’t ask it to check a value on a website for you hourly.

So how did OpenAI Operator do?

There doesn’t seem to be a way to export the chat from Operator, so I’ll do my best to give the play-by-play. (See also the video of browser activity.)

Starting prompt:

Post an update to LinkedIn that contains a list of jobs available in the DevRel space in Europe this week.

At first, things went pretty smoothly:

  • ✅ Interpreted from the prompt that it should start on Linkedin.com

  • ✅ Figured out it would need to log in in order to do anything further, and prompted me to take control to enter username/password.

    • Note: At this point, LinkedIn prompted me to enter a code from my email because it said I was acting suspicious. Which is probably fair, because I was logged in on my laptop at whatever IP at the same time. :)

    • Note #2: Because this required human intervention, it was not recorded in the video.

  • ~0:04: Navigated to the LinkedIn Jobs list, and fiddled with the search words and filters there to find “DevRel jobs in Europe” in location “European Union” that had a Date Posted of the Past Week.

    .

  • ✅ Having done this, reported back to the chat with a summary of what it had found, and asked for a quality control check.

    Next prompt:

That is sufficient. Could you please craft a LinkedIn message that contains links to each of those jobs, but prompt me to manually approve it before it is posted.

Here’s where things got a bit “interesting”…

  • ✅ ~0:35: Navigated to the home page and opened the post window.

  • ✅ ~0:40: Inserted the above text, plus some LinkedIn greeting cruft, plus “[Link]” placeholders next to each job

  • ❌ ~0:41 Attempted to replace [Link] placeholders with URLs to jobs and got stuck in an infinite loop of highlighting text and failing to replace it.

    (I’m honestly not sure what exactly happened here… it’s like it’s trying to highlight only “[Link]” and ends up accidentally highlighting a lot more [maybe because the first part of the message is off-screen and it lost its way-finding?], then gets confused about why that happens, then tries again, over and over.)

Anyway, after several minutes of this, I manually intervene. Prompt:

Hi, you seem to be stuck at this step. Is there something I can do to help?

It responds by asking me to manually copy/paste the Markdown into the text area for it??

❌ ~1:33: Ok, now replacing URLs has worked a bit better, but… it’s pretty sloppy. You can see that some links are https://… (as intended) others are [https://… or [Lhttps://… and there’s even a VERY special ttpss:// :P

❌ I try various tricks to get it to fix these links, including attempting to instruct it directly, and asking it to do automated verification itself. It was confidently wrong every single time. :P (For example, assuming job postings from 5 minutes ago were “no longer available” vs. that it screwed up the URLs.)

Finally I give up and try a different approach. Prompt:

Hm. I would like all jobs in this post to be:

1) jobs that are currently available

2) open to candidates living/working in Europe

3) contaning proper, valid URLs connecting people reading this post to said jobs

Is it possible we need to start the process over, or do you think the post can be fixed with only minor adjustments?

Apparently, Operator agrees with me that starting over is best. ;)

BONUS: How could this have gone better? Let’s Ask!

The prior example highlights the importance of Prompt Engineering, so your request is clear and the bot doesn’t have to guess. Adding a new requirement mid-stream (not only a list of jobs but ALSO links to said jobs) seems to have flummoxed it.

Anywho, snark aside, I decided to do some “meta” prompting to ask what I should’ve asked instead:

Ok… let’s try that! Prompt:

Please find and compile a list of Developer Relations jobs available in Europe this week. Draft a LinkedIn post with the job titles, companies, locations, and links, ensuring the links are correctly formatted. Share the draft with me for review before posting.

This one is fascinating, because it takes a totally different approach. Instead of starting with LinkedIn, it searches Bing for Developer Relations jobs and starts clicking into websites like https://devrelcareers.com/ and https://www.remoterocketship.com/

As before, it compiles a summary (this time with links!) and asks for input:

Looking great!

Great, now just one last step… post it for me!

Aaaaand… NO! 🤣

A couple things that could be going on here:

  1. The capabilities of OpenAI Operator truly changed within the 12 hours of these attempts and it no longer supports posting things to websites (seems doubtful).

  2. This feature’s getting popular and therefore expensive, so they’ve made it dumber on purpose to cut down on costs (would not be the first time OpenAI has done this).

  3. The subtle shift in the prompt, from “Post an update to LinkedIn” to “Draft a LinkedIn post” turned off some inner capability in support of keeping costs down.

So what’s the deal, is OpenAI Operator gonna put me out of a job?

To its credit, OpenAI Operator did successfully automate several tasks with minimum direction:

  • Searching for jobs from numerous sources

  • Figuring out how to filter those jobs appropriately (by location, by time frame)

  • Compiling a summary of research

  • Asking for the human to review said research

  • Drafting LinkedIn copy

  • Posting LinkedIn updates (once a human has manually supplied their credentials).

What it did NOT do great on, though:

  • It didn’t deal well with instructions changing mid-stream.

  • It doesn’t seem to know when it’s stuck repeating the same instructions and getting nowhere.

  • It seems to be incapable of checking its own work.

  • When it thinks it’s checked its own work, it’s wrong.

  • It confidently lies about its correctness, even after repeatedly pointing out its wrongness.

In other words, you can’t just set this thing off to do things without you watching it carefully (which, in fairness, is pointed out in the text below the Operator chat box). Sometimes you even need to roll your sleeves up to get it manually unstuck.

Also, real talk: If adding a requirement (links to jobs) midstream thew it for such a loop, it essentially means that OpenAI Operator is currently incapable of doing work for actual clients because they never know what they want up front. ;-)

So it feels we are still quite a ways off from delegating any and all knowledge work to agents like this. (The 2 minute video is a bit misleading, as it only captures browser screen frames in rapid succession; the task overall took about an hour whereas “DIY” it would’ve taken about 10 minutes.)

On the other hand, it seems like a very interesting tool to play around with, because the promise of being able to automate, even to a limited extent, “anything you can do in a browser” is pretty fascinating.

What would / will YOU build with this tool? 👀