AI models are editors, not writers

Status: ongoing
Writing is thinking, and so outsourcing writing is outsourcing thinking. Humans should retain the agency to think for themselves, or we'll miss the insight and learning that makes writing valuable. Progress with Cursor and Claude Code shows that agents can effectively understand and edit files in a complex codebase, and this suggests they could do the same for a complex writing project with lots of background context. This makes them good candidates for editors.
Editors are more like thinking partners, providing an outside view on a writing project while understanding the context and intentions. They can zoom out and talk about ideas and structure, or zoom in and provide line edits. These services are invaluable to writers who need sophisticated feedback from someone who understands what they are trying to do. Of course, editors are too expensive for the vast majority of writers to afford.
So how might we design an interface through which AI models can act more like editors?
Design considerations
Please note, all screenshots are from a working prototype.
My working principle is that the user should always understand exactly what the model is doing, and easily track any changes made to their work.
As part of a previous piece of work, I implemented an in-context diff view for viewing, accepting and rejecting edits made by an AI model. This meant that the user could see the new and old text in the main prose. This prototype reused those ideas, though updated the implementation to work with the anthropic text editor tool.
Beyond that, the primary considerations were as follows.
Providing context on workspace structure
I used an editable readme file in which the user describes the structure of their project. The model sees the readme and understands how to navigate the workspace.
Understanding where edits are located in a document
IDEs solve this well using a scroll component to represent the vertical length of the document, with colored bars representing the location of any changes in the current document. This prototype uses the same idea.
Understanding which files contain edits
Again, IDEs solve this well. There is a filetree, and edited files are assigned a specific color. This prototype uses the same idea, and includes a number to represent the number of changes in a given file.
Technical considerations
On a technical level, the prototype is a set of documents hosted on Liveblocks and visualised in a Tiptap editor. This setup provides real-time collaborative editing with simple scaling and easy implementation. Text edits are made using the Anthropic text editor, and converted from simple text to Tiptap executables using a custom pipeline.
Notes
- This strongly feels like the MAYA interface. The in-context diff view makes AI changes trivial to understand, and giving an agent freedom to work across the entire workspace provides the extra intelligence needed to make sophisticated suggestions.
- Opus can use complex context to make good suggestions. I find the suggestions genuinely useful and massively prefer doing it this way compared to working in a chat interface and copy/pasting. I have had worse results with Sonnet, to the point of not using it.
- The primary issue is cost. Operating over a large workspace uses a lot of tokens, and a single complex query can cost $3-5. That can be optimised, but it will continue to be expensive regardless. For many writers, the cost is beyond their means. To make this work, it would need to be either an enterprise product for high value users, or a consumer product heavily subsidised by venture capital. Alternatively, if Anthropic launches "sign in with Claude" or similar, then this may become viable as at least there is a single all-in subscription. Admittedly, this is a problem for all agent products in 2025.
- More like a feature than a product? There are existing workspaces like Notion for whom an implementation like this would be obvious and trivial. It isn't clear why a user would make a long-term switch to a new generalised tool, unless there was a strong focus on a niche market that a generalised tool cannot adequately serve.