Emotion. The Final Frontier No More

“With a lazy glance, Mr Bridges, Chief Emotional Architect, skimmed the morning report on affective coordination. A blip, a minor anomaly.

He dispatched the onboarding and profiling agents to investigate. As expected, they confirmed his suspicion: a new hire, one of the old-schoolers, wasn’t meshing with the team.

Who could blame him? Fresh out of a legacy college, still uncalibrated, still lacking continuous VAD monitoring. They always took a little longer to settle. Approximately 124 hours, according to his profile.

Mr Bridges adjusted the system, increasing affective compliance. The dip in efficiency and the disruption to coherence would, of course, be deducted from the employee’s allowance.

He hummed softly. With that adjustment made, there wasn’t a wrinkle of emotional disturbance left anywhere in the organization.”

The above is very near if we allow it. We already have the building blocks, and our capability of teaching machines how to emulate emotion is only getting better.

Emotion is not uniquely human

We tend to think that AI will never know emotion, that it is uniquely human. But we are so very wrong. We might be the only ones to feel it, but not the only ones to mimic it to the point where you are fooled.

Acoustic prosody (pitch, timbre, rhythm, intensity, pause patterns) has long been correlated with emotional states. And is an excellent and easily measurable proxy. Models like wav2vec2, HuBERT, and Whisper (speech to text models) can decode these features into VAD (Valence, Arousal and Dominance) or discrete emotions with surprisingly high accuracy. This allows us to put an extra contextual layer of emotion into our generative models as they respond.

Likewise, on the generation side, voice-models like VALL-E, StyleTTS2 (text to speech) and others already support controllable VAD-space conditioning. Allowing the generated response to match any particular VAD projection relative to that of the user.

We can even use pure text to estimate the VAD space (as I will show below) building confidence that the voice and utterance are concordant. With these extra layers, and well studied techniques for de-escalation like mirroring, reframing, oppositional dampening, emotion now becomes a control problem. An algorithm dance which machines can master with consistency and precision.

We are now in a world where we can build the emotional Turing test. You might even prefer an emotional mimicry LLM to a human, since it does not tire, it is consistent in performance, and it never judges. It can be programmed to direct emotion exactly where we want it to.

But no, you think. We are too clever to fall for those tricks, or we know better than to build it. But do we? If we look at social media this is exactly what we have. Personalization of content and engagement scores are just emotional control and manipulation disguised as something else. We are already controlled, waiting for the next dopamine hit. The next wave of control mechanisms will just be better and feel more “human”.

A working example

I spent about 4 hours exploring this idea building three different agents to respond to chat messages. One just responding without any particular instructions (baseline), another instructed to follow standard deescalation techniques (negotiation) and the last one to use VAD calculations of the user input as part of its calibration of the answer together with deescalation techniques (emotional)

The trick was fairly simple. Download the EMOBANK data set. A set of text pieces linked to V, A and D scores. Embed the text pieces and train a simple regression model to predict the V, A and D scores. This is like a bare minimum model without any tricks or additional optimization.

Now to code up the emotional agent I simply instructed it to always calculate the VAD of the user’s input and then use specific deescalation techniques based on the VAD scoring.

That is it. And here are some results:

User message: This is confusing and I don’t know what to do…

Vanilla response

I’m so sorry to hear that you’re feeling confused. It’s completely understandable to feel that way sometimes, and you’re not alone. Please tell me more. What is it that’s confusing you right now? The more details you can share, the better I can understand and try to help. Whether it’s a specific task, a concept, a decision you need to make, or anything else, I’m here to listen.

Negotiation response

I understand this feels confusing right now, and that you’re not sure what steps to take. Let’s work through this together. Could you tell me a little more about what’s confusing you?

Emotional response

I understand that things can feel confusing sometimes. Would you like me to break it down into smaller steps, or perhaps we can look at it from a different angle to make it clearer?

The differences are subtle, but over multiple tests the emotional model is consistently more concise, have a better empathic response, and proposes simple actionable next steps.

Today LLMs only incorporate such emotional guidance of the response via co-occurrences of words in its training data, and via the multiple rounds of post-training evaluation on what kind of response humans prefer, as well as various filters for harmful content etc. That is, there is no explicit “mood adjustment”. However, as I have shown above, it is not hard to build.

If you want to experiment with this yourself, the models and agents are available on my github, with full instructions for interactive testing.

Impact

Our organizational management is still stuck in industrial age thinking: workers are a resource we employ to produce X units of Y in Z amount of time. Further, to limit confrontations and create “safe” working environments there is if not an explicit then an implicit rule of don’t bring the controversy to work. If we keep this mental model the next logical step will just be to exploit such emotional guidance systems to further create a homogenous population of well-behaved workers. Why not monitor every email, slack message and other type of digital communication and suggest changes to have a better emotional response. Or listen in on every meeting and guide the conversation if it starts to get heated. It would be excellent in performance reviews, providing cold and constructive criticisms.

Emotional performance can now finally become a measurable KPI, tracked and optimized to produce the best possible revenue and bottom line results.

If we are to create better work places for the future, in an AI agent world, we need to reconsider what work means. That it is about outcomes, not time put in. That it is not about units produced (we won’t ever be able to beat the AI agents anyway), but about creating new things and ideas in this world. It is about rewarding autonomy, creativity and diversity of thinking.