Equation of intelligence (AGI and more)

Yashraj Erande
5 min readNov 26, 2023
Artificial General Intelligence (AGI) Q*?

Equation of artificial general intelligence.

AGI = ability to( solve problems [A*], learn [Q-learning], communicate [LLM])

World is abuzz with the possibility of Q* algo leading to creation of AGI — Artificial General Intelligence.

As I tried to study it, I went down a fantastic memory lane. Back to 2001–2002 when Pooja Erande and I were working on NLP in @Dr. Pushpak Bhattacharyya’s lab (one of the best teachers and humans I have had the good fortune of knowing). Those memories are for a personal post.

This post is about what I learnt about Q* and its implications for business.

I will keep it brief and as simple as possible. So I will compromise some precision for brevity.

First things first — as of the date of publishing this, I havent come across any public application of Q* or confirmation of AGI.

Now for more fun things.

Warning — go thru next section with some tolerance for geeky stuff. Else skip this section.

Q* = combination (A*, Q-learning)

  • A* = search algorithm which can find the shortest path between a start state and goal state.
  • Q-learning = a technique through which an agent learns about the quality (reward or penalty) of its decisions in a given environment

A* search algo that the following inputs

  1. Start state
  2. Goal state
  3. Cost function — cost incurred in going from start state to current state
  4. Heuristic function — cost likely to be incurred in going from current state to goal state
  5. Graph of all possible states between start and goal state
  6. Moves allowed within the graph — e.g. up, down, left, right, diagonal
  7. Boundary of the graph — maximum width and depth beyond which the search should not proceed

Basis these inputs, the algo tries to find the lowest cost way from start to goal by recursively scanning neighboring nodes. Neighboring nodes are evaluated for the eligibility (i.e. they are within the graph edges, have not be already evaluated, can be legally reached basis the moves allowed) and then prioritised basis their (least) cost. This process continues till we reach the goal state.

Now, if you have a mathematical problem. You know what a valid answer could look like. For example in a two variable simultaneous equation, you know the answer would be x = a real number and y = a real number. And you know all possible legal mathematical moves allowed. E.g. BODMAS then you can attempt to solve the problem.

Q-learning technique works as follows

An (AI) agent is given a table which comprises — different states of the world, actions that can be taken in those states, rewards for each action. This is the Q-table. Q = quality of decision / action. Agent is expected to take actions that maximizes reward and minimizes penalty.

Q-table example. See some better sources below.

Agent takes actions and updates the Q-table with new values basis what it actually experiences as reward or penalty.

Agent has some attributes.

  • Speed of learning. How rapidly and dramatically the agent updates the Q-table values. Does it make minor updates every time it experiences a reward or penalty different from what is already in the Q-table or does it make major updates.
  • Long term vs. short term orientation. Does the agent choose an action based on immediate reward expectation or longer term cumulative reward expectation based on the Q-table that has been given to it.

In some ways these form the ‘nature’ of the (AI) agent.

Agent uses a mix of exploration and exploitation to maximize its lifetime reward. Exploration means, it takes an action which is not known to provide the highest expected reward. But it takes the action to check if the reward is higher or lower than expected. Exploitation means, it takes actions which it has already learnt to be most rewarding. Over a period of time, it forms a policy. Policy refers to the agent’s plan to maximize rewards.

Each time it takes an action, the state could change. And then a different set of action-reward pairs emerge corresponding to that state. As shown in the table.

[non geeky stuff starts here] What did I take away?

Both A* and Q could represent two fundamental attributes of human intelligence.

  • A* = ability to do structured problem solving. Go from a problem definition to a solution.
  • Q-learning = ability to learn. Experiment. React to changing environment.

An AI agent could go from probabilistic outputs (current LLMs) to deterministic outputs.

Independent of AGI or not, some thoughts that crossed my mind:

Which business decisions can be modelled such that they can be solved by A* + Q-learning combined?

  • Capital allocation decisions. E.g. credit decisions, collections decisions, venture capital decisions…?
  • Network / logistics decisions? Where to set up branches…ATMs…warehouses…logistics hubs…?
  • HR decisions? How many people to hire…which level…which geography…which skill set…?

In 5–10 years, what are the chances that there is a Q* Bank or Q* NBFC? Perhaps built on Web-3 technology. CBDC as currency. Where the policy / regulation is embedded in the code.

Are we ready for it?

Should India or APAC take the lead? This part of the world has a few critical things going for it — data density + young population.

Role of humans

My experience has taught me that humans are amazing for validating a hypothesis. May not be amazing for taking decisions.

Decisions require probabilistic thinking. It doesn’t scale across human teams easily. Models (AI) tend to be better.

But models have limitations. They are bound by their training data set. Synthetic or not. So black swans or changes in underlying data set can lead to catastrophe.

So best is to combine. Humans to validate key inputs that goes into making decisions. Models to come up with the decision.

BTW — something that hasn’t settled in my mind yet.

Computers could always solve math problems by writing the right algos. So whats new? Perhaps they can (dis)prove theorems. Perhaps transformers can now solve math problems?

Comments / feedback welcome.

This article is not a very satisfying one. No closures. It raises more questions than it answers. With all the fog surrounding this, it could be just speculation. But it doesn’t seem like science fiction. Each of these technologies exist.

Thoughts / comments / clarifications / corrections / gyan — welcome in the comments!

PS: views are personal.

Sources: there are many, some below https://towardsdatascience.com/reinforcement-learning-explained-visually-part-4-q-learning-step-by-step-b65efb731d3e

https://www.freecodecamp.org/news/an-introduction-to-q-learning-reinforcement-learning-14ac0b4493cc/

--

--

Yashraj Erande

MD and Partner BCG | Former Founder Growth Source / Protium (NBFC FinTech) | Economic Times 40 Under 40