When to imitate and how it matters for price discovery: An agent-based approach
We analyze a Scarf economy with three commodities. Agents are characterized by Leontief utility functions and they are categorized into groups based on the commodity that they produce. Agents attempt to maximize their utility and engage in bilateral trade with others, based on their initial endowment and a subjective perception of "fair" prices. Trade failures – either unsatisfied demand or involuntary inventory - signal misperceptions in the economy. A modified gradient descent approach is employed to adjust these misperceptions of prices, which also serve as the individual learning model.
In Gintis (2007) individual or social learning alone cannot ensure price discovery. To converge to the competitive equilibrium (CE), agents have to learn from their own experiences (individual learning) as well as from others'(social learning). Earlier studies have shown that individual and social learning can generally lead to different results (Vriend, 2000). We integrate these two learning schemes as a first attempt towards a generalized learning formulation for artificial agents. In contrast to random (unconscious) switching, agents consciously decide when to imitate (social learning) and not to imitate (individual learning). We apply reinforcement learning to model this discrete choice as a two-armed bandit problem.
We test this variant of the Gintis model with different sizes of the economy: 30, 120 and 300 agents. We find that the basic results are (qualitatively) size-independent. As in Gintis, social learning alone does not lead the economy to CE. With reinforcement learning, the intensity of choice turns out to be a key parameter for price convergence. Our results indicate that an economy needs to be flexible in order to converge to CE, i.e., a fraction of agents need to explore, while others exploit. We systematically extend Gintis' findings, lending support to the significance of a balance between exploration and exploitation when searching for the most efficient configuration of the economy. We also extend the above economy to a trinary choice problem, a three-armed bandit problem, by including a hybrid strategy in the learning model. Our earlier results are robust to this extension.