Adaptive Multi-Strategy Market Making Agent

Anton Kolonin1,2,3, Ikram Ansari1

1 SingDAO Ltd, Gros-Islet, St. Lucia

2 SingularityNET Foundation, Amsterdam, Netherlands

Abstract. We propose an architecture for algorithmic trading agents for liquidity provisions on centralized exchanges. These implementing what we call an adaptive market making multi-strategy, which is based on a limit order grid with continuous experiential learning. The concept exploits definitions of artificial general intelligence (AGI) as an ability to “reach complex goals in complex environments given limited resources”, and is treated as a universal multi-parameter optimization. We present basic reference on implementation of the architecture being back-tested on historical crypto-finance market data and capable of providing almost 1000% excess return (“alpha”) under evaluated market conditions.

Keywords: adaptive agent, back-testing, centralized exchange, continuous learning, experiential learning, liquidity provision, market making.


The subject of algorithmic trading is attracting attention of investors, developers, and scientists due to high potential financial returns, high demand for implementation of automated business applications for investments, and liquidity provision and trading across all sorts of financial markets, including crypto-currencies. One of the popular applications of that is so called “yield farming” in the crypto-industry, which makes it possible to create investment portfolios consisting of crypto-assets being used for automated liquidity provision also called market making. Yield farming can be per- formed either on centralized exchanges (CEX) such as Binance or decentralized ones (DEX) with smart contracts on Uniswap or Balancer on the Ethereum blockchain. Respectively, there is are a lot of studies on how machine learning and artificial intelligence can be applied to it, such as attempts to learn efficient market making strategies [1,2,3,4]. Unfortunately, the known results are not that exciting so far with demonstrated ability to learn some basic principles of trading using limit book orders, and some ability to outperform “hodling” strategies (buy and hold on rising market) in very specific conditions. So more effort is required to take in this area.

The important part of automated trading is a price prediction [5,6] which can take form of either predicting price change direction as a classification problem or prediction of specific price level as a regression problem. The latter appears more critical for market making activity. That is because conventional trading with market orders could accept predicted price direction change as a trading signal for either sell or buy. In turn, market making with limit book orders on CEX don’t necessarily need to sell or buy, they just needs to set the appropriate price levels on bid and ask limit orders on CEX, according not just to anticipated price movement, but the actual target level of its move. Unfortunately, high volatility and the manipulative nature of the crypto market provides challenges even for the former, let alone the latter, so even more work is needed in this direction, if the problem can be solved at all.

In this paper we extend our earlier work on the matter[7], focusing on methodology and architecture for algorithmic trading agents for liquidity provision on centralized exchanges implementing what we call adaptive market making multi-strategy based on limit order grid with continuous experiential learning. The concept exploits a definition of artificial general intelligence (AGI) as an ability to “reach complex goals in complex environments given limited resources” [8], being treated as universal multi-parameter optimization. Below we present basic reference implementation of the architecture being back-tested on historical crypto-finance market data capable to provide almost 1000% excess return (“alpha”) under evaluated market conditions. Along the way, we assess the value of the ability to predict the price during such activity as well as drawbacks of not being able to do it properly.

Adaptive Market Making Methodology

For the initial experiment we have designed and implemented a market making methodology of limit order grid market making “macro-strategy”, where individual market making agents create a grid of limit orders with each individual order in the grid representing a specific “micro-strategy”. In turn, each of the micro strategies may have their individual parameters. The agent executing the “macro-strategy” has an option to revise the set of different “micro-strategy” sub-agents, as they were controllable sub-personalities in a scope of a single super-person being in total control of its own “multi-personality” - that is why we call this a “multi-strategy”.

The classical approach for using experiential or reinforcement learning would be creating an action space for a market making agent with actions such as creating bid and ask orders with different spreads [1,2,3] and learning the behavioral model based on historical data. Significant performance results have been obtained with this approach from studies on historical and live crypto-trading data. We presume that might be due to the following factors. First, the stochastic nature of the crypto market might not make it possible to learn a single model on long historical interval s covering a variety of market conditions, so that a single model would work well for such conditions. Second, building an operational space of agents based on order-level actions might be too fine-grained where no statistically confident experience associated with corresponding feedback might be collected for any specific order creation or canceling in corresponding market situations.

The following consideration has lead us to a few decisions for simplifying the methodology of the initial experiments discussed further and making it more efficient and risk-tolerant. First, we have replaced operational space of actions with operational space of strategies being executed for determined time intervals. Second, the feedback or reward for using the strategy was evaluated as profit or loss for the period of strategy execution. Third, in order to speed-up the learning curve and mitigate the risk, we made it possible for an agent to execute a certain number of strategies at a time, having its “personality” split in several “sub-persona” child agents, with each of them running their own “micro-strategy”, while the parent agent “macro-strategy” was designated to control and manage the child agents. Fourth, each of the child “micro-strategy” agents could be run either in “real mode” trying to make real trades on the market, or in “virtual mode” just watching the live structure of the limit order book on the Exchange aligned with the stream of trades being closed and performing “virtual market making” like we are doing in our back-testing framework [7].

In our current architecture evaluated in the course of presented work, each of the “micro-strategy” child agents has ability to create only one limit order at a time, where the position of the order on bid or ask sides is defined by price dynamics, spread is asserted to be one of the “micro-strategy” parameters. The order cancelling policy of such agents is defined by conservatism parameters of the “micro-strategy”, where orders can be either never cancelled until completion, or canceled if there is a need to create an order on the other side of the mid price, or if there is just a mid price change which needs the current bid or ask price to be updated. That is, the operational space of a child agent can be denoted as P(s,c), where P is a point in parameter space, s is a spread in percents and c is order cancellation conservatism.

The “macro-strategy” of a parent agent is designed to start its market-making activity with all of its child “micro-strategy” agents with each of them placed in an individual point P(s,c) in the operational space having the space covered evenly by a grid of unique N configurations. Each of the child N agents is given 1/N share of the parent agent’s budget so they can invest in their orders. The first round of trading from starting time t0 during period T and order refresh rate dt is executed, and then the parent agent evaluates losses and returns of all of its children. For the next round of trading starting time t1 during the same period T, the top M most profitable agents are selected and given a much larger budget as 1/M share of the parent agent budget. At the same time, while M winners are doing the “real” market making with real budget, all of the remaining agents keep market making in “virtual mode” against the live market data. At the end of the next round, the returns and losses of all of the agents are collected and the new M winners are selected for the subsequent round starting t1 while the “real” profits and losses are accumulated.

The profitability of an agent is assumed to mean positive returns as well as positive excess returns (“alpha”) compared to a “hodler” strategy agent which just holds the same budget as given to a market making agent. If the number of agents with positive excess return is less than M for a certain round then only that number of agents is selected for “real” operations in the next round with the real budget shared between them. If no agents have positive return exceeding the “hodler” return, the next round is skipped for “real” market making but “virtual” operations are continued in order to attempt to find suitable “micro-strategies” for subsequent rounds.

Optionally, each of the child agents may be making decisions relying not on the current market price (mis price), but rather on its future projection predicted for every new time point past refresh rate dt by a machine learning algorithm. In the current work we used only the simple linear regression algorithm relying just on the historical price data. For the “ground truth” prediction baseline we were using the historical data looked up in the following data point past dt in course of back-testing.

Preliminary Experimental Results

The methodology described above has been implemented and tested relying on back- testing framework described in our earlier work [7] with results presented art Fig.1.

The experiments have been run or based on BTC/USDT data from Binance for 6 days period starting 2021-6-21 17:00, relying on per-minute snapshots of the limit or - der book data and full scope of trades data. The N of “micro-strategy agents” was 18, so there were 6 different spread settings (0.0%, 0.2%, 0.4%, 0.6%, )0.8%, 1.0%) and 3 different order cancellation conservatism settings as described above. The M of winning agents for “real trading” was 3. The strategy evaluation period T was taken as 2 days, so only three rounds have been executed in each of the experiments. The experiments were run for order refresh period 1 hour (left side of Fig.1) and 1 minute (right side of Fig.1). The first set of experiments for the two refresh rates were run without predictions (top on Fig.1). The second set of experiments were run with “ground truth” predictions to evaluate baseline - what would be the maximum returns given the ultimate predictive abilities (middle on Fig.1). The third set of experiments were run using basic Linear Regression (see on price data, with mean absolute percentage error (MAPE) about 9% better than just using the “last known price” (from) previous data point on given price data for historical interval.

Each of the 6 experiments (with 2 refresh rates and three prediction setups) involved assessments of three kinds of returns based on the same initial budget given to an agent executing specific “macro-strategy”: “hodler” - just holding investments into base currency during the entire period of testing; all “micro-strategies” being executed together with 1/N of allocated budget; “macro-strategy” described in the previous section being the subject of a given study.

The results on Fig.1 (top) clearly show about 800-1000% (8-10 times) excess re- turn compared to “hodler” if using the suggested “macro-strategy” for any refresh rate. At the same time, if using market making with all possible “micro-strategies” at once, it can provide significant (350%) “alpha” compared to “hodling” in case of hourly refresh rate but also underperform the “hodler” in case of minutely refresh rate. This is thought to be the key result of given work deserving further attention and exploration.

The other two experiments have shown that the ability to predict the price during such activities is a key to high returns as well as a point that not being able to do it properly leads to rather high losses. That is, using the “ground truth” level of price prediction (not achievable in real life) makes the “alpha” skyrocket to 5000-20000% (5-20 times) excess returns as seen in the middle of Fig.1. On the other hand, price prediction with high MAPE is causing straight losses which are still substantially less if using the adaptive “macro-strategy” suggested in this work.

Fig. 1. Overall returns using different “macro-strategies”. Top – not using predictions, middle – using “ground truth” predictions, bottom – using predictions by Linear Regression. Left three bars – hourly refresh rate (dt =1 hour), right three bars – minutely refresh rate (dt=minute). Groups of three bars indicating overall returns/losses by strategies (left to right): hodler, all “micro-strategies” acting together with no selection, “macro-strategy” described above.


The proposed algorithmic market making methodology is designed for liquidity pro- vision architecture at and https://www.singularity- The preliminary results point at potential business value of using the adaptive market making multi-strategy based on a limit order grid with continuous experiential learning in the area of decentralized finance, automatically generating significant excess returns without of manual interventions for ongoing adjustment of market making strategy parameters depending on constantly changing market conditions.

Apart from that, the results point at the need or extra care to be taken in regard of using machine learning for price predictions and the need of careful assessment of the prediction quality results before integrating it into production pipelines.

Our future work will be dedicated to a) testing the developed methodology and architecture against extended time intervals covering different market conditions for different assets and trading pairs. This will be done by testing it with different strategy evaluation periods, parameter space discretization winner selection; b) improving the adaptive experiential learning to more intelligent navigation of the operational space of greater dimensionality involving more complex “micro-strategies” with a greater number of parameters; c) involving evolutionary/genetic programming in “micro-strategy” selection and evolution; d) incorporating the latest developments of the price prediction domain in the agent “micro-strategies”.


    1. Ganesh S., et. al.: Reinforcement Learning for Market Making in a Multi-agent Dealer Market. arXiv:1911.05892v1 [q-fin.TR] 14 Nov 2019. 1911.05892.pdf

    2. Sadighian J.: Deep Reinforcement Learning in Cryptocurrency Market Making. arXiv:1911.08647v1 [q-fin.TR] 20 Nov 2019.

    3. Sadighian J.: Extending Deep Reinforcement Learning Frameworks in Cryptocurrency Market Making. arXiv:2004.06985v1 [q-fin.TR] 15 Apr 2020. pdf/2004.06985.pdf

    4. Gu´eant O., et. al.: Dealing with the Inventory Risk. A solution to the market making problem. arXiv:1105.3115 [q-fin.TR] 3 Aug 2012. 1105.3115.pdf

    5. Tsantekidis A.: Using Deep Learning for price prediction by exploiting stationary limit order book features. arXiv:1810.09965 [cs.LG] 23 Oct 2018 abs/1810.09965

    6. Yanjun C., et. al.: Financial Trading Strategy System Based on Machine Learning. Hindawi / Mathematical Problems in Engineering Volume 2020, Article ID 3589198, 13 pages.

    7. Raheman A., Architecture of Automated Crypto-Finance Agent. arXiv:2107.07769 [cs.AI] 16 Jul 2021.

    8. Goertzel B.: Artificial General Intelligence: Concept, State of the Art, and Future Prospects. Journal of Artificial General Intelligence 5(1) 1-46, 2014. DOI: 10.2478/ jagi-2014-0001, 2014.

Last updated