stefan-jansen / machine-learning-for-trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing stefan-jansen/machine-learning-for-trading in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context on-demand, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Overview (README excerpt)
Crawler viewML for Trading - 2 nd Edition This book aims to show how ML can add value to algorithmic trading strategies in a practical yet comprehensive way. It covers a broad range of ML techniques from linear regression to deep reinforcement learning and demonstrates how to build, backtest, and evaluate a trading strategy driven by model predictions. In four parts with **23 chapters plus an appendix**, it covers on **over 800 pages**: • important aspects of data sourcing, **financial feature engineering**, and portfolio management, • the design and evaluation of long-short **strategies based on supervised and unsupervised ML algorithms**, • how to extract tradeable signals from **financial text data** like SEC filings, earnings call transcripts or financial news, • using **deep learning** models like CNN and RNN with market and alternative data, how to generate synthetic data with generative adversarial networks, and training a trading agent using deep reinforcement learning This repo contains **over 150 notebooks** that put the concepts, algorithms, and use cases discussed in the book into action. They provide numerous examples that show: • how to work with and extract signals from market, fundamental and alternative text and image data, • how to train and tune models that predict returns for different asset classes and investment horizons, including how to replicate recently published research, and • how to design, backtest, and evaluate trading strategies. > We **highly recommend** reviewing the notebooks while reading the book; they are usually in an executed state and often contain additional information not included due to space constraints. In addition to the information in this repo, the book's website contains chapter summary and additional information. Join the ML4T Community! To make it easy for readers to ask questions about the book's content and code examples, as well as the development and implementation of their own strategies and industry developments, we are hosting an online platform. Please join our community and connect with fellow traders interested in leveraging ML for trading strategies, share your experience, and learn from each other! What's new in the 2 nd Edition? First and foremost, this book demonstrates how you can extract signals from a diverse set of data sources and design trading strategies for different asset classes using a broad range of supervised, unsupervised, and reinforcement learning algorithms. It also provides relevant mathematical and statistical knowledge to facilitate the tuning of an algorithm or the interpretation of the results. Furthermore, it covers the financial background that will help you work with market and fundamental data, extract informative features, and manage the performance of a trading strategy. From a practical standpoint, the 2nd edition aims to equip you with the conceptual understanding and tools to develop your own ML-based trading strategies. To this end, it frames ML as a critical element in a process rather than a standalone exercise, introducing the end-to-end ML for trading workflow from data sourcing, feature engineering, and model optimization to strategy design and backtesting. More specifically, the ML4T workflow starts with generating ideas for a well-defined investment universe, collecting relevant data, and extracting informative features. It also involves designing, tuning, and evaluating ML models suited to the predictive task. Finally, it requires developing trading strategies to act on the models' predictive signals, as well as simulating and evaluating their performance on historical data using a backtesting engine. Once you decide to execute an algorithmic strategy in a real market, you will find yourself iterating over this workflow repeatedly to incorporate new information and a changing environment. The second edition's emphasis on the ML4t workflow translates into a new chapter on strategy backtesting, a new appendix describing over 100 different alpha factors, and many new practical applications. We have also rewritten most of the existing content for clarity and readability. The trading applications now use a broader range of data sources beyond daily US equity prices, including international stocks and ETFs. It also demonstrates how to use ML for an intraday strategy with minute-frequency equity data. Furthermore, it extends the coverage of alternative data sources to include SEC filings for sentiment analysis and return forecasts, as well as satellite images to classify land use. Another innovation of the second edition is to replicate several trading applications recently published in top journals: • Chapter 18 demonstrates how to apply convolutional neural networks to time series converted to image format for return predictions based on Sezer and Ozbahoglu (2018). • Chapter 20 shows how to extract risk factors conditioned on stock characteristics for asset pricing using autoencoders based on Autoencoder Asset Pricing Models by Shihao Gu, Bryan T. Kelly, and Dacheng Xiu (2019), and • Chapter 21 shows how to create synthetic training data using generative adversarial networks based on Time-series Generative Adversarial Networks by Jinsung Yoon, Daniel Jarrett, and Mihaela van der Schaar (2019). All applications now use the latest available (at the time of writing) software versions such as pandas 1.0 and TensorFlow 2.2. There is also a customized version of Zipline that makes it easy to include machine learning model predictions when designing a trading strategy. Installation, data sources and bug reports The code examples rely on a wide range of Python libraries from the data science and finance domains. It is not necessary to try and install all libraries at once because this increases the likeliihood of encountering version conflicts. Instead, we recommend that you install the libraries required for a specific chapter as you go along. > Update March 2022: , , , and are now avai…