Aprendizado de Máquinas para Negociação # 8211; Visão geral.
Inteligência Artificial (AI) e Aprendizado de Máquinas (ML) estão revolucionando quimicamente quase todas as áreas de nossas vidas. Você sabia que a Aprendizagem de Máquina para negociação está cada vez mais importante?
Você pode se surpreender ao saber que os fundos de hedge da Machine Learning já superaram significativamente os fundos de hedge generalizados, bem como os fundos de quantia tradicionais, de acordo com um relatório da ValueWalk. Os sistemas ML e AI podem ser ferramentas incrivelmente úteis para os seres humanos navegando no processo de tomada de decisão envolvendo investimentos e avaliação de risco.
O impacto das emoções humanas nas decisões de negociação é muitas vezes o maior obstáculo para o desempenho superior. Algoritmos e computadores tomam decisões e executam trades mais rapidamente do que qualquer ser humano, e libertam-se da influência das emoções.
Existem vários tipos diferentes de negociação algorítmica. Alguns exemplos são os seguintes:
Algoritmos de execução comercial, que dividem os negócios em ordens menores para minimizar o impacto no preço das ações. Um exemplo disto é uma estratégia de preço médio ponderado por volume (VWAP) Estratégia de algoritmos de implementação que fazem negócios com base em sinais de dados de mercado em tempo real. Exemplos disso são estratégias baseadas em tendências que envolvem médias móveis, fuga de canais, movimentos de níveis de preços e outros indicadores técnicos. Algoritmos de discrição / jogo que são orientados para detectar e aproveitar os movimentos de preços causados por grandes negócios e / ou outras estratégias de algoritmos. Oportunidades de arbitragem. Um exemplo seria onde uma ação pode negociar em dois mercados separados por dois preços diferentes e a diferença de preço pode ser capturada vendendo ações com preços mais altos e comprando ações com preço mais baixo.
Quando as estratégias de negociação algorítmica foram introduzidas pela primeira vez, elas foram lucrativamente lucrativas e ganharam rapidamente participação de mercado. Em maio de 2017, a empresa de pesquisa de mercado Tabb Group disse que a negociação de alta freqüência (HFT) representou 52% do volume médio diário de negociação. Mas, à medida que a concorrência aumentou, os lucros diminuíram. Neste ambiente cada vez mais difícil, os comerciantes precisam de uma nova ferramenta para dar-lhes uma vantagem competitiva e aumentar os lucros. A boa notícia é que a ferramenta está aqui agora: Aprendizado de Máquinas.
Aprendizagem de máquinas envolve a alimentação de um algoritmo de amostras de dados, geralmente derivados de preços históricos. As amostras de dados consistem em variáveis chamadas preditores, bem como uma variável alvo, que é o resultado esperado. O algoritmo aprende a usar as variáveis preditoras para prever a variável alvo.
A Aprendizagem em Máquina oferece o número de vantagens importantes em relação aos programas algorítmicos tradicionais. O processo pode acelerar a busca de estratégias de negociação algorítmicas efetivas, automatizando o que muitas vezes é um processo manual tedioso. Também aumenta o número de mercados que um indivíduo pode monitorar e responder. Mais importante ainda, eles oferecem a capacidade de passar de encontrar associações baseadas em dados históricos para identificar e se adaptar às tendências à medida que elas se desenvolvem. Se você pode automatizar um processo, outros estão executando manualmente; Você tem uma vantagem competitiva. Se você pode aumentar a quantidade de mercados em que você está, você terá mais oportunidades. E no mundo de negociação de soma zero, se você pode se adaptar a mudanças em tempo real, enquanto outros estão parados, sua vantagem se traduzirá em lucros.
Existem múltiplas estratégias que usam a Aprendizagem de Máquinas para otimizar algoritmos, incluindo regressões lineares, redes neurais, aprendizado profundo, máquinas de vetor de suporte e Bayes ingênuos, para citar alguns. E fundos bem conhecidos como Citadel, Renaissance Technologies, Bridgewater Associates e Two Sigma Investments estão buscando estratégias de Aprendizagem de Máquinas como parte de sua abordagem de investimento. Na Sigmoidal, temos a experiência e o know-how para ajudar os comerciantes a incorporar ML nas suas próprias estratégias de negociação.
Nosso estudo de caso.
Em um de nossos projetos, nós criamos um sistema inteligente de alocação de ativos que utilizou Deep Learning e Modern Portfolio Theory. A tarefa era implementar uma estratégia de investimento que pudesse se adaptar a mudanças rápidas no ambiente de mercado.
O modelo base AI foi responsável pela previsão de retorno de ativos com base em dados históricos. Isso foi realizado através da implementação de Unidades de Memória de Longo Prazo Longas, que são uma generalização sofisticada de uma Rede Neural Recorrente. Esta arquitetura particular pode armazenar informações para vários timestaps, o que é possível por uma célula de memória. Esta propriedade permite ao modelo aprender padrões temporais longos e complicados em dados. Como resultado, conseguimos prever os retornos futuros do ativo, bem como a incerteza de nossas estimativas utilizando uma técnica inovadora denominada Dropway Variacional.
Para fortalecer nossas previsões, usamos uma riqueza de dados de mercado, como moedas, índices, etc. em nosso modelo, além dos retornos históricos dos ativos relevantes. Isso resultou em mais de 400 recursos que usamos para fazer previsões finais. Claro, muitas dessas características foram correlacionadas. Esse problema foi atenuado pela Análise de Componentes Principais (PCA), o que reduz a dimensionalidade do problema e decorra os recursos.
Usamos as previsões de retorno e risco (incerteza) para todos os ativos como entradas para um algoritmo de Otimização de Variância Média, que usa um solucionador quadrático para minimizar o risco de um determinado retorno. Este método determina a alocação de ativos, que é diversificada e garante o menor nível possível de risco, tendo em vista os retornos e # 8217; previsões.
A combinação desses modelos criou uma estratégia de investimento que gerou um retorno anualizado de 8%, que foi 23% maior do que qualquer outra estratégia de benchmark testada ao longo de um período de dois anos. Contacte-nos para saber mais.
As estratégias de IA superam a performance.
É difícil encontrar dados de desempenho para as estratégias de AI, dado a sua natureza proprietária, mas a empresa de pesquisa de hedge funds Eurekahedge publicou alguns dados informativos. O gráfico abaixo mostra o desempenho do índice Eurekahedge AI / Machine Learning Hedge Fund vs. vs. Quantos fundos tradicionais de 2018 a 2018. O índice rastreia 23 fundos no total, dos quais 12 continuam a viver.
Eurekahedge observa que:
"Os fundos de hedge de AI / Machine Learning superaram os quads tradicionais e o hedge fund médio desde 2018, apresentando rendimentos anualizados de 8,44% ao longo desse período, em comparação com 2,62%, 1,62% e 4,27% para os seguidores de tendências da CTA, e o hedge global médio fundo, respectivamente. "
Eurekahedge também fornece a seguinte tabela com as principais opções:
Tabela 1: Desempenho em números - AI / Machine Learning Hedge Fund Index vs. quants e hedge funds tradicionais.
Os fundos de hedge de AI / Machine Learning superaram o fundo de hedge global médio para todos os anos, excluindo 2018. Salvo em 2018 e em 2018, os retornos dos fundos de hedge AI / Machine Learning ultrapassaram os padrões tradicionais de estratégias de futuros administrados pelo CTA, enquanto apresentavam uma tendência sistemática de baixo desempenho seguindo estratégias apenas para no ano de 2018, quando este realizou ganhos fortes de futuros de energia curta. Durante os cinco, os três e os dois anos, os fundos de hedge de AI / Machine Learning superaram os quentes tradicionais e o hedge hedge médio global que fornece ganhos anualizados de 7,35%, 9,57% e 10,56%, respectivamente, ao longo desses períodos. Os fundos de hedge de AI / Machine Learning também apresentaram melhores retornos ajustados ao risco nos últimos dois anos e três períodos anualizados em comparação com todos os pares descritos na tabela abaixo, com índices de Sharpe de 1,51 e 1,53 em ambos os períodos, respectivamente. Embora os retornos tenham sido mais voláteis em comparação com o fundo de hedge médio (compare com o Índice do Fundo Hedge Eurekahedge), os fundos AI / Machine Learning lançaram volatilidades anualizadas consideravelmente menores em comparação com estratégias de tendências sistemáticas.
Eurekahedge também observa que os fundos de hedge AI / Machine Learning estão "negativamente correlacionados com o fundo de hedge médio (-0,267)" e têm "correlação zero a marginalmente positiva com os futuros de CTA / gestão e as estratégias seguindo as tendências", que apontam para potenciais benefícios de diversificação de uma estratégia de AI.
Os dados acima ilustram o potencial na utilização de AI e Aprendizado de Máquinas em estratégias de negociação. Felizmente, os comerciantes ainda estão nos estágios iniciais de incorporar essa ferramenta poderosa em suas estratégias de negociação, o que significa que a oportunidade permanece relativamente inexplorada e o potencial significativo.
Aqui está um exemplo de uma aplicação AI na prática:
Imagine um sistema que possa monitorar os preços das ações em tempo real e prever os movimentos dos preços das ações com base no fluxo de notícias. É exatamente isso que o AZFinText faz. Este artigo relata uma experiência que usou o Support Vector Machine (SVM) para trocar S & # 038; P-500 e produziu excelentes resultados. Abaixo está a tabela que mostra como foi realizada em relação aos 10 principais fundos de investimento quantitativos do mundo:
Estratégia usando o Google Trends.
Outra estratégia de comércio experimental usou o Google Trends como uma variável. Há uma infinidade de artigos sobre o uso do Google Trends como um indicador de sentimento de um mercado.
O experimento neste artigo acompanhou as mudanças no volume de pesquisa de um conjunto de 98 termos de busca (alguns deles relacionados ao mercado de ações). O termo & # 8220; dívida & # 8221; acabou por ser o indicador mais forte e confiável ao prever os movimentos de preços no DJIA.
Abaixo está um gráfico de desempenho cumulativo. A linha vermelha representa um & # 8220; compre e segure & # 8221; estratégia. A estratégia Google Trends (linha azul) superou massivamente com um retorno de 326%.
Posso aprender ML mesmo?
Aplicando máquina Aprender a negociar é um topis vasto e complicado que leva o tempo para dominar. Mas se você estiver interessado, como ponto de partida, recomendamos:
Uma vez que você está familiarizado com esses materiais, há um curso de Udacity popular em hot para aplicar a base da Aprendizagem de Máquinas para negociação no mercado.
Se você quiser acelerar o processo de aprendizagem, você pode contratar um consultor. Certifique-se de fazer perguntas difíceis antes de iniciar um projeto.
Ou, você pode agendar uma pequena chamada conosco para explorar o que pode ser feito.
Preciso de exemplos mais específicos aplicáveis na minha indústria.
Ao incorporar a Aprendizagem de Máquinas em suas estratégias comerciais, seu portfólio pode capturar mais alfa. Mas implementar uma estratégia bem sucedida de investimento ML é difícil, você precisará de pessoas extraordinárias e talentosas com experiência em comércio e ciência dos dados para chegar lá. Deixe-nos ajudá-lo a começar.
Melhores estratégias 4: Aprendizado de máquinas.
Deep Blue foi o primeiro computador que ganhou um campeonato mundial de xadrez. Isso foi em 1996 e levou 20 anos até que outro programa, o AlphaGo, pudesse derrotar o melhor jogador Go humano. Deep Blue era um sistema baseado em modelo com regras de xadrez hardwired. O AlphaGo é um sistema de mineração de dados, uma rede neural profunda treinada com milhares de jogos Go. Hardware não melhorado, mas um avanço no software foi essencial para o passo de vencer os melhores jogadores de xadrez para vencer os melhores jogadores Go.
Nesta 4ª parte da mini-série, analisaremos a abordagem de mineração de dados para o desenvolvimento de estratégias comerciais. Este método não se preocupa com os mecanismos de mercado. Ele apenas verifica curvas de preços ou outras fontes de dados para padrões preditivos. Aprendizagem de máquina ou "Inteligência Artificial" e # 8221; nem sempre está envolvido em estratégias de mineração de dados. Na verdade, o mais popular & # 8211; e surpreendentemente lucrativo & # 8211; O método de mineração de dados funciona sem redes neurais sofisticadas ou máquinas de vetor de suporte.
Princípios de aprendizado da máquina.
Um algoritmo de aprendizagem é alimentado com amostras de dados, normalmente derivadas de algum modo de preços históricos. Cada amostra consiste em n variáveis x 1 .. x n, comumente designadas preditores, recursos, sinais ou simplesmente entrada. Esses preditores podem ser os retornos de preços das últimas barras n, ou uma coleção de indicadores clássicos, ou qualquer outra função imaginável da curva de preços (I & # 8217; até mesmo visto os pixels de uma imagem de gráfico de preços usada como preditor para uma neural rede!). Cada amostra também inclui normalmente uma variável alvo y, como o retorno do próximo comércio depois de tirar a amostra, ou o próximo movimento de preços. Na literatura, você pode encontrar também o nome do rótulo ou objetivo. Em um processo de treinamento, o algoritmo aprende a prever o alvo y a partir dos preditores x 1 .. x n. A memória aprendida & # 8216; & # 8217; é armazenado em uma estrutura de dados chamada modelo que é específico para o algoritmo (não deve ser confundido com um modelo financeiro para estratégias baseadas em modelos!). Um modelo de aprendizagem de máquina pode ser uma função com regras de predição no código C, gerado pelo processo de treinamento. Ou pode ser um conjunto de pesos de conexão de uma rede neural.
Os preditores, características, ou o que quer que você os chama, devem conter informações suficientes para prever o alvo e com alguma precisão. Eles também cumprem com freqüência dois requisitos formais. Primeiro, todos os valores de preditores devem estar no mesmo intervalo, como -1 ... +1 (para a maioria dos algoritmos R) ou -100 ... +100 (para algoritmos Zorro ou TSSB). Então você precisa normalizá-los de alguma forma antes de enviá-los para a máquina. Em segundo lugar, as amostras devem ser equilibradas, ou seja, distribuídas igualmente em todos os valores da variável alvo. Então, deve haver quase tantos como ganhar amostras. Se você não observar estes dois requisitos, você se perguntará por que você está obtendo resultados ruins do algoritmo de aprendizado da máquina.
Os algoritmos de regressão prevêem um valor numérico, como a magnitude e o sinal do próximo movimento de preços. Os algoritmos de classificação prevêem uma classe de amostra qualitativa, por exemplo, se ela está precedendo uma vitória ou uma perda. Alguns algoritmos, como redes neurais, árvores de decisão ou máquinas de vetor de suporte, podem ser executados em ambos os modos.
Alguns algoritmos aprendem a dividir amostras em classes sem necessidade de qualquer alvo y. A aprendizagem sem supervisão desse tipo, em oposição à aprendizagem supervisionada usando um alvo. Somewhere inbetween é o aprendizado de reforço, onde o sistema se treina executando simulações com os recursos fornecidos e usando o resultado como alvo de treinamento. AlphaZero, o sucessor do AlphaGo, usou a aprendizagem de reforço ao jogar milhões de jogos Go contra si. Em finanças, há poucas aplicações para aprendizagem sem supervisão ou reforço. 99% das estratégias de aprendizagem de máquinas usam a aprendizagem supervisionada.
Independentemente dos sinais que usamos para preditores em finanças, eles provavelmente contêm muito ruído e pouca informação, e não serão estacionários além disso. Portanto, a previsão financeira é uma das tarefas mais difíceis na aprendizagem por máquinas. Algoritmos mais complexos não conseguem necessariamente melhores resultados. A seleção dos preditores é fundamental para o sucesso. Não é bom usar muitos preditores, uma vez que isso simplesmente causa superação e falha na operação da amostra. Portanto, as estratégias de mineração de dados geralmente aplicam um algoritmo de pré-eleição que determina um pequeno número de preditores de um grupo de muitos. A pré-seleção pode basear-se na correlação entre preditores, na significância, no conteúdo da informação ou simplesmente no sucesso da previsão com um conjunto de testes. Experimentos práticos com seleção de recursos podem ser encontrados em um artigo recente sobre o blog Robot Wealth.
Aqui é uma lista dos métodos de mineração de dados mais populares usados em finanças.
1. Sopa indicadora.
A maioria dos sistemas de negociação que nós estamos programando para clientes não são baseados em um modelo financeiro. O cliente só queria sinais comerciais de certos indicadores técnicos, filtrado com outros indicadores técnicos em combinação com indicadores mais técnicos. Quando perguntado como essa mistura de indicadores poderia ser uma estratégia rentável, ele normalmente respondeu: "Confie em mim". Eu negocie-o manualmente e funciona. & # 8221;
Certamente. Pelo menos às vezes. Embora a maioria desses sistemas não tenha passado um teste WFA (e alguns nem mesmo um backtest simples), um número surpreendentemente grande. E esses também foram geralmente lucrativos no comércio real. O cliente havia experimentado sistematicamente indicadores técnicos até encontrar uma combinação que funcionasse em negociação ao vivo com certos ativos. Esta maneira de análise técnica de teste e erro é uma abordagem clássica de mineração de dados, apenas executada por um ser humano e não por uma máquina. Eu realmente não posso recomendar este método # 8211; e muita sorte, para não falar de dinheiro, provavelmente está envolvido & # 8211; mas posso testemunhar que às vezes leva a sistemas lucrativos.
2. Padrões de velas.
Não deve ser confundido com os padrões japoneses de velas que tiveram a melhor data antes, há muito tempo. O equivalente moderno é a negociação de ações de preço. Você ainda está olhando o aberto, alto, baixo e fechado de velas. Você ainda espera encontrar um padrão que preveja uma direção de preço. Mas você agora está curando curvas de preços contemporâneas para coleta desses padrões. Existem pacotes de software para esse fim. Eles procuram padrões que são lucrativos por algum critério definido pelo usuário, e usá-los para criar uma função de detecção de padrões específica. Poderia parecer este (do analisador de padrão Zorro & # 8217; s):
Esta função C retorna 1 quando os sinais correspondem a um dos padrões, caso contrário, você pode ver do longo código que esta não é a maneira mais rápida de detectar padrões. Um método melhor, usado pelo Zorro quando a função de detecção não precisa ser exportada, é classificar os sinais por sua magnitude e verificar a ordem de classificação. Um exemplo desse sistema pode ser encontrado aqui.
O mercado de ações de preços pode realmente funcionar? Assim como a sopa de indicadores, ela não é baseada em nenhum modelo financeiro racional. Pode-se, na melhor das hipóteses, imaginar que as seqüências de movimentos de preços levem os participantes do mercado a reagirem de uma certa maneira, estabelecendo assim um padrão preditivo temporário. No entanto, o número de padrões é bastante limitado quando você olha apenas as seqüências de algumas velas adjacentes. O próximo passo é comparar velas que não são adjacentes, mas arbitrariamente selecionadas dentro de um período de tempo mais longo. Desta forma, você está obtendo um número quase ilimitado de padrões & # 8211; mas à custa de deixar finalmente o reino do racional. É difícil imaginar como um movimento de preços pode ser previsto por alguns padrões de velas de semanas atrás.
Ainda assim, há muito esforço para isso. Um colega de blogueiro, Daniel Fernandez, administra um site de inscrição (Asirikuy) especializado em padrões de vela de dados minerados. Ele refinou o padrão de negociação até os menores detalhes, e se alguém conseguisse algum lucro desta forma, seria ele. Mas para seus assinantes & # 8217; desapontamento, trocando seus padrões ao vivo (QuriQuant) produziu resultados muito diferentes do que seus maravilhosos backtests. Se os sistemas de ação de preço rentáveis realmente existem, aparentemente ninguém já os encontrou.
3. Regressão linear.
A base simples de muitos algoritmos complexos de aprendizagem de máquina: Prever a variável alvo y por uma combinação linear dos preditores x 1 .. x n.
Os coeficientes a n são o modelo. Eles são calculados para minimizar a soma de diferenças quadradas entre os valores verdadeiros de y das amostras de treino e seus i preditos a partir da fórmula acima:
Para amostras distribuídas normais, a minimização é possível com alguma aritmética da matriz, portanto, nenhuma iteração é necessária. No caso n = 1 & # 8211; com apenas uma variável preditor x & # 8211; a fórmula de regressão é reduzida para.
que é uma regressão linear simples, em oposição à regressão linear multivariada onde n & gt; 1. A regressão linear simples está disponível na maioria das plataformas de negociação, f. i. com o indicador LinReg no TA-Lib. Com y = preço e x = tempo, muitas vezes usado como alternativa para uma média móvel. A regressão linear multivariada está disponível na plataforma R através da função lm (...) que vem com a instalação padrão. Uma variante é a regressão polinomial. Como regressão simples, ele usa apenas uma variável preditor x, mas também seus graus quadrados e superiores, de modo que x n == x n:
Com n = 2 ou n = 3, a regressão polinomial é freqüentemente usada para prever o próximo preço médio a partir dos preços suavizados das últimas barras. A função polyfit de MatLab, R, Zorro e muitas outras plataformas podem ser usadas para regressão polinomial.
4. Perceptron.
Muitas vezes referido como uma rede neural com apenas um neurônio. Na verdade, um perceptron é uma função de regressão como acima, mas com um resultado binário, assim chamado de regressão logística. Não é regressão, é um algoritmo de classificação. A função de recomendação do Zorro (PERCEPTRON, & # 8230;) gera código C que retorna 100 ou -100, dependendo se o resultado previsto está acima de um limite ou não:
Você pode ver que a matriz sig é equivalente às características x n na fórmula de regressão, e os fatores numéricos são os coeficientes a n.
5. Redes nacionais.
A regressão linear ou logística só pode resolver problemas lineares. Muitos não se enquadram nessa categoria & # 8211; um exemplo famoso é prever a saída de uma função XOR simples. E provavelmente também previsão de preços ou retornos comerciais. Uma rede neural artificial (ANN) pode enfrentar problemas não-lineares. É um monte de perceptrons que estão conectados em uma série de camadas. Qualquer perceptron é um neurônio da rede. Sua saída vai para as entradas de todos os neurônios da próxima camada, como esta:
Como o perceptron, uma rede neural também aprende determinando os coeficientes que minimizam o erro entre a previsão da amostra e o alvo da amostra. Mas isso exige agora um processo de aproximação, normalmente com backpropagating o erro da saída para as entradas, otimizando os pesos a caminho. Este processo impõe duas restrições. Primeiro, as saídas do neurônio devem agora ser continuamente funções diferenciáveis em vez do limiar de perceptron simples. Em segundo lugar, a rede não deve ser muito profunda e # 8211; não deve ter muitas camadas escondidas & # 8217; de neurônios entre entradas e saída. Esta segunda restrição limita a complexidade dos problemas que uma rede neural padrão pode resolver.
Ao usar uma rede neural para previsão de negociações, você tem muitos parâmetros com os quais você pode brincar e, se você não for cuidadoso, produza muitos tipos de seleção:
Número de camadas ocultas Número de neurônios por camada oculta Número de ciclos de backpropagation, épocas nomeadas Taxa de aprendizado, a largura do passo de uma Momência de época, um fator de inércia para a função de ativação da adaptação de pesos.
A função de ativação emula o limite de perceptron. Para o backpropagation você precisa de uma função continuamente diferenciável que gere um & # 8216; soft & # 8217; passo com um certo valor x. Normalmente, é utilizada uma função sigmoide, tanh ou softmax. Às vezes, também é uma função linear que apenas retorna a soma ponderada de todas as entradas. Nesse caso, a rede pode ser usada para regressão, para prever um valor numérico em vez de um resultado binário.
As redes neurais estão disponíveis na instalação R padrão (nnet, uma única rede de camada oculta) e em muitos pacotes, por exemplo RSNNS e FCNN4R.
6. Aprendizagem profunda.
Métodos de aprendizado profundo usam redes neurais com muitas camadas ocultas e milhares de neurônios, que não podem ser treinados de forma efetiva por backpropagation convencional. Vários métodos tornaram-se populares nos últimos anos para treinar tais redes enormes. Eles costumam pré-treinar as camadas do neurônio escondido para alcançar um processo de aprendizagem mais eficaz. Uma Máquina Boltzmann Restrita (RBM) é um algoritmo de classificação não supervisionado com uma estrutura de rede especial que não possui conexões entre os neurônios ocultos. Um auto-codificador esparso (SAE) usa uma estrutura de rede convencional, mas pré-treina as camadas ocultas de forma inteligente, reproduzindo os sinais de entrada nas saídas da camada com o menor número possível de conexões ativas. Esses métodos permitem redes muito complexas para lidar com tarefas de aprendizagem muito complexas. Como bater o melhor jogador humano do mundo.
As redes de aprendizagem profunda estão disponíveis nos pacotes Deepnet e Darch R. Deepnet fornece um autoencoder, Darch uma máquina Boltzmann restrito. Eu ainda não experimentei com o Darch, mas aqui é um exemplo de script R usando o autoencoder Deepnet com 3 camadas ocultas para sinais comerciais através da função neural () do Zorro & # 8217;
7. Suporte máquinas vetoriais.
Como uma rede neural, uma máquina de vetor de suporte (SVM) é outra extensão da regressão linear. Quando olhamos novamente para a fórmula de regressão,
podemos interpretar os recursos x n como coordenadas de um espaço de recursos n-dimensional. Definir a variável de destino y para um valor fixo determina um plano nesse espaço, chamado de hiperplane, pois possui mais de duas dimensões (na verdade, n-1). O hiperplane separa as amostras com y & gt; o das amostras com y & lt; 0. Os coeficientes a n podem ser calculados de forma a que as distâncias do plano para as amostras mais próximas # 8211; que são chamados de & # 8216; vetores de suporte & # 8217; do plano, daí o nome do algoritmo & # 8211; é o máximo. Desta forma, temos um classificador binário com a separação ideal de amostras vencedoras e perdidas.
O problema: normalmente, essas amostras não são linearmente separáveis e # 8211; Eles estão espalhados irregularmente no espaço de recursos. Nenhum avião plano pode ser espremido entre vencedores e perdedores. Se pudesse, tínhamos métodos mais simples para calcular esse avião, f. i. análise discriminante linear. Mas, no caso comum, precisamos do truque SVM: adicionando mais dimensões ao espaço de recursos. Para isso, o algoritmo SVM produz mais recursos com uma função kernel que combina dois preditores existentes para um novo recurso. Isso é análogo ao passo acima, desde a regressão simples até a regressão polinomial, onde também são adicionados mais recursos, levando o único preditor ao n-ésimo poder. Quanto mais dimensões você adiciona, mais fácil é separar as amostras com um hiperplano plano. Este plano é então transformado de volta para o espaço n-dimensional original, ficando enrugado e amassado no caminho. Através da seleção inteligente da função kernel, o processo pode ser executado sem realmente calcular a transformação.
À semelhança das redes neurais, os SVMs podem ser utilizados não apenas para classificação, mas também para regressão. Eles também oferecem alguns parâmetros para otimizar e possivelmente superar o processo de previsão:
Função Kernel. Você normalmente usa um kernel RBF (função de base radial, um kernel simétrico), mas você também tem a escolha de outros kernels, como sigmoid, polynomial e linear. Gamma, a largura do kernel RBF Custo parâmetro C, & # 8216; penalidade & # 8217; para classificações erradas nas amostras de treino.
Um SVM usado frequentemente é a biblioteca libsvm. Ele também está disponível em R no pacote e1071. Na próxima e última parte desta série, planejo descrever uma estratégia comercial usando este SVM.
8. K-vizinho mais próximo.
Comparado com as coisas pesadas de ANN e SVM, esse é um bom algoritmo simples com uma propriedade única: não precisa de treinamento. Então as amostras são o modelo. Você poderia usar esse algoritmo para um sistema comercial que aprenda permanentemente simplesmente adicionando mais e mais amostras. O algoritmo vizinho mais próximo calcula as distâncias no espaço de recursos dos valores de recurso atuais para as amostras mais próximas do k. Uma distância no espaço n-dimensional entre dois conjuntos de recursos (x 1 .. x n) e (y 1 .. y n) é calculada exatamente como em 2 dimensões:
O algoritmo simplesmente prediz o alvo da média das k variáveis alvo das amostras mais próximas, ponderadas por suas distâncias inversas. Pode ser usado para classificação, bem como para regressão. Os truques de software emprestados a partir de gráficos de computador, como uma árvore binária adaptativa (ABT), podem fazer com que o vizinho mais próximo busque muito rápido. Na minha vida passada como programador de jogos de computador, usamos esses métodos em jogos para tarefas como inteligência inimiga de auto-aprendizagem. Você pode chamar a função knn em R para a previsão do vizinho mais próximo e # 8211; ou escreva uma função simples em C para esse propósito.
Este é um algoritmo de aproximação para classificação não supervisionada. Tem alguma semelhança, não apenas com o nome, com o vizinho mais próximo. Para classificar as amostras, o algoritmo primeiro coloca k pontos aleatórios no espaço de recursos. Em seguida, atribui a qualquer um desses pontos todas as amostras com as menores distâncias a ele. O ponto é então movido para a média dessas amostras mais próximas. Isso gerará uma nova atribuição de amostras, uma vez que algumas amostras estão agora mais próximas de outro ponto. O processo é repetido até a atribuição não mudar mais movendo os pontos, isto é, cada ponto está exatamente na média das amostras mais próximas. Agora temos k classes de amostras, cada uma na vizinhança de um dos pontos k.
Este algoritmo simples pode produzir resultados surpreendentemente bons. Em R, a função kmeans faz o truque. Um exemplo do algoritmo k-means para classificar padrões de velas pode ser encontrado aqui: classificação de castiçal não supervisionada para diversão e lucro.
10. Naive Bayes.
Este algoritmo usa Bayes & # 8217; Teorema para classificar amostras de características não numéricas (isto é, eventos), como os padrões de vela acima mencionados. Suponha que um evento X (por exemplo, que o Open da barra anterior esteja abaixo do Open da barra atual) aparece em 80% de todas as amostras vencedoras. Qual é então a probabilidade de uma amostra estar ganhando quando contém evento X? Não é 0.8 como você pensa. A probabilidade pode ser calculada com Bayes & # 8217; Teorema:
P (Y | X) é a probabilidade de que o evento Y (f. i. winning) ocorra em todas as amostras contendo evento X (no nosso exemplo, Abrir (1) & lt; Abrir (0)). De acordo com a fórmula, é igual à probabilidade de X ocorrer em todas as amostras vencedoras (aqui, 0,8), multiplicado pela probabilidade de Y em todas as amostras (cerca de 0,5 quando você seguiu meu conselho acima de amostras equilibradas) e dividido por a probabilidade de X em todas as amostras.
Se somos ingênuos e assumimos que todos os eventos X são independentes um do outro, podemos calcular a probabilidade geral de que uma amostra ganhe simplesmente multiplicando as probabilidades P (X | winning) para cada evento X. Desta forma, acabamos com esta fórmula:
com um fator de escala s. Para que a fórmula funcione, os recursos devem ser selecionados de forma que sejam o mais independentes possível, o que impõe um obstáculo ao uso de Naive Bayes na negociação. Por exemplo, os dois eventos fecham (1) & lt; Fechar (0) e Abrir (1) & lt; Open (0) provavelmente não são independentes um do outro. Os preditores numéricos podem ser convertidos em eventos dividindo o número em intervalos separados.
O algoritmo Naive Bayes está disponível no omnipresente pacote e1071 R.
11. Árvores de decisão e regressão.
Essas árvores predizem um resultado ou um valor numérico com base em uma série de decisões sim / não, em uma estrutura como os ramos de uma árvore. Qualquer decisão é a presença de um evento ou não (no caso de características não numerais) ou uma comparação de um valor de recurso com um limite fixo. Uma função de árvore típica, gerada pelo construtor de árvores do Zorro & # 8217; parece assim:
Como uma tal árvore é produzida a partir de um conjunto de amostras? Existem vários métodos; Zorro usa a entropia Shannon i nformation, que já teve uma aparição neste blog no artigo Scalping. No começo, verifica um dos recursos, digamos x 1. Coloca um hiperplano com a fórmula plana x 1 = t no espaço da característica. Este hiperplato separa as amostras com x 1 & gt; t das amostras com x 1 & lt; t. O limite de divisão t é selecionado de modo que o ganho de informação & # 8211; a diferença de entropia de informação de todo o espaço, a soma das entropias de informação dos dois sub-espaços divididos e # 8211; é o máximo. Este é o caso quando as amostras nos subespaços são mais parecidas entre si que as amostras em todo o espaço.
Este processo é então repetido com o próximo recurso x 2 e dois hiperplanos dividindo os dois subespaços. Cada divisão é equivalente a uma comparação de um recurso com um limite. Por fraccionamento repetido, logo obteremos uma enorme árvore com milhares de comparações de limiar. Em seguida, o processo é executado para trás pela poda da árvore e remoção de todas as decisões que não levam a um aumento substancial de informações. Finalmente, acabamos com uma árvore relativamente pequena como no código acima.
As árvores de decisão possuem uma ampla gama de aplicações. Eles podem produzir excelentes previsões superiores às das redes neurais ou às máquinas de vetor de suporte. Mas eles não são uma solução única, já que seus planos de divisão são sempre paralelos aos eixos do espaço de recursos. Isso limita um pouco suas previsões. Eles podem ser usados não só para classificação, mas também para regressão, por exemplo, retornando a porcentagem de amostras que contribuem para um determinado ramo da árvore. A árvore do Zorro é uma árvore de regressão. O algoritmo de árvore de classificação mais conhecido é C5.0, disponível no pacote C50 para R.
Para melhorar a previsão ainda mais ou superar a limitação do eixo paralelo, um conjunto de árvores pode ser usado, chamado floresta aleatória. A previsão é então gerada pela média ou votação das previsões das árvores individuais. As florestas aleatórias estão disponíveis em pacotes R randomForest, ranger e Rborist.
Conclusão.
Existem vários métodos diferentes de mineração de dados e aprendizagem de máquinas à sua disposição. A questão crítica: o que é melhor, uma estratégia de aprendizagem baseada em modelos ou a máquina? Não há dúvida de que o aprendizado automático da máquina tem muitas vantagens. Você não precisa se preocupar com a microestrutura do mercado, a economia, a psicologia do comerciante ou coisas suaves semelhantes. Você pode se concentrar na matemática pura. O aprendizado de máquina é uma maneira muito mais elegante e atraente de gerar sistemas de comércio. Ele tem todas as vantagens do seu lado, mas um. Apesar de todos os tópicos entusiasmados nos fóruns de comerciantes, ele tende a falhar misteriosamente na negociação ao vivo.
A cada segunda semana, um novo artigo sobre comércio com métodos de aprendizagem de máquinas é publicado (alguns podem ser encontrados abaixo). Pegue todas essas publicações com um grão de sal. De acordo com alguns papéis, as taxas de ganhos fantásticos na faixa de 70%, 80% ou mesmo 85% foram alcançadas. Embora a taxa de ganhos não seja o único critério relevante & # 8211; você pode perder mesmo com uma alta taxa de vitória e # 8211; 85% de precisão na previsão de trades é normalmente equivalente a um fator de lucro acima de 5. Com esse sistema, os cientistas envolvidos devem ser bilionários enquanto isso. Infelizmente, eu nunca consegui reproduzir as taxas de vitórias com o método descrito, e nem chegou perto. Então, talvez um monte de viés de seleção tenha entrado nos resultados. Ou talvez eu seja muito estúpido.
Em comparação com as estratégias baseadas em modelos, eu não vi muitos sistemas de aprendizado de máquina bem sucedidos até agora. E do que se ouve sobre os métodos algorítmicos por hedge funds bem-sucedidos, a aprendizagem por máquinas parece ainda raramente ser usada. Mas talvez isso mude no futuro com a disponibilidade de mais poder de processamento e a próxima de novos algoritmos para aprendizagem profunda.
Classificação usando redes neurais profundas: Dixon. et. al.2018 Previsão de direção de preço usando ANN & amp; SVM: Kara. et. al.2018 Comparação empírica de algoritmos de aprendizagem: Caruana. et. al.2006 Tendência do mercado de ações de mineração com GA & amp; SVM: Yu. Wang. Lai.2005.
A próxima parte desta série tratará do desenvolvimento prático de uma estratégia de aprendizado de máquinas.
30 pensamentos sobre & ldquo; Better Strategies 4: Machine Learning & rdquo;
Bela postagem. Existe uma grande quantidade de potencial nessa abordagem em relação ao mercado.
Btw você está usando o editor de código que vem com zorro? como é possível obter essa configuração de cor?
O script colorido é produzido pelo WordPress. Você não pode mudar as cores no editor do Zorro, mas você pode substituí-lo por outros editores que suportem cores individuais, por exemplo Notepad ++.
É então possível que o bloco de notas detecte as variáveis zorro nos scripts? Quero dizer que o BarPeriod é comentado como está com o editor zorro?
Teoricamente sim, mas para isso você precisou configurar o destaque de sintaxe do Notepad ++ e digitar todas as variáveis na lista. Tanto quanto eu sei, o Notepad ++ também não pode ser configurado para exibir a descrição da função em uma janela, como faz o editor Zorro. Não existe uma ferramenta perfeita e # 8230;
Conforme o último parágrafo. Eu tentei muitas técnicas de aprendizado de máquina depois de ler vários & # 8216; peer reviewed & # 8217; papéis. Mas reproduzir seus resultados permanece indescritível. Quando eu vivo teste com ML, eu não posso parecer melhorar a entrada aleatória.
ML falha ao vivo? Talvez o treinamento do ML tenha que ser feito com dados de preços que incluam também o spread histórico, roll, tick e assim por diante?
Eu acho que o motivo # 1 para falha ao vivo é o viés de mineração de dados, causado por seleção tendenciosa de entradas e parâmetros para o algo.
Obrigado ao autor pela grande série de artigos.
No entanto, deve-se notar que não precisamos restringir nossa visão ao prever apenas o próximo movimento de preços. Pode acontecer que o próximo movimento vá contra o nosso comércio em 70% dos casos, mas ainda vale a pena fazer um comércio. Isso acontece quando o preço finalmente vai para a direção certa, mas antes disso pode fazer alguns passos contra nós. Se atrasarmos o comércio por um passo de preço, não entraremos nos 30% mencionados das negociações, mas para isso aumentamos o resultado do passo de preço de 70% por um preço. So the criteria is which value is higher: N*average_result or 0.7*N*(avergae_result + price_step).
Bela postagem. If you just want to play around with some machine learning, I implemented a very simple ML tool in python and added a GUI. It’s implemented to predict time series.
Thanks JCL I found very interesting your article. I would like to ask you, from your expertise in trading, where can we download reliable historical forex data? I consider it very important due to the fact that Forex market is decentralized.
Desde já, obrigado!
There is no really reliable Forex data, since every Forex broker creates their own data. They all differ slightly dependent on which liquidity providers they use. FXCM has relatively good M1 and tick data with few gaps. You can download it with Zorro.
Thanks for writing such a great article series JCL… a thoroughly enjoyable read!
I have to say though that I don’t view model-based and machine learning strategies as being mutually exclusive; I have had some OOS success by using a combination of the elements you describe.
To be more exact, I begin the system generation process by developing a ‘traditional’ mathematical model, but then use a set of online machine learning algorithms to predict the next terms of the various different time series (not the price itself) that are used within the model. The actual trading rules are then derived from the interactions between these time series. So in essence I am not just blindly throwing recent market data into an ML model in an effort to predict price action direction, but instead develop a framework based upon sound investment principles in order to point the models in the right direction. I then data mine the parameters and measure the level of data-mining bias as you’ve described also.
It’s worth mentioning however that I’ve never had much success with Forex.
Anyway, best of luck with your trading and keep up the great articles!
Thanks for posting this great mini series JCL.
I recently studied a few latest papers about ML trading, deep learning especially. Yet I found that most of them valuated the results without risk-adjusted index, i. e., they usually used ROC curve, PNL to support their experiment instead of Sharpe Ratio, for example.
Also, they seldom mentioned about the trading frequency in their experiment results, making it hard to valuate the potential profitability of those methods. Por que é que? Do you have any good suggestions to deal with those issues?
ML papers normally aim for high accuracy. Equity curve variance is of no interest. This is sort of justified because the ML prediction quality determines accuracy, not variance.
Of course, if you want to really trade such a system, variance and drawdown are important factors. A system with lower accuracy and worse prediction can in fact be preferable when it’s less dependent on market condictions.
“In fact the most popular – and surprisingly profitable – data mining method works without any fancy neural networks or support vector machines.”
Would you please name those most popular & surprisingly profitable ones. So I could directly use them.
I was referring to the Indicator Soup strategies. For obvious reasons I can’t disclose details of such a strategy, and have never developed such systems myself. We’re merely coding them. But I can tell that coming up with a profitable Indicator Soup requires a lot of work and time.
Well, i am just starting a project which use simple EMAs to predict price, it just select the correct EMAs based on past performance and algorithm selection that make some rustic degree of intelligence.
Jonathan. orrego@gmail offers services as MT4 EA programmer.
Thanks for the good writeup. It in reality used to be a leisure account it.
Look complicated to more delivered agreeable from you!
By the way, how could we be in contact?
There are following issues with ML and with trading systems in general which are based on historical data analysis:
1) Historical data doesn’t encode information about future price movements.
Future price movement is independent and not related to the price history. There is absolutely no reliable pattern which can be used to systematically extract profits from the market. Applying ML methods in this domain is simply pointless and doomed to failure and is not going to work if you search for a profitable system. Of course you can curve fit any past period and come up with a profitable system for it.
The only thing which determines price movement is demand and supply and these are often the result of external factors which cannot be predicted. For example: a war breaks out somewhere or other major disaster strikes or someone just needs to buy a large amount of a foreign currency for some business/investment purpose. These sort of events will cause significant shifts in the demand supply structure of the FX market . As a consequence, prices begin to move but nobody really cares about price history just about the execution of the incoming orders. An automated trading system can only be profitable if it monitors a significant portion of the market and takes the supply and demand into account for making a trading decision. But this is not the case with any of the systems being discussed here.
2) Race to the bottom.
Even if (1) wouldn’t be true and there would be valuable information encoded in historical price data, you would still face following problem: there are thousands of gold diggers out there, all of them using similar methods and even the same tools to search for profitable systems and analyze the same historical price data. As a result, many of them will discover the same or very similar “profitable” trading systems and when they begin actually trading those systems, they will become less and less profitable due to the nature of the market.
The only sure winners in this scenario will be the technology and tool vendors.
I will be still keeping an eye on your posts as I like your approach and the scientific vigor you apply. Your blog is the best of its kind – keep the good work!
One hint: there are profitable automated systems, but they are not based on historical price data but on proprietary knowledge about the market structure and operations of the major institutions which control these markets. Let’s say there are many inefficiencies in the current system but you absolutely have no chance to find the information about those by analyzing historical price data. Instead you have to know when and how the institutions will execute market moving orders and front run them.
Thanks for the extensive comment. I often hear these arguments and they sound indeed intuitive, only problem is that they are easily proven wrong. The scientific way is experiment, not intuition. Simple tests show that past and future prices are often correlated – otherwise every second experiment on this blog had a very different outcome. Many successful funds, for instance Jim Simon’s Renaissance fund, are mainly based on algorithmic prediction.
One more thing: in my comment I have been implicitly referring to the buy side (hedge funds, traders etc) not to the sell side (market makers, banks). The second one has always the edge because they sell at the ask and buy at the bid, pocketing the spread as an additional profit to any strategy they might be running. Regarding Jim Simon’s Renaissance: I am not so sure if they have not transitioned over the time to the sell side in order to stay profitable. There is absolutely no information available about the nature of their business besides the vague statement that they are using solely quantitative algorithmic trading models…
Thanks for the informative post!
Regarding the use of some of these algorithms, a common complaint which is cited is that financial data is non-stationary…Do you find this to be a problem? Couldn’t one just use returns data instead which is (I think) stationary?
Yes, this is a problem for sure. If financial data were stationary, we’d all be rich. I’m afraid we have to live with what it is. Returns are not any more stationary than other financial data.
Hello sir, I developed some set of rules for my trading which identifies supply demand zones than volume and all other criteria. Can you help me to make it into automated system ?? If i am gonna do that myself then it can take too much time. Please contact me at svadukia@gmail if you are interested.
Sure, please contact my employer at info@opgroup. de. They’ll help.
I have noticed you don’t monetize your page, don’t waste your traffic,
you can earn extra bucks every month because you’ve got high quality content.
If you want to know how to make extra $$$, search for: Mrdalekjd methods for $$$
Technical analysis has always been rejected and looked down upon by quants, academics, or anyone who has been trained by traditional finance theories. I have worked for proprietary trading desk of a first tier bank for a good part of my career, and surrounded by those ivy-league elites with background in finance, math, or financial engineering. I must admit none of those guys knew how to trade directions. They were good at market making, product structures, index arb, but almost none can making money trading directions. Por quê? Because none of these guys believed in technical analysis. Then again, if you are already making your millions why bother taking the risk of trading direction with your own money. For me luckily my years of training in technical analysis allowed me to really retire after laying off from the great recession. I look only at EMA, slow stochastics, and MACD; and I have made money every year since started in 2009. Technical analysis works, you just have to know how to use it!!
Melhores estratégias 5: um sistema de aprendizado de máquina a curto prazo.
It’s time for the 5th and final part of the Build Better Strategies series. In part 3 we’ve discussed the development process of a model-based system, and consequently we’ll conclude the series with developing a data-mining system. The principles of data mining and machine learning have been the topic of part 4. For our short-term trading example we’ll use a deep learning algorithm , a stacked autoencoder, but it will work in the same way with many other machine learning algorithms. With today’s software tools, only about 20 lines of code are needed for a machine learning strategy. I’ll try to explain all steps in detail.
Our example will be a research project – a machine learning experiment for answering two questions. Does a more complex algorithm – such as, more neurons and deeper learning – produce a better prediction? And are short-term price moves predictable by short-term price history? The last question came up due to my scepticism about price action trading in the previous part of this series. I got several emails asking about the “trading system generators” or similar price action tools that are praised on some websites. There is no hard evidence that such tools ever produced any profit (except for their vendors) – but does this mean that they all are garbage? We’ll see.
Our experiment is simple: We collect information from the last candles of a price curve, feed it in a deep learning neural net, and use it to predict the next candles. My hypothesis is that a few candles don’t contain any useful predictive information. Of course, a nonpredictive outcome of the experiment won’t mean that I’m right, since I could have used wrong parameters or prepared the data badly. But a predictive outcome would be a hint that I’m wrong and price action trading can indeed be profitable.
Machine learning strategy development.
Step 1: The target variable.
To recap the previous part: a supervised learning algorithm is trained with a set of features in order to predict a target variable . So the first thing to determine is what this target variable shall be. A popular target, used in most papers, is the sign of the price return at the next bar. Better suited for prediction, since less susceptible to randomness, is the price difference to a more distant prediction horizon , like 3 bars from now, or same day next week. Like almost anything in trading systems, the prediction horizon is a compromise between the effects of randomness (less bars are worse) and predictability (less bars are better).
Sometimes you’re not interested in directly predicting price, but in predicting some other parameter – such as the current leg of a Zigzag indicator – that could otherwise only be determined in hindsight. Or you want to know if a certain market inefficiency will be present in the next time, especially when you’re using machine learning not directly for trading, but for filtering trades in a model-based system. Or you want to predict something entirely different, for instance the probability of a market crash tomorrow. All this is often easier to predict than the popular tomorrow’s return.
In our price action experiment we’ll use the return of a short-term price action trade as target variable. Once the target is determined, next step is selecting the features.
Step 2: The features.
A price curve is the worst case for any machine learning algorithm. Not only does it carry little signal and mostly noise , it is also nonstationary and the signal/noise ratio changes all the time. The exact ratio of signal and noise depends on what is meant with “signal”, but it is normally too low for any known machine learning algorithm to produce anything useful. So we must derive features from the price curve that contain more signal and less noise. Signal, in that context, is any information that can be used to predict the target, whatever it is. All the rest is noise.
Thus, selecting the features is critical for success – much more critical than deciding which machine learning algorithm you’re going to use. There are two approaches for selecting features. The first and most common is extracting as much information from the price curve as possible. Since you do not know where the information is hidden, you just generate a wild collection of indicators with a wide range of parameters, and hope that at least a few of them will contain the information that the algorithm needs. This is the approach that you normally find in the literature. The problem of this method: Any machine learning algorithm is easily confused by nonpredictive predictors. So it won’t do to just throw 150 indicators at it. You need some preselection algorithm that determines which of them carry useful information and which can be omitted. Without reducing the features this way to maybe eight or ten, even the deepest learning algorithm won’t produce anything useful.
The other approach, normally for experiments and research, is using only limited information from the price curve. This is the case here: Since we want to examine price action trading, we only use the last few prices as inputs, and must discard all the rest of the curve. This has the advantage that we don’t need any preselection algorithm since the number of features is limited anyway. Here are the two simple predictor functions that we use in our experiment (in C):
The two functions are supposed to carry the necessary information for price action: per-bar movement and volatility. The change function is the difference of the current price to the price of n bars before, divided by the current price. The range function is the total high-low distance of the last n candles, also in divided by the current price. And the scale function centers and compresses the values to the +/-100 range, so we divide them by 100 for getting them normalized to +/-1 . We remember that normalizing is needed for machine learning algorithms.
Step 3: Preselecting/preprocessing predictors.
When you have selected a large number of indicators or other signals as features for your algorithm, you must determine which of them is useful and which not. There are many methods for reducing the number of features, for instance:
Determine the correlations between the signals. Remove those with a strong correlation to other signals, since they do not contribute to the information. Compare the information content of signals directly, with algorithms like information entropy or decision trees. Determine the information content indirectly by comparing the signals with randomized signals; there are some software libraries for this, such as the R Boruta package. Use an algorithm like Principal Components Analysis (PCA) for generating a new signal set with reduced dimensionality. Use genetic optimization for determining the most important signals just by the most profitable results from the prediction process. Great for curve fitting if you want to publish impressive results in a research paper.
For our experiment we do not need to preselect or preprocess the features, but you can find useful information about this in articles (1), (2), and (3) listed at the end of the page.
Step 4: Select the machine learning algorithm.
R offers many different ML packages, and any of them offers many different algorithms with many different parameters. Even if you already decided about the method – here, deep learning – you have still the choice among different approaches and different R packages. Most are quite new, and you can find not many empirical information that helps your decision. You have to try them all and gain experience with different methods. For our experiment we’ve choosen the Deepnet package, which is probably the simplest and easiest to use deep learning library. This keeps our code short. We’re using its Stacked Autoencoder ( SAE ) algorithm for pre-training the network. Deepnet also offers a Restricted Boltzmann Machine ( RBM ) for pre-training, but I could not get good results from it. There are other and more complex deep learning packages for R, so you can spend a lot of time checking out all of them.
How pre-training works is easily explained, but why it works is a different matter. As to my knowledge, no one has yet come up with a solid mathematical proof that it works at all. Anyway, imagine a large neural net with many hidden layers:
Training the net means setting up the connection weights between the neurons. The usual method is error backpropagation. But it turns out that the more hidden layers you have, the worse it works. The backpropagated error terms get smaller and smaller from layer to layer, causing the first layers of the net to learn almost nothing. Which means that the predicted result becomes more and more dependent of the random initial state of the weights. This severely limited the complexity of layer-based neural nets and therefore the tasks that they can solve. At least until 10 years ago.
In 2006 scientists in Toronto first published the idea to pre-train the weights with an unsupervised learning algorithm, a restricted Boltzmann machine. This turned out a revolutionary concept. It boosted the development of artificial intelligence and allowed all sorts of new applications from Go-playing machines to self-driving cars. In the case of a stacked autoencoder, it works this way:
Select the hidden layer to train; begin with the first hidden layer. Connect its outputs to a temporary output layer that has the same structure as the network’s input layer. Feed the network with the training samples, but without the targets. Train it so that the first hidden layer reproduces the input signal – the features – at its outputs as exactly as possible. The rest of the network is ignored. During training, apply a ‘weight penalty term’ so that as few connection weights as possible are used for reproducing the signal. Now feed the outputs of the trained hidden layer to the inputs of the next untrained hidden layer, and repeat the training process so that the input signal is now reproduced at the outputs of the next layer. Repeat this process until all hidden layers are trained. We have now a ‘sparse network’ with very few layer connections that can reproduce the input signals. Now train the network with backpropagation for learning the target variable, using the pre-trained weights of the hidden layers as a starting point.
The hope is that the unsupervised pre-training process produces an internal noise-reduced abstraction of the input signals that can then be used for easier learning the target. And this indeed appears to work. No one really knows why, but several theories – see paper (4) below – try to explain that phenomenon.
Step 5: Generate a test data set.
We first need to produce a data set with features and targets so that we can test our prediction process and try out parameters. The features must be based on the same price data as in live trading, and for the target we must simulate a short-term trade. So it makes sense to generate the data not with R, but with our trading platform, which is anyway a lot faster. Here’s a small Zorro script for this, DeepSignals. c :
We’re generating 2 years of data with features calculated by our above defined change and range functions. Our target is the result of a trade with 3 bars life time. Trading costs are set to zero, so in this case the result is equivalent to the sign of the price difference at 3 bars in the future. The adviseLong function is described in the Zorro manual; it is a mighty function that automatically handles training and predicting and allows to use any R-based machine learning algorithm just as if it were a simple indicator.
In our code, the function uses the next trade return as target, and the price changes and ranges of the last 4 bars as features. The SIGNALS flag tells it not to train the data, but to export it to a. csv file. The BALANCED flag makes sure that we get as many positive as negative returns; this is important for most machine learning algorithms. Run the script in [Train] mode with our usual test asset EUR/USD selected. It generates a spreadsheet file named DeepSignalsEURUSD_L. csv that contains the features in the first 8 columns, and the trade return in the last column.
Step 6: Calibrate the algorithm.
Complex machine learning algorithms have many parameters to adjust. Some of them offer great opportunities to curve-fit the algorithm for publications. Still, we must calibrate parameters since the algorithm rarely works well with its default settings. For this, here’s an R script that reads the previously created data set and processes it with the deep learning algorithm ( DeepSignal. r ):
We’ve defined three functions neural. train , neural. predict , and neural. init for training, predicting, and initializing the neural net. The function names are not arbitrary, but follow the convention used by Zorro’s advise(NEURAL. ) function. It doesn’t matter now, but will matter later when we use the same R script for training and trading the deep learning strategy. A fourth function, TestOOS , is used for out-of-sample testing our setup.
The function neural. init seeds the R random generator with a fixed value (365 is my personal lucky number). Otherwise we would get a slightly different result any time, since the neural net is initialized with random weights. It also creates a global R list named “Models”. Most R variable types don’t need to be created beforehand, some do (don’t ask me why). The ‘<<-‘ operator is for accessing a global variable from within a function.
The function neural. train takes as input a model number and the data set to be trained. The model number identifies the trained model in the “ Models ” list. A list is not really needed for this test, but we’ll need it for more complex strategies that train more than one model. The matrix containing the features and target is passed to the function as second parameter. If the XY data is not a proper matrix, which frequently happens in R depending on how you generated it, it is converted to one. Then it is split into the features ( X ) and the target ( Y ), and finally the target is converted to 1 for a positive trade outcome and 0 for a negative outcome.
The network parameters are then set up. Some are obvious, others are free to play around with:
The network structure is given by the hidden vector: c(50,100,50) defines 3 hidden layers, the first with 50, second with 100, and third with 50 neurons. That’s the parameter that we’ll later modify for determining whether deeper is better. The activation function converts the sum of neuron input values to the neuron output; most often used are sigmoid that saturates to 0 or 1, or tanh that saturates to -1 or +1.
We use tanh here since our signals are also in the +/-1 range. The output of the network is a sigmoid function since we want a prediction in the 0..1 range. But the SAE output must be “linear” so that the Stacked Autoencoder can reproduce the analog input signals on the outputs.
The learning rate controls the step size for the gradient descent in training; a lower rate means finer steps and possibly more precise prediction, but longer training time. Momentum adds a fraction of the previous step to the current one. It prevents the gradient descent from getting stuck at a tiny local minimum or saddle point. The learning rate scale is a multiplication factor for changing the learning rate after each iteration (I am not sure for what this is good, but there may be tasks where a lower learning rate on higher epochs improves the training). An epoch is a training iteration over the entire data set. Training will stop once the number of epochs is reached. More epochs mean better prediction, but longer training. The batch size is a number of random samples – a mini batch – taken out of the data set for a single training run. Splitting the data into mini batches speeds up training since the weight gradient is then calculated from fewer samples. The higher the batch size, the better is the training, but the more time it will take. The dropout is a number of randomly selected neurons that are disabled during a mini batch. This way the net learns only with a part of its neurons. This seems a strange idea, but can effectively reduce overfitting.
All these parameters are common for neural networks. Play around with them and check their effect on the result and the training time. Properly calibrating a neural net is not trivial and might be the topic of another article. The parameters are stored in the model together with the matrix of trained connection weights. So they need not to be given again in the prediction function, neural. predict . It takes the model and a vector X of features, runs it through the layers, and returns the network output, the predicted target Y . Compared with training, prediction is pretty fast since it only needs a couple thousand multiplications. If X was a row vector, it is transposed and this way converted to a column vector, otherwise the nn. predict function won’t accept it.
Use RStudio or some similar environment for conveniently working with R. Edit the path to the. csv data in the file above, source it, install the required R packages (deepnet, e1071, and caret), then call the TestOOS function from the command line. If everything works, it should print something like that:
TestOOS reads first our data set from Zorro’s Data folder. It splits the data in 80% for training ( XY. tr ) and 20% for out-of-sample testing ( XY. ts ). The training set is trained and the result stored in the Models list at index 1. The test set is further split in features ( X ) and targets ( Y ). Y is converted to binary 0 or 1 and stored in Y. ob , our vector of observed targets. We then predict the targets from the test set, convert them again to binary 0 or 1 and store them in Y. pr . For comparing the observation with the prediction, we use the confusionMatrix function from the caret package.
A confusion matrix of a binary classifier is simply a 2×2 matrix that tells how many 0’s and how many 1’s had been predicted wrongly and correctly. A lot of metrics are derived from the matrix and printed in the lines above. The most important at the moment is the 62% prediction accuracy . This may hint that I bashed price action trading a little prematurely. But of course the 62% might have been just luck. We’ll see that later when we run a WFO test.
A final advice: R packages are occasionally updated, with the possible consequence that previous R code suddenly might work differently, or not at all. This really happens, so test carefully after any update.
Step 7: The strategy.
Now that we’ve tested our algorithm and got some prediction accuracy above 50% with a test data set, we can finally code our machine learning strategy. In fact we’ve already coded most of it, we just must add a few lines to the above Zorro script that exported the data set. This is the final script for training, testing, and (theoretically) trading the system ( DeepLearn. c ):
We’re using a WFO cycle of one year, split in a 90% training and a 10% out-of-sample test period. You might ask why I have earlier used two year’s data and a different split, 80/20, for calibrating the network in step 5. This is for using differently composed data for calibrating and for walk forward testing. If we used exactly the same data, the calibration might overfit it and compromise the test.
The selected WFO parameters mean that the system is trained with about 225 days data, followed by a 25 days test or trade period. Thus, in live trading the system would retrain every 25 days, using the prices from the previous 225 days. In the literature you’ll sometimes find the recommendation to retrain a machine learning system after any trade, or at least any day. But this does not make much sense to me. When you used almost 1 year’s data for training a system, it can obviously not deteriorate after a single day. Or if it did, and only produced positive test results with daily retraining, I would strongly suspect that the results are artifacts by some coding mistake.
Training a deep network takes really a long time, in our case about 10 minutes for a network with 3 hidden layers and 200 neurons. In live trading this would be done by a second Zorro process that is automatically started by the trading Zorro. In the backtest, the system trains at any WFO cycle. Therefore using multiple cores is recommended for training many cycles in parallel. The NumCores variable at -1 activates all CPU cores but one. Multiple cores are only available in Zorro S, so a complete walk forward test with all WFO cycles can take several hours with the free version.
In the script we now train both long and short trades. For this we have to allow hedging in Training mode, since long and short positions are open at the same time. Entering a position is now dependent on the return value from the advise function, which in turn calls either the neural. train or the neural. predict function from the R script. So we’re here entering positions when the neural net predicts a result above 0.5.
The R script is now controlled by the Zorro script (for this it must have the same name, NeuralLearn. r , only with different extension). It is identical to our R script above since we’re using the same network parameters. Only one additional function is needed for supporting a WFO test:
The neural. save function stores the Models list – it now contains 2 models for long and for short trades – after every training run in Zorro’s Data folder. Since the models are stored for later use, we do not need to train them again for repeated test runs.
This is the WFO equity curve generated with the script above (EUR/USD, without trading costs):
EUR/USD equity curve with 50-100-50 network structure.
Although not all WFO cycles get a positive result, it seems that there is some predictive effect. The curve is equivalent to an annual return of 89%, achieved with a 50-100-50 hidden layer structure. We’ll check in the next step how different network structures affect the result.
Since the neural. init , neural. train , neural. predict , and neural. save functions are automatically called by Zorro’s adviseLong/adviseShort functions, there are no R functions directly called in the Zorro script. Thus the script can remain unchanged when using a different machine learning method. Only the DeepLearn. r script must be modified and the neural net, for instance, replaced by a support vector machine. For trading such a machine learning system live on a VPS, make sure that R is also installed on the VPS, the needed R packages are installed, and the path to the R terminal set up in Zorro’s ini file. Otherwise you’ll get an error message when starting the strategy.
Step 8: The experiment.
If our goal had been developing a strategy, the next steps would be the reality check, risk and money management, and preparing for live trading just as described under model-based strategy development. But for our experiment we’ll now run a series of tests, with the number of neurons per layer increased from 10 to 100 in 3 steps, and 1, 2, or 3 hidden layers (deepnet does not support more than 3). So we’re looking into the following 9 network structures: c(10), c(10,10), c(10,10,10), c(30), c(30,30), c(30,30,30), c(100), c(100,100), c(100,100,100). For this experiment you need an afternoon even with a fast PC and in multiple core mode. Here are the results (SR = Sharpe ratio, R2 = slope linearity):
We see that a simple net with only 10 neurons in a single hidden layer won’t work well for short-term prediction. Network complexity clearly improves the performance, however only up to a certain point. A good result for our system is already achieved with 3 layers x 30 neurons. Even more neurons won’t help much and sometimes even produce a worse result. This is no real surprise, since for processing only 8 inputs, 300 neurons can likely not do a better job than 100.
Conclusão.
Our goal was determining if a few candles can have predictive power and how the results are affected by the complexity of the algorithm. The results seem to suggest that short-term price movements can indeed be predicted sometimes by analyzing the changes and ranges of the last 4 candles. The prediction is not very accurate – it’s in the 58%..60% range, and most systems of the test series become unprofitable when trading costs are included. Still, I have to reconsider my opinion about price action trading. The fact that the prediction improves with network complexity is an especially convincing argument for short-term price predictability.
It would be interesting to look into the long-term stability of predictive price patterns. For this we had to run another series of experiments and modify the training period ( WFOPeriod in the script above) and the 90% IS/OOS split. This takes longer time since we must use more historical data. I have done a few tests and found so far that a year seems to be indeed a good training period. The system deteriorates with periods longer than a few years. Predictive price patterns, at least of EUR/USD, have a limited lifetime.
Where can we go from here? There’s a plethora of possibilities, for instance:
Use inputs from more candles and process them with far bigger networks with thousands of neurons. Use oversampling for expanding the training data. Prediction always improves with more training samples. Compress time series f. i. with spectal analysis and analyze not the candles, but their frequency representation with machine learning methods. Use inputs from many candles – such as, 100 – and pre-process adjacent candles with one-dimensional convolutional network layers. Use recurrent networks. Especially LSTM could be very interesting for analyzing time series – and as to my knowledge, they have been rarely used for financial prediction so far. Use an ensemble of neural networks for prediction, such as Aronson’s “oracles” and “comitees”.
Papers / Articles.
(3) V. Perervenko, Selection of Variables for Machine Learning.
I’ve added the C and R scripts to the 2018 script repository. You need both in Zorro’s Strategy folder. Zorro version 1.474, and R version 3.2.5 (64 bit) was used for the experiment, but it should also work with other versions.
69 thoughts on “Better Strategies 5: A Short-Term Machine Learning System”
I’ve tested your strategy using 30min AAPL data but “sae. dnn. train” returns all NaN in training.
(It works just decreasing neurons to less than (5,10,5)… but accuracy is 49%)
Can you help me to understand why?
Desde já, obrigado.
If you have not changed any SAE parameters, look into the. csv data. It is then the only difference to the EUR/USD test. Maybe something is wrong with it.
Another fantastic article, jcl. Zorro is a remarkable environment for these experiments. Thanks for sharing your code and your approach – this really opens up an incredible number of possibilities to anyone willing to invest the time to learn how to use Zorro.
The problem with AAPL 30min data was related to the normalizing method I used (X-mean/SD).
The features range was not between -1:1 and I assume that sae. dnn need it to work…
Anyway performances are not comparable to yours 🙂
I have one question:
why do you use Zorro for creating the features in the csv file and then opening it in R?
why not create the file with all the features in R in a few lines and do the training on the file when you are already in R? instead of getting inside Zorro and then to R.
When you want R to create the features, you must still transmit the price data and the targets from Zorro to R. So you are not gaining much. Creating the features in Zorro results usually in shorter code and faster training. Features in R make only sense when you need some R package for calculating them.
Really helpful and interesting article! I would like to know if there are any English version of the book:
“Das Börsenhackerbuch: Finanziell unabhängig durch algorithmische Handelssysteme”
I am really interested on it,
Not yet, but an English version is planned.
Thanks JCL! Please let me now when the English version is ready, because I am really interested on it.
Works superbly (as always). Muito Obrigado. One small note, if you have the package “dlm” loaded in R, TestOOS will fail with error: “Error in TestOOS() : cannot change value of locked binding for ‘X'”. This is due to there being a function X in the dlm package, so the name is locked when the package is loaded. Easily fixed by either renaming occurrences of the variable X to something else, or temporarily detaching the dlm package with: detach(“package:dlm”, unload=TRUE)
Thanks for the info with the dlm package. I admit that ‘X’ is not a particular good name for a variable, but a function named ‘X’ in a distributed package is even a bit worse.
Results below were generated by revised version of DeepSignals. r – only change was use of LSTM net from the rnn package on CRAN. The authors of the package regard their LSTM implementation as “experimental” and do not feel it is as yet learning properly, so hopefully more improvement to come there. (Spent ages trying to accomplish the LSTM element using the mxnet package but gave up as couldn’t figure out the correct input format when using multiple training features.)
Will post results of full WFO when I have finished LSTM version of DeepLearn. r.
Confusion Matrix and Statistics.
95% CI : (0.5699, 0.5956)
No Information Rate : 0.5002.
P-Value [Acc > NIR] : <2e-16.
Mcnemar's Test P-Value : 0.2438.
Pos Pred Value : 0.5844.
Neg Pred Value : 0.5813.
Detection Rate : 0.2862.
Detection Prevalence : 0.4897.
Balanced Accuracy : 0.5828.
Results of WFO test below. Again, only change to original files was the use of LSTM in R, rather than DNN+SAE.
Walk-Forward Test DeepLearnLSTMV4 EUR/USD.
Simulated account AssetsFix.
Bar period 1 hour (avg 87 min)
Simulation period 15.05.2018-07.06.2018 (12486 bars)
Test period 04.05.2018-07.06.2018 (6649 bars)
Lookback period 100 bars (4 days)
WFO test cycles 11 x 604 bars (5 weeks)
Training cycles 12 x 5439 bars (46 weeks)
Monte Carlo cycles 200.
Assumed slippage 0.0 sec.
Spread 0.0 pips (roll 0.00/0.00)
Contracts per lot 1000.0.
Gross win/loss 3628$ / -3235$ (+5199p)
Average profit 360$/year, 30$/month, 1.38$/day.
Max drawdown -134$ 34% (MAE -134$ 34%)
Total down time 95% (TAE 95%)
Max down time 5 weeks from Aug 2018.
Max open margin 40$
Max open risk 35$
Trade volume 5710964$ (5212652$/year)
Transaction costs 0.00$ spr, 0.00$ slp, 0.00$ rol.
Capital required 262$
Number of trades 6787 (6195/year, 120/week, 25/day)
Percent winning 57.6%
Max win/loss 16$ / -14$
Avg trade profit 0.06$ 0.8p (+12.3p / -14.8p)
Avg trade slippage 0.00$ 0.0p (+0.0p / -0.0p)
Avg trade bars 1 (+1 / -2)
Max trade bars 3 (3 hours)
Time in market 177%
Max open trades 3.
Max loss streak 17 (uncorrelated 11)
Annual return 137%
Profit factor 1.12 (PRR 1.08)
Sharpe ratio 1.79.
Kelly criterion 2.34.
R2 coefficient 0.435.
Ulcer index 13.3%
Prediction error 152%
Confidence level AR DDMax Capital.
Portfolio analysis OptF ProF Win/Loss Wgt% Cycles.
EUR/USD .219 1.12 3907/2880 100.0 XX/\//\X///
EUR/USD:L .302 1.17 1830/1658 65.0 /\/\//\////
EUR/USD:S .145 1.08 2077/1222 35.0 \//\//\\///
Interessante! For a still experimental LSTM implementation that result looks not bad.
Sorry for being completely off topic but could you please point me to the best place where i can learn to code trend lines?? I’m a complete beginner, but from trading experience i see them as an important part of what i would like to build…
Robot Wealth has an algorithmic trading course for that – you can find details on his blog robotwealth/.
I think you misunderstand the meaning pretrening. See my articles https://mql5/ru/articles/1103.
I think there is more fully described this stage.
I don’t think I misunderstood pretraining, at least not more than everyone else, but thanks for the links!
You can paste your LTSM r code please ?
Could you help me answering some questions?
I have few question below:
1.I want to test Commission mode.
If I use interactive broker, I should set Commission = ? in normal case.
2.If I press the “trade” button, I see the log the script will use DeepLearn_EURUSD. ml.
So real trade it will use DeepLearn_EURUSD. ml to get the model to trade?
And use neural. predict function to trade?
3.If I use the slow computer to train the data ,
I should move DeepLearn_EURUSD. ml to the trade computer?
I test the real trade on my interactive brokers and press the result button.
Can I use Commission=0.60 to train the neural and get the real result?
Result button will show the message below:
Trade Trend EUR/USD.
Bar period 2 min (avg 2 min)
Trade period 02.11.2018-02.11.2018.
Spread 0.5 pips (roll -0.02/0.01)
Contracts per lot 1000.0.
Commission should be normally not set up in the script, but entered in the broker specific asset list. Otherwise you had to change the script every time when you want to test it with a different broker or account. IB has different lot sizes and commissions, so you need to add the command.
to the script when you want to test it for an IB account.
Yes, DeepLearn_EURUSD. ml is the model for live trading, and you need to copy it to the trade computer.
Do I write assetList(“AssetsIB. csv”) in the right place?
So below code’s result includes Commission ?
I test the result with Commission that seems pretty good.
Annual +93% +3177p.
BarPeriod = 60; // 1 hour.
WFOPeriod = 252*24; // 1 year.
NumCores = -1; // use all CPU cores but one.
Spread = RollLong = RollShort = Commission = Slippage = 0;
if(Train) Hedge = 2;
I run the DeepLearn. c in the IB paper trade.
The code “LifeTime = 3; // prediction horizon” seems to close the position that you open after 3 bars(3 hours).
But I can’t see it close the position on third bar close.
I see the logs below:
Closing prohibited – check NFA flag!
[EUR/USD::L4202] Can’t close 1@1.10995 at 09:10:51.
In my IB paper trade, it the default order size is 1k on EUR/USD.
How to change the order size in paper trade?
Muito obrigado.
IB is an NFA compliant broker. You can not close trades on NFA accounts. You must set the NFA flag for opening a reverse position instead. And you must enable trading costs, otherwise including the commission has no effect. I don’t think that you get a positive result with trading costs.
Those account issues are not related to machine learning, and are better asked on the Zorro forum. Or even better, read the Zorro manual where all this is explained. Just search for “NFA”.
I do some experiment to change the neural’s parameter with commission.
The code is below:
BarPeriod = 60; // 1 hour.
WFOPeriod = 252*24; // 1 year.
NumCores = -1; // use all CPU cores but one.
Spread = RollLong = RollShort = Slippage = 0;
if(Train) Hedge = 2;
I get the result with commission that Annual Return is about +23%.
But I don’t complete understand the zorro’s setting and zorro’s report.
Walk-Forward Test DeepLearn EUR/USD.
Simulated account AssetsIB. csv.
Bar period 1 hour (avg 86 min)
Simulation period 15.05.2018-09.09.2018 (14075 bars)
Test period 23.04.2018-09.09.2018 (8404 bars)
Lookback period 100 bars (4 days)
WFO test cycles 14 x 600 bars (5 weeks)
Training cycles 15 x 5401 bars (46 weeks)
Monte Carlo cycles 200.
Simulation mode Realistic (slippage 0.0 sec)
Spread 0.0 pips (roll 0.00/0.00)
Contracts per lot 20000.0.
Gross win/loss 24331$ / -22685$ (+914p)
Average profit 1190$/year, 99$/month, 4.58$/day.
Max drawdown -1871$ 114% (MAE -1912$ 116%)
Total down time 92% (TAE 41%)
Max down time 18 weeks from Dec 2018.
Max open margin 2483$
Max open risk 836$
Trade volume 26162350$ (18916130$/year)
Transaction costs 0.00$ spr, 0.00$ slp, 0.00$ rol, -1306$ com.
Capital required 5239$
Number of trades 1306 (945/year, 19/week, 4/day)
Percent winning 52.5%
Max win/loss 375$ / -535$
Avg trade profit 1.26$ 0.7p (+19.7p / -20.3p)
Avg trade slippage 0.00$ 0.0p (+0.0p / -0.0p)
Avg trade bars 2 (+2 / -3)
Max trade bars 3 (3 hours)
Time in market 46%
Max open trades 3.
Max loss streak 19 (uncorrelated 10)
Annual return 23%
Profit factor 1.07 (PRR 0.99)
Sharpe ratio 0.56.
Kelly criterion 1.39.
R2 coefficient 0.000.
Ulcer index 20.8%
Confidence level AR DDMax Capital.
10% 29% 1134$ 4153$
20% 27% 1320$ 4427$
30% 26% 1476$ 4656$
40% 24% 1649$ 4911$
50% 23% 1767$ 5085$
60% 22% 1914$ 5301$
70% 21% 2245$ 5789$
80% 19% 2535$ 6216$
90% 16% 3341$ 7403$
95% 15% 3690$ 7917$
100% 12% 4850$ 9625$
Portfolio analysis OptF ProF Win/Loss Wgt% Cycles.
EUR/USD .256 1.07 685/621 100.0 /X/XXXXXXXXXXX.
The manual is your friend:
Great read…I built this framework to use XGB to analyze live ETF price movements. Let me know what you think:
Hi, deep learning researcher and programmer here. 🙂
Great blog and great article, congratulations! I have some comments:
& # 8211; if you use ReLUs as activation functions, pretraining is not necessary.
& # 8211; AE is genarraly referred to as networks with same input and output, I would call the proposed network rather a MLP (multi-layer perceptron).
Do you think it is possible to use Python (like TensorFlow) or LUA (like Torch7) based deep learing libraries with Zorro?
I have also heard that ReLUs make a network so fast that you can brute force train it in some cases, with no pretraining. But I have not yet experimented with that. The described network is commonly called ‘SAE’ since it uses autoencoders, with indeed the same number of inputs and outputs, for the pre-training process. & # 8211; I am not familiar with Torch7, but you can theoretically use Tensorflow with Zorro with a DLL based interface. The network structure must still be defined in Python, but Zorro can use the network for training and prediction.
Would you do YouTube Tutorials to your work, this series of articles. And where can I subscribe this kinda of algorithmic trading tutorials. Thanks for your contribution.
I would do YouTube tutorials if someone payed me very well for them. Until then, you can subscribe this blog with the link on the right above.
Why not feed economic data from a calendar like forexfactory into the net as well? I suggested that several times before. This data is what makes me a profitable manual trader (rookie though), if there is any intelligence in these neuronal networks it should improve performance greatly. input must be name (non farm payrolls for example or some unique identifier) , time left to release, predicted value (like 3-5 days before) last value and revision. Some human institutional traders claim its possible to trade profitably without a chart from this data alone. Detecting static support and resistance areas (horizontal lines) should be superior to any simple candle patterns. It can be mathematically modeled, as the Support and Resistance indicator from Point Zero Trading proves. Unfortunately i dont have a clue how Arturo the programmer did it. I imagine an artificial intelligence actually “seeing” what the market is focussed on (like speculation on a better than expected NFP report based on other positive Data in the days before, driving the dollar up into the report). “seeing” significant support and resistance levels should allow for trading risk, making reasonable decisions on where to place SL and TP.
We also made the experience that well chosen external data, not derived from the price curve, can improve the prediction. There is even a trading system based on Trump’s twitter outpourings. I can’t comment on support and resistance since I know no successful systems that use them, and am not sure that they exist at all.
thank you very much for everything that you did so far.
I read the book (German here, too) and am working through your blog articles right now.
I already learnt a lot and still am learning more and more about the really important stuff (other than: Your mindset must be perfect and you need to have well-defined goals. I never was a fan of such things and finally I found someone that is on the same opinion and actually teaches people how to correctly do it).
So, thank you very much and thanks in advance for all upcoming articles that I will read and you will post.
As a thank you I was thinking about sending you a corrected version of your book (there are some typos and wrong articles here and there…). Would you be interested in that?
Again thank you for everything and please keep up the good work.
Obrigado! And I’m certainly interested in a list of all my mistakes.
Thank you for this interesting post. I ran it on my pc and obtained similar results as yours. Then I wanted to see if it could perform as well when commission and rollover and slippage were included during test. I used the same figures as the ones used in the workshops and included in the AssetFix. csv file. The modifications I did in your DeepLearn. c file are as follows:
Spread = RollLong = RollShort = Commission = Slippage = 0;
The results then were not as optimistic as without commission:
Walk-Forward Test DeepLearn_realistic EUR/USD.
Simulated account AssetsFix.
Bar period 1 hour (avg 86 min)
Simulation period 09.05.2018-27.01.2017 (16460 bars)
Test period 22.04.2018-27.01.2017 (10736 bars)
Lookback period 100 bars (4 days)
WFO test cycles 18 x 596 bars (5 weeks)
Training cycles 19 x 5367 bars (46 weeks)
Monte Carlo cycles 200.
Simulation mode Realistic (slippage 5.0 sec)
Spread 0.5 pips (roll -0.02/0.01)
Contracts per lot 1000.0.
Gross win/loss 5608$ / -6161$ (-6347p)
Average profit -312$/year, -26$/month, -1.20$/day.
Max drawdown -635$ -115% (MAE -636$ -115%)
Total down time 99% (TAE 99%)
Max down time 85 weeks from Jun 2018.
Max open margin 40$
Max open risk 41$
Trade volume 10202591$ (5760396$/year)
Transaction costs -462$ spr, 46$ slp, -0.16$ rol, -636$ com.
Capital required 867$
Number of trades 10606 (5989/year, 116/week, 24/day)
Percent winning 54.9%
Max win/loss 18$ / -26$
Avg trade profit -0.05$ -0.6p (+11.1p / -14.8p)
Avg trade slippage 0.00$ 0.0p (+1.5p / -1.7p)
Avg trade bars 1 (+1 / -2)
Max trade bars 3 (3 hours)
Time in market 188%
Max open trades 3.
Max loss streak 19 (uncorrelated 12)
Annual return -36%
Profit factor 0.91 (PRR 0.89)
Sharpe ratio -1.39.
Kelly criterion -5.39.
R2 coefficient 0.737.
Ulcer index 100.0%
Confidence level AR DDMax Capital.
Portfolio analysis OptF ProF Win/Loss Wgt% Cycles.
EUR/USD .000 0.91 5820/4786 100.0 XX/\XX\X\X/X/\\X\\
I am a very beginner with Zorro, maybe I did a mistake ? O que você acha ?
No, your results look absolutely ok. The predictive power of 4 candles is very weak. This is just an experiment for finding out if price action has any predictive power at all.
Although it apparently has, I have not yet seen a really profitable system with this method. From the machine learning systems that we’ve programmed so far, all that turned out profitable used data from a longer price history.
Thank you for the great article, it’s exactly what I needed in order to start experimenting with ML in Zorro.
I’ve noticed that the results are slightly different each time despite using the random seed. Here it doesn’t matter thanks to the large number of trades but for example with daily bars the performance metrics fluctuate much more. My question is: do you happen to know from where does the randomness come? Is it still the training process in R despite the seed?
It is indeed so. Deepnet apparently uses also an internal function, not only the R random function, for randomizing some initial value.
any idea about how to use machine learning like in this example with indicators? you could do as better strategy 6.
would be very interesting.
Is it grid search inside the neural. train function allowed? I get error when I try it.
Besides Andy, how did you end up definining the LSTM structure using rnn? Is it not clear for me after reading inside the package.
where is the full code?(or where is the repository?)
You said” Use genetic optimization for determining the most important signals just by the most profitable results from the prediction process. Great for curve fitting” How about after using genetic optimization process for determining the most profitable signals , match and measure the most profitable signals with distance metrics/similarity analysis(mutual information, DTW, frechet distance algorithm etc…) then use the distance metrics/similarity analysis as function for neural network prediction? Does that make sense ?
Distance to what? To each other?
Yes find similar profitable signal-patterns in history and find distance between patterns/profitable signals then predict the behavior of the profitable signal in the future from past patterns.
Was wondering about this point you made in Step 5:
“Our target is the return of a trade with 3 bars life time.”
But in the code, doesn’t.
mean that we are actually predicting the SIGN of the return, rather than the return itself?
Sim. Only the binary win/loss result, but not the magnitude of the win or loss is used for the prediction.
“When you used almost 1 year’s data for training a system, it can obviously not deteriorate after a single day. Or if it did, and only produced positive test results with daily retraining, I would strongly suspect that the results are artifacts by some coding mistake.”
There is an additional trap to be aware of related to jcl’s comment above that applies to supervised machine learning techniques (where you train a model against actual outcomes). Assume you are trying to predict the return three bars ahead (as in the example above – LifeTime = 3;). In real time you obviously don’t have access to the outcomes for one, two and three bars ahead with which to retrain your model, but when using historical data you do. With frequently retrained models (especially if using relatively short blocks of training data) it is easy to train a model offline (and get impressive results) with data you will not have available for training in real time. Then reality kicks in. Therefore truncating your offline training set by N bars (where N is the number of bars ahead you are trying to predict) may well be advisable…
Amazing work, could you please share the WFO code as well. I was able to run the code till neural. save but unable to generate the WFO results.
Muito obrigado.
The code above does use WFO.
Dear jcl, in the text you mentioned that you could predict the current leg of zig-zag indicator, could you please elaborate on how to do that? what features and responses would you reccomend?
I would never claim that I could predict the current leg of zigzag indicator. But we have indeed coded a few systems that attempted that. For this, simply use not the current price movement, but the current zigzag slope as a training target. Which parameters you use for the features is completely up to you.
Bom trabalho. I was wondering if you ever tried using something like a net long-short ratio of the asset (I. e. the FXCM SSI index – real time live data) as a feature to improve prediction?
Not with the FXCM SSI index, since it is not available as historical data as far as I know. But similar data of other markets, such as order book content, COT report or the like, have been used as features to a machine learning system.
I see, thanks, and whats’s the experience on those? do they have any predictive power? if you know any materials on this, I would be very interested to read it. (fyi, the SSI index can be exported from FXCM Trading Station (daily data from 2003 for most currency pairs)
Thanks for the info with the SSI. Yes, additional market data can have predictive power, especially from the order book. But since we gathered this experience with contract work for clients, I’m not at liberty to disclose details. However we plan an own study with ML evaluation of additional data, and that might result in an article on this blog.
Thanks jcl, looking forward to it! there is a way to record SSI ratios in a CSV file from a LUA Strategy script (FXCM’s scripting language) for live evaluation. happy to give you some details if you decide to evaluate this. (drop me an email) MyFxbook also has a similar indicator, but no historical data on that one unfortunately.
Does random forest algorithm have any advantage over deep net or neural networks for classification problems in financial data? I make it more clear ; I use number of moving averages and oscillators slope colour change for trading decision(buy - sell-hold).Sometimes one oscillator colour change is lagging other is faster etc..There is no problem at picking tops and bottoms but It is quite challenging to know when to hold. Since random forest doesnt’ need normalization, do they have any advantage over deep net or neural networks for classification? Thanks.
This depends on the system and the features, so there is no general answer. In the systems we did so far, a random forest or single decision tree was sometimes indeed better than a standard neural network, but a deep network beats anything, especially since you need not care as much about feature preselection. We meanwhile do most ML systems with deep networks.
I see thank you. I have seen some new implementations of LSTM which sounds interesting. One is called phased LSTM another one is from Yarin Gaal. He is using Bayesian technique(gaussian process) as dropout cs. ox. ac. uk/people/yarin. gal/website/blog_2248.html.
I hooked up the news flow from forexfactory into this algo and predictive power has improved by 7%.
I downloaded forexfactory news history from 2018. Used a algo to convert that into a value of -1 to 1 for EUR. This value becomes another parameter into the neural training network. I think there is real value there …let me see if we can get the win ratio to 75% and then I thik we have a real winner on hands here. …..
The neural training somehow only yields results with EURUSD.
Anyone tried GBPUSD or EURJPY.
That’s also my experience. There are only a few asset types with which price pattern systems seem to really work, and that’s mainly EUR/USD and some cryptos. We also had pattern systems with GBP/USD und USD/JPY, but they work less well and need more complex algos. Most currencies don’t expose patterns at all.
Georgia Institute of Technology College of Computing.
Online Master of Science.
Computer Science (OMS CS)
Search form.
You are here: GT Home Home.
CS 7646: Machine Learning for Trading.
Course Creator and Instructor.
Tucker Balch.
Creator, Instructor.
Este curso apresenta os alunos aos desafios do mundo real de implementar estratégias de negociação baseadas em aprendizado de máquinas, incluindo os passos algorítmicos da coleta de informações para pedidos de mercado. O foco é sobre como aplicar abordagens de aprendizado de máquina probabilística para decisões de negociação. We consider statistical approaches like linear regression, Q-Learning, KNN and regression trees and how to apply them to actual stock trading situations.
This course is composed of three mini-courses:
Pré-requisitos.
Todos os tipos de alunos são bem-vindos! The ML topics might be "review" for CS students, while finance parts will be review for finance students. No entanto, mesmo se você tiver experiência nesses tópicos, você achará que os consideramos de uma maneira diferente da que você já viu antes, em particular com o objetivo de implementar para negociação.
If you answer "no" to the following questions, it may be beneficial to refresh your knowledge of the prerequisite material prior to taking CS 7646:
Do you have a working knowledge of basic statistics, including probability distributions (such as normal and uniform), calculation and differences between mean, media, and mode? Do you understand the difference between geometric mean and arithmetic mean? Do you have strong programming skills? Take this quiz if you would like help determining the strength of your programming skills.
Course Preview.
Late Policy - For each day late, -5% on the assignment.
Mini-course 1: Two homework assignments and two programming projects. Mini-course 2: Two homework assignments, two programming projects, and a test. Mini-course 3: Three programming projects and a test.
*Percentage weights for each of these is still being determined.
Required Course Readings.
We will use the following textbooks:
For Mini-course 1: Python for Finance by Yves Hilpisch For Mini-course 2: What Hedge Funds Really Do by Romero and Balch For Mini-course 3: Machine Learning by Tom Mitchell(see note)
*Note: The Mitchell book is expensive (as of this writing, $212) but it is also required for the OMS ML course. Also, we're working with the publisher to offer a less expensive paperback version.
Minimum Technical Requirements.
Browser and connection speed: An up-to-date version of Chrome or Firefox is strongly recommended. We also support Internet Explorer 9 and the desktop versions of Internet Explorer 10 and above (not the metro versions). 2+ Mbps recommended; at minimum 0.768 Mbps download speed Operating system: - PC: Windows XP or higher with latest updates installed - Mac: OS X 10.6 or higher with latest updates installed - Linux: Any recent distribution that has the supported browsers installed.
Outras informações.
Horas de escritório.
Tuesdays and Thursdays, from 4:30-5:30.
Plágio.
All Georgia Tech students are expected to uphold the Georgia Tech Academic Honor Code. In most cases I expect that all submitted code will be written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write-ups you provide should have been created by you alone. More detail will be provided in the course syllabus.
Application of Machine Learning Techniques to Trading.
Auquan recently concluded another version of QuantQuest, and this time, we had a lot of people attempt Machine Learning with our problems. It was good learning for both us and them (hopefully!). This post is inspired by our observations of some common caveats and pitfalls during the competition when trying to apply ML techniques to trading problems.
Creating a Trade Strategy.
The final output of a trading strategy should answer the following questions:
DIRECTION: identify if an asset is cheap/expensive/fair value ENTRY TRADE: if an asset is cheap/expensive, should you buy/sell it EXIT TRADE: if an asset is fair priced and if we hold a position in that asset(bought or sold it earlier), should you exit that position PRICE RANGE: which price (or range) to make this trade at QUANTITY: Amount of capital to trade(example shares of a stock)
Machine Learning can be used to answer each of these questions, but for the rest of this post, we will focus on answering the first, Direction of trade.
Strategy Approach.
There can be two types of approaches to building strategies, model based or data mining. These are essentially opposite approaches. In model-based strategy building , we start with a model of a market inefficiency, construct a mathematical representation(eg price, returns) and test it’s validity in the long term. This model is usually a simplified representation of the true complex model and it’s long term significance and stability need to verified. Common trend-following, mean reversion, arbitrage strategies fall in this category.
On the other hand, we first look for price patterns and attempt to fit an algorithm to it in data mining approach . What causes these patterns is not important, only that patterns identified will continue to repeat in the future. This is a blind approach and we need rigorous checks to identify real patterns from random patterns. Trial-and-error TA, candle patterns, regression on a large number of features fall in this category.
Clearly, Machine Learning lends itself easily to data mining approach. Let’s look into how we can use ML to create a trade signal by data mining.
You can follow along the steps in this model using this IPython notebook . The code samples use Auquan’s python based free and open source toolbox. You can install it via pip: ` pip install - U auquan_toolbox` . We use scikit learn for ML models. Install it using `pip install - U scikit-learn`.
Using ML to create a Trading Strategy Signal — Data Mining.
Before we begin, a sample ML problem setup looks like below.
We create features which could have some predictive power (X), a target variable that we’d like to predict(Y) and use historical data to train a ML model that can predict Y as close as possible to the actual value. Finally, we use this model to make predictions on new data where Y is unknown. This leads to our first step:
Step 1 — Setup your problem.
In our framework above, what is Y?
Are you predicting Price at a future time, future Return/Pnl, Buy/Sell Signal, Optimizing Portfolio Allocation, try Efficient Execution etc?
Let’s say we’re trying to predict price at the next time stamp. In that case, Y(t) = Price(t+1). Now we can complete our framework with historical data.
Note Y(t) will only be known during a backtest, but when using our model live, we won’t know Price(t+1) at time t. We make a prediction Y(Predicted, t) using our model and compare it with actual value only at time t+1. This means you cannot use Y as a feature in your predictive model.
Once we know our target, Y, we can also decide how to evaluate our predictions. This is important to distinguish between different models we will try on our data. Choose a metric that is a good indicator of our model efficiency based on the problem we are solving. For example, if we are predicting price, we can use the Root Mean Square Error as a metric. Some common metrics(RMSE, logloss, variance score etc) are pre-coded in Auquan’s toolbox and available under features.
F or demonstration, we’re going to use a problem from QuantQuest(Problem 1). We are going to create a prediction model that predicts future expected value of basis, where:
basis = Price of Stock — Price of Future basis(t)=S(t)−F(t) Y(t) = future expected value of basis = Average(basis(t+1),basis(t+2),basis(t+3),basis(t+4),basis(t+5))
Since this is a regression problem, we will evaluate the model on RMSE. We’ll also use Total Pnl as an evaluation criterion.
Our Objective: Create a model so that predicted value is as close as possible to Y.
Step 2: Collect Reliable Data.
You need to think about what data will have predictive power for the target variable Y? If we were predicting Price, you could use Stock Price Data, Stock Trade Volume Data, Fundamental Data, Price and Volume Data of Correlated stocks, an Overall Market indicator like Stock Index Level, Price of other correlated assets etc.
You will need to setup data access for this data, and make sure your data is accurate, free of errors and solve for missing data(quite common). Also ensure your data is unbiased and adequately represents all market conditions (example equal number of winning and losing scenarios) to avoid bias in your model. You may also need to clean your data for dividends, stock splits, rolls etc.
If you’re using Auquan’s Toolbox, we provide access to free data from Google, Yahoo, NSE and Quandl. We also pre-clean the data for dividends, stock splits and rolls and load it in a format that rest of the toolbox understands.
F or our demo problem, we are using the following data for a dummy stock ‘MQK’ at minute intervals for trading days over one month(
8000 data points): Stock Bid Price, Ask Price, Bid Volume, Ask Volume Future Bid Price, Ask Price, Bid Volume, Ask Volume, StockVWAP, Future VWAP. This data is already cleaned for Dividends, Splits, Rolls.
Auquan’s Toolbox has downloaded and loaded the data into a dictionary of dataframes for you. We now need to prepare the data in a format we like. The function ds. getBookDataByFeature() returns a dictionary of dataframes, one dataframe per feature. We create a new data dataframe for the stock with all the features.
Step 3: Split Data.
This is an extremely important step! Before we proceed any further, we should split our data into training data to train your model and test data to evaluate model performance. Recommended split: 60–70% training and 30–40% test.
Since training data is used to evaluate model parameters, your model will likely be overfit to training data and training data metrics will be misleading about model performance. If you do not keep any separate test data and use all your data to train, you will not know how well or badly your model performs on new unseen data. This is one of the major reasons why well trained ML models fail on live data — people train on all available data and get excited by training data metrics, but the model fails to make any meaningful predictions on live data that it wasn’t trained on.
There is a problem with this method. If we repeatedly train on training data, evaluate performance on test data and optimise our model till we are happy with performance we have implicitly made test data a part of training data. Eventually our model may perform well for this set of training and test data, but there is no guarantee that it will predict well on new data.
To solve for this we can create a separate validation data set. Now you can train on training data, evaluate performance on validation data, optimise till you are happy with performance, and finally test on test data. This way the test data stays untainted and we don’t use any information from test data to improve our model.
Remember once you do check performance on test data don’t go back and try to optimise your model further. If you find that your model does not give good results discard that model altogether and start fresh. Recommended split could be 60% training data, 20% validation data and 20% test data.
F or our problem we have three datasets available, we will use one as training set, second as validation set and the third as our test set.
To each of these, we add the target variable Y, defined as average of next five values of basis.
Step 4: Feature Engineering.
Now comes the real engineering. The golden rule of feature selection is that the predictive power should come from primarily from the features and not from the model. You will find that the choice of features has a far greater impact on performance than the choice of model. Some pointers for feature selection:
Don’t randomly choose a very large set of features without exploring relationship with target variable Little or no relationship with target variable will likely lead to overfitting Your features might be highly correlated with each other, in that case a fewer number of features will explain the target just as well I generally create a few features that make intuitive sense, look at correlation of target variable with those features, as well as their inter correlation to decide what to use You could also try ranking candidate features according to Maximal Information Coefficient (MIC), performing Principal Component Analysis(PCA) and other methods.
Feature Transformation/Normalization:
ML models tend to perform well with normalization. However, normalization is tricky when working with time series data because future range of data is unknown. Your data could fall out of bounds of your normalization leading to model errors. Still you could try to enforce some degree of stationarity:
Scaling: divide features by standard deviation or interquartile range Centering: subtract historical mean from current value Normalization: both of the above (x — mean)/stdev over lookback period Regular normalization: standardize data to the range -1 to +1 over lookback period (x-min)/(max-min) and re-center.
Note since we are using historical rolling mean, standard deviation, max or min over lookback period, the same normalized value of feature will mean different actual value at different times. For example, if the current value of feature is 5 with a rolling 30-period mean of 4.5, this will transform to 0.5 after centering. Later if the rolling 30-period mean changes to 3, a value of 3.5 will transform to 0.5. This may be a cause of errors in your model; hence normalization is tricky and you have to figure what actually improves performance of your model(if at all).
If you are using our toolbox, it already comes with a set of pre coded features for you to explore.
F or this first iteration in our problem, we create a large number of features, using a mix of parameters. Later we will try to see if can reduce the number of features.
Step 5: Model Selection.
The choice of model will depend on the way the problem is framed. Are you solving a supervised (every point X in feature matrix maps to a target variable Y ) or unsupervised learning problem(there is no given mapping, model tries to learn unknown patterns)? Are you solving a regression (predict the actual price at a future time) or a classification problem (predict only the direction of price(increase/decrease) at a future time).
Some common supervised learning algorithms to get you started are:
I recommend starting with a simple model, for example linear or logistic regression and building up to more sophisticated models from there if needed. Also recommend reading the Math behind the model instead of blindly using it as a black box.
Step 6: Train, Validate and Optimize (Repeat steps 4–6)
Now you’re ready to finally build your model. At this stage, you really just iterate over models and model parameters. Train your model on training data, measure it’s performance on validation data, and go back, optimize, re-train and evaluate again. If you’re unhappy with a model’s performance, try using a different model. You loop over this stage multiple times till you finally have a model that you’re happy with.
Only when you have a model who’s performance you like, proceed to the next step.
For our demo problem, let’s start with a simple linear regression.
Look at the model coeffecients. We can’t really compare them or tell which ones are important since they all belong to different scale. Let’s try normalization to conform them to same scale and also enforce some stationarity.
The model doesn’t improve on the previous model, but it’s not much worse either. And now we can actually compare coefficients to see which ones are actually important.
Let’s look at the coefficients.
We can clearly see that some features have a much higher coeffecient compared to others, and probably have more predictive power.
Let’s also look at correlation between different features.
The areas of dark red indicate highly correlated variables. Let’s create/modify some features again and try to improve our model.
For example, I can easily discard features like emabasisdi7 that are just a linear combination of other features.
See, our model performance does not change, and we only need a few features to explain our target variable. I recommend playing with more features above, trying new combinations etc to see what can improve our model.
We can also try more sophisticated models to see if change of model may improve performance.
K Nearest Neighbours.
Decision Trees.
Step 7: Backtest on Test Data.
This is the moment of truth. We run our final, optimized model from last step on that Test Data that we had kept aside at the start and did not touch yet.
This provides you with realistic expectation of how your model is expected to perform on new and unseen data when you start trading live. Hence, it is necessary to ensure you have a clean dataset that you haven’t used to train or validate your model.
If you don’t like the results of your backtest on test data, discard the model and start again. DO NOT go back and re-optimize your model, this will lead to over fitting! (Also recommend to create a new test data set, since this one is now tainted; in discarding a model, we implicitly know something about the dataset).
F or backtesting, we use Auquan’s Toolbox.
Step 8: Other ways to improve model.
Besides collecting more data, creating better features or trying more models, there’s a few things you can try to train your model better.
1. Rolling Validation.
Market conditions rarely stay same. Let’s say you have data for a year and you use Jan-August to train and Sep-Dec to test your model, you might end up training over a very specific set of market conditions. Maybe there was no market volatility for first half of the year and some extreme news caused markets to move a lot in September, your model will not learn this pattern and give you junk results.
It might be better to try a walk forward rolling validation — train over Jan-Feb, validate over March, re-train over Apr-May, validate over June and so on.
2. Ensemble Learning.
Some models may work well in prediction certain scenarios and other in prediction other scenarios. Or a model may be extremely overfitting in a certain scenario. One way of reducing error and overfitting both is to use an ensemble of different model. Your prediction is the average of predictions made by many model, with errors from different models likely getting cancelled out or reduced. Some common ensemble methods are Bagging and Boosting.
To keep this post short, I will skip these methods, but you can read more about them here.
Let’s try an ensemble method for our problem.
Variance score: 0.95.
All the code for the above steps is available in this IPython notebook. You can read more below:
That was quite a lot of information. Let’s do a quick Recap:
Frame your problem Collect reliable Data and clean Data Split Data into Training, Validation and Test sets Create Features and Analyze Behavior Choose an appropriate training model based on Behavior Use Training Data to train your model to make predictions Check performance on validation set and re-optimize Verify final performance on Test Set.
Phew! But that’s not it. You only have a solid prediction model now. Remember what we actually wanted from our strategy? You still have to:
Develop Signal to identify trade direction based on prediction model Develop Strategy to identify Entry/Exit Points Execution System to identify Sizing and Price.
And then you can finally send this order to your broker, and make your automated trade!
Important Note on Transaction Costs : Why are the next steps important? Your model tells you when your chosen asset is a buy or sell. It however doesn’t take into account fees/transaction costs/available trading volumes/stops etc. Transaction costs very often turn profitable trades into losers. For example, an asset with an expected $0.05 increase in price is a buy, but if you have to pay $0.10 to make this trade, you will end up with a net loss of -$0.05. Our own great looking profit chart above actually looks like this after you account for broker commissions, exchange fees and spreads:
Transaction fees and spreads take up more than 90% of our Pnl! We will discuss these in detail in a follow-up post.
Finally, let’s look at some common pitfalls.
DO’s and DONT’s.
AVOID OVERFITTING AT ALL COSTS! Don’t retrain after every datapoint: This was a common mistake people made in QuantQuest. If your model needs re-training after every datapoint, it’s probably not a very good model. That said, it will need to be retrained periodically, just at a reasonable frequency (example retraining at the end of every week if making intraday predictions) Avoid biases, especially lookahead bias: This is another reason why models don’t work — Make sure you are not using any information from the future. Mostly this means, don’t use the target variable, Y as a feature in your model. This is available to you during a backtest but won’t be available when you run your model live, making your model useless. Be wary of data mining bias: Since we are trying a bunch of models on our data to see if anything fits, without an inherent reason behind it fits, make sure you run rigorous tests to separate random patterns from real patterns which are likely to occur in the future. For example what might seem like an upward trending pattern explained well by a linear regression may turn out to be a small part of a larger random walk!
Avoid Overfitting.
This is so important, I feel the need to mention it again.
Overfitting is the most dangerous pitfall of a trading strategy A complex algorithm may perform wonderfully on a backtest but fails miserably on new unseen data — this algorithm has not really uncovered any trend in data and no real predictive power. It is just fit very well to the data it has seen Keep your systems as simple as possible. If you find yourself needing a large number of complex features to explain your data, you are likely over fitting Divide your available data into training and test data and always validate performance on Real Out of Sample data before using your model to trade live.
Webinar Video : If you prefer listening to reading and would like to see a video version of this post, you can watch this webinar link instead.
Ao bater palmas mais ou menos, você pode nos indicar quais são as histórias que realmente se destacam.
Team Auquan.
Auquan aims to to engage people from diverse backgrounds to apply the skills from their respective fields to develop high quality trading strategies. We believe that extremely talented people equipped with right knowledge and attitude can design successful trading algorithms.
No comments:
Post a Comment