Mode of Operation

Introduction

The sequence to get to a usable trained neural net:

  1. Prepare raw data to suitable sample files.
  2. Create data sample sets with a specific characteristics, e.g. rapid increases, neutral samples.
  3. Create training sets by using selected data sample sets.
  4. Train a neural net.
  5. Test a neural net.

Anchor sets and anchor files are an intermediate step in creating proper data sample sets, as a basis for training sets.

Data preparation

The raw data comes in two separate files, one for the BID data, one for the ASK data. We combine these two files in one single file, containing the timestamp, the ASK data, and the BID data. All files are in CSV format, using the semicolon (;) as a separator.

The input files have the following path and name:

  • input_path/base_filename_BID_month.csv for the bid values and
  • input_path/base_filename_ASK_month.csv for the ask values.

The output file has the following path and name:

  • output_path/base_filename_month_X_Y.csv
    where X is the type of decimal separator and Y denotes whether extra data points are added for plotting graphs in Excel.

Data Sample Sets

A Data Sample is a sequence of data points (timestamp, ask-value, bid-value). Each sample is stored in the database. A Data Sample can produce a plot of its data in SVG. We collect data samples in a Data Sample Set with a specific name. Data Sample Sets are also stored in the database.

Properties of a Data Sample:

  • time
    An array of timestamps. Always starts at 0.
  • ask
    The ask-values on each timestamp.
  • bid
    The bid-values on each timestamp.
  • start
    The timestamp of the original starting point of the sample.
  • anchor
    The timestamp which is the anchor point of this data sample, e.g. a decision point. The data sample contains a certain period of time before the anchor point and may also contain data after the anchor point.
  • min
    Bandwidth minimal value.
  • max
    Bandwidth maximum value.

The bandwidth is calculated over a given period of the recent past preceeding the data sample, for example 14400 seconds (4 hours).

Creating a Data Sample Set

Long-take

Taking a Long position means to expect the ration of the currency pair to increase. This method finds such increases in the prepared data files.

Parameters:

  • Pattern size
    The (short) period of time in which the increase must take place.
  • Increase amount
    The minimum increase.
  • Maximum drops
    The increase is valid if it does not contain more than this maximum number of drops (decreases) within the pattern size period.
  • Data density
    Require a minimum number of (raw) data points for the data sample.
  • Samples before
    The number of seconds before the anchor point.
  • Samples after
    The number of seconds after the anchor point.
  • Bandwidth period
    ​During this period of time we calculate the bandwidth of the recent past preceeding the data sample.

Neutral

This kind of data sample sets is used in the training process to teach the Neural Net not to take position. It is crucial that samples which represent a positive decision do not overlap with neutral samples. Therefor you can select which samples to avoid in the creation of neutral data sample sets.

Parameters:

  • Related sets
    Data sample sets to avoid overlap in this neutral set.
  • Data density
    Require a minimum number of (raw) data points for the data sample.
  • Samples before
    The number of seconds before the anchor point.
  • Samples after
    The number of seconds after the anchor point.
  • Bandwidth period
    ​During this period of time we calculate the bandwidth of the recent past preceeding the data sample.

From file

The data samples are build around the anchor points listed in a CSV file.

Parameters:

  • Samples size
    The number of seconds before the anchor point. The anchor point is the last point in the sample.
  • Bandwidth period
    ​During this period of time we calculate the bandwidth of the recent past preceeding the data sample.

Anchor Sets

An Anchor Set is a list of anchor points (timestamps), stored in the database. The purpose of an anchor set is that you can create an anchor file from it. From this file you can then create a data sample set, which in turn can be used as a training set.

Creating an Anchor Set

You can create an anchor set together with a data sample set. They will both have the same name so you can see they are associated to each other. Also, the data sample set will be the source of the anchor set. While creating the anchor set, together with the data sample set, you can also fill the anchor set with all anchor points of the data sample set.

Editing an Anchor Set

When opening the editing screen, it shows a list of all anchor points and a plot of the first sample of the associated data sample set. The selected anchor points will be made visible in the plot, if in range of the plot. You can add anchor points by clicking at the desired point in the plot, and the new anchor point will be added and selected in the list. By pressing "Delete", selected anchor points will be removed from the list.

You may navigate to other samples in the associated data sample set by clicking the Prev and Next buttons.

Creating an Anchor File

Select the anchor sets to be included in the file. All anchors will be sorted before written to the file.

Training Sets

A Training Set is a file, in a format which can be read by FANN, containing input and output data for a Neural Net. This file can be used to train a Neural Net using FANN. Once trained, a Neural Net can be tested with any Data Sample stored in the database.

Creating a Training Set

Select the Data Sample Sets you want in the Training Set and define the destination output file. The order of the samples written to the file is random (order of the database).

Training a Neural Net

The input file, containing the training set, is defined by:

  • training_set_path/base_filename.data

The trained Neural Net is stored as:

  • neural_net_path/base_filename.net

Neural Net Configuration

The number of input and output neurons is defined in the training set file. We only need to define the number of neurons in the hidden layer.

The training set is repeatedly offered to the net to train it. The maximum number of such repeats must be defined as the maximum number of epochs. A report can be produced after a certain number of epochs.

Finally, the Neural Net is considered properly trained if the error of the output is below a certain threshold, given as the Desired maximum error.

Testing a Neural Net

The Neural Net definition is taken from the file:

  • neural_net_path/base_filename.net

The Data sample ID is the ID of the sample as it is stored in the database.