Slide 1

PropheZy

Z/Yen were asked whether PropheZy could predict song popularity from some historic song data.  Two spreadsheets below provide some positive results, though the test data was sparse. Some historic song data from the 1980s was provided - song, genre, artist, position in chart, week in year, date.  Pre-processing was turned text into numbers. However, as PropheZy tends to perform well on classes as opposed to exact values in sparse data sets, the number of weeks was banded into clumps of 10, e.g. band 1 = 1-9 weeks. We then built a model excluding the last three weeks and tried to predict those.

The first sheet (General) just uses the predictor with no class. The results are good, but exactly 50% high, e.g. PropheZy predicts 16, 24 and 16 weeks in the charts for the last three songs, but they were 10, 15 and 8 respectively.  Likewise it predicts bands 2, 3 and 2 when it was 2, 2 and 1. These cases fell on band boundaries. This is a case of predicting off of very, very little data.  PropheZy is predicting weeks in the charts using only position, major genre and minor genre. There are no duplicate artists (so no information there) and having the date actually reduces the predictability a bit because it's such a limited data set.

The second sheet (Class) tries to predict the week band.  At first it looks no better - predicts 2, 2 and 2 when it was 2, 2 and 1 (so still one off). However, looking behind it, it is actually picking up the pattern of those three song (10, 15, 8), but the banding reduces it to the same general class.  In summary, while lots more data would help, PropheZy has said that the last three songs would have performed about where they did.

Little should be read into song prediction without a much richer dataset, but this example does show that small datasets do have some predictive power.

The A14 is a strategic route of national importance which connects the motorways of the Midlands and the North of England to the City of Cambridge, East Anglia, the ports of Felixstowe and Harwich and the M11 to the south.

Many sections of the A14 are currently operating close to capacity, with an average of 65,000 to 85,000 vehicles per day using the route. Up to 25% of the traffic is heavy goods vehicles, which is about twice the national average for this type of road. Consequently, the A14 experiences severe congestion, particularly during peak hours, which results in unreliable journey times.

Z/Yen was asked to use PropheZy, the world-class risk/reward prediction engine, to predict journey times on the A14 in an attempt to help alleviate conditions.

We were provided with historical journey times from the Highways Agency dating back one year for various segments of the A14 as well as supplementary information on events such as accidents and roadworks which may have affected journey times.

In order to conduct our trials we picked two specific sections of road (code named AL2282 and AL229) which were 6.9km and 1.7km in length respectively. We built a model around the historic data for these segments, each model taking into account approximately 30,000 journeys. We then used the models to predict 2,880 journey times for the most recent month for which we had known data (September 2011). To make the predictions we split all the journey times into five second bands and predicted the band into which new journeys would fall. In building the models we included the following predictive parameters for each journey made.

  • Time of day and day of week;
  • Previous average journey band: the average journey band for all previous journeys occurring at the same time of day;
  • Previous five journey bands:
  • Event indicator: number of events such as roadworks or accidents which occurred during this period.

In comparing our predictions to the known data, we found PropheZy to be capable of delivering a high level of accuracy. A selection of our results is detailed in the tables below.

Segment AL2282

Segment AL 2282: Most of the historical journey times fell between 200 – 350 seconds. The journey times within this range were split into 30 distinct five second bands. The table below lists the number of cases in which PropheZy either predicted the correct band or was correct within one to five bands.

 N%
Correct 525 18.23%
Within 1 band 1,366 47.43%
Within 2 band 2,028 70.42%
Within 3 band 2,413 83.78%
Within 4 band 2,600 90.28%
Within 5 band 2,689 93.37%
Total 2,880  

Segment AL 229:

Most of the historical journey times for AL 229 took between 55 – 115 seconds. In this case, all the journey times within this range were split into 12 bands.

 N%
Correct 1,019 35.38%
Within 1 band 2,257 78.37%
Within 2 band 2,593 90.03%
Within 3 band 2,751 95.52%
Within 4 band 2,843 98.72%
Within 5 band 2,862 99.38%
Total 2,880  

The Financial Laboratory, aka Financial £aboratory Club, was a £1.9 million limited liability partnership conducting joint research into financial risk management over 1997 and 1998.  The Financial Laboratory members were BZW, The London Stock Exchange, Royal & Sun Alliance and Z/Yen.  The programme manager was the Ministry of Defence's DERA (Defence Evaluation and Research Agency), Europe's largest research and development organisation with a turnover of £1.1 billion.  Supporting organisations included Silicon Graphics, the Worshipful Company of Information Technologists (a City livery company), COMAX Secure Business Services, City University and City University Business School.

The Financial Laboratory members' contribution of £1.2 million was augmented by a £750,000 award for "The Use of Synthetic Environments for Risk Management" by the Department of Trade and Industry in a Foresight Challenge recommendation on 21 June 1996 by Sir Robert May, the Government's Chief Scientific Adviser and Head of the Office of Science and Technology.  The Foresight Panel rated the science `alpha'. There was significant press coverage (e.g.  Financial Times 1 July 1996 and 16 January 1997) in advance of the research.

The Financial Laboratory used military technology to tackle the problems posed by financial risk.  Risk management is increasingly seen by bankers and other financiers as the core discipline of most financial service companies.  Leadership in qualitative and quantitative risk management is essential to staying at the forefront of the financial markets.  Financial regulators nationally and internationally focus on risk management processes in setting standards, trading limits and capital adequacy requirements.  Recent high-profile risk management failures, such as Barings, Daiwa, Metallgesellschaft and Sumitomo, increase the urgency of applying advanced techniques to a complex and vital area.

The Financial Laboratory exploited technologies in regular use in military environments, such as virtual reality, supercomputing, secure networks, signal analysis, psychology, heads-up displays and operational analysis, on some of the leading edge problems in finance, e.g.  interest rate prediction and trade error detection.  The Financial Laboratory research programme consisted of five areas: 

  • visualisation of financial environments - using virtual reality and computer graphics to provide real-world interaction with abstract financial markets; 
  • psychology of risk-taking - examining the selection, training, motivation, stress and other psychological factors on decision-makers; 
  • group dynamics - developing war-gaming techniques for situations such as bond auctions; 
  • financial mathematics - exploring the capabilities of algorithms from domains such as weapon targeting and radar signal processing in financial prediction; 
  • information technology - developing leading-edge applications with technology such as supercomputers or network firewalls to demonstrate likely advances.

The programme linked technology for visualisation to a series of projects which simulated an abstract world of markets for direct interaction.  Nine specific projects were agreed as the basis for supporting the programme - intraday anomaly detection, optimising dealer performance, advanced risk evaluation, government debt auctions, interest rate modelling, capital requirements, networks, public access and common risk visualisation environments.

 

More information:

PropheZy

Michael Mainelli, "Risk/Reward In Virtual Financial Communities", Information Services & Use, Volume 23, Number 1, IOS Press (2003) pages 9-17.

Michael Mainelli and Martin Dooney, "Military Minds Train on Financial Targets", Investment & Pensions Europe, page 14 (March 1997).

Ian Harris, "Assets in Wonderland" Berkely Morgan Money Matters (1996/1997).

James Flint, "Market Wars", Wired UK (Dec 1996).

Probably the best way to think of PropheZy in action is to look at an application.  We have numerous ones, but predicting television audiences is a relatively easy example to understand.  Our data comes from Peaktime, a French company, who are sort of the "Reuters" of television data in Europe.  The data covered 1 December 2002 to 15 December 2002 06.00-24.00 on a day-by-day basis for UK television channels BBC1, BBC2, ITV1, Channel 4, Five and Sky 1.  The available columns were Channel, Title, Date, Time of Broadcast, Duration, Genre, TVR (televisualrating), Audience in 000's and Share %.

We started prototyping predictors one afternoon.  We built four different predictors from the data that afternoon.  Each predictor used a different prime dimension:

  • audience numbers;

  • broadcast time;

  • channel;

  • genre.

Each predictor tries to predict "audience share %". Naturally the desired predictor can change.  Audience numbers is not a particularly good prime dimension, as you'll see later, unless we do some analytical work.

The predictors work in Excel.  For instance, having built the predictor and saying you want the most confidence about "broadcast time" you can then alter one or more parameters and get the new predicted audience share %.  You can partially supply data and get that filled in, rather than audience share %.

Four output sheets from each of the four predictors are attached.  For the output sheets we took 8 programmes - Bagpuss, Breakfast, Animal Hospital, As Time Goes By, Arrest and Trial, Art Now, Cash in the Attic and Casualty - and used them to play with the predictors.  You will see that the four output sheets are broadly in agreement, indicating that the data is probably quite good for this type of application.  For ease of reference:

  • black text comes from Peaktime;

  • blue text are numbers we altered;

  • red text is predicted by PropheZy.

Peaktime Forecasts - BCastTime as Class.xls

Peaktime Forecasts - Channel ID as Class.xls

Peaktime Forecasts - Genre ID as Class.xls

Peaktime Forecasts - Audience Number as Class.xls

Taking each programme in turn:

  • Bagpuss - we used this to predict itself, thus no blue numbers - perfect;

  • Breakfast - we changed the day of broadcast from Sunday to Monday - decreased share from 8.2% to 7.6%, except when audience numbers was the prime dimension where share went up to 11.7%;

  • Animal Hospital - we changed the day of broadcast from Wednesday to Sunday - share increased from 4.6% to 7.05%, except when audience numbers was the prime dimension where share went down to 6.65%;

  • As Time Goes By - we changed show time from 14:45 to 14:00 - share went up from 9.4% to 10.5%;

  • Arrest and Trial - we didn't fill in the channel and PropheZy rightly predicted Channel 5;

  • Art Now - we changed the broadcast time from 17:11 to peak time 21:00 - share went up from 11.9% to 18.05%, except when audience numbers was the prime dimension where it hit 24.9%;

  • Cash in the Attic - we moved from BBC1 to C4 - was 40.4% and then decreased markedly to 4.8%;

  • Casualty - like Bagpuss, a direct test - hit 26.0% accurately.

However, Excel is really just the development environment.  You can build an HTML front-end to interrogate your model in a few minutes and roll it out globally across the internet.  We have built numerous predictors

  • straight through processing exceptions for investment bank trading, also very similar to anti-money-laundering applications;

  • trading price anomalies to benchmark against quantitative trading units;

  • price-your-audit for accountancy firms and finance directors;

  • customers likely to buy or prospects worth mailshots;

  • data cleansing (using PropheZy to fill in blanks in data);

  • credit risk predictions;

  • reserve level requirements;

  • grant-giving success;

  • medical analytics;

  • performance measurement setting (PropheZy can predict what you "ought" to achieve despite variables such as geography, assets, input quality) for property managers and, full circle, television producers.

Anyway, this gives you some background to open up discussions with Z/Yen!