Besides Research

New breakpoint analysis of English Soccer Data

In geophysics, we are especially interested in investigating changes in various temporal curves. For this purpose, I developed an R function to scan and identify important breakpoints in any time series (this function is included in the R package ACA).

Here I am going to show a new analysis of  changes in scoring patterns over time in English soccer. This data set is not exactly what a geophysicist would process but it is real fun and relevant to a wide audience. The previous analysis of this time series is published by  James P. Curley  (jc3181 AT columbia DOT edu) here

James P. Curley’s results

James P. Curley used the breakout function in the BreakoutDetection R package.  The breakout algorithm is referred to as E-Divisive with Medians (EDM) and is utilizing permutation tests to assess for changes in medians between steady states. James P. Curley set to 10  the minimum number of observations between change points. As stated by JPC, The change-points identified by the BreakoutDetection package define 7 periods of time (1881-1898, 1898-1923, 1923-1933, 1933-1950, 1950-1961, 1961-1974, 1974-2013) and are probably those that we would expect.

Below are reproduced the interpretation and the plot from James P. Curley:

“The most interesting one is probably the one identified at 1923. The offside law was changed in 1925 to enable more scoring. The other revealing one is at 1974. Soccer tactics were evolving throughout the 1960s with the adoption of ‘method football’ and more focus on defense. This shift is picked up by this analysis at the 1974 breakpoint.”

Soccer data change-point

The SDScan (ACA R package) results

The change-points identified by the R ACA package define 6 periods of time (1881-1896, 1896-1914, 1914-1924, 1924-1935, 1935-1967, 1967-2013). Some breakpoints are very similar to those of the BreakoutDetection library: 1896 (1898 from BreakoutDetection), 1924 (1923 from BreakoutDetection), 1935 (1933 from BreakoutDetection). The R ACA package finds an additional change-point in 1914 (it’s the beginning of WWI! ). For the points after 1935, the two methods do not perfectly agree: The R  ACA package fails to detect  the effect of WWII but it locates definitely sooner the last change-point (1967, whereas BreakoutDetection puts it in 1974).

With the R ACA package, the effect of the offside law (changed in 1925) is detected in 1924 and the effect of ‘method football’ (introduced in the mid-1960s) is detected as soon as 1967.


Statistics for the change-points are :


They show that the two first discontinuities (1967 and 1924) are very strong change-points (just have a look to their large DisSNR values! ).

Any questions or comments, please email me at amorese AT ipgp DOT fr