In geophysics, we are especially interested in investigating changes in various temporal curves. For this purpose, I developed an R function to scan and identify important breakpoints in any time series (this function is included in the R package ACA).
Here I am going to show a new analysis of changes in scoring patterns over time in English soccer. This data set is not exactly what a geophysicist would process but it is real fun and relevant to a wide audience. The previous analysis of this time series is published by James P. Curley (jc3181 AT columbia DOT edu) here
James P. Curley’s results
James P. Curley used the
breakout function in the
BreakoutDetection R package. The
breakout algorithm is referred to as E-Divisive with Medians (EDM) and is utilizing permutation tests to assess for changes in medians between steady states. James P. Curley set to 10 the minimum number of observations between change points. As stated by JPC, The change-points identified by the
BreakoutDetection package define 7 periods of time (1881-1898, 1898-1923, 1923-1933, 1933-1950, 1950-1961, 1961-1974, 1974-2013) and are probably those that we would expect.
Below are reproduced the interpretation and the plot from James P. Curley:
“The most interesting one is probably the one identified at 1923. The offside law was changed in 1925 to enable more scoring. The other revealing one is at 1974. Soccer tactics were evolving throughout the 1960s with the adoption of ‘method football’ and more focus on defense. This shift is picked up by this analysis at the 1974 breakpoint.”
The SDScan (ACA R package) results
The change-points identified by the R ACA package define 6 periods of time (1881-1896, 1896-1914, 1914-1924, 1924-1935, 1935-1967, 1967-2013). Some breakpoints are very similar to those of the
BreakoutDetection library: 1896 (1898 from
BreakoutDetection), 1924 (1923 from
BreakoutDetection), 1935 (1933 from
BreakoutDetection). The R ACA package finds an additional change-point in 1914 (it’s the beginning of WWI! ). For the points after 1935, the two methods do not perfectly agree: The R ACA package fails to detect the effect of WWII but it locates definitely sooner the last change-point (1967, whereas
BreakoutDetection puts it in 1974).
Statistics for the change-points are :
They show that the two first discontinuities (1967 and 1924) are very strong change-points (just have a look to their large DisSNR values! ).
Any questions or comments, please email me at amorese AT ipgp DOT fr