Previous topic

Quality check of time series data

Next topic

CR Database

This Page

Finding peaks in a vectorΒΆ

In the following example we try to find peaks in some artificially generated data.

First we make a test time series data set for 4 antennas:

>>> data = hArray(float, [4,512], name='Random series with peaks')

and fill it with random data that have arbitrary offsets:

>>> data.random(-1024,1024)
>>> data[...] += Vector([-128.,256., 385.,-50.])

Then we put some peaks at location 2-3, 32, and 64-67 in each of the 4 data sets:

>>> for i in range(4):
...     data[i,[2,3,32,64,65,67],...] = Vector([4096.,5097,-4096,4096,5099,3096])

Now, we reverse-engineer and try finding all 5 sigma peaks:

>>> nsigma = 5

First make a scratch array that will contain the locations of the peaks. A location is actually a ‘slice’ in the array, i.e. given by its beginning and ending position (plus one). The length of the return array must be pre-allocated and should be long enough to contain all peaks (at maximum as long as the input array):

>>> datapeaks = hArray(int, [4,5,2], name="Location of peaks")

Now, retrieve the mean and RMS of the array to set the thresholds above and below which one considers a peak to be significant:

>>> datamean = data[...].mean()
>>> datathreshold2 = data[...].stddev(datamean)
>>> datathreshold2 *= nsigma
>>> datathreshold1 = datathreshold2*(-1)
>>> datathreshold1 += datamean
>>> datathreshold2 += datamean

Finally, we determine the input parameters for the search algorithm:

>>> maxgap = Vector(int, len(datamean), fill=10)

The gap vector tells the algorithm how many samples can be between two values that are above threshold, so that the two peaks are considered as one:

>>> minlength = Vector(int, len(datamean), fill=1)

A minimum length can be specified to exclude peaks that consists of only a single or a few values (no relevant here, so set to 1, i.e. all peaks are relevant). Then call hFindSequenceOutside() (or hFindSequenceBetween(), hFindSequenceGreatererThan(), hFindSequenceLessEqual() ...):

>>> npeaks = datapeaks[...].findsequenceoutside(data[...], datathreshold1, datathreshold2, maxgap, minlength)

The return value is the number of peaks found (in each row of the data set):

>>> npeaks
Vector(int, 4, fill=[3,3,3,3])

And the slices are actually contained in the return vector for each antenna:

>>> datapeaks.mprint()
[2,4,32,33,64,66,0,0,0,0]
[2,4,32,33,64,66,0,0,0,0]
[2,4,32,33,64,66,0,0,0,0]
[2,4,32,33,64,66,0,0,0,0]