This follows on very nicely from my previous post, ‘Where Is the Rest of the Video?’, where we had to analyze video metadata in order to answer questions about missing footage. In the case here, we have some discrepancies with the frame rate. Let us take a closer look.
The first thing to point out is that I have recently changed one of the Amped FIVE Program Options and I thought it was worthy of an initial mention.
I am coming across many more CCTV files with audio streams. Now, I am not saying that they all have noise, but they do have a stream. As such, I have changed my video engine default to FFMS with Audio.
This engine is now the default on new installations. However, if you have updated from a previous version, the old settings are retained.
Upon loading the video into Amped FIVE, I can see from the File Info tool that an audio stream has been detected, but the waveform envelope in the Player bar is empty.
This is a common issue. The file has an audio track and a stream identifier….but there is no audio inside it. Sometimes there is audio, but it is only a quiet click, usually every second. If ever you come across this, it is always worth paying attention to the timing and duration of the audio. The audio is often used for timing so it can help to identify what length the video should be if there are frame rate issues such as it playing too slow or too fast.
Another thing to highlight is that these ‘blank’ audio streams can often cause scrubbing problems. If this is the case, you can switch back to the FFMS Video Engine to stop any audio decoding attempts.
Before I digress any further, let us get back to the ‘Frame Rate’ issue here..
The File Info tool, seen in the image above, also tells me that the video has 1208 frames and a duration of 00:01:20.786. A frame rate of 14.95 has been calculated. (1208 / 80.786).
The video plays back fine and persons in the footage appear to walk normally. There is also a hard-encoded date and timestamp. This is where the date and time information is encoded into the video rather than overlaid on top.
By frame counting the changes between each second, I count 15 individual frames per second.
Now, everything is pointing me towards 15 frames per second.
I suppose then that now is a good time to point out how valuable it is to have information from the DVR. When completing a recovery, one of the many pieces of information I get from the device is the settings per channel, such as frame rate. If I had a record of that then this would help further along the line……because things are about to get mixed up!
When analyzing the Advanced File Info, the summary reports 25 FPS. This value is read by the FFMS decoder directly from the container.
This video does originate from a UK based CCTV system so seeing 25 FPS is not unexpected, due to it being the PAL frame rate standard, but why is this different from what I counted earlier? The first thing that springs to mind is the audio – is this causing an issue and is the audio length the same as the video? Moving to the MediaInfo tab I can see that both Where Is the Rest of the Video? Video and audio have a 1min and 20sec duration….BUT…. MediaInfo reports that the video has…… 30 FPS!
At this point, it is common to swear loudly and go home…hoping that by the morning, the file fairies would have fixed it and you won’t have to deal with three different frame rates!
Unfortunately, that didn’t happen so it’s up to us to find out what the true frame rate is and attempt to identify why we have 30, 25 and 15 FPS being presented.
FFprobe is the next tab in the Advanced File Info.
This brings a lot more information and I have edited the text and output to only show what we are interested in.
The video stream has a codec time base of 1/30
The container frame rate is 25/1
The average frame rate using the set duration time of each frame is 30/1
The timebase of the video stream is 1/90000
The timebase duration is 7271910
The time duration is 80.799
Let us now use this information, along with the frame analysis output, to try and figure out what is going on, so we can document our decision making.
The first thing to look at is the ‘probe_score’ at the bottom of the [FORMAT] section. A score of 100 means that the software has analyzed the file and the codec and the container match what is expected and the file complies with all standards. A score below 25 will probably not result in you being able to decode the file with the FFmpeg libraries as it will not know what it is.
We have a score of 52.
Remember, we are dealing with the exports from a CCTV system, so there may be many things within the data that cause conflicts and confusion. Codec and container standards are written from the decoding point of view. Most CCTV manufacturers want to control the decoding within their own system of hardware and software so removing a file from that system may result in ‘misunderstandings’. It is up to us, as analysts, to understand the data and identify what is relevant.
We must understand what the true frame rate is? If I am asked what I did with the ‘other’ frames, as it should be 25 or 30 frames per second, I must be able to answer correctly.
Is it 30? Is it 25? Is it 15? Is it 14.95?…. or is it something else?
Looking at the timebase first, why do we often see 1/90000 regardless of the frame rate?
For transport streams, the timebase or ‘tbn’ as it is sometimes displayed, is always 90kHz, this is the time scale of ISO/IEC 13818-1. Other containers do not need to comply with this, but many do for continuity and ease.
In a standard PAL file, you will often see that each packet has a duration of 3600 ‘ticks’ of the 90KHz clock. 90,000 / 3600 = 25. That’s the frame rate of standard PAL video!!
Where do we see these values? In the Frame Analysis.
For our file, we have 3000 ‘ticks’ per packet.
This gives us the 30 FPS result. (90,000 / 3000 = 30).
Next we have Timebase Duration… is this the same as our time duration?
7271910 (Timebase Duration) / 90000 (Timebase) = 80.799 seconds. So yes, it does match the duration of the video we have.
In the image above I am only displaying the video frames. You may notice that the DTS, the Decoding Time Stamp values, are all extremely high. (These values were the same as the PTS, the Presentation Time Stamps, but I have cropped the image for display purposes).
This would indicate to me that the data stream is a direct trim from the original storage device and into the container file. Again, it must be remembered that this small clip is just a tiny part of a huge continuous recording that was once on a Hard drive inside a CCTV recording device.
If it had been cleaned and re-indexed specifically for the export, then the PTS would start at 0.
We can see that although the codec frame rate is set to 30, there are only 15 frames per second.
We have the ability to do a lot more with this data by using the ‘Open in Excel’ button at the bottom of the Frame Analysis window.
I have copied the pkt_pts_Time column into a new sheet and then cleaned the values to make them easier to read. I have then calculated the duration of each frame according to the PTS. In the box you will see the pattern, and see that they total 1 second.
Patterns like this are really important to identify in cases involving motion analysis.
Before we move on, how about a recap?
We have a file that presents 15 FPS, it has a codec frame rate of 30 FPS but the container reports a frame rate of 25 FPS.
We can identify 15 unique frames per second in the frame analysis.
We can identify that the video stream has a timebase of 30 FPS, but each frame is held longer than it’s packet duration during presentation.
And what about the audio rate?
Each has a duration in time of 0.40ms, and 3600 ‘ticks’ per packet… and there are 25 samples per second. That’s where the 25 FPS comes from!!!
There are many reasons why companies use standard frame rates on their exports. It is usually to ensure that multiple cameras can be exported to the same container format regardless of the frame rate of the original source. Also, when various cameras are multiplexed into the same container, each one can have different frame rates.
However, it is vital that we identify what the stream’s rate ‘should’ be. If I had decoded this video using software that read the codec – it would playback at 30 frames per second.
If I had decoded the video using the container’s reported frame rate, it would play back at 25 frames per second.
Depending on the player/decoder, duplicate frames may get added without me knowing, and then this would present me with the wrong frame count. The player/decoder may simply speed the video up. With the difference here being 10-15 frames, that would have been noticeable but, if it were only one or two then it may not be. Something captured by the camera being displayed too fast could be very detrimental to the investigation.
Nowhere in the files metadata does it report the true frame rate. If I had used the information presented to me, or I had utilized software that takes the frame rates from the container or codec, I could have changed the video considerably. Amped FIVE’s default decoder, FFMS, is designed to present you with the frames that are there, and to display them according to the duration of the stream. It has ignored the 30 FPS. It has ignored the 25 FPS. Using this, I can rely on the 14.95 FPS that FFMS is presenting to me.
I am also able to analyze right down to the frame level to answer any questions related to the timing.
That is the answer to our question, “ what is the frame rate?”.
The CCTV device was probably set to 15 FPS, so why have we not got this exactly?
There are two main reasons.
Often minor fluctuations in frames captured during each second will occur. Some seconds may have 16 frames, whereas some may have only 13 or 14. Lastly, when the export is completed, it must start at an I frame (Reference frame), which may be halfway through a second…..and this is exactly what occurs in this file.
Frame rate issues can be very confusing, but now that I have fully understood the data being reported and how it is presented, I can continue with my investigation…
I hope that you have enjoyed this post, walking through the minefield that is format and codec frame rates. Dealing with CCTV exports are never simple!