The final post in our File Formats and Conversion blog series reviews the last part of the coding/decoding process. You will learn how decisions made during video decoding can affect data reliability and image representation.
Hi everyone, welcome back to this last post in the current series. We have covered proprietary data, learned all about codecs and formats, and then the reasons and methods for video conversion. We now head to video decoding.
In “Video Decoding: From File Conversion to Forensic Integrity”, we will look at:
- Video Decoding Process: Decoding is the reverse process of encoding, where decisions made during encoding (what data to keep or discard) are interpreted. Proper decoding ensures that video data is accurately presented for forensic analysis.
- GPU Decoding vs Forensic Decoding: While non-linear video editors (NLEs) can use GPU power to decode video, forensic analysis requires exact repeatability and accuracy at the pixel level, making GPU-based decoding unsuitable for current forensic purposes.
- Direct Loading of Files: The “Attempt Direct Loading” option within Amped FIVE allows users to bypass stream extraction in some proprietary formats. However, doing so may result in incomplete or incorrect video decoding, such as missing timestamps or corrupted pixel data.
- Video Loader: The Video Loader tool within Amped FIVE provides details about the original file and its conversion. Users can compare the original file with the newly formatted one and analyze the video stream more accurately.
- Video Decoding Engines: Amped FIVE provides several decoding engines like FFMS (FFmpeg Source), VfW (Video for Windows), DirectShow, and FFmpeg. Each engine interprets the video data differently, allowing for flexibility in decoding options.
- Engine Comparison: Different decoding engines can result in variations in frame count, timing, and playback. For example, FFMS decodes the video based on frame timestamps, while other engines may misinterpret the frame rate or duration due to container data.
- Color Range Control: Video can be encoded with either full or limited color range. Amped FIVE allows users to override the default decoding settings, ensuring that all luminance values (especially in dark footage) are correctly displayed during analysis.
- Chroma Upsampling: This setting controls how color is added back during the video decoding process. It allows forensic analysts to see how color interpolation affects the edges of objects and may improve the speed of decoding, especially in high-resolution videos.
- Playback Modes: Amped FIVE offers multiple playback modes, including average frame rate, PTS (Presentation Timestamp), and data timestamps. These modes help to accurately display video based on a specific data source, ensuring proper timing and frame synchronization.
- Forensic Challenges: Decoding video for forensic analysis is complex, as proprietary formats, varying codecs, and container issues can introduce errors. Forensic analysts must carefully select the right decoding methods to ensure accurate results.
We have learned previously that codecs use a two-stage approach. The encoding and the decoding stage. During the encoding, decisions are made on what to keep and what to discard. A GoPro camera recording in either H264 or H265 may be keeping a lot of information. The processing chips on those devices have to be fairly powerful to manage and deal with all the data efficiently.
Decoding all that data will also be fairly challenging and that’s where hardware decoding comes in.
There is a difference though between a Non-Linear Editor (NLE) for editing video and making films, and an application designed for the forensic analysis of video at pixel and block levels. The NLE can use the power of a Graphics Processing Unit (GPU) to decode the video. However, there is no guarantee that you will get the same result twice. It will be close, but nearly the same may not cut it in the courtroom.
The forensic video community must use procedures aligned with forensic science to ensure that results are repeatable, reproducible, and explainable. Consequently, it is difficult at the moment to pass some of the decoding to the GPU for example. It may become possible in the future, but for the time being our video decoding guarantees that the pixel value and processing result are exactly the same every time.
Our GoPro example above used a good-quality device recording high-quality video. What about the $39 Wifi camera that has captured your evidence? Has that device been able to process all the details required? It will have recorded an image, but it will not have worried too much about the details. Nine times out of ten, it’s those small details that you need.
So, how you decode those small details will affect how much they can be relied upon. Correct decoding will also enable more opportunities for restoration and enhancement.
Attempt Direct Loading
We briefly looked at this in the last post on video conversion.
After dragging a file into Amped FIVE, if the file is proprietary or not known, then this message will appear.
The Attempt Direct Loading button allows you to bypass any attempt at stream extraction or formatting. A small number of single-stream proprietary formats retain a large amount of standard video data. This allows the streams to be played back with standards-based video players. However, there are some warnings. Firstly, although the video may be displayed, other data, such as non-standard audio or, more commonly, a data timestamp will not be identified. As you have bypassed the stream extraction stage, the time data may be in the file you are decoding, but it cannot be understood by a standard multimedia framework.
This causes the next issue. If the standard multimedia framework incorrectly interprets that data as part of the video stream, it may corrupt the video decoding and cause errors in the displayed pixels, errors in the correct frame referencing, and incorrect frame numbering.
Lastly, you may see something like this.
This is an example of a standard multimedia framework interpreting a proprietary date/time timestamp as PTS data.
The purpose of Direct Loading is to give you the option of comparing the file without any necessary data management. As can be seen in the image below, although there has been a conversion of the time scale, the important PTS durations are the same in our new, standard container.
Video Loader
Let us now head to the Video Loader filter settings.
In the last post on video conversion, we completed several processes where we started with a proprietary file. At the end of that process, if selected in the Convert DVR settings, the results are loaded automatically into new chains within Amped FIVE.
It is necessary to log the conversion process, and that is the purpose of the Original File tab within the Video Loader.
Original File
This tab displays the original filename and path. You can see in this example that the original file that was loaded was a DAV format. Convert DVR has extracted the video stream, formatted it into an MKV, extracted the Timestamp, and then loaded those into the Viewer. However, the original file was the .dav.
You may also see that the Loader has a file list for multiple files that may have been concatenated and then loaded as a newly formatted, single file.
Lastly, at the bottom, you have further options to load this original file back into Convert DVR, for further conversions or to analyze the original file within Advanced File Info. This allows comparative analysis of the data between the original and the newly formatted.
Now that we have an understanding of this tab, let us go back to the main parameters and look at the options there.
Video Engine
The first dropdown is an important one, it dictates what engine is to be used for decoding and controlling the video.
There are several options and we will look at the ones most commonly used.
“FFMS”
“FFMS” stands for FFmpeg Source. It is a powerful multimedia library based around FFmpeg but gives frame and sample accurate decoding. Many video tools use it and this can often be identified by the ffms.dll or ffms2.dll being present within, or associated with, the application.
You can see that we have split it into two. There is no point in decoding Audio if there is no audio to decode.
“FFMS” creates a frame index behind the scenes, which keeps scrubbing and timing all synchronized. Using “FFMS” allows a higher level of stream accuracy, both at the timing and pixel level. We will look at both shortly.
“Video For Windows (VfW) and DirectShow (DShow)”
These are using the Microsoft Multimedia Framework.
You may want to use these or observe how a file decodes with them for several reasons.
You may be unlucky and have a file that requires the installation of a proprietary codec. Thankfully, there are not that many now, but some of the older systems that utilized them are still in the wild.
As we have learned in previous posts, the codec must be installed on the computer as a library resource. Then, when that video file is opened using the framework, the file 4CC matches the library resource, and the video can be decoded.
Although the MS Framework comes with many standard codecs built-in, always be careful when installing proprietary CCTV codecs into a working host machine. Use a Virtual Machine or a Sandbox environment. They have been known to cause conflicts, resulting in the incorrect decoding of other formats, and may also have security risks.
The most common reason to use these engines is to evaluate how a standard Windows Decoder deals with the file. This may then answer why another user reports a certain value, such as a duration or frame count.
Remember that a proprietary codec is very different from a proprietary container format.
“FFmpeg”
This video engine will use the native FFmpeg decoder. It will simply read the file information available and won’t analyze the stream like “FFMS”. The decoder uses the information within the file header to decode the stream.
Again, it is here as another option. Although both “FFmpeg” and “FFMS” use the same library to interpret the data, the data they use to decode the video is different.
It is rather similar to FFprobe and MediaInfo within the Advanced File Information. Both tools use the same media library to read file information, but they differ in how they do it, and also how it is presented.
Engine Comparison
Using the engines above, with an MPEG4 Part2 video inside an AVI container, the following differences are observed.
We learned in a previous post how the AVI format can add “empty” chunks to ensure video is presented at the correct speed. Taking that knowledge, we can quickly identify what data is being used by the video engines.
Starting with “FFmpeg“, it can read the amount of real frames. However, it is then reading the base frame rate set in the container header, 25 fps. Remember that the container is not really suited for the video that’s inside it!
5994 / 25 = 239.76, or 3mins, 59secs and 760ms.
This error results in the decoder playing the video too fast.
“DShow” reads the base frame rate and the duration of the video as set in the container header. For this reason, we have left the duration uncolored.
There are only 5994 chunks of video so the empty chunks are added during decode to ensure the video plays at the right speed to match the set duration.
Although this decoder plays the video at the correct speed, it has added thousands of duplicate frames to do so.
“FFMS” reads the frame count and also the stream data itself. Within the video stream, each frame has a set Presentation Time Stamp, which tells a decoder when to display the picture. The time deltas between each Presentation Timestamp are added together, and this gives us the duration.
5994 / 1198.835 = 4.999 fps.
This decoder plays the video accurately.
You will also notice that as “FFMS” is highly configurable, there are added controls when using that engine. Let’s look at those now.
Color Range
When a video is encoded with the most common colorspace for video, YUV420P, it can have either the “Full” or “Limited” color range. There is a compression benefit to using “Limited”. Rather than using the “Full” range with all luminance values, 0-255, it only needs to use 16-235.
When that video gets decoded, the decoder will identify that the video is using a “Limited” color range and then remap the values during playback. There will be some loss of values in this process but they are not noticeable to the viewer.
A common issue within surveillance footage is that a video is encoded with the “Full” range, but the stream presents as limited to the decoder. Therefore, all the dark values and all the light values are hidden. As you may imagine, restoring and enhancing a dark area without first decoding valuable data could be problematic.
This override function allows you to assess the color range in the video and force a “Full” range if required. It’s not something that needs to be done for every video. However, for very dark footage, it may be worth assessing the color range to see if valuable data is being hidden from you.
Watch this short tutorial on Color Range.
Chroma Upsampling
The next dropdown deals with how color is added during decoding.
You may remember earlier in the series where we discussed color compression. We learned that there may be no requirement to save all the color information if that color can be added during the video decoding stage.
Looking again at the Colorspace YUV420P, it’s the numbers that dictate how much luminance and color are stored.
- The first number is luminance. For every row of 4 pixels, there are 4 values.
- The next number details that in the first row of 4 pixels, only 2 have color.
- The next number details the second row of 4 pixels, and none have color.
So, for every 2×4 block of pixels (8), only 2 have a color value. Putting this into an 8×8 pixel block, it is decoded something like this.
Having control of that upsampling enables you to view how the edges of objects change. The upsampling has to be achieved through the interpolation of the sampled values.
Also improved is the speed of video decoding. The upsampled values are calculated during decoding so fewer calculations will enable a faster decode. This could be useful during playback of video with resolutions higher than 2K and block-level analysis is not required.
Playback Mode
The final stage of video decoding, after the pixels, is how to control the rate of playback.
In Amped FIVE there are several options to evaluate and use as required. The default can be changed in the Program Options.
Average (FFMS)
This utilizes the FFMS Engine to present an average rate based on the calculations of frame count and all the frame duration deltas.
PTS
This is the default Playback Mode. It displays each frame as dictated by the Presentation Timestamp (PTS) associated with that picture. These values form part of the container formatting and are not within the raw frame data itself.
Timestamp
Data timestamps, subtitle timestamps and timestamps added manually can all be used to control the playback rate.
This is very useful with streams extracted from proprietary containers where there is no PTS data found. The originating system used the data timestamp to control the presentation of the images. Using this option replicates that of the proprietary player.
Each one of these could be used, depending on the source file and its capabilities. Having the ability to evaluate the data and compare it with other modes can be very helpful.
Program Options
Remember that the control of files coming into FIVE can be configured using the Import tab within Program Options.
Finally
The intention for this series was to carry on along the pathway from our previous CCTV Acquisition series. Due to the variety of formats we have to deal with, the history behind them, and why certain processes are required had to be documented.
We hope that those new to this wonderful niche world of Forensic Video gain fresh insight into the challenges of proprietary data. Also, we aim to highlight solutions available with applications such as Amped FIVE.
Proprietary data, along with the challenge of managing various codecs and formats, requires careful analysis, understanding, and interpretation.
This is one area of forensics where you will never stop learning.
We will leave you with this.
Imagine 16, five-thousand-piece, jigsaw puzzles, and all the pieces are in one single box.
There are no separate boxes to establish what they should look like. There is no “key” to unlocking how they should join together.
That’s our world!
Video is not easy but with us all working together to identify new formats, establish new challenges, and build new functionality, we can at least lighten the load.
As they say at the end of the cartoon: That’s All Folks!