In this first post from our new Video Formats and Conversion blog series, we cover some of the terminologies we will be using throughout. Looking ahead, we run through the upcoming posts to give you an overview of what to expect over the next few months.
Hey everyone. Welcome to this short blog series on video formats and conversion.
Over the next few months, we will be answering some of the most common questions on video formats and conversion that are asked during training or through support. In “Getting Started with Video Formats and Conversion”, we will learn that you may find a lot of conflicting information within this niche corner of digital forensics. This often causes confusion that may result in difficulties when processing, analyzing, and reporting on video surveillance data.
This series is designed for the Forensic Video Technicians or Analysts. However, if you are involved in CCTV installation, monitoring, Digital Multimedia Evidence (DME) storage, or investigation, you may still find some of the points useful.
As you may imagine, video formats are a huge subject. Consequently, we will only take in the most common formats and codecs, rather than the rare and obscure. This is not the “everything you will ever need to know” series. Forensic Video Analysis is a career of lifelong learning so see this as a starting point rather than the finishing line!
Linked with formats are the inevitable conversion requirements. This takes on a whole new level when dealing with surveillance video for investigations, as it is often not a simple A-B process. As such, an understanding of the native data is vital.
Once we have understood the data we have, we can decide on the next steps. Without understanding what it is you are starting with, incorrect processing can damage the data you are relying on.
Data Reliability
Reliability is the key here.
How can you rely on the data? Where has it come from? How have the values been calculated? In this series, we will answer these questions and many more like them.
First, let us consider why a series like this is important. In the same way, as CCTV Acquisition required a full blog series, the challenges of the data they produce must also be acknowledged. Why is that? You are told to “go and get the video”. It’s only a video, isn’t it? Well, it is not, in most cases.
It is volatile digital data, that may be required to establish facts in an investigation. It must be fully understood and interpreted correctly. This refers not only to your interpretation but also to how a computer application interprets it. Welcome then to the world of proprietary data.
Proprietary Data
The internet is full of articles, blogs, and videos all about codecs and containers. These terms, which we will learn more about later, are all used on the premise that they conform to a known standard. Also, an unfortunate result of web scraping and AI generation is the increase in articles that are so completely wrong that they border on being dangerous to an unsuspecting CCTV investigator.
International organizations standardize and document how moving pictures should be written, and then how they should be stored, or containerized. This is great, but many years ago, nobody mentioned this to the thousands of CCTV manufacturers around the world.
We have ended up in a rather bizarre situation. Those responsible for capturing video that may be used to establish facts at a later time, do not have to document how they are storing that video. More importantly, there is no requirement for them to use any known standard.
Luckily, over the past 20 years, the use of a standard starting point has overtaken the completely proprietary encoding. You will still find them though, and we will discuss some options for dealing with them later in the series.
So, within a file of proprietary data, some of it may have started out conforming to a standard. This is good news as if we can identify that data, then we can decode it. This stage is documented in many guidance documents as the Lossless Extraction of Data from Proprietary Formats.
This brings us nicely to terminology, as it would be helpful to know what a proprietary format is.
Terms
Here we have a summary of terms that will be used during the series.
Proprietary
Within this context, proprietary relates to having been developed or controlled by a specific company. There are two distinct types. Open and closed. Open meaning that there is a published document, explaining the formation of the data. As you will have probably guessed, closed means the opposite and the developing company has not published any detail on the data structure.
Standard
A standard data type means that there is an internationally accepted method and structure, ratified through an organization or committee. The creation and publication of the standard ensures that interested parties can then develop hardware and software to utilize that data type.
Format
In our multimedia world, format refers to how the data is formatted. The format is often referred to as the container. You may see the words put together as, “Container Format”.
Video and audio data that are not “formatted”, or “containerized” may not be correctly decodable. As such, there is a synergy between the multimedia and the container where they reside. Due to these being containers, they can contain several items: video data, audio data, time data, and metadata.
Codec
Stands for COmpressor and DECompressor, or COder and DECoder, depending on what you read. Think of a codec as a language. There are many different languages, and there are many different codecs. It is the codec that encodes the multimedia and then decodes it when required to be used, such as during playback. The standards for codecs fully document every component necessary for the encoding stage. Consequently, it is then possible to reverse that and decode the data for playback or conversion.
Stream
Inside a format, the multimedia that has been encoded are stored as streams. The format dictates the type of streams that can reside inside. Not all streams encoded with all codecs can go into all formats. There are some compatibility limitations.
There is another type of stream, and that is a subtitle. These will also conform to a standard to enable the overlay of text with standards-based players.
Conversion
There are various stages and types of conversion and again, this is often a subject of confusion. Let us break them down and later in the series we will look at these in more depth.
- Extraction: The process of identifying and extracting streams, codecs, and other associated data from within a proprietary format.
- Encode: The process of transforming raw stream data into a coded form utilizing a codec.
- Mux: The process of placing the streams into a container format.
- Transcode: The process of decoding multimedia streams using a codec and encoding that data with another codec.
- Reformat: The process of copying the streams from one format to another compatible container format.
- Reindex: The process of copying the digital data from one format and placing it inside the same format type.
- Concatenation: The process of joining video segments together in consecutive order to create a single video stream.
Within this section, we must also consider the term “Lossless”. Some processes retain all the native information when extracting, moving, and changing data within a conversion stage. The data may be formatted very differently, but nothing has been “lost”.
Multimedia Framework
With many video codecs, audio codecs, and several different container formats, handling multimedia is very demanding. To ensure the data is passed from one process to the next correctly, frameworks exist that use international standards. Within each one, are several libraries that contain the information required to complete a process. From capture to writing, playback to conversion. Building a framework “from the ground up” would be a huge task, with the main ones now coming from the likes of Apple, Microsoft, and of course, FFmpeg. Integrated into most tech companies’ digital infrastructure, the FFmpeg framework of libraries can be found everywhere.
Examples
Let us now put all of this information into some context.
AVI
Here we have a file named “cctv_export” with a file extension of AVI. AVI stands for Audio Video Interleave. This standard format is part of a family of multimedia containers that use the RIFF standard. Resource Interchange File Formats store the data using indexed chunks.
The format will have a header containing information about the streams inside and an index to ensure efficient file navigation during playback.
The first steam (which will have the number 0), uses a standard video codec, MP4.
The next stream (1) uses a standard audio codec, MP3.
The AVI container supports multiple streams, but it does have limitations. One of the main ones, for our work, is that it does not efficiently support variable frame rates. We will take a further look at this during the series.
You may have noticed that MPEG4 has various parts, and this is where more confusion often occurs.
- MPEG4 Part 2 refers to visual encoding.
- MPEG4 Part 3 refers to audio encoding.
- MPEG4 Part 14 refers to the Container Format MP4.
You could have an MP4 container with MP4 video inside. We will look more at the MP4 format during the series.
SCD
Here we have a file with the file extension of SCD. Whereas online searching for AVI may result in a gazillion search results, searching for this file extension may only return a few. It may be located within a user manual for an unbranded DVR that explains that if files are exported, they will be in the SCD format. It may state that the manufacturer’s proprietary player will be required to decode them.
There is no standard or whitepaper on this format. It is, therefore, a closed proprietary format.
However, the video has been encoded using a standard. H264, also known as AVC (Advanced Video Coding), or MPEG4 Part 10 (another part!). The fact that the manufacturer has started with a standard does help us. However, as the formatting is not standard, the two streams are muxed into a format that is not understood by standard multimedia frameworks, such as FFmpeg.
Next, there is unknown data that has no identifiable structure for it to be extracted and formatted correctly. It could be audio but due to the closed nature of the format, it may never be possible to establish the data type or how to decode correctly.
Inside the container is also an index system that references every second in real-time. This index system is also proprietary but, through research and testing, it is possible to decode. Due to the proprietary nature of the formatting, the identifiable data will need to be extracted and reformatted to ensure the footage can be decoded forensically.
Finally
As you have learned, Digital Multimedia Evidence can be a minefield of data mixed with standards, proprietary, and the unknown.
Let us take the RIFF data type that we looked at within the AVI example. Some proprietary video formats also use this structure but the format is not AVI, and they do not use the AVI extension. Then we have other manufacturers that use known extensions, such as MP4, but the format does not conform to the standard (MPEG4 Part 14).
This series aims to highlight some of these pitfalls so you are better prepared to analyze, process, interpret, and report on the data you have. When you deal with data forensically, you maximize all the opportunities that restoration and enhancement afford you. Proprietary decoding limitations do not restrain you and, as such, you ensure that the imagery has integrity.
The next post in this series will look at proprietary data in much more detail as this is often where the most problems occur. Until then, look after yourselves.