Dear friends, welcome to this Tip Tuesday! It’s a rather anomalous one, indeed, since we’ll not show any trick about using a specific Amped solution as usual. But we’ll still provide you with a very good tip: we’ll guide you through some of the freely available datasets you can find online, that you can use to test and validate our Amped FIVE, Amped Authenticate, Amped Replay, and Amped DVRConv. Yes, you’ve read correctly! There is lots of data out there that you can use to make your own experiments and increase your confidence in our software reliability. Keep reading to find out.
Validation is a crucial step in the development and adoption of any forensic tool. People working with digital data are generally lucky: validating their tools only needs some software and hardware, it does not take someone’s blood or fingerprints. Users of Amped products are even luckier: they only need data, since Amped solutions run on virtually any decently-equipped Windows computer.
We’ve already dedicated a tip to show that both Amped FIVE and Amped Authenticate come with their own samples folder, where you can find a lot of different sample cases to test the software on and play with. However, those are just samples, they’re not a massive amount of data designed to test software performance.
When we implement an algorithm, it normally comes from the scientific literature, where its validation protocol and results are presented. Before beginning the implementation we review the published data, and once finished we run internal tests to check the correctness of our implementation. We may also share our internal results from time to time like we did for PRNU Identification.
But increasingly more often, users ask us whether they could validate some of our algorithms on their own. And we always encourage them to do so! Of course, you’ll need some data… and here is where this tip hopefully helps you.
Let’s start with Amped Authenticate. If you’re interested in validating Amped Authenticate’s Camera Identification feature, you’re very lucky: there are several free datasets online that you can use, explicitly made for PRNU testing. Some examples:
- The VISION Dataset contains images and videos obtained with 35 smartphones/tablets, along with their Whatsapp and Facebook shared version. The paper presenting the dataset is freely accessible here, while the data is on this FTP server hosted by the University of Florence (Italy).
- The SOCRatES (SOurce Camera REcognition on Smartphones) dataset features images from more than 100 devices, for which you can freely request access to authors.
If your goal is to test the forgery localization performance of Amped Authenticate’s Local Analysis filters, you can find many datasets online. Some examples:
- The IEEE IFS-TC Image Forensics Challenge provides hundreds of forged images, at various resolutions. All images are in the PNG format, along with their ground truth mask to be used for evaluation.
- The Copy-Move Forgery Dataset, described in this paper, provides 50 hand-made and 680 automatically-generated copy-move forgeries (generated starting from 20 pictures and applying different scale/rotation to the cloned object). Images are available in the BMP format.
- The Realistic Tampering Dataset v. 2.0, which contains 220 realistic forgeries created by hand in modern photo-editing software (GIMP and Affinity Photo) and covers various challenging tampering scenarios involving both object insertion and removal. Images are all available in the TIFF format.
- The Wild Web Dataset, a very large collection of forgeries collected from various web and social media sources, accompanied by ground truth binary masks localizing the forgery, and by the image sources that were used to perform the forgery, wherever these are available. In contrast to other datasets, no unspliced images are contained. The dataset is made of 80 realistic hand-made forgeries and more than 10.000 strongly related sub-cases (i.e., other versions of the fake image as found on the web, with different resolution, format, compression quality, etc.).
- The dataset used for validating the paper A Framework for Decision Fusion in Image Forensics based on Dempster-Shafer Theory of Evidence, freely available on this page. It contains 70 realistic hand-made forgeries and 4800 automatically generated, visually undetectable forgeries. All images are in the JPEG format.
If you’re more on the Amped FIVE side, there’s also a lot out there waiting for you.
- You’ll find a very nice license plate dataset on this page (available for download at this link), which presents the dataset associated with the paper Eyes on the Target: Super-Resolution and License-Plate Recognition in Low-Quality Surveillance Videos. It’s a great resource to validate the recent Perspective Stabilization filter, and possibly compare its performance with Amped FIVE’s Super-resolution filter. Citing from the linked page: “The dataset is a collection of 200 real-world traffic videos, in which the movement of the vehicles is away from the camera (one target license plate per video). All collected streams are 1080p HD videos @30 fps (video codec H.264, without additional compression) and contain only Brazilian license-plates. As we have a good resolution of the license plate in the beginning of each video, we manually identified the correct characters of its target license plate and created its ground-truth file. Unlike the beginning of the video, the license-plate alphanumerics in the last frames are harder to recognize.”
- The MPlayer samples collection, 54 GBs of various common and uncommon multimedia formats, is the right dataset if you want to test the famous Amped video conversion engine shared by Amped Replay, Amped FIVE, and Amped DVRConv. We recommend that you read the header.txt file before downloading huge amounts of data.
- The Digital Camera Review page maintained by ImagingResource.com is a great place to find images of a specific camera model (e.g. to test or enlarge Authenticate’s JPEG Quantization Table database). You’ll have to manually browse and look for sample shots, but it’s a very rich collection.
- The RAISE Dataset is a very good resource if you’re looking for camera-original RAW files.
You’re all set! Just download data and start testing, and remember: if you find some interesting case or spot any bug/unexpected behavior, contact our support team and let us know! Have a great week and… happy testing!
One last thing: this post is just a tip, but websites exist that indefatigably monitor publicly available datasets for image processing. Some examples are:
- The dataset collection maintained by Stefan Winkler.
- The image database page published by ImageProcessingPlace.com.
We hope you enjoyed this Tuesday tip! Stay tuned and don’t miss the next ones. You can also follow us on LinkedIn, YouTube, Twitter, and Facebook: we’ll post a link to every new Tip Tuesday so you won’t miss any!