Follow The Hacker Factor Blog on Feedspot

Continue with Google
Continue with Facebook

The Hacker Factor Blog by Dr. Neal Krawetz - 1w ago
One of the ongoing research projects at FotoForensics involves image ballistics. Specifically, I want to be able to look at a picture's signature and determine if it came from a specific camera or application. (I'm currently on the third major revision to my ballistic system.)

The JPEG file format is a container. It usually contains one big picture, but it can also contain thumbnail or preview images. (The terms "preview" and "thumbnail" are interchangeable.) These previews are usually variants of the main image and located within a JPEG application block. But in some cases, the preview may be appended after the main image or stored in a format that violates the JPEG Standard. Appending non-standard data to the JPEG happens often enough that it doesn't surprise me. My current ballistic system generates signatures for the main image and all thumbnail images.

Last week, the ballistic system indexer alerted me that something was "odd". The picture's contents were not anything unusual -- just some guy giving his dog a bath. (What a cute puppy!!!) The picture's signature and metadata appeared to be a camera-original from an Android device, but the file size was too large. After a little investigating, I realized that there was a video appended to the end of the JPEG! The video was about 3 seconds long and showed the guy next to a very happy puppy.

Needles in Haystacks I've seen short videos like this before -- mostly from Apple devices -- but I had never noticed them appended to JPEG images. I created a small script to search the 3.2 million pictures at FotoForensics and tag every instance of an appended video:
2012: 0 instances
2013: 0 instances
2014: 0 instances
2015: 1 instance
2016: 33 instances
2017: 205 instances
2018: 481 instances
2019: 276 instances (Jan - June)

No wonder I hadn't noticed it -- I only have 996 instances out of 3.2 million pictures.

Microsoft Living Images There are a couple of different systems that permit capturing video with pictures. As far as I can tell, the first instance was Microsoft's "Living Image" system. This functionality was initially available on Microsoft's short-lived Lumia mobile devices (2014-2017) and is still available on Windows 10. Here's a sample that I made on my own system. (Yes, I have one Windows 10 system for testing.) You can click on the image to view it at FotoForensics.

Note: Because of the additional resources needed, I normally do not have thumbnail extraction or video support enabled for the public FotoForensics site. For this picture only, I have explicitly enabled this support.

On anything other than Microsoft Photo, it just looks like a JPEG. (I took my colorful egg picture and drew a blue swirl on it.) However, appended to the end of the JPEG is a zip file. If you rename the file from .jpg to .zip, then you can unpack it. The zip contains 3 files:
  1. "formats/living/living.jpg": This is a JPEG image. The filename is "living.jpg" because it is a Microsoft "Living Image".

  2. "formats/living/living.mp4": This MP4 video shows the swirl being drawn.

  3. "[Content_Types].xml": There's a small XML/HTML file that says there are JPEG and MP4 contents.
Even though there is a zip wrapper to bundle these three files together, the JPEG and MP4 are not compressed by the zip file. (Then again, it's hard to further compress video.)

Here's what the "living.jpg" image looks like:

And here's the video:
Your browser does not support this video format (video/mp4).

Microsoft's "Living Image" format permits people to annotate or alter images. The main JPEG image shows the final results, the embedded "living.jpg" shows the "before" contents, and the video shows the animated transition from start to finish.

Apple Live Photos Not to be outdone, Apple introduced "Live Photo" in 2015. Basically, it captures a few seconds of video around the time the picture is taken. (Snapping one of these pictures can take up to 3 seconds.) This allows non-Apple users to see the photo, and Apple users can either see the picture or the animation. There's a couple of ways Apple users can apply these videos:
  • Selective frames: Did your friend just blink while you were snapping the picture? You can step through the frames and choose the perfect one rather than being stuck with a bad photo.

  • Blending: You can blend frames from the video with the main photo to create movement effects. There's even one blend method that removes all motion. Cars, birds, and sudden photobombs are all digitally removed.

  • Sharpening: Given the frames from the video, the main photo can be digitally sharpened. (This approach is better than auto-sharpening based only on the source picture because the motion's impact can be included in the calculations.)
Keep in mind, when Apple's management and marketeers say, "These are still-photos, they are not videos", they are lying to you. There is one photo and one MP4 video. (You know it's a lie when you can point to the associated video file.) As far as I can tell, Apple stores them as two separate files. As two independent files, there can be a problem of file separation. For example, someone can send you the picture without the associated video. At FotoForensics, I often see the post-processed photo and not the associated video.

Google Motion Photos Of course, if Microsoft has a feature and Apple has a feature, then Google must also get that feature. In 2017, Google added Motion Photos to their Android devices. These are usually 1-3 second videos attached to the end of the JPEG. Of course, Android being Android, I'm seeing lots of inconsistencies:
  • Appended. The puppy picture that started me down this path was literally an MP4 appended to the end of the JPEG. No separation, no headers; just a video concatenated to a JPEG.

  • Appended with small header. I've mostly seen this with Motorola Android devices: there's a small non-JPEG header appended to the end of the image that is followed by the MP4 video. The proprietary "MakerNotes" metadata may reference the post-JPEG header and contents. (I say "may" because sometimes the metadata makes no reference to the attached video.)

  • Appended with large header. I have a few examples of Android devices appending a huge amount of non-JPEG information to the end of the image, followed by the MP4 video. Again, the proprietary "MakerNotes" metadata may reference the post-JPEG header and contents.

  • Duration: Most videos are either 1 second or 3 seconds long. The shortest has been 0.8 seconds and the longest so far was nearly 8 seconds of video.

  • Audio. Most of these videos have no audio. While I haven't reviewed all of these videos, at least one does include audio.

  • Encoding. The MP4 file format is a container. It can contain audio and video tracks. Each of these tracks can be encoded using a wide range of codecs. I've seen a half-dozen different MP4 video codecs, including one example that is not supported by older ffmpeg players. (Time to upgrade!)
I haven't seen any of these Android videos with the blending, sharpening, or annotating features found in Microsoft's Living Image or Apple's Live Photo systems. But as an open architecture, I'm sure it will be coming eventually.

As an aside, I'm seeing the Android community becoming very confused by these very similar product names: Living Images, Live Photos, Motion Photos... Google also has something called "Motion Stills". If you search Google for any combination of Living, Live, Motion with the synonyms Images, Photos, Pictures, then you'll likely find references to these appended video formats. (Uh, isn't "Motion Pictures" going to cause confusion and trademark issues?)

Interesting Behaviors With few exceptions, the videos are not very incriminating: outdoor shots, people, pets, etc. These types of uses are exactly what this feature is intended for.

However, there are also some odd "mistake" videos. (At least, I think they are mistakes.) One of the most common is the "long selfie". That is, the final JPEG looks like an impromptu selfie, but the video shows them holding the pose, eyes looking briefly at the computer or mirror to check the pose, and more staged posing for the entire 3 second video. It's like a human statue with moving eyes in a selfie stance. The final picture may appear impromptu, but the video shows that it is clearly planned.

There are lots of videos from people who don't seem to realize that they need to hold the camera for 3 seconds. The videos usually show 1-2 seconds of stable contents and then a massive blur as the user lowered the camera before the full 3 seconds were up. A few of the videos end with some really good "shoe" shots. Personally, I think this suggests a usability issue; 3 seconds is a long time to hold the camera for a simple photo.

Only a few pictures are interesting with regards to fraud detection. One picture shows an ID, but the motion in the video reveals that it's a photo of a computer monitor displaying the ID and not a direct photo of the ID. If this is supposed to show possession for authentication, then the video shows the fraud. Another shot features a new sofa, but the video shows that it is 2-dimensional photo -- probably from a magazine. (I have other ways to detect a picture of a picture, but having the video is a good alternative confirmation.) There's also a picture that was digitally altered and the video shows the artist inserting the alteration.

Increasing in Volume I'm seeing a slow and steady increase in these video-appended JPEG files. I don't think these video previews will become mainstream in the near future, but I do expect to encounter them more often. I'm currently beta-testing some code that helps render, display, and analyze these videos. Meanwhile, I'm ramping up for the inevitable "deep fakes". It's one thing to have photos with high-quality alterations, but it's a new level of complexity if there is an attached deep-fake video that supports the altered picture.
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
The number of reasons for wanting to create fraudulent photos dwarfs the number of different tools and techniques. For example:
  • Fame. A powerful picture may generate notoriety. And if you can't take a powerful picture, then you can always use PhotoShop to make an average picture look better.

  • Reward. Some photo contests have had to disqualify entries due to digital alterations. But the risk might be worth taking: many photo contests have little or no penalty for cheating, and they rarely announce who was disqualified prior to announcing winners.

  • Ego. Pictures can be altered to make someone look better, or make someone else look worse.

  • "Helping". Often, investigators receive pictures that have been annotated or enhanced to highlight a specific aspect. The user might think they are being helpful by pointing out a specific element in the photo. But in reality, their alterations may make it more difficult to distinguish real photos from deceptive alterations.

  • Education. Let's say you want to learn how to detect fake pictures. Where do you start? If you have a collection of fake photos, then you can use them for training. But if you don't have a collection, then you can always build some examples. Even with access to 3.1 million pictures at FotoForensics, I often have trouble finding that one picture that perfectly demonstrates a technique. Sometimes it's easier to make it than to search for it. (While my training classes include pictures that I created, this blog entry only features pictures that other people uploaded.)

  • Sow doubt. I've seen people intentionally introduce altered photos into sets of unaltered pictures, just so they can later point out the fakes. This way, they can cast doubt on the unaltered pictures. (This is a common approach used by Holocaust deniers.)

  • Propoganda. An altered picture at just the right time can direct the audience's actions and influence their emotions.

  • Scam. Scammers often provide pictures as "proof" of ownership, existence, damage, authority, etc. Scams, by themselves, have a wide range of motivations. The scam could be for direct gain, indirect gain, or intended to cause confusion or misdirection.

  • Demonstration. An altered picture could be a proof-of-concept. This could be for a presentation, research project, threat assessment, or justifying some kind of action. Sometimes the demonstration is used as an example for an alternate revenue stream -- like selling specialized software that helps alter pictures.
In part 1 of this series, I showed some examples of common forgeries, but a single picture really doesn't provide enough information to identify the motivation. In part 2 and part 3, I identified a large set of altered photos from AvtoVAZ, a Russian car manufacturer. In part 4, I detailed how the AvtoVAZ group has been continually improving their alteration method. In this final part of the series, I'm going to dive into why the AvtoVAZ group appears to be using their tool and photo alteration technique.

The Direct Use A single altered picture usually lacks enough information to identify the underlying motivation. In part 4, I mentioned that these thousands of altered car photos could be used for internal training -- and this could be the case if I never saw these pictures uploaded by anyone other than the group directly associated with AvtoVAZ. However, FotoForensics occasionally received picture with this same unique alteration method coming from other sources.

For example:

Between 2013 and 2014, someone at VSK Insurance Joint Stock Company (in/near Moscow) uploaded nearly two dozen pictures to FotoForensics for analysis. Some of the pictures, such as the example above, explicitly use the unique method developed by the AvtoVAZ group. However, other pictures in the series did not feature AvtoVAZ vehicles or this specific alteration approach. While the AvtoVAZ group almost exclusively uploaded photos of AvtoVAZ vehicles, this person at VSK uploaded all sorts of vehicles. (Of the 23 pictures in this set, 3 used the AvtoVAZ group's alteration method.)

According to Bloomberg, VSK is a Russian insurance company. This appears to be an insurance investigator using FotoForensics to evaluate pictures. The variety of pictures appear to be associated with a variety of claims. (The 23 pictures were not from the same claim and were uploaded over the course of a year.) Some of the photos appear to be fraudulent, and a few of the fraud photos used the AvtoVAZ group's method. This type of usage pattern is typical for real insurance investigators; they don't evaluate one type of suspicious claim or one class of fraudulent activity, they evaluate lots of types of different types of potential fraud. Most likely, there was something odd about a claim and it happened to include photos using the AvtoVAZ group's technique; the pictures probably didn't set off the initial concerns.

Since these pictures from the AvtoVAZ group are outside of the group's internal use, this is likely insurance fraud. If the Russian insurance company didn't notice the forgery, then they might have paid out a claim for a fake photo. However, this could be something else, like some kind of training material or information disclosure to VSK. (Without information from VSK, I'll never know. And they didn't respond to my 2014 email inquiry.)

Other Sightings One picture is a single data point. You usually can't draw a conclusion from a single data point. But with the AvtoVAZ group, I have thousands of pictures spanning years. This introduces an entirely new evaluation dimension: volume analysis.

The following graph shows all of the uploads that I have identified as coming from the AvtoVAZ group.

Each vertical line identifies the number of uploads per day. Initially, this group uploaded lots of pictures almost every day. During this time, they were making changes to their tools and revising their techniques. In the graph, you can see days with 1-2 uploads followed by a high-volume day (e.g., 2013-09-02 had 1 upload and 2013-09-03 had 2 uploads, then 2013-09-04 had 10 uploads and the next two days had 16 uploads each). These low-to-high oscillations correspond with changes in their methods and subsequent high-volume testing.

This testing went on for about a year. Then they settled down. There would be long periods of no activity followed by short bursts of activity. Occasionally there would be some development, but this was mostly short bursts of activity.

Now for the fun part: the AvtoVAZ group wasn't the only group using these tools and techniques. Each burst of activity was followed by a new group using the same tools and techniques. Here's a graph showing everyone who uses the AvtoVAZ method:

This graph tells an interesting story. Each burst by the AvtoVAZ group was followed by a group in a different country. For example, between December 4 and December 11, 2013, the AvtoVAZ group only had two days of activity: 6 uploads on 2013-12-06, and 2 on 2013-12-10. This was followed by two uploads from Vietnam on 2019-12-10:

The Vietnam photos are not car pictures; these are shipping containers. But they have the same embedded photo timestamp, same low ELA result, and same altered preview image (embedded picture in the JPEG file) that is seen with the AvtoVAZ group. These two pictures from Vietnam use the AvtoVAZ group's tools and techniques.

On 2013-12-12 (2 days after the Vietnam pictures), the AvtoVAZ group uploaded 2 more pictures. This was followed by 2 pictures from Chile. The next day, the AvtoVAZ group uploaded 2 more pictures, followed by 1 picture from Pakistan. And the cycle repeats. A pause in uploads, a burst of pictures from AvtoVAZ, and then pictures from Italy (Feb 2014). Pause, burst, then Poland (Oct 2014). ... Pause, burst, then Venezuela (May 2018). Pause, burst, then China and a Chinese affiliate in the United States (Oct 2018). Here are some example pictures from these other groups:

Italy (Feb 2014)
Venezuela (May 2018)
China via an affiliate hosted in the United States (Oct 2018)

This pattern is very specific. Most companies have a "code freeze" period before releases or demonstrations. During this freeze, there are few or no uploads from the AvtoVAZ group. Then there is a short burst of activity, while they demonstrate their capabilities to some other group. Then the other group tries a few example pictures. This appears to be a sales demonstration.

In the volume graph, I have only segmented the data by country. However, there are at least 3 different groups in China and 2 different groups in Russia (beyond the AvtoVAZ group) that received demonstrations and are regularly using the software. One of the Chinese groups is focused on trucks. A different Chinese group is focused on damaged mass-produced products. One Russian group is focused on buildings and boxes, while a different Russian group is forging pictures of damaged tires.

Two pictures from one Chinese group. This group is focused on trucks. You can see the embedded timestamp that varies in color, size, position, and formatting. (Don't trust the metadata because the metadata was attached to the pictures using the AvtoVAZ method.)
Two pictures from a different Chinese group. This group is focused on mass-produced products.
Two pictures from a Russian group (not the AvtoVAZ group). They are focused on buildings and boxes.
A second Russian group. This group uploaded hundreds of photos claiming to show tire damage. In these two pictures, it's the same tire but the digitally added damaged rectangle has been shifted.

This group used the latest/greatest version of the AvtoVAZ group's software. The embedded timestamp is optional and was not enabled for these pictures.

In some cases, like Chile, China, and Russia, the groups began to regularly use FotoForensics as part of their ongoing workflow. (And for clarity: Yes, the Russian AvtoVAZ group appears to sell to other Russians who use the tools to defraud other Russians.)

Identifying Buyers One of the nice things about network addresses is that they are registered to owners. I can tell when corporations and government organizations use the public FotoForensics site because the network addresses identify the users.

I normally don't look at who is doing the uploads to the public site unless there is a high volume or it triggers one of my upload alerts. For example, "Who is uploading all of these photos of broken furniture? Oh, someone at XYZ Insurance." And "That's a lot of cars. Oh, ABC Insurance." This is also how I can distinguish the Russian VSK insurance uploads from the AvtoVAZ group.

With each of these other groups that use the AvtoVAZ method, none come from domains that are registered to insurance or corporate addresses. This means that the AvtoVAZ group is not selling to registered businesses. Instead, they appear to be selling to other organized groups that are focused on a wide range of likely fraudulent and criminal activity. If photo alteration tools are weapons to be used for fraud and scams, then the AvtoVAZ group are weapon dealers.

There's one more thing about the network addresses... Back in part 3, I mentioned that some of the network addresses were coming directly from AvtoVAZ's corporate network. Those sightings started on 2015-02-25 and are very closely tied to the bursts of activity that are associated with demonstrating the software to third-party groups. The demonstration comes from AvtoVAZ's corporate network, and then other groups around the world begin using the software. It appears that these demonstrations are a corporate-sponsored activity.

Total Recall Companies, organizations, and individuals usually don't start making something without a reason. Since I know when the AvtoVAZ group began their testing (Aug 2013), I started looking for some kind of precipitate that inspired the tool's development. I mean, why would they want to build this type of forgery system? So far, I can only find one possible catalyst: a vehicle recall.

Back in May 2013, AvtoVAZ announced a recall impacting 30,000 Lada cars. As I understand it, a recall must be reported back to the associated agency. (In the United States, car manufacturers must coordinate recall actions with the National Highway Traffic Safety Administration. I assume that Russia has something similar.) I suspect that it is easier and cheaper to create a tool that can forge VIN plates as "proof of work done" than to perform the actual recall.

Branching Out Of course, few fraud groups stick with one type of fraud. On 2016-05-19, someone in Shchekin, Russia (south of Moscow) uploaded a fake Russian passport. Now, FotoForensics receives lots of fake Russian IDs, so this by itself isn't noteworthy. However, a week later (2016-05-26), the same type of fake passport was uploaded by someone in Tol’yatti who has ties to the AvtoVAZ group. And a few days after that (2016-05-29), the same type of fake passport was uploaded by someone in Samara (the city adjacent to Tol’yatti). Here are the three passport photos, in order:

At first glance, these might appear to be different passports. The passport numbers and photos are different. However, the desk, lighting, camera angle, etc. are all the same. This is a template and someone entered in the passport information; each passport was digitally added to the template. Moreover, this fraud technique is associated with the AvtoVAZ group. It appears that, in 2016, the AvtoVAZ group began branching out from cars to fake passports.

Heads-Up A single picture can be used to identify alterations and handling. It can even identify tool marks, techniques, and individual artifacts. However, a set of pictures can identify additional information.
  • In part 1 of this series, I showed some examples of common forgeries. With a single picture, it is possible to identify inconsistencies that denote alterations, techniques, and methods.

  • Part 2 demonstrated how a series of pictures provides more information. A group of pictures permits identifying techniques, scope, and variations.

  • Part 3 traced the alterations to a specific group with strong ties to a specific car manufacturer.

  • Part 4 identified changes to the group's tools and techniques. These alterations happened slowly over time, and were traced back to a specific code base from a specific developer.

  • And now, in part 5, I have correlated the volume of uploads from the AvtoVAZ group with uploads from other groups. This shows a strong interaction between these groups. The AvtoVAZ groups, using network addresses from within the AvtoVAZ corporate network, appears to give demonstrations to other groups right before those groups begin using the software. This helps identify a possible motive: they appear to be giving demonstrations before distributing the software. These are likely sales to other organized crime groups.
This deep dive should serve as a warning to insurance companies and other examiners about an increase in a very sophisticated fraud method that is being distributed around the world. If you haven't seen fraudulent photos using this method yet, then know that it's coming.

More generally, this deep dive illustrates how photo forensics is more than simply looking at pictures and determining "real" or "fake". A deep dive also requires tracking trends and patterns, identifying suspects and motivations, and determining the impact and future directions.

This month-long deep dive into the AvtoVAZ group's photos is my first time devoting an entire month to a specific topic. I certainly hope you enjoyed this series as much as I enjoyed writing it.
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
The quality of any project stems from a combination of tools and techniques. The tools provide the raw capabilities, while the techniques relate to how the tools are used. Habitual behavior, training, and skill level all impact the techniques. A paintbrush is just a tool, but in the hands of a trained artist, it can create a masterpiece. (In contrast, the same paintbrush in my hands will make a mess.) A piano is a tool but good music requires a good technique.

With digital photo fraud, the tools determine how an image can be altered. The technique determines the quality of the final results.

In part 1 of this series, I described some of the common techniques used for common forgeries, and presented examples that used advanced forgery techniques. In part 2, I identified an advanced forgery method that uses an uncommon tool and very distinct technique. And in part 3, I associated the technique with a specific group of people who have direct ties to AvtoVAZ, a Russian car manufacturer. In today's part 4, I'll share how the tools and the techniques used by the AvtoVAZ group has changed since I first encountered them in 2013; it has been slowly evolving and improving.

Tracking Techniques For the first four months, this group was very active, uploading hundreds of pictures. During this time, they adjusted their techniques. For example:
  • Many of their initial pictures had fake metadata attached to the files. (The metadata was copied from some other files.) They didn't start using backdating timestamps (without replacing all of the metadata) until a month into their development cycle.

  • Initially, they replaced entire images and attached them to metadata, but they rarely did selective editing. However, every now and then a picture would appear to have altered content, such as a change to the VIN.

  • Their tool permits inserting the timestamp into the pictures, and the user appears to have the option to specify the timestamp's font, format, color, and position. It took them nearly a year to closely replicate (close, but not exact) the position, font, color, and size used by Canon and other camera manufacturers. (I think they have a few Canon cameras and Android smartphones for testing, but just guessed at other camera brands.)

  • When this group first started, most of their pictures were small -- often around 640x480 pixels. And they were all low quality pictures. It took them about a month to start using larger pictures more often. Then they didn't consistently use higher quality images until about 4 months later (January 2014).
One reason for using low quality images might have been to deter detection from error level analysis (ELA). When they moved to larger and higher quality pictures, they became better at hiding their alterations. For example:

If you only depend on ELA, then you might not notice the edits to these pictures.
  • The first image shows some really bad cut-and-paste numbers. The last "4" has very clear pasted-in rectangles. And if you look closer, you might notice that the last 5 digits ("99764") were all pasted in and the nines are identical. But just using ELA, you would only notice a low quality picture. (This picture was one of the first that they uploaded. This is an image before applying the copied metadata and embedded timestamp.)

  • The second picture has replicated zeros (XTA11173080000014). The compression rate was normalized to hide the alterations from ELA.

  • The third picture also has replicated digits, but you'll never be able to tell that with ELA.
These pictures demonstrate an improving technique. They have the ability to alter the VIN numbers while avoiding the related compression artifacts. After altering the image, they attach metadata, add in a timestamp, and replace thumbnail images. In a few pictures, the edits are visually obvious. In some, the numbers appear suspect -- it's not conclusive, but it's enough to start digging deeper. And in most, the numbers appear unaltered. (Are they not altering every picture, or are they good enough to not be noticed?)

As an aside: The VIN format can include a checksum digit. This checksum is required for North America automobiles and used by some manufactures outside of North America. However, AvtoVAZ does not appear to use the checksum, so that cannot be used to identify these VIN alterations.

Each change to the technique seemed to take a few weeks to spread across group members. Before adopting changes to their techniques, this group would upload tests. For example:

Visually, both of these pictures appear to show the same content. However, one was resaved at a lower quality. This appears to be part of a test to reduce the ELA difference between the timestamp and the rest of the image.

Evolving Tools People use tools and tools act consistently. Different users may apply different options, but the alteration methods provided by the tool should be consistent. If the tool's capabilities change, then the tool changed.

In this case, the tool appears to have undergone updates. Tools under development usually demonstrate incremental changes -- and that is what I saw here. For example, the initial version (August 2013) could replace the main image and thumbnail images. However, it only inserted the timestamp into the main image.

The second major revision (September 2013 to mid-2014) added the visible timestamp to thumbnail images.

The third major change (mid-2014 to present) makes adding the visible timestamp to the photo optional. It also uses a different method for merging an image with existing metadata. This is also when the group changed their techniques to include cropping and content alteration more often. For example:

Although these two pictures have different dates, they are both the same photo. (One is a cropped version of the other.) Both pictures were uploaded to FotoForensics on 2014-04-18.

This picture came directly from AvtoVAZ. It was uploaded to FotoForensics on 2015-05-27. The metadata claims to be from a Canon camera and indicates scaling (the image size does not match the dimensions in the metadata), but that's common for emailed photos. However, the JPEG compression level (ELA) shows an obvious alteration: someone changed the VIN number.

This picture came from a network address in Tol’yatti, Russia (same city as AvtoVAZ's headquarters). Although the VIN number has been altered, they modified their technique to deter ELA detection. (Since ELA and basic metadata were the only public analyzers, they could only test against those.)

This final picture lacks the embedded timestamp, contains camera-original metadata, and looks perfectly fine under ELA. And yet, it used the same tool. This picture has been scaled and cropped to fit the dimensions, and the paint on the door was digitally altered. (You'll need something more than ELA to spot the edits. I suspect that a dent or scratch was removed.)

Tracing Tools Tools don't appear out of nowhere. Either the AvtoVAZ group created this alteration program or they acquired it from somewhere else. On the chance that they initially acquired the tool from elsewhere, I went looking for any kind of photo alteration software that could generate pictures with these specific attributes:
  1. The tool needed to add a timestamp to the visible portion of the image.

  2. The user should be able to optionally select the timestamp's font, size, color, position, and language.

  3. In these pictures, the timestamp was added but the rest of the image was unaltered. Often, there was no indication of a resave to the non-timestamp portion of the photo. So the tool needed to add a timestamp to the picture without re-compressing the rest of the image. (This is a very unusual and critical feature.)
As it turns out, I found one (and only one) program that does this: Gena PhotoStamper by Kozasoft. (When I first found the software, the domain was online. It's offline today, but you can still see the web site at the Internet Archive.)

According to their web page (and their bold emphasis), the features included:
Key Features:
  • Lossless stamp. PhotoStamper doesn't modify any single pixel outside datestamp area
  • 100% Reversible. You can remove the stamp from the image to roll back to original image pixels
  • Shell Extension. You can stamp your pictures right from the Explorer window
  • Multi-language support. Various date/time formats for different languages
  • Stamp appearance is fully customizable (position, size, font, color, etc.)
  • Background area to make stamp reading easier
  • EXIF support. PhotoStamper reads image date and time from EXIF data
The 'features' web page even includes an ELA-style image showing the impact from their alterations.

The tool only alters the visible timestamp.

PhotoStamper I did a little diving into how PhotoStamper works. It uses a very creative technique that includes copying the lossy encoding method and replacing parts of the lossless encoding.

The JPEG file format uses two types of compression. The first is a grid-based lossy compression format. Every time the JPEG is encoded, the quality of each grid degrades. (Well, eventually it will stop degrading, but that takes a lot of re-saves.) The lossy-encoded grids are then packed using a lossless compression technique. (While the JPEG Standard claims it uses "Discreet Huffman Tables" or DHT, it is really more like a semi-optimized run-length encoding and not an optimized Huffman encoding.) Each of these lossless encoding blocks are called a Minimum Coded Unit (MCU). There are 3 MCUs per lossy JPEG grid (one for luminance and two for chrominance).

PhotoStamper does two things. First, it encodes the timestamp using the same lossy settings as the rest of the picture. Second, it identifies which lossless MCU components should contain the timestamp and merges the new MCU elements into the previous encoded stream. This is a very clever trick. Using this approach, they only alter the JPEG grids that include the timestamp.

Undo! In the description, Gena PhotoStamper says it has a lossless undo function. This is an interesting addition. I looked over the AvtoVAZ group's pictures and immediately found examples of this:

Buried inside each of these images from the AvtoVAZ group is an extra JPEG APP segment called "GenaPhotoStamperd". The segment is completely ignored when rendering the image. Inside this segment is a rectangular JPEG thumbnail image. To undo the stamped image (recreating the original image), the tool just needs to merge the lossless MCU elements from this thumbnail image back into the main image. For example, here's the "GenaPhotoStamperd" image from the first image:

This small, rectangular image is all that is needed to recreate the original picture without the timestamp.

Most of the pictures uploaded by the AvtoVAZ group do not have this extra GenaPhotoStamperd data. Instead, they have all of the remaining artifacts along with incremental improvements. This means they have access to the source code in order to make modifications to the tool. The development changes can be summarized as:
  1. Gena PhotoStamper was created.

  2. The AvtoVAZ group altered the program to remove the GenaPhotoStamperd data, replaced the thumbnail image, and implemented an option to copy metadata from another photo.

  3. Next, they altered it to include the timestamp in the thumbnail image.

  4. Finally, they altered it to make the timestamp optional.
These are not the only changes that have impacted the output format. However, there have not been any major code changes since 2014. The later changes (2015 through late 2017) resulted in pretty minor tweaks; these could reflect a change to their technique and not change to the tool. (I may not always be able to distinguish a 2016 picture from their 2014 solution, but I can distinguish 2013 from 2014.)

This timeline shows active development and incremental steps. It also identifies a common source code tree based on PhotoStamper.

PhotoStamper Software by Kozasoft Gena PhotoStamper is a very rare tool. Since it generates a very distinct GenaPhotoStamperd undo image, I searched the 3.1 million pictures at FotoForensics for instances of this artifact. I found a total of 39. Two came from a guy in Egypt (2012-07-13, before the AvtoVAZ group started using FotoForensics; a kid and a selfie), one came from France, one from Pakistan, one from the United States ... and the other 34 came from the AvtoVAZ group.

Before it went offline, Gena PhotoStamper was provided by Kozasoft. (It appears to have been a one-person company.) According to the WHOIS record, this domain was registered to Gena Kozlov from Saint Petersburg, Russia. The domain name expired in 2018, but I grabbed a copy of the WHOIS record back in 2013:
Domain Name: kozasoft.com

Registry Domain ID: 96920192_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.1and1.com
Updated Date: 2007-08-17T19:56:40Z
Creation Date: 2003-04-17T12:16:11Z

Registry Registrant ID:
Registrant Name: Gena Kozlov
Registrant Organization:
Registrant Street: Glavpochtamp Do Vostrebovaniya
Registrant City: Saint-Petersburg
Registrant State/Province:
Registrant Postal Code: 190000
Registrant Country: RU
Registrant Phone: +7.8129328587
Registrant Email: gena_kozlov@yahoo.com
(The street name translates as "the main post office, on demand")

Gena Kozlov started Kozasoft in 2003 with a program called "Gallery". It provided a way to view all of your pictures in a single page. (Back in 2003, the photo viewers for Apple and Microsoft were seriously lacking. This was probably a good idea at the time.)

In 2007, he introduced PhotoStamper so that users could see the timestamps in the pictures within the photo gallery. Given the purpose of the gallery, PhotoStamper was probably a good addition. The last version of PhotoStamper (v.2.7.1) was released in July 2011. Then things took an odd turn.

I could find public postings and social media appearances of Gena Kozlov up through mid-2012. Then he just vanished. (Today, I can find other people with the same name, but this specific guy seems to have vanished.) On 13-August-2013, the AvtoVAZ group began bulk-uploading to FotoForensics, using both PhotoStamper and a modified version of PhotoStamper.

I want to emphasize: I cannot find anything that says Gena Kozlov moved to Tol’yatti or started to work for AvtoVAZ in 2012-2013 or is involved in anything related to how his tool is used by the AvtoVAZ group. All I can tell from the photo artifacts is that someone with direct ties to AvtoVAZ has Kozlov's software and is making code changes.

Fast Forward to Today At this point, I am confident that a group associated with AvtoVAZ is making intentional forgeries with a modified version of Gena PhotoStamper. However, even if they are developing fraudulent photos, this is not necessarily a problem. For example, many companies make intentional forgeries for internal testing. Some companies even use FotoForensics to help in their analysis and training. In my next blog entry, I'll cover how the AvtoVAZ group uses these photos, including what they are doing today.
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Some methods to deceptively alter pictures are better than others. In part 1, I showed some of the typical approaches for altering photos and metadata. (These methods are usually good enough if nobody looks too closely.) In part 2, I detailed a set of pictures that use a high quality forgery approach. These pictures are all associated with vehicles, and specifically the Russian automobile manufacturer called AvtoVAZ (or as they say in Russian, АвтоВАЗ). The next question becomes: who is behind this effort?

There are a couple of clues about the people behind these photos, including GPS data, photo content, and network addresses.

Clue #1: Content As I mentioned in part 2, all of the photos show AvtoVAZ vehicles and AvtoVAZ parts. Some of the pictures were even taken inside their buildings (factories, repair shops, and dealerships with AvtoVAZ signs on the walls) or show employees in work uniforms.

(In this blog entry, I'm not including the photos of documents with signatures or pictures with recognizable people, including a manager posing in his office with a bookshelf full of AvtoVAZ manuals and binders.)

These pictures show signs and labels with words like "Lada". Car manufacturers typically have a wide range of brands. Just as the Ford Motor Company creates cars with names like Bronco and Fusion, AvtoVAZ's cars include the Lada and Vesta.

These pictures were not publicly available on the Internet. (Google Image Search, TinEye, Yandex, and other search engines couldn't find them.) The easiest way to get these kinds of photos from AvtoVAZ is to have a working relationship with AvtoVAZ, or to directly work for AvtoVAZ.

Clue #2: Types of Alterations Besides adding in the timestamp to the photos, this group sometimes copies or alters metadata. There are a couple picture sets that each include the exact same, extremely precise GPS data. Here is one set that share the same GPS information:

Each of these pictures have the exact same GPS coordinates (53.532881 N, 49.430775 E), same metadata layout, and the same file structure. The only observed differences:
  • The EXIF date and time differ. These are plain text fields and relatively easy to alter. If we assume that the dates are unaltered, then we have pictures that span months from a camera that has been moving around the location. However, if it was moving, then they pictures should not have the exact same GPS coordinates. (And if we assuming they are different cameras, then the GPS really should be different.) This means that the metadata was copied and the dates were altered.

  • The camera ISO values, which denote the photo sensitivity, differ. While this group can change this value, the changes do not visually match the photos.

  • In some of the pictures, the orientation (a single byte value) was changed from horizontal to vertical. They also swapped the width and height dimensions to match.
There are no other differences among these pictures. The forgers have a tool that appears to take a photo as a template, permits altering parts of the EXIF data (date/time, ISO, and orientation), and inserts the timestamp into the picture. In other pictures, I have observed timestamps in MakeNotes that also match the EXIF data, so they are likely changing those, too. While I have seen them modify these few metadata fields, I have not seen anything to suggest that they are altering any of the other types of metadata. They either use the existing metadata in the template file or they copy the entire metadata from another file; they do not appear to be doing any deep edits to the other metadata fields.

Besides identifying the type of alterations that this group makes, this is also an important clue about who they are. External attackers (e.g., organized crime gangs) rarely attack just one vendor for years. Instead, they branch out and attack lots of vendors. With insurance fraud, the gangs typically go after lots of insurance companies. For drug theft, they will visit lots of hospitals and pharmacies. Gamblers with a trick will hit lots of casinos. In contrast, if all you have is data from your company or your group, then you will only use the data you have. These are your internal attackers.

In this case, we have a group that has spent literally years only developing a fraud method that only uses photos from AvtoVAZ and repeatedly performs a limited number of variations. The people developing this forgery method are likely internal to AvtoVAZ.

Clue #3: GPS As I also previously mentioned, the GPS format is difficult to forge. (It's not just plain text.) Moreover, photos with GPS data are uncommon. That's even the case with these AvtoVAZ pictures -- very few contain GPS data. (But "few" still means that some exist.) In this case, the available GPS data does not appear to be intentionally altered. Here are a few examples: (Click on any picture to see it at FotoForensics.)

The GPS coordinates in this picture's metadata are 58.58626172 N, 49.61483764 E. According to Google Maps, this is a location just outside the back of a large car dealership in Kirkov, Russia. Google's labels on some of the adjacent buildings include "Lada" and "Vesta".

GPS does not work well inside buildings; it really needs a direct view of the sky. A GPS location taken from inside a building usually records either a location right outside the nearest window, or the last sample location before the user entered the building (an exterior door). That matches the coordinates seen here.
The GPS coordinates (53.53304000 N, 49.43476000 E) identifies an AvtoVAZ dealership and repair shop in Tol’yatti, Russia.

This GPS data also includes a dilution of precision (DoP) record. When you turn on your cellphone's map, you often see a large circle. The GPS coordinates identify the center of the circle, but your actual location could be anywhere in the circle. The circle is the DoP -- the accuracy range. In this case, the DoP is "3" (pretty inaccurate), meaning that the smartphone device couldn't get a strong enough signal or long enough view of the sky. In this case, the coordinates denote the street outside the building, but the accuracy range includes the building.
The GPS (53.53312944 N, 49.43427917 E) identifies another location slightly further down the street from the same AvtoVAZ dealership in Tol’yatti, Russia. Again, the photo includes a DoP with a value of "3" and the dealership is within the accuracy range.
The GPS coordnates in this picture (53.53209389 N, 49.43443472 E) identifies a third location around same AvtoVAZ dealership in Tol’yatti, Russia. This time, it's outside the door of a building located at the back of the lot.

Even if the metadata was copied from some other source photo, they do not appear to be altering the GPS data. This means that they have access to source photos that are GPS tagged with locations inside AvtoVAZ-related facilities.

When it comes to the GPS coordinates, all of the locations are associated with AvtoVAZ or vehicle repair shops that are working on AvtoVAZ automobiles.

Clue #4: Network With these pictures, my FotoForensics server recorded the network address from every upload. The vast majority of them geolocate to in/near the city of Tol’yatti in Russia. For example, the network address geolocates to Tol’yatti and uploaded 964 pictures between 2013 and 2014.

Tol’yatti is an important location because AvtoVAZ is headquartered in Tol’yatti. The majority of photos from this group were uploaded from around AvtoVAZ HQ.

But it doesn't stop there.
  • is "vpn.vaz.ru". This is a VPN server for the AvtoVAZ domain (vaz.ru). It uploaded 55 pictures.

  • is "nat76.vaz.ru" and uploaded 26 pictures.

  • is "nat77.vaz.ru" and uploaded 98 pictures.

  • is "nat78.vaz.ru" and uploaded 85 pictures.
These did not come from "near" the AvtoVAZ network; these are from inside their corporate network. (My coworker just shouted, "The calls are coming from inside the house!")

This fraud method of altering pictures with different timestamps is either directly sponsored by AvtoVAZ, or it's a very targeted crime group that happens to have employees inside AvtoVAZ. From this point on, I'm going to call them the "AvtoVAZ group".

Clue #5: Iterations Seeing pictures from AvtoVAZ that were uploaded by AvtoVAZ's network does not mean that AvtoVAZ is actually altering the pictures or developing the fraud technique. For example, maybe they received the pictures from some other source and uploaded them for analysis. And maybe they were investigating fraud and not actually involved in fraud.

However, pictures associated with the AvtoVAZ group includes iterations. This shows the development and application of the forged photos. For example:

Both of these pictures show the same barcode photo. I don't mean two different photos of the same barcode; I mean, "one picture is a cropped version of the other picture." Both pictures have metadata that matches the same kind of camera: a SONY DSC-W810. And yet, the metadata dates differ by over a month. (Also, both embedded dates have the wrong font, color, position, and format for this camera. The dates are fake.) Both were uploaded to FotoForensics by nat77.vaz.ru (directly from AvtoVAZ) on 2016-11-02; they were uploaded 20 seconds apart. This shows that someone inside AvtoVAZ is actively working on these pictures.

However, there are other pictures of these barcodes. Below is one example:

This picture is one of three that feature a screenshot of a user logged into the internal AvtoVAZ network. (In this example, I have explicitly redacted the name, phone number, and email address of the logged in user. FotoForensics contains the original, unredacted screenshot.) The set of unredacted screenshots shows:
  • He's logged into a web form at "http://portal:7003/wicket/..." and viewing warranty defect pages (the screenshots are for ПССС Codes 39405 and 39412). The claims are from 05.03.2016 and 18.06.2016.

  • The screenshots are all from 20.10.2016 and were uploaded to FotoForensics on 2016-11-02. The screenshots were uploaded 20 minutes after the barcodes (without the screenshots). The screenshot uploads used a different browser and different internal AvtoVAZ network address. There were likely 2 different people within AvtoVAZ working on these images -- one with the raw pictures and one with the screenshots.

  • Two screenshots are for one warranty claim page that shows five attached pictures, and the other is for a claim with four pictures. Both claims include these barcode photos. Someone managed to insert these forgeries into warranty claims -- that's fraud.
I don't know if this is a fraud investigator within AvtoVAZ, or someone providing confirmation that the forgeries were accepted.

As another example, these next four pictures show the same VIN plates (not the same photos; 4 different photos of the same subject matter and under the same lighting conditions). These four pictures have very different dates.

All 4 pictures were uploaded to FotoForensics last month, on 2019-05-24. Yet, they all show different timestamps in the metadata and embedded in the pictures. According to the (fake) metadata, they were photographed on 4 different dates (years apart), come from 3 different cameras and were handled four different ways. This shows active development. (And I have plenty of other examples.)

Clue #6: Filenames The screenshots from within the AvtoVAZ network shows that they track claims by "ПССС" codes. (It translates to English as "PSSS", but I don't know what it stands for.) Each claim appears to have a different code.

While some of the network addresses directly track back to AvtoVAZ's network, many of the network addresses simple geolocate to the city of Tol’yatti. However, these uploads from outside of AvtoVAZ's network range still appear to use these "ПССС" codes. For example:

This picture was uploaded to FotoForensics on 2013-08-13. It came from (the most active uploader) and had the filename "ПССС 35018 АГО 6553(Рис 2)дата большими буквами.png". The name translates as "PSSS 35018 AGO 6553 (Figure 2) date in capital letters.png". The descriptive filename ("date in capital letters") matches the date that was added to the picture. This suggests that the user at this network address is working with AvtoVAZ and is testing different text insertion formats.

The user at uploaded this picture to FotoForensics four times between 2013-08-13 and 2013-08-21. The filenames were:
  • 1376387907262_P1090987.JPG
  • 1376800510905_P1090987.JPG
  • 1377082107334_P1090987.JPG
  • ПССС 35018 АГО 6750.JPG
The first 3 filenames include a timestamp and a camera filename format. The timestamp is in microseconds. (E.g., 1376387907262 decodes as 2013-08-13 09:58:27.262.) The filename format (P1090987.JPG) is consistent with a Kodak, Olympus, or Panasonic camera. However, the metadata claims that the picture is from a Canon (wrong filename format). The final uploaded filename matches an AvtoVAZ claim number.

The user at uploaded 964 pictures, of which 39 had AvtoVAZ claim numbers in the filenames. I have been able to validate some of the claim numbers (because they uploaded screenshots from their internal AvtoVAZ desktops). In order to have an accurate claim number, the user would need to have a direct relationship with AvtoVAZ.

The other filename format (with the initial timestamp before the camera's filename) was explicitly seen coming from addresses inside AvtoVAZ's network. That format is also widely used by other network addresses around Tol’yatti. Since this format is very uncommon, it appears to be a direct correlation to the internal AvtoVAZ network. Even though these network addresses may not trace to AvtoVAZ's internal network, their pictures appear to have originated from within AvtoVAZ.

Clue #7: User-Agent Strings The AvtoVAZ network addresses are outbound proxies; they relay traffic for a large number of people. If everyone uses the same network address, then how do you tell them apart?

One way uses the type of browser. Different user-agent strings denote likely different people. The pictures that came directly from AvtoVAZ indicate 5 different user-agent strings, but only 3 are used often and one is very often. Moreover, I don't see any evidence of people testing with different browsers. (There is no rapid switching between browsers.) There are likely a minimum of 3 different people using these pictures from inside AvtoVAZ. That's not a big group.

Outside of AvtoVAZ is a different story. Dozens of different user-agent strings, but only a handful are extremely frequent. I think there are 2-3 main developers, 6-12 total developers (many only last months or a few years), and dozens of testers. As scope goes, this is not a one-person operation.

Answering 'Who' So who is behind this operation? It appears to be an organized group primarily based in and around the AvtoVAZ headquarters in Tol’yatti, Russia. At least part of their group is operating from inside AvtoVAZ's network. While I cannot rule out some of the AvtoVAZ employees being fraud investigators, some people with internal access appear to be involved in the testing and development of this fraud method. As a group, they are actively developing a tool that permits altering images, digitally adding timestamps, editing metadata, and splicing in metadata..
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
The vast majority of uploads at FotoForensics are mundane and innocuous: selfies, food, memes, celebrities, anything seen at Reddit, etc. If someone somewhere thinks a picture looks fake or incredible, then someone will likely upload it for evaluation.

In my last blog entry (Fraud and Deception: Part 1), I mentioned some types of fraudulent photos. I usually don't go looking for explicit fraud. Instead, these deceptive pictures just rise to the surface because they look so different from the typical content.

Cars! Beginning in August 2013, FotoForensics began receiving pictures of automobile VIN plates. A typical VIN plate contains the vehicle identification number (a series of letters and numbers) as well as the manufacturer's name and other information. Interspersed within the pictures of VIN plates were photos of vehicle parts.

If it were just one picture, then I probably wouldn't have noticed. But it wasn't just one picture. On the first day, they uploaded 27 photos.

(Click on any picture to view it at FotoForensics.)

Over the remainder of the month, they uploaded 242 pictures. By the end of 2013 (four and a half months later), they had uploaded 833 pictures.

This bulk uploading triggered a ton of alerts. At the time, I was also developing the trend detector; the widely-uploaded uncommon theme generated alerts. Between the bulk uploading and trend alerts, they definitely caught my attention, so I took a closer look. (And I've been passively watching them ever since.)

Trends and Patterns The initial cluster of pictures all had the same very distinct attributes:
  • Most pictures did not have selective editing. A few had added text, but most did not. At first glance, they did not appear to be altering the vehicle parts or VIN numbers.

  • The metadata identified a wide range of cameras and software applications. This is odd because usually people focus on one type of device or one application. It is unusual to see a wide range from a single group of uploads. (Seriously: how many cameras do you own? You probably have a smartphone or two, a web cam, and maybe a point-and-shoot. Regular people might have a few cameras that they use regularly, and one that they usually rely on. Professional photographers may have a dozen. But this group? Hundreds.)

  • Some pictures appeared to include camera-original metadata, but others had clearly fake metadata. This is unusual in that pictures from a single group usually have either one or the other, but don't alternate.

  • But most importantly: they each had a timestamp visibly embedded in the photo's corner.
At FotoForensics, there are different degrees of rare or uncommon. For example, JPEG files can contain thumbnail or preview images, but they are rare; only about 18% of pictures at FotoForensics includes these secondary pictures. (Most pictures that we receive come from Facebook, Twitter, media outlets, WhatsApp, Telegram, etc. -- services and applications that remove thumbnail images.) GPS metadata is rare -- only about 1% of pictures uploaded to FotoForensics contain any kind of GPS metadata. Child porn is very rare; fewer than 0.1% of uploads are child porn (and those all get reported to NCMEC, because that's what Federal law requires). But pictures with embedded timestamps in the corner? Excluding pictures that use this forgery technique, there might be a few thousand out of the current 3.1 million images (fewer than 0.06%). "Extremely uncommon" is an understatement. Although most digital cameras have the option to enable embedding timestamps in the visible photo, almost nobody enables that option.

Checking the Time Cameras normally don't default to embedding visible timestamps on images. If your camera does support it, then you may only have the option to turn it on or off. Some cameras include additional options, such as the location (which corner of the photo) or formatting (e.g., m/d/y or d/m/y). If you have a really fancy camera, then you might have the option to select the font's color. In general, there are not many options available.

I want to emphasize: if the camera doesn't support changing the font's color, then you only get one color. And if the camera doesn't support changing the size or font or language, then you can't change those either. I went scouring through manuals for different cameras and noticed a very specific attribute: the color, size, format, language, and position of the timestamps in these car photos rarely matched the format supported by the camera. Moreover, the image quality often didn't match the camera.

They were explicitly adding timestamps to photos.

For example:

I'm not a car guy, so I don't know what part of the car is shown in this photo. (Is that the engine?) But I do know pictures. The metadata in this photo of an engine part appears to be camera original. However, the compression level quality (ELA) is definitely not original. Moreover, the font, color, size, and language (Russian) are wrong for a Panasonic DMC-S1.

With this picture of a VIN plate, someone added text to the top. It's in Russian and translates as "Photo of good quality". The metadata identifies a Canon PowerShot SX100 IS camera and post-processing with an Adobe application. I'm not too concerned with the use of Adobe. Rather, I'm more concerned about the quality of the timestamp. ELA indicates that the timestamp is at the same quality as the text at the top; the timestamp was not added by the camera.

According to the metadata, both of these pictures are from the same camera: a Canon PowerShot A4050 IS running firmware version 1.01 revision 2. And yet, the embedded timestamps are different sizes, colors, and positions. (That's not possible with this camera.)

If you look closer, then you might notice other problems. For example:
  • Both have very low quality ELA results; these are not camera-original images.

  • The first picture says the date is "12/08/2013" in the photo, but the metadata says it was captured on "2013:08:09". The embedded timestamp in the photo is off by 3 days when compared to the metadata.

  • Both pictures have metadata that says "Date Stamp Mode: Off". The embedded timestamp should not be in the photo.
That's right: all of the metadata is fake. They took real camera metadata and altered it just enough to appear different at a superficial level. Then they digitally added in a timestamp to the photo. Finally, they attached the metadata to the altered image.

Here's another example:

In these two pictures, the embedded timestamps differ in size, color, location, language, and even date. But they are both embedded over the exact same picture. The first one is a JPEG with full metadata, while the second one is a PNG that was converted from a JPEG. This confirms that the timestamp was not part of the camera capture process; the embedded timestamp was added later. (If it had been added by the camera, then the background behind one of the timestamps would be different due to some kind of removal of the older timestamp. There are no distortions, so it was never there.)

Variety is the spice of life (Paznoobraziye - pryanost' zhizni) In most of these examples, they reused low quality old photos and tested with small dimensions. But that isn't always the case. Often they used higher quality images and larger dimensions. Here are a few examples:

In many of the larger and high quality pictures, they appear to have started with a camera-original photo (or template image) and then added in the timestamp. The timestamp's value usually matches the EXIF metadata's timestamp information. Since the EXIF's timestamp is stored in plain text, they appear to be altering the plain text EXIF data and then embedding a matching timestamp into the image.

Most of the examples that I have included so far are from Canon cameras, but that isn't all that they used. Here are photos that claim to be from other types of devices:

These match the metadata of a Sony DSC-H70, Samsung ES70, and Nikon P520. This group uploaded lots of pictures that appear to come from lots of different cameras.

At this point, I am certain that they are (1) reusing old photos, (2) digitally adding in the timestamps, and (3) optionally replacing metadata.

Above and Beyond With over 3 million pictures at FotoForensics, I've seen plenty of pictures from lots of different groups that contain altered images. Replacing metadata is rare, but not totally unexpected. In particular, there's a common ballistic approach that evaluates the existing metadata. It basically checks if camera x has metadata fields y with values z in a specific order. Copying metadata from a known camera defeats this ballistic approach. (If this was all they were doing, then I wouldn't be impressed.)

However, there are other ballistic methods. For example, JPEG compression uses a combination of quantization values (Q tables) and discrete Huffman tables (DHT). Even if the metadata is wrong or missing, the combination of Q tables and DHT values may still be unique enough to identify the last device or application that encoded the JPEG. And this is where this car forgery approach differs from other approaches: they replicate the encoding ballistic signature. (This is some seriously advanced software.)

For example, that initial picture from a Panasonic DMC-S1 is too low quality to be a camera original. And yet, the ballistic signature perfectly matches an original photo from a Panasonic DMC-S1. With most pictures, they even replace embedded thumbnail images, including those embedded in proprietary MakerNotes, and they encoded those preview pictures with the correct ballistic signatures. (If it wasn't for a few small mistakes in how they copied data, I wouldn't be able to detect them at all. And no, I'm not going to detail those mistakes because I don't want them to improve their forgery method.)

This group has demonstrated the ability to edit timestamps and replace metadata with data copied from other photos. This means that I have no reason to believe that the photo matches the metadata or that the times are correct. Moreover, they take steps to retain the original ballistic signatures, making these alterations very difficult to detect. As forgery methods go, this is one of the best. (I've only seen a few other methods that are better.)

ABTOBA3 There is one other commonality that you have probably noticed from these pictures: ABTOBA3. It's the vehicle manufacturer's name, written at the top of every VIN plate. In English, it's spelled AvtoVAZ and they are the biggest Russian car company. There are other Russian car manufacturers (GAZ, UAZ, NAMI, Yarovit Motors, etc.), but all of these photos are hyperfocused on AvtoVAZ. In the thousands of pictures that FotoForensics has received from this group, nearly all have been from AvtoVAZ.

This brings up some questions. Such as: Who is creating these forgeries? Is this activity sponsored by AvtoVAZ or is someone developing photos to scam AvtoVAZ? Maybe the developer has easy access to this type of car for some other reason? And how many people are involved here? Is this one person, or a large group? In my next blog entry, I'll dive deeper into the 'who'.
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
There are as many different reasons to alter photographs as there approaches. Every alteration technique leaves detectable attributes that can be used to construct a fingerprint for the technique. Sometimes these attributes are automatically generated by specific applications. These fingerprints and toolmarks can be used to determine a ballistic signature, which maps the technique to a specific approach.

I have previously written about some of the more obvious types of fraud and deception. Here are a few examples:
  • "Body by Victoria": compression rate analysis and different types of signal analysis identified alterations in a Victoria's Secret photo.

  • "Re: Smithsonian": image analysis identified alterations in a photography contest, resulting in some disqualification due to excessive digital modifications.

  • "Blurring The Truth": blur and colorspace analysis identified an altered news photo.

  • "We Have A Winner!": evaluation of fake lottery tickets that were used for gaining Internet notoriety.

  • "Unbelievable": identified digital alterations made to win the World Press Photo contest. This included a deep dive into Adobe's XMP metadata (a very distinct toolmark).

  • "Pretty in Pink": traced a picture used for a catfishing scam.
In each of these cases, there were distinct attributes in the metadata, color profiles, compression level, and other detectable artifacts that could easily identify alterations and related deceptive practices.

(Reminder: All of these pictures in this blog entry came from the public FotoForensics service. The FAQ explicitly says that the public site offers no privacy, images may be viewed by other people, and pictures may be used for research, and this blog entry is discussing research findings. If you want privacy, then use the commercial FotoForensics Lab service.)

Everyone Starts Somewhere There are different reasons for altering pictures. In my opinion, most of them are not used for anything related to "fraud". Edits for vanity, humor, and memes are very common. Similarly, enhancing colors or focusing the subject matter (e.g., cropping out excess content) are expected in many situations. In contrast, pictures intended for fraud or deception typically stand out. There are no selfies, no obvious humor, nothing that looks like a meme, and no attempt to remove extraneous content.

Most of these deceptive edits stand out for one reason or another. Usually the artist (person doing the edit) doesn't take very good steps to cover any evidence of the edits. These often stand out as simplistic or amateurish. For example:

(Click on the pictures to view them at FotoForensics.)

The style of the first picture (box) is often used to provide proof of shipping or proof of delivery. However, even if you don't visually see the problems (e.g., text was added), error level analysis (ELA) quickly highlights many of the issues with this picture. In addition, metadata shows that all camera-original information has been stripped out of the file. (If you're an analyst and were expecting a camera-original image, then you should be asking "where did the metadata go?")

The second picture shows a badly made fake diploma. ELA shows the most recent alterations and the metadata identifies the tools that were used. (Photoshop!) This picture is just one of hundreds that FotoForensics has received. They all come from a fake diploma service based in the Ukraine and they primarily cater to people throughout Africa. (Want to see another one?)

Practice Makes Perfect It takes the right combination of the right tools, right knowledge, and right kind of thinking to identify any alterations. Simple alterations usually stand out immediately. However, some alterations show sufficiently advanced techniques that they might pass a preliminary examination. The better the forgery method, the deeper the examination needs to go.

For example, back in 2013, FotoForensics received a picture showing bags of drugs that were ready for sale.

At first glance, the compression level appears consistent and the metadata has all of the fields that are expected from an Apple iPhone 4 on iOS 6.1.3. It even has GPS coordinates that pinpoints a house in the UK. When I showed this during a presentation to law enforcement, the overall consensus was that they were ready to kick in the door.

Here is the GPS metadata as seen when extracted with ExifTool:
GPS Latitude Ref                : North
GPS Latitude : 54 deg 34' 40.80"
GPS Longitude Ref : West
GPS Longitude : 1 deg 20' 34.80"
GPS Altitude Ref : Above Sea Level
GPS Altitude : 27.20873786 m
GPS Time Stamp : 00:00:00
GPS Img Direction Ref : Magnetic North
GPS Img Direction : 117.6573939
GPS Date Stamp : 2013:05:07

A preliminary evaluation should identify the timestamp and coordinates. A deeper look might confirm the metadata ordering (latitude before longitude before altitude, etc. Yup! That's an iPhone!) However, a deeper examination identifies some abnormalities:
  • The GPS metadata includes a date stamp. The data stamp identified the correct date (as recorded by other EXIF data fields). However, FotoForensics has received literally hundreds of examples from iPhone 4 and 4S devices running iOS 6.1.3; as a research service, I can use these examples for generating a comparison. As far as I can tell, iOS 6.1.3 never records the GPS date stamp; it only records the GPS time stamp. The appearance of the GPS date stamp in an iOS 6.1.3 GPS record is an unexpected attribute and inconsistent with the iPhone's default signature.

  • The GPS metadata also includes a time stamp: "00:00:00". This is inconsistent with an iPhone 4; the time should denote when the GPS was last sampled prior to capturing the photo. It should be at (or a few seconds before) the EXIF creation time stamp. But if the person is inside, then it could be earlier by minutes or longer. In any case, there is a 1 in 86,400 chance that it would last be sampled at exactly 00:00:00, and it is even lower considering that it would need to be the exact second that the phone entered the house, over 18 hours earlier. This time stamp indicates that someone forgot to set the time when they altered the metadata.

  • The GPS metadata includes a compass direction and the relative angle. The direction is "117.6573939" degrees relative to "Magnetic North". However, nearly all other iPhone 4 and 4S pictures that are running iOS 6.1.3 with GPS coordinates are relative to "True North" and not "Magnetic North". Of the less-than-dozen examples seen at FotoForensics that are relative to "Magnetic North", all appear post-processed by an unidentified application.

  • The GPS is very accurate in this file. The internal data structure encodes the latitude and longitude as a triplet of numerators and denominators. In this case, latitude is 54.000000 (54/1) 34.000000 (34/1) 40.800000 (4080/100) and the longitude is 1.000000 (1/1) 20.000000 (20/1) 34.800000 (3480/100). The problem is, any iPhone 4 running iOS 6.1.3 uses a different encoding method: the third component should always be 0 (0/1), and the second number should always have a denominator of 100, not 1. This is the wrong GPS encoding format for this device.
There are many different tools for extracting metadata. However, this type of GPS alteration is so sophisticated that most common tools won't detect it. Even if you rely on powerful metadata extractors like ExifTool or Exiv2, you'll never notice the wrong internal encoding because these tools only display the converted numeric values.

But there is one more issue with the GPS coordinates: they pinpoint a house. Depending on the mapping tool, the coordinates are either directly on the center of the house, or along the center of the wall that is between two windows. The thing is, GPS does not work well inside houses -- it really needs a direct view of the sky. A GPS location taken from inside a building usually records either a location right outside the nearest window, or the last sample location before the user entered the building (an exterior door). The GPS location is inconsistent with a device inside a home. (And we know the person wasn't outside because the photo has indoor lighting. You can even see the light fixture reflected in the plastic bags.)

Wash, Rinse, Repeat While I have reasons to doubt the GPS information in the drug photo, I don't know if the encoding method was intentional or a side-effect of some unknown modification approach. In particular, I have a few other examples from an iPhone 4/4S running iOS 6.1.3 that includes a GPS Date Stamp and the wrong lat/lon encoding. However, all of those other pictures (1) use True North, (2) include a non-zero GPS Time Stamp (even if it isn't always correct), and (3) have artifacts denoting post-processing by some other application (many used an Adobe product). Here are two examples that were uploaded by different people:

Both of these examples include metadata from iPhone 4/4S devices running iOS 6.1.3. Each also includes the unexpected GPS Date Stamp and wrong encoding method for latitude and longitude. The first picture (river) also includes artifacts denoting Windows Photo Gallery, while the second picture (food) was post-processed by an Adobe product and last saved on an Apple device.

As alteration methods go, this approach is better than your average amateurish attempt. This drug photo forgery would pass a simple initial inspection, but fails on a much deeper dive. (I suspect that the GPS coordinates in the drug photo were intentionally added to mislead any investigators.)

Nearing Perfection It takes time and effort to develop the tools and skills needed to create professional quality forgeries. These are pictures that often pass a superficial and moderately deep examination. But without the right tools, techniques, and knowledge, an examiner may never notice the forgery. For example:

This picture (red car) uses a professional quality forgery method. (Don't trust the metadata!) Among other things, the forger attached real metadata to a low quality image. If you didn't notice the image quality, then you might not notice that the picture wasn't a camera original.

Going Pro I've decided to do something a little different this month. Rather than blogging about a wide range of topics, I'm going (to try) to do one big deep dive that is divided across several blog entries. And rather than evaluating amateurs, I will focus on one specific professional fraud technique, including how it was developed and how it is currently being used. In my next blog entry, I'm going to start diving into specific photos, JPEG attributes, and details about a very sophisticated fraud method's ballistic signature (with just a hint of networking).
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
The Hacker Factor Blog by Dr. Neal Krawetz - 2M ago
Whenever I mention my honeypot, there's always someone who asks how to set one up. Configuring a basic honeypot on a Linux system isn't too difficult for entry-level programmers.

(Note: This is an extremely technical blog entry. If you don't understand basic networking (interfaces, IP addresses, ports, etc.) or have a working understanding of shell scripting in Linux, then this might just look like technobabble. And to the extreme techies: there's a lot of additional functionality that I'm skipping over because I want to keep this at an introductory networking level.)

There are a couple of reasons why you might want to run a honeypot. For example:
  • Curiosity: There's a lot of stuff happening on the network that you're probably not seeing. A honeypot helps you learn what is going on. You can even identify the "normal" amount of network traffic and identify unexpected traffic patterns.

  • Fun: For networking geeks, this is absolutely entertainment. And for people interested in networking, this is a great way to get started.

  • Socializing: I know of some close-door groups where people who run honeypots gather and compare notes. It's nice to be able to share findings and observations in order to determine if something is just happening to you, or is happening to everyone. (But mentioning this on Tinder might not get you a date. In theory.)

  • Validate claims: Lots of organizations make statements about how hostile the Internet can be, or how quickly a compromise can happen. But who verifies these findings? With a honeypot, you can do it.
Scanners vs Honeypots There are lots of different types of honeypots. However, when new people want to start collecting packet data, I find that they often confuse scanners with honeypots.

A network scanner queries every computer on the network or every port on a computer. This is active scanning. Some scanning tools, like zmap and masscan, can scan a single port on all 4 billion IPv4 addresses in a few hours. Often, bots will scan first and then queue up the results for deeper analysis. For example, Shodan continually scans for all computers that have open HTTPS (web) servers. When they find a web server, they will queue it up for a follow-up scan. It's during the follow-up that they index your HTTP headers, TLS certificates, your web server configuration, etc.

(Most services won't notice active scanning. But the ones that do might not react well to your scans. They might try to get your scanner shut down, blacklist you as a hostile network service, or worse.)

In contrast, a honeypot is passive. Rather than scanning the entire Internet for potential targets or victims, a honeypot quietly sits and waits for a scanner to come by. A regular user should never accidentally stumble across a honeypot. The only systems that should find your honeypot will be scanners and attackers.

Types of Honeypots There are two types of honeypots: interactive and non-interactive. While honeypots passively wait for a packet to arrive, an interactive honeypot can interact with the incoming network traffic, while non-interactive honeypots give no outward appearance of doing anything.

An interactive honeypot might appear as a vulnerable network service. For example, a honeypot might pretend to be an SSH daemon in order to record login attempts. This is a great way to identify the most used passwords that are guessed by brute-force attackers. However, interactive honeypots come with some risks. For example:
  • What if the attacker finds a flaw in the honeypot? They could potentially use the honeypot to compromise the computer.

  • Some honeypots, like fake FTP and fake proxy services, might need to generate some outbound connections in response to hostile requests. (You don't want them to immediately realize that you're a honeypot, right?) However, those outbound connections could end up relaying attacks to someone else.

  • Some honeypots, in an effort to appear real, support file uploading and downloading. This can be a great way to have the attacker upload their exploitation tools. However, it also means that they can upload porn and child porn and use your server as remote storage.

  • There are many pre-build honeypots that emulate mail servers, telnet and ssh services, web servers, proxies, and more. I'm usually hesitant about these systems because they are often unsupported and poorly documented. Many have not been updated in years. (Are they stable code? Or just dropped projects?)
Personally, I try to stay away from most types of interactive honeypots.

The alternative is a non-interactive honeypot. This type of system just sits and listens. It never interacts with the scanner/attacker; ideally, the scanner/attacker should not even realize it is there. While you won't collect passwords or data files, you can collect statistics about who is doing the scanning and the types of network services that are attacked most often.

If you've never built a honeypot before, then you want to start with a non-interactive honeypot.

Configuring a Non-interactive Honeypot Server A simple honeypot can listen to one port, multiple ports, or every port. However, you don't want it to listen on a port that is running an active service. Otherwise, it will just be collecting everything -- scanners, attackers, and legitimate users.

In my honeypot configuration, my server has two network interface cards. One card has an IP address but no running services. I have even disabled all outbound traffic -- including TCP RST packets. (TCP RST packets are typically sent when a scanner tries to connect to a port that doesn't run any services). For example:
iptables -A OUTPUT -o eth0 -j DROP
This command blocks all outbound network traffic on interface eth0. With this configuration, anyone scanning my honeypot would receive nothing back. It is indistinguishable from having an unused IP address.

My second network interface card has a running Secure Shell (SSH) daemon. This is how I connect into the honeypot server for system management and to collect the data files.

If you don't have two network interfaces, then try running SSH on a non-standard port (something in the 40000-65000/tcp range is probably a good idea). For example, you might move your ssh daemon (/etc/ssh/sshd_config) to be on port 41244. This way, typical scanners and common attackers won't find it and you can filter out all traffic using SSH on your non-standard port.
iptables -A OUTPUT -o eth0 ! --dports 41244 -j DROP

This command says to drop all network traffic exiting through eth0 except for traffic on port 41244.

Using a Packet Sniffer The second thing you're going to need is some kind of packet sniffer. There are lots of options out there, and some are better than others.
  • Wireshark: Wireshark is an extremely powerful packet capture and packet decoder. If you have collected packets, then this is the easy way to decode them. However, wireshark has a problem with long-term captures. You see, wireshark tracks TCP sessions. (Each TCP session is a combination of source address, source port, destination address, and destination port.) Each new TCP session requires a little bit of allocated memory. And since wireshark never forgets a TCP session, it will eventually run out of memory and crash. On my servers, wireshark can run for a few days before crashing.

  • tshark: Wireshark is a graphical application; tshark is the text-based version. Like wireshark, tshark will crash after a few days due to too many tracked sessions.

  • snort: Snort is a powerful packet sniffer and intrusion detection system (IDS). In the old days, snort was a good choice for a simple packet sniffer and honeypot. But it has grown over the years and become much more focused on the IDS portion. In my opinion, snort is not a good choice for a simple honeypot because it tries to do too much.

  • Bro: Like snort, Bro is a full IDS system. It is very powerful, very configurable, very large, and very overkill for a simple honeypot.

  • tcpdump: This is as simple as it gets. No session tracking, minimal packet decoding, and it can run forever. This is just a raw packet sniffer with optional filtering rules. If you don't know where to start, then start here.
There are other packet capture options, but tcpdump is a great starting place. (Tcpdump can get overwhelming after a few filter rules, so I wrote my own packet sniffer for my honeypots. But for a simple honeypot, tcpdump is fine.)

Configuring tcpdump While tcpdump can capture entire packets, a really basic honeypot just needs to record IP addresses, port numbers, and network protocols (typically TCP, UDP, or ICMP). The basic tcpdump command for doing this is:
sudo tcpdump -tttt -q -l -i eth0 -n -s0
  • "-tttt" tells tcpdump to preface each line of packet output with the current data and time in a human readable form. E.g., "2019-05-17 14:39:54.576416" (that's down to the fraction of a second).

  • "-q" tells it to be quiet (minimize the output).

  • "-l" (lowercase 'L') directs it to use line-buffering for output. This is great for streaming results into a file or another application.

  • "-i eth0" specifies the network interface for sniffing the traffic. If you leave it off, then it will listen on every interface. But if you have one network interface devoted to the honeypot (e.g., eth0), then use "-i" to tell tcpdump where to listen. (The interface "eth0" may not be right for you. If you don't know your interface's name, use sudo tcpdump -D to list every available interface.)

  • "-n" prevents hostname and protocol lookups. Querying DNS and doing name resolution lookups can slow things down. Also, if DNS queries attempt to use the same interface where tcpdump listens, then it could end up capturing its own network requests, causing a feedback loop and infinite network traffic. If you want to look up hostnames on a honeypot, then do it after the fact and not during data collection.

  • "-s0" determines how big of a packet buffer it should use. This resolves to the default size.
If you just run this command, then you are likely to see lots of network traffic. (Quick! Press control-C to kill it!) Some of the traffic may be caused by you. Others will be caused by your local routers or network infrastructure. You're going to want to filter these out. Fortunately, tcpdump has a really easy filtering system. You just need to list everything that needs exclusion.

For example, my honeypot immediately saw network routing traffic (VRRP, STP, and ARP). So I filtered those out:
sudo tcpdump -tttt -q -l -i eth0 -n -s0 not vrrp and not stp and not arp
With those gone, I began to see local IPv6 traffic (from addresses beginning with "fe80:") and UDLD traffic. (UDLD is used by Cisco routers. These requests always have the ethernet address 01:00:0C:CC:CC:CC.) So I filtered those out:
sudo tcpdump -tttt -q -l -i eth0 -n -s0 not vrrp and not stp and not arp and not net fe80::/16 and not ether host 01:00:0C:CC:CC:CC

If you only have one network interface card, then you can also add in a rule to filter out your traffic. For example, add in and not port 41244. And if you see network traffic to and from other computers on the same network (not just your own traffic), then you can add in a filter to only catch the honeypot's traffic. For example, if the honeypot's network address is, then include and host (On linux, use hostname -I to list all of your external network addresses.)

With all of these filter rules, I'm left with a simple listing that tells me who is scanning my honeypot. (I replaced my honeypot's IP address with the word "honey"):
2019-05-17 14:56:29.448483 IP 2019-05-17 14:56:51.254279 IP 2019-05-17 14:57:34.326819 IP 2019-05-17 14:57:42.540136 IP 2019-05-17 14:58:05.864077 IP 2019-05-17 14:58:14.398041 IP 2019-05-17 14:58:16.858401 IP 2019-05-17 14:58:29.418584 IP 2019-05-17 14:58:29.423488 IP 2019-05-17 14:58:31.468471 IP 2019-05-17 14:59:07.238115 IP 2019-05-17 14:59:11.985728 IP 2019-05-17 14:59:32.908400 IP 2019-05-17 14:59:06.462259 IP 2019-05-17 14:59:09.028231 IP
In under three minutes, I collected data from 15 packets. (Your volume will vary based on your network neighborhood, upstream filtering, and bandwidth.) This really isn't much data; a month might be a megabyte. But it shows the date and time any system on the Internet tried to access my honeypot. (And since I'm not running any services on my honeypot, nothing should be trying to access it so everything is suspicious.) This log tells me their network address and port number (the final "." number after the each address) and where they were going. Port 4450/tcp is the Common ASCII Messaging Protocol (CAMP). 11211/tcp is used by memcachedb and the Apple iCal Server. 5060/udp is SIP (voice of IP telephone). You can use tools at Speedguide.net and Internet Storm Center to look up each port's purpose and common risks.

Analyzing Logs It's easy enough to capture the output from tcpdump into a file.
sudo tcpdump -tttt -q -l -i eth0 -n -s0 not vrrp and not stp and not arp and not net fe80::/16 and not ether host 01:00:0C:CC:CC:CC > honeypot.log

From here, the log file (e.g., honeypot.log) can be easily imported into a spreadsheet or parsed with command-line tools or basic shell scripting. Personally, I use a simple bash script (tcpdumpfilter.sh) to normalize the data format:
while read Date Time IP ipsrc lt ipdst proto ; do
# Normalize the protocol name
case "$proto" in
(tcp*) proto="TCP" ;;
(UDP*) proto="UDP" ;;
(ICMP6*) proto="ICMP6"; ipsrc="$ipsrc.x"; ipdst="$ipdst.x" ;;
(ICMP*) proto="ICMP"; ipsrc="$ipsrc.x"; ipdst="$ipdst.x" ;;
# Remove trailing colon
# input: tcpdump line with address "dot" port (e.g.,
# ouput: tcpdump line with address "space" port (e.g., 23)
echo "$Date $Time $IP ${ipsrc%.*} ${ipsrc##*.} $lt ${ipdst%.*} ${ipdst##*.} $proto"

This script makes the data easier to parse because it separates out the port numbers and keeps the data columns. (ICMP doesn't have port numbers, so they are listed as 'x'.)
cat honeypot.log | tcpdumpfilter.sh
2019-05-17 14:56:29.448483 IP 44860 2019-05-17 14:56:51.254279 IP 53260 2019-05-17 14:57:34.326819 IP 47041 2019-05-17 14:57:42.540136 IP 36684 2019-05-17 14:58:05.864077 IP 53279 2019-05-17 14:58:14.398041 IP 44532 2019-05-17 14:58:16.858401 IP 56470 2019-05-17 14:58:29.418584 IP 5134 2019-05-17 14:58:29.423488 IP 5134 2019-05-17 14:58:31.468471 IP 52989 2019-05-17 14:59:07.238115 IP 45446 2019-05-17 14:59:11.985728 IP 46209 2019-05-17 14:59:32.908400 IP 41579 2019-05-17 14:59:06.462259 IP 57633 2019-05-17 14:59:09.028231 IP 51219
Now I can start doing data analysis. For example, what if I want to find every targeted destination port? I can capture all of my tcpdump data into a log file (honeypot.log) and then parse the file!
$ cat honeypot.log | tcpdumpfilter.sh | awk '{print $8,$9}' | sort -n | uniq -c
1 22 TCP
1 161 UDP
2 445 TCP
1 2323 TCP
1 3305 TCP
1 3499 TCP
1 4450 TCP
2 5060 UDP
1 5359 TCP
1 9224 TCP
1 9833 TCP
1 11211 TCP
1 49049 TCP

The sort command orders the values numerically, and uniq counts the number of unique port sightings. In the few minutes that I ran my tcpdump, I already caught two SIP (5060/udp) and two Windows SMB (445/tcp) queries. If you let this run for a few days, you'll see what is attacked most often. Eventually you compare attack frequencies over time and determine when a new exploit comes out because there will be an increase in those port scans.

Another useful command identifies who is scanning your honeypot:
$ cat honeypot.log | tcpdumpfilter.sh | awk '{print $4}' | sort | uniq -c

The awk command prints the 4th field: the scanner/attacker's IP address. The results are sorted and counted. Typically, you'll see a wide range of addresses that are only seen a few times, and a small set of address that are seen over and over.

You can also start doing DNS and WHOIS lookups to find out who is behind each scan. In this brief sample, I caught one pseudo-"researcher": resolves to "zg-0301f-15.stretchoid.com". According to their web site:
Stetchoid is a platform that helps identify an organization's online services.

Sometimes this activity is incorrectly identified by security systems, such as firewalls, as malacious. Our activity is completely harmless. However, if you would prefer that we do not scan your infrastructure, please submit the following information: ...

Stretchoid doesn't identify who they are. They don't identify what they are doing or why they scanned my honeypot on 161/udp. (Port 161/udp is reserved for the simple network management protocol (SNMP) and is often vulnerable to attack.) And yes, they spelled "malicious" wrong -- probably so it won't come up in google searches for "stretchoid and malicious".

This short capture also caught a couple of cloud providers. ( is LeaseWeb and is OVH SAS.) And it caught a VPN: is operated by "Asiamax Technology Limited VPN Service Provider Hong Kong". And it caught some addresses from known-hostile subnets (,,, and -- almost everything you'll see from these subnets are scans and attacks. They are either looking for targets to attack or blindly attacking.

There are two things that I typically look for in this host grouping. First, hosts with frequency counts that are much larger than any of the others should stand out. For example, you might see most addresses with 1-5 sightings over a few days, and one host with 100 or 1000 sightings. Those indicate really active (and hostile) network addresses.

The second thing to look for are large blocks of sequential addresses. For example, the last two addresses are sequential ( and If I waited long enough, I would likely see every address from that subnet. This is because they distribute their scanning/attacking load across an entire network range. This simplifies the identification of known-hostile subnets.

If you dive deeper into the analysis, then you are likely to see clusters form. For example, you will probably see groups of non-sequential addresses repeatedly do the same kind of scan at the same time (+/- 1 second). Those are often triangulation attacks -- based on the time differences between the server's responses, they can try to geolocate your server. (I see them often from China and clients on cloud providers like Digital Ocean.) Other clusters may be trying to map out all Internet routes. And since your honeypot doesn't respond to anything, they can't tell that you see them and they can't map your honeypot.

From Theory to Practice This tcpdump solution is great for a "my first honeypot". But eventually you will probably want to collect more types of data. This is where Bro and Snort come in -- they can log access attempts and specific types of exploits.

For my own honeypot, I log addresses, ports, protocols, common types of network attacks, and any kind of "unexpected" data. I've been turning my logs into a list that maps network addresses back to the kind of activity: known "research" groups, known hostile networks, known cloud providers (and whether they act as unknown researchers or unknown hostile networks), VPNs, proxies, Tor, etc. Some parts of my list are updated daily, while other parts are updated less often.

On my various network services, I use this list to quickly determine who is coming into my site. For example, I can spot a web query from Shodan, BinaryEdge, and other "research" groups before my web server receives the query. This allows me to stop them from accessing / attacking / "just checking" my site before the packet reaches the web server. By the same means, I can block known-hostile network traffic before it can connect to anything. (My list is really long, so I created my own DNS blacklist server for rapidly looking up addresses in this list. I've also been toying with the idea of selling access to my list.)

Advanced Uses Beyond collecting data about scans and targets, a simple honeypot can also be used to monitor outbound traffic. For example, a network administrator may collect data about typical office network traffic (with appropriate permissions). This way, they can set a baseline for normal usage. If there's ever a spike in unusual traffic, then the administrator can jump in and see what's going on. It might be the start of a computer virus or worm, or maybe something else.

With a honeypot, you can also measure "hype". For example, Microsoft recently patched a really bad remote desktop vulnerability. The issue permits a worm to infect vulnerable systems and is being promoted by descriptions like "UPDATE NOW! Critical, remote, 'wormable' Windows vulnerability" and "Microsoft warns wormable Windows bug could lead to another WannaCry".

Now, I certainly don't want to downplay the significance of this vulnerability. People should patch their Windows systems...
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
I've previously written about network "researchers" (with air quotes) who are constantly scanning and attacking servers on the Internet. Here's a quick summary:
  • Many of these researchers don't identify themselves.

  • Most don't publish their research or describe what they are doing.

  • Many "research" groups use cloud services and don't identify their network addresses. As the recipient of the attack, you cannot immediately identify who sent it.

  • They often use the exact same exploit packets used by botnets and malware. You can't tell intent from the packet; the packets look the same.
If you can find the researchers behind any given attack:
  • Most make vague claims about making the world a better place.

  • Some claim that they are not hacking because that's not the intent. But this goes against Ethical Hacking 101 and the basics for any legitimate audit: Always get permission first. These "researchers" don't have permission.
Most of these vulnerability "research" services are relentless in their attacks. As one person wrote at thedailywtf, "At this point those so called 'researchers' are worse than real bad actors."

Last week Tiago Henriques, the CEO of scanning company BinaryEdge, published an article titled, "I want to block internet scanners." His basic premise is that (1) he's been receiving "stop scanning me" notices for years, (2) people don't like being scanned, and (3) you can't stop groups like BinaryEdge from scanning you.

His article really set me off. I think it shows how he, as well as most of the other "research" groups out there, are out of touch with reality.

Pump up the volume The first heading in the article is titled "The Internet is a noisy place!" He explains that there are lots of people scannng IP addresses. As someone with servers being scanned, I've been keeping metrics. The distribution is roughly:
  • 25% of scanning packets that hit my honeypots are from known "research" groups: Shodan, Shadowserver, Team-Cymru.org, University of Michigan, University of Washington, comsys.rwth-aachen.de, researchscan.eecs.berkeley.edu, binaryedge.ninja, intruder.io, scan.security.ipip.net, telnetscanproject.org, internet-census.org, Net Systems Research, QuadMetrics, United Protection Security Ltd, research.findmalware.org, ampereinnotech.com, etc. (Not the entire list.) These groups are just the ones that I've been able to identify and backtrace to owners.

  • 25% are from cloud services with no clear owner. As far as I can tell (spot-checking), almost none are infected systems or open proxies. Rather, most are operated by unidentified organizations who are using the cloud service for anonymity. Some may be research groups, while others may be malicious attackers. Malicious attackers are usually rapid and either hyperfocused on one protocol or slinging a wide range of exploits. In contrast, "researchers" typically use a small set of exploits and repeatedly scan for the same things. Most of these cloud systems behave like unidentified "researchers".

  • 25% are from known-hostile subnets. Vanoppen.biz, OkServers, a couple of Chinese subnets, etc.

  • 25% appear to be infected hosts or botnet nodes. (Mostly Mirai-like infections. I've been working with Bad Packets to track these.)
I use this information to block access from known-hostile networks (both malicious and "researchers"). And I'm not the only person doing this. Lots of administrators have similar blacklists. As one person wrote, "Who made Shodan, ShadowServer and Rapid 7 the official ones or are they just community vigilantes?"

Yes, there are lots of systems scanning the entire Internet for vulnerable systems. But an estimated 50% of these scans come from self-proclaimed "researchers". If the researchers stopped scanning, or provided an easy way to identify the researchers (for blocking), or scanned less frequently, then the noise level would drop dramatically. As a result, administrators could more easily identify the true malicious attacks.

Benign, like a tumor Henriques describes some scanning groups as "benign". He lists BinaryEdge, Rapid7, Censys and Shodan as benign because they:
  • Respect blacklisting: you can opt out of their scans.

  • Perform no active exploitation: they scan, but don't exploit.

  • Provide benefits for you: You can check what data they have collected about your IP address.
Personally, I haven't tried to opt out of those four "benign" scanning services. Years ago, I tried to opt out of scanning from the University of Michigan, but they never removed me. Michigan partnered up with "benign" Censys, but Censys still scans me daily.

I also tried to opt out of Shadowserver. However, they ignored the request. On their page, they include a link that is supposed to show everyone who has opted out from scans. But the link's destination page doesn't exist. (Or maybe nobody ever successfully opted out?) And I tried to opt out of a few others. None have ever honored an opt-out request.

With regards to "no active exploitation": I think Shadowserver is one of the few groups that admits that they might cause problems: "We intend no harm, but if we are causing problems, please contact us". The big problem here is that the recipient of the scan cannot identify the intent. (There's no "intent" flag in the TCP header.) The packets generated by these "research" groups look no different from actual attacks.

At the server, there are only microseconds to make a decision.
  1. A packet arrives.

  2. Based on the packet's contents, you make a decision: friend or foe, harmless or harmful, legitimate or malicious. There's no time to do a Whois lookup or trace the sender's origins. If it looks like an NTP amplification attack, then it is an NTP amplification attack. If it looks like a DNS cache injection attack, then it is a DNS cache injection attack. As the staff at the Internet Storm Center wrote, "a scan is a scan is a scan".

  3. The server reacts to the packet. This may grant access, block the packet, drop the packet, or flag for tracking. The local intrusion detection and prevent systems (IDS/IPS) should generate alerts regarding any scans or possible attacks.
Moreover, these scans are not harmless since they incur an actual cost in both time and money. As Tony Carothers wrote,
So when the security analysis notices unidentified addresses or services, the effort to classify the activity begins. This may take an hour sometimes, and from my experience time is always the resource we never have enough of. This is where the cost is incurred by the end user being scanned. The time spent to identify and update their internal databases.

Finally, there's the "benefit for you". I don't mind services where I can connect and ask it to scan my host for specific risks. However, that's different from services that scan the entire Internet. In the former case, I asked for the scan. In the latter case, I didn't ask for it -- they just forced it on me.

The benefit from these scans really goes back to these "research" groups and not the people they are scanning:
  • Some researcher groups charge for the data. That's right, they scan my server and then want me to pay to learn what they found. If I'm going to pay someone, then I'm going to get someone who is certified and have a signed contract for doing the audit properly.

  • Some, like Shadowserver, want you to subscribe before showing you your scan results.

  • While typically not advertised, most of these "research" groups have some private customers (with deep pockets) who pay for interesting findings.

  • In the worst case (which is pretty common), they will scan your server and make the results public for everyone to see. This way, you may not know that you have a problem, but the people who want to exploit problems can find you very quickly. (Right, Shodan?)
At no time do any of these "research" groups inform you of any vulnerabilities before giving your information to someone else. This goes against the whole concept of responsible disclosure.

Then again, I'm not sure how reliable their scans are. For example, Shadowserver is constantly scanning for NTP exploits. So, I setup a honeypot with a very vulnerable NTP server. Since Shadowserver publishes their scanning subnets, I put in a firewall filter to only allow packets from Shadowserver. They came in, scanned the NTP server, saw that it was vulnerable, and never told me that my NTP server had a problem. I've done similar experiments with Censys, University of Michigan, and other "research" groups; none ever tried to tell me about any detected problems. I like to hope that they inform big organizations -- like the US Government or Google -- but I just don't know. None of them seem to make any effort to contact smaller services. (Either that, or they are not looking for what they claim. That brings up an entire trust issue.)

Answering Questions In the section titled "You're targeting me on your attacks!", Henriques correctly points out that few attacks are targeted. Rather, these "researchers" are attacking everyone. (It's not just you.) However, that doesn't make me feel any better.

He goes on to list some questions that his company is trying to answer:
  • "If a vendor fails to provide a patch, how many systems worldwide are affected?" This is a good question and does require active scanning. However, it doesn't require scanning the Internet every few hours; once a day or once every few days should be plenty. (If an IP address wasn't vulnerable 3 hours ago, then it's probably still not vulnerable right now.) There's no justification for the volume of scans. Also, I rarely see researchers (other than Shodan) scanning for brand new vulnerabilities.

  • "If a vulnerability for a router is released how many ISP's worldwide have that router deployed?" As with the patch issue: if I didn't have a vulnerable router yesterday, then I probably don't have a vulnerable router today. There is no justification for scanning multiple times per day.

  • "Do we see a specific country attacking another?" This question does not require scanning, and scanning will not help answer this. Answering this question requires passive honeypots or detectors to observe incoming traffic. Unfortunately, organizations like BinaryEdge, Shodan, stretchoid.com, and ipip.net (not the full list) use cloud providers all over the world. Without identifying every researcher, one might mistakenly conclude that an increase in attacks from the Netherlands is a hostile country and not a group of cloud providers being used by "researchers".
Henriques points out that attackers often relay through other countries. To detect this, you need honeypots or detectors that can see the initial attack, sustained attacks, and subsequent compromises. That's not something that can be done through active scanning. (I'm a big fan of passive monitoring; I'm not a fan of widespread, long-term active scanning.)

Finally, Henriques notes that "malicious actors" differ from the "benign" scans because they follow the scan with an attack. Personally, I'm not willing to open my production servers to the followup packets in order to distinguish benign from malicious. Reconnaissance for an attack is part of the attack. At the server, I can see the reconnaissance scan, but I cannot distinguish the reconnaissance from "researchers" from the reconnaissance from malicious actors. A scan is a scan is a scan. And many types of scans are attacks.

Alternatives In the section titled "So are you saying my worries are unfounded for wanting to block internet scanners?", Henriques claims that blocking non-malicious actors lowers your visibility into your own exposure. I strongly disagree here since none of these scanning organizations will inform me about my vulnerabilities when they are discovered.
  • If I had no visibility or didn't know before, then having these organizations learn of my risks -- and not tell me -- does not increase my visibility or knowledge. It offers me no protection.

  • If I saw their scans (meaning that I have visibility), then continually scanning me just raises the noise level so that I might miss a malicious attacker. (Remember the big Target compromise? It wasn't that they didn't receive alerts about a possible compromise. It was that they didn't react to the alerts. With too many alerts, real problems get buried in the noise.)
Henriques claims to provide a benefit to the sites being scanned, but I'm just not seeing it. My interpretation is that blocking his scans means he can't provide a constant flow of updates to his paying customers, like vendors with patches or router providers.

If you find yourself being attacked by these "researchers", what should you do? Here are two good options:
  • Drop. You can drop their packets. Don't send back an ICMP undeliverable or TCP RST; just don't respond. This way, they can't distinguish "no host" from "stop scanning me".

  • Poisoned Honeypot. Setup a moving honeypot so the scanners incorrectly index vulnerable systems. For example, run a honeypot that responds to NTP and DNS exploits that researchers look for, only make it accessible to known research groups, and change the IP address every few hours or after every scan. This way, you can poison their collected data results. They might think there is a sudden increase in vulnerable systems when it's really just a moving honeypot. (Bonus points if they publish a warning about vulnerable systems due to your moving honeypot. That changes the moving honeypot from an attack on their data collection to an attack on their reputation.)
Analogies At the beginning of the article, Henriques stated that he would not use analogies. His paper includes similes, metaphorical examples, and emotional descriptions, but no analogies and no empirical evidence. He then ended his paper with an analogy:
If I had to use an analogy, which I typically hate doing, its as if you have your building, with the windows without any curtains, internet wide scanners are the people on the street, and you are asking 5 out of 1000 on the street to cover their eyes and not look at your windows. And it just so happens that these 5 are the neighbors that would be respectful and tell you "Hey, I can see inside your house because you have no curtains".

If you don't want people to see inside your house, you need to put curtains on your windows.
This analogy is apt, but he's not standing on the street. Scanning requires sending data right up to the server and looking at how it responds. However, regardless of whether someone has curtains, walking up to a house and trying to look in windows is usually illegal. In California, it's Penal Code Section 647(i) and (j):
(i) Who, while loitering, prowling, or wandering upon the private property of another, at any time, peeks in the door or window of any inhabited building or structure, without visible or lawful business with the owner or occupant.

(j) (1) A person who looks through a hole or opening, into, or otherwise views, by means of any instrumentality, including, but not limited to, a periscope, telescope, binoculars, camera, motion picture camera, camcorder, or mobile phone, the interior of a bedroom, bathroom, changing room, fitting room, dressing room, or tanning booth, or the interior of any other area in which the occupant has a reasonable expectation of privacy, with the intent to invade the privacy of a person or persons inside. This subdivision does not apply to those areas of a private business used to count currency or other negotiable instruments.

All 50 States in the United States, and most other countries, have some kind of laws restricting loitering, prowling, and/or "peeping Toms".

"I can't be bothered anymore." I tried to chat with Henriques after his paper came out, but he said he "can't be bothered anymore" to discuss his recent paper.

Keep in mind: This is his reply to my first-ever interaction with him. This wasn't after some long conversation or ongoing debate. And at least he was slightly more professional than "Shadowops", who just jumped in and told me to "Go gaslight yourself".

Binary Logic The CEO of BinaryEdge was very specific concerning his opinion on his scanning:
  • He is aware that people think his scans look like attacks. He is so aware, that he wrote a paper on the topic.

  • In his paper, he shows no intention of stopping. Instead, he claims it is for the greater good and that we harm ourselves by losing visibility if we opt out.

  • When I tried to discuss his practices, he stated that (1) I'm wrong (with no evidence to counter my opinion), and (2) he can't be bothered to have this discussion. Even though he just released a paper on this topic.
Some of the results from some scanning groups do have beneficial purposes for the security community as a whole. On rare occasions, they may release advisories, white papers, or work with vendors for a patch before any public disclosure. Personally, I use them in a more passive way: if I want to know what the next big exploit will be, I look for an increase in scans from Shodan. Shodan usually looks for new exploits before they are publicly announced. But even in these few beneficial cases, let's not pretend that they wear white hats.

To Henriques: If it looks like an attack, then it's an attack. You shouldn't expect system administrators to wait around to see if the reconnaissance leads to a compromise. And if people are constantly telling you to stop attacking them, then you should realize that you are attacking them. Perhaps you should reconsider your scanning technique. Finally, I should not have to opt-out of your "service" when I never opted in.
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
Lately I've been having a fun discussion with my friend, Bill, about some of the good uses for Python. A lot of people promote Python as a good "my first programming language" for new software developers. One person at the Python Conference (PyCon) even pointed out that "Python is the official language for education in French high schools". (This comes from a report at InfoTech News.) Personally, I recommend JavaScript as a first language. Bill asked if JavaScript was a good fit.

Why Learn Programming? There's a lot of reasons why people might want to pick up programming. For example:
  • Jobs and money. If you are a really good programmer, then you can at least get a job (people are always looking to hire programmers) and those jobs often pay well. A lot of people think that programming is a good career move.

  • Curiosity. TV shows like "The Big Bang Theory", "Silicon Valley", "Mr. Robot", and "The I.T. Crowd" all feature programmers having fun. I know a few people who took programming classes because they wanted to know if it really was this fun. (Personally, it depends on how easily you take to it. I really enjoy programming. But I can't watch Silicon Valley because it's too close to the truth and the truth hurts.)

  • New skills. Even if you're not looking for a new job, learning how to program can really teach new skills. For example, have you ever tried to configure a home wireless router, add a spam rule to your mailbox, or write a macro for some Excel spreadsheet? If you know how to program (even a little), then all of those tasks suddenly become much easier.
There's a lot of reasons why someone might want to learn how to program. But remember: programming is not for everyone. And switching jobs to become a high paid programmer is often more of a cliché than reality. Also keep in mind, teenagers and adults will learn differently than really young kids. (Teaching really young kids how to program should be a different blog entry.)

The next question is: where to start? There are plenty of good choices for "my first programming language". In my opinion, JavaScript and PHP are both great choices. I think languages like C, C++, Java, and Perl are pretty advanced starting points and make better "my next language" options. But Python? I think Python is a horrible choice for a first language.

Goals I've learned way too many programming languages, and I've learned them in lots of different ways.
  • I've taken after school programs to learn Fortran and C.
  • I've taken formal college courses that taught languages like Assembly, C, Pascal, Lisp, and Scheme.
  • I used books and manuals to learn Rexx, PHP, and Perl.
  • I picked up Shell (Bourne, Csh, Ksh, Bash, Ash, and Dash) by looking at a few examples and reading man pages.
  • I've been through short seminars and training sessions on Ruby, Java, R, and other languages.
  • I've even taken a college graduate course on denotational semantics (all languages have the same basic characteristics, even if the syntax changes) -- where we learned a new computer language in every session, just to see how it's all the same.
One thing that has come through really clearly in all of this overly-educated experience are the steps needed for understanding how to program in the first place. There's a couple of things that you must get out of your very first programming language -- regardless of the language. These are:
  • Variables and Values. What is a variable? How are they used? This is such a fundamental concept that most programmers don't even think about it. But as a fundamental concept, it is the first thing you need to understand. (Kind of like learning Calculus: if you don't understand the fundamental concept of "the limit", then you're not going to understand the math.) To make a long definition short: values are numbers, text, or objects, and variables store values. (Comparing with English: values are "nouns" and variables are "pronouns".)

  • Instructions and Operators. These are the basic commands and syntax. Every common programming language has some kind of assignment statement, some kind of conditional statement, and some kind of looping construct. And regardless of the language, you're probably going to need these. (Again, comparing with English, these are your "verbs".)

  • Flow. Where does the code start? How does it process instructions? When or where does it end? This also includes the order of precedence for a complex set of instructions and operators.

  • Functions. Depending on the language, these may be called functions, procedures, methods, constructors and deconstructors, modules, routines, or subroutines (or "subs"). It's all basically the same thing: Blocks of code that performs specific tasks. Functions go along with parameters (a type of variable) and results (another type of variable).

  • Scope. Variables may not exist everywhere. Instead, they can be defined and used as needed. The scope defines the context where a variable exists.

  • Inputs and Outputs. How do you communicate with the program, and how does it communicate back results?

  • Comments and Maintainability. I can't emphasize this enough. The comments should be clear enough that, ten years from now, you can look back at the code and immediately understand what it is doing. Do not assume that you'll remember everything about the programming language or the program. Code is never "self explanatory".
In my opinion, these are the absolute basics. Ideally, there are additional advanced topics that you should get from your first programming languages. For example: typing and casting, memory management, inter-process communications, and parallel processing. There are also concepts like abstract data types (arrays, lists, dictionaries, hashes, stacks, structures, unions, etc.) and common search and sorting methods. But all of those are more advanced topics and not absolutely necessary for brand new programmers.

Even though most programming languages have the same basic capabilities, the constructs and integrated features of each language makes each of them different. As an analogy, every screwdriver in your toolbox has the same basic purpose, but some are big, some are small, and some are designed for specific tasks. (While I could write a word processor in Perl, I really wouldn't want to support it. And C can do complex string parsing, but other languages can do it more easily.)

If it were just these basic fundamentals, then any language could be a good first language. However, there are three other things that you should get from your first language: consistency, support, and transference. The more consistent a language is, the more likely someone won't become discouraged. (We don't want to drive away new programmers with confusing exceptions.) The better the support (documentation, examples, etc.), the easier the language is to learn. And most importantly: if this isn't going to be your only language, then the lessons learned from it should be easy to transfer to your 2nd, 3rd, and 12th language.

Given these requirements, I still think JavaScript and PHP are great first languages. But Python? No way -- it still sucks.

Python issue #1: Consistency There are three basic paradigms for programming languages: procedural, functional, and object oriented. (They are not the only ones, but they are the big ones.)
  • Functional languages heavily rely on functions for everything. There are often few or no variables (but potentially lots of parameters), and very few global constructs. Examples include Lisp, Forth, Scheme, and Lambda (yes, there's a language called Lambda where every function and every variable is named "lambda").

  • Procedural languages use blocks of code called procedures. (Even if the nomenclature in the language calls it a function or method or subroutine, it's technically a procedure: a series of processing steps.) These languages heavily rely on variables and often use global structures that are accessible from any part of the code.

  • Object-oriented languages are a half-way point between functional and procedural, and are heavily dependent on scope. The main structures are called object and objects can have variables and functions (mostly methods, constructors, and deconstructors). Variables can exist globally to the program, locally to an object, or locally to a function inside an object. Moreover, these local object variables can be externally accessible (public) or restricted to use within the object (private).
(This is nowhere near the entire list. I'm sure that object-oriented programmers are shouting things like "What about inheritance, encapsulation, and polymorphism?" That's all more complex than the very beginning stuff.)

As a heuristic, it's pretty easy to tell these apart. If you see code calling functions with basic names and some parameters, such as "myfunction(a,b)", then it's procedural. If you see names with dots separating a method's handle from an object (e.g., "myobject.myfunction(a,b)") then it's object oriented. (Most languages use dots to access internal components to objects.) And if you see lots of functions calling functions, such as "funa(fun(b),fun(c))", then it's probably functional.

While many languages have support for functional, procedural, and object-oriented constructs, most languages are primarily focused on one type of construct. C and Bash use procedural. C++ and Java are object oriented. Lisp and Scheme are functional. PHP, Perl, and JavaScript are usually procedural but do easily support object-oriented functionality. (Technically, you can use a procedural language to implement a functional or object-oriented style, but it may not be as trivial as using a native object-oriented language. Like I said, most languages are effectively equivalent.)

And then there's Python. Like PHP and Perl, Python natively supports object-oriented programming. However, most of the code I've seen programmers build are procedural and not object-oriented designs. Complex add-on modules, like BeautifulSoup, SciPy, and TensorFlow are object oriented, but most people seem to integrate those modules into procedural programs. You can even see this in a lot of the sample code. For example, SciPy's samples use procedural top-level functions calling the object-oriented constructs that come from add-in modules.

However, let's ignore the add-on modules for Python and focus strictly on the core instructions. To add a value to a list, you use mylist.append(value). This is an object-oriented construct. But to identify the number of items in a list (the length, or len, of a list), you use len(mylist) -- a procedural construct. This means that the Python developers couldn't decide which basic paradigm to use when designing the language.

There's also the issue about precedence. To split a string into parts, you might use mylist=mystring.split(' ') to separate the string at every space. In this example, the primary object is mystring and the internal method is split(). However, to recombine the components back into a string with spaces, you would use newstring=' '.join(mylist). With most other programming languages, the join function is associated with the object that is being joined: newstring=mylist.join(' '). Compared to every other common language, Python has it backwards.

While these are unique quirks to Python, these inconsistencies teach new programmers really bad habits. Consistency makes code easy to learn, build, and maintain. Inconsistency makes new programmers frustrated.

Python issue #2: Support Even if you have taken a class on Python and built your first Python program, you probably haven't memorized every core construct. New programmers often find themselves repeatedly asking "how do I create a new list?" or "is a search needle,haystack or haystack,needle?" or "how do I <insert common task>?" This is where online documentation and community support comes in.

C has been around for almost 50 years. There are plenty of online resources and examples. But even so, not all of them are as easy to follow as they could be. This is one reason why C is a difficult first language. As the saying goes, C has all the power of an assembly language with all of the ease of an assembly language.

PHP and JavaScript have absolutely outstanding support. The online manuals at php.net and and Mozilla developer provide very readable descriptions of every function -- what it does, how to call it, what it returns, and error conditions. They even include plenty of examples. Using php.net, I taught myself php in under an hour. And when I first came across 'promises' (an advanced JavaScript topic), the documentation at Mozilla made it trivial to understand (and dislike, but it's understandable). Because of these excellent resources, I think PHP and JavaScript are ideal "my first language" options for new programmers.

And then there's Python. The documentation at python.org was clearly written by programmers who already know the language. They use cryptic variable names, generalized examples, and complex descriptions. Even knowing how to use a list in Python, I have trouble following their documentation. Most examples for Python are either overly technical or overly generalized; they don't just say in plain language how any particular topic works.

While I know some self-taught Python developers who learned Python this way, I don't know anyone who says it was easy. The lack of good documentation and support makes Python a really bad first language.

(If you want good Python documentation, look at w3schools. They also have good documentation for PHP and JavaScript. However, the native documentation for PHP and JavaScript is a little better than w3schools. In contrast, the native documentation for Python sucks.)

Python issue #3: Transference Like most languages, Python supports strings and numbers and binary data. (Sure, some people complain that it only supports one type of anonymous lambda function, but that's way too advanced for beginning programmers to debate.) I don't even have an issue with how Python does dynamic type casting. Rather, my concern is with how Python handles variables, parameters, and scoping. In particular, Python uses lazy binding and mutable default values. There are some great examples at SoPython's Common Gotchas In Python, such as:
x = 23
f = lambda: x
x = 42
print f()
#expected result: 23
#actual result: 42
def frob(x=[]):
print len(x)


#expected result: 1
#actual result: 2
value = 42 #ascii e
valuе = 23 #cyrillic e
#expected result: 23
#actual result: 42

Every language has quirks, but Python has more than most. For anyone learning Python as a second programming language, these are very annoying quirks. Notice the wording used by SoPython: "expected result". The expectation is compared to nearly every other programming language out there. If you wrote this type of code in C, PHP, JavaScript, or any other common language, you would get the expected result. Instead, Python generates different results.

These quirks are big reason why Python is a really bad choice for a first programming language. The lessons and knowledge you acquire from Python's handling of fundamental topics regarding variables, parameters, and scoping are not transferable to other languages. In contrast, if you learn C, Java, PHP, JavaScript, or most other languages, then there are fewer quirks and a lot more consistency. You can take what you learned from any of these non-Python languages and rapidly transfer the knowledge to other programming languages. Going between PHP and JavaScript to other languages has a much lower learning curve than going from Python to anything else.

Rather than learning Python first and then viewing every other language as weird, you can learn every other language first and then realize that Python is very weird. Weird by default is not a good choice for a first programming language.
(Hey Bill, did this answer your question?)
Read Full Article
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 
The Hacker Factor Blog by Dr. Neal Krawetz - 2M ago
A few times each year, someone writes in to complain about my site's use of ads. (Sometimes they complain about hackerfactor.com, and other times it's fotoforensics.com.) The problem is, I don't use ads. In fact, I take personal pride in knowing that I have two sites in the Majestic Million (the top 1 million web destinations; the top 0.05% of Internet destinations) and both of my sites are among only the tiniest fraction of popular sites with zero ad content and no links to third-party services.

If you see ads on your browser when viewing any of my web sites, then it's not because of anything I'm doing. More likely than not, your browser or computer are infected with adware, spyware, or other kinds of malware.

Detecting Problems Over at FotoForensics, I have a tutorial about malware. The tutorial includes a couple of simple tests to see if your browser is showing signs of an infection or data hijacking. For example, the first test checks for unexpected cookies and known-indicators in your HTTP request. Basically, some malware alters your HTTP header. If my server sees a header that indicates malware, then this test flags it.

The second test looks for ad blockers and problematic plugins. Every browser should be using an ad blocker. Besides being annoying, ads can be used to track user activities and perform data collection. Ads are also occasionally a vector for infection. Similarly, browser extensions for Flash, Silverlight, and Java can be used to infect your computer.

These tests have worked pretty well. Over the years, I've had a few people write in with requests for assistance with their infected systems. However, I don't want that kind of liability. My test page will tell you if it detects problems, but I'm not going to fix your computer. There are other companies that specialize in removing malware.

Unfortunately, I occasionally find indicators of a problem that are not associated with HTTP headers, unexpected cookies, or online ads. Instead, I see browsers or intermediary proxies altering web content. (I usually see this with "free" open proxies, but over the years I have also caught a few Tor exit nodes altering content.) Usually the altered web content includes trackers or ads.

New Test! I just added a new test to my malware tutorial. Test #3 checks for unexpected HTML alterations. Here's how the test works:
  1. I use an iframe (a web page in a web page) to download some very simple HTML:
    <html><head></head><body>Cat <a href='/'>Link</a> computer</body></html>
    (Yes, it really does say "Cat Link computer".)

  2. The test checks the iframe's content to see if it matches the known HTML.
This test is designed to catch software that rewrites hyperlinks, adds in links for ads around common words (e.g., "cat" or "computer"), or inserts additional HTML. It's also really small -- so any other alterations will be easily noticed.

Expected Alterations As a web developer, you might think that the web browser would store the HTML exactly as the server transmitted it. However, that's usually not the case. Most web browsers make a few changes as they parse the HTML structure. For example:
  • Last newline: My test transmits the HTML and ends with a newline character. Most browsers remove the final newline.

  • Inserted newline: My test sends 3 words: "Cat", "Link", and "computer". There is no line break between the word "computer" and the end-of-body tag (</body>). However, most web browsers will insert a newline before the end-of-body tag.

  • Quotes: HTML tags can contain attributes. My only attribute uses single quotes (href='/'). However, most web browsers will rewrite them to use double quotes (href="/").

  • Firefox: Beginning around Firefox 60, the Firefox browser began adding in a class to the HTML tag: <html >. You won't see this alteration with Firefox on Linux or Mac; it's only on Firefox for Windows. (Or the Tor Browser. The Tor Browser will do this on every platform.) Firefox also adds in trigger events to the HTML tag, but those don't show up in the raw HTML. In contrast, Chrome doesn't add events or classes to the HTML.
These are all harmless alterations; these alterations don't change how the web page renders or how content appears. But my detection tool looks for any change, so it needs to account for these minor changes.

Unexpected Alterations Any other change to the HTML content means that something other than the default browser made a change. For example, some anti-virus products insert hooks into web pages. This way, they can intercept web queries in real time. Kaspersky, Avira, Norton, and Dr.Web are all known to alter web content. These alteration should be detected by this simple test. (As far as this new test can tell, Sophos anti-virus doesn't alter anything.)

In my test code, I flag Avira, Norton, and Dr.Web as harmless alterations. (This is not a product endorsement.) In contrast, I flag Kaspersky Anti-virus as a problem based on security concerns voiced by the US and UK governments. While it is an area of debate whether Kaspersky's software is scanning for sensitive information, I'd rather not risk it -- especially when there are plenty of free alternatives that do not have the same concerns.

Any other observed changes will be flagged as problematic; you should investigate the cause. It could be something simple, like another anti-virus system that I haven't identified, a corporate proxy that scans for malware and alters URLs, or some desirable plugin, add-on, or extension. However, it could be something else.

The current test keeps everything in your browser; it does not report back to me. However, if Test #3 flags the content as an unexpected alteration (you'll know because of the red skull and crossbones), please consider sharing the alteration with me. If it's something common or expected from some known software, then I'll update the detector so it won't flag it as problematic. And if it's a malicious alteration, I'd like to take a look -- even if I can't help you fix the problem.
Read Full Article

Read for later

Articles marked as Favorite are saved for later viewing.
  • Show original
  • .
  • Share
  • .
  • Favorite
  • .
  • Email
  • .
  • Add Tags 

Separate tags by commas
To access this feature, please upgrade your account.
Start your free month
Free Preview