Putting People in Their Place With Metadata - HCCI-214 Session 4

Previously in this course, we've discussed how CEO's end up sharing their private email contacts publicly, how private company files end up indexed by Google, and how mistakes in web server configuration can give crooks a shopping list of vulnerable data within your organization.  These days though, it's nearly impossible to conduct business without sharing some data.  Every business needs a website and typically, such a website will have things like whitepapers, case studies, press packets, and more available for the public.  However, a problem arises when software manufacturers see fit to include semi-private information about document authors hidden within each document.  That information is known as "metadata."  Metadata, in the wrong hands, can be reconstituted into usable intelligence about a target organization.  In this exercise we'll review examples of this, some more successful than others.

How Metadata is Added to a File

Whenever a file is written by your operating system, information about the time the file was created and when it was last modified are included, hidden, within the file.  This applies to all files.  Hidden information about circumstances under which a file was created, edited, accessed, and more are called metadata.  Professors love metadata because when they get that student who says their paper was on time, even though the email didn't go through, the professor can point to the metadata and ask why the paper was edited 2 hours after the deadline.  Bosses love metadata because it shows which employee "accidentally" deleted the printout counts from the networked printer, making it impossible to hold employees accountable for their supply usage.  Prosecutors love metadata because, obviously, it shows who logged into which device and did what action at what time.  Metadata tells anyone who cares to look which cellphone you used to take your photo, which version of Adobe Photoshop you used to edit the photo (and potentially whether or not it was pirated), and so much more.  Metadata is the friend of the truth and the enemy of deceit and obfuscation.  It's also the tool we'll be using to map out the interior structure of a government agency in this exercise.

Example 1: Reading the US State Department's Metadata

Above: Metadata for a US State Department file
Government organizations are great for security demonstrations because by law, most information
about them is already within the public record.  Things like job title, email address, telephone, and salary are public record.  So though these organizations don't always make it easy to contact them, no one is getting in trouble for posting the Department of Environmental Protection's employee directory.  So, that concludes our disclaimer!

For the sake of example, consider: https://hiu.state.gov/Products/SyriaIraq_YearInReview2014_2015Jan29_HIU_U1173.pdf  -- This is the February 2015, Syria Iraq Conflict 2014 Year in Review Document from the US State Department.  It's a beautiful map by the way.  The document is unclassified and has been posted online as a PDF for our perusal.  Who really wrote it though?  How long did they spend on it?  What software did they use?  What else do we know about the author?

  1. The first step will be to download the PDF to a file folder on your PC.
  2. Next, you'll need to right-click and open the PDF in a viewer, like Adobe Reader.
  3. There are plugins that will tend to automatically open the PDF in your browser if you double-click the file, but Adobe Reader is better for our purposes.
  4. From Adobe Reader, click the File menu and select Properties from the drop-down.  A Properties window will pop-up, listing File, Title, Author, Subject, Keywords, Created, Modified, Application, and more.

Often, this will be enough to show us who made the document and when.
If you see something along the lines of "jsmith" in the Author field, you can make a pretty good guess that John or Jane Smith wrote the document and if you wanted to email jsmith at whatever the organization's email is (i.e. jsmith@state.us.gov), you'd have a fair shot of getting them.

The State Department is pretty good at security, which is great for us, and so has populated the Author field with the name of the department which published the file and clearly made a point of creating a new copy of the file, then publishing it in short order (since the Created and Modified fields are so close in time).  This effectively robs any miscreants of the ability to easily acquire data about that organization.  On the other hand, Adobe Reader is only one tool for gathering metadata.  Other tools such as Adobe PhotoShop and free tool "Exif Tool" can view metadata as well.  In most cases, these tools would reveal additional information about the author, but because the State Department not only encrypted the document, but used multiple layers of password protection, these tools also came up empty.  Again, it's good news when the government has great security.

Example 2: Audubon Society Caught by Metadata

Those engaging in Corporate Intelligence gathering will find that corporations seldom have such robust security.  For example, a political watchdog group in New Jersey was able to demonstrate in 2014 that the New Jersey Audubon Society had used donor dollars to hire an outside consultant tasked with training employees in fundraising tactics that were less than ethical.  This information came to light when a PowerPoint presentation that was left on a public server had the name of the consultant as author within the metadata, complete with the dates that work was started and completed.

Example 3: Metadata and Combination Attacks

As a final example, New Jersey and a number of other states implemented a new primary school standardized testing program in 2015 called "PARCC."  The rigorous test took up a great deal of time previously spent teaching and thus became the subject of much controversy and protest.  Fortunately, the PARCC consortium, working with test developer Pearson and various state governments, had developed a series of education PDFs intended to alleviate concerns among parents and educators alike.  Unfortunately for PARCC, protesters seeking a link between special interests groups and the government found that metadata made available through the PARCC marketing materials made their job a lot easier.  Employee names left in the Author field of various PARCC files permitted protesters to map out the organization's hierarchy in a way that otherwise might not have been possible.  PARCC was also subject to combination intelligence efforts where, armed with the names of employees from PARCC who were never intended to be public facing, protesters gained the ability to target individuals within the PARCC organization directly.

Using nothing more than the first and last names of employees (often hourly) who draft documents for a target company, it is possible to:

  1. Perform a targeted Directory Harvest Attack (DHA) against the company's email servers, resulting in the leak of a complete employee email directory.
  2. Cross-reference employee names with public records and social networking.
  3. Contact those employees directly, for harassment, employment poaching, or even with offers of pay in exchange for information leaks.
  4. Take up valuable work time by targeting those employees with contact via email or in more political cases, petitions.

Employees singled out and approached as a result of this sort of competitive intelligence are more likely to:

  1. Feel attacked and threatened, both personally and professionally.
  2. Make damaging and mistaken statements about their organization's policies.
  3. Respond to attacks / queries and inadvertently reveal confidential organizational information.

How to Limit Metadata

While the gains from metadata competitive intelligence efforts can often be modest in the near term, the variety of threats posed by such disclosures in the hands of a seasoned professional are nearly limitless.

It is extremely challenging as an organization to remove metadata from each and every document produced.  One method to limit exposure is to register purchased software to the organization or department, rather than to employees.  Another method is to ensure that PCs are configured not to use the name of the employee as the name of their documents folder.  Software packages tend to automatically populate the author field from those places, so they are a good place to start with security.

It is possible, though not always practical, to add false trails to metadata which limit its competitive intelligence use.  Such efforts are sometimes known as "honeypotting," as in the case of any security hole created specifically to gain information about an attacker, or "poisoning the well" (including erroneous data mixed with usable data).  It's important that any such efforts taken not detract from the professionalism of the organization.  No prospective investor wants to hear that they've been mistakenly emailing a dummy account which exists as a security feature.  That scenario can occur with "catch-all" email servers which falsely report all email addresses tested (by an attacker) as legitimate: well-meaning corporations simply fail to keep up with the occasional valid email (with misspelled recipient) which ends up in the spam folder.

Finally, there are many solutions for removing metadata or preventing it from being attached to a file to begin with, but they are outside of the scope of this article.

Enroll in the course now: Enter your e-mail for VIP Updates at the top of the page

The entirety of Christopher Lotito's Health Care CFO Competitive Intelligence Master Class can be found online here: http://www.christopherlotito.org/search/label/HCCI-215%20Course -- The self-paced course runs through late March 2015 and the content will remain online after.

Contact the author on LinkedIn or via the comment form.  http://www.linkedin.com/in/christopherlotito

Popular posts from this blog

How to Keep a Secret Online

How to Turn an Email Server Rogue - HCCI-214 Session 5