Enabling security through effective interface design

Kudos to the Mozilla Firefox team. I upgraded to Firefox 3 today, and shortly thereafter went to Travelocity to schedule a trip. To my great pleasure, I noticed that the SSL certificate is provided in the URL bar, with a green background to indicate it's trusted.

This information has always been available to users, but how to access it - or even the need to - wasn't something intuitively obvious. The little lock showed up, so everything is encrypted, meaning I'm fine, right? With this interface, you not only clearly see that the certificate is valid, but who it has been issued to. This required a bit of clicking around before - something few were willing to do. Admit it, how often did you check?

Not only that, but the most important details appear at the click of a button, not in a separate window but as a pop-out. Of course, the complete details are also available.

This is precisely how the industry can empower users to act securely and make the right decisions without a second thought. More integration of security features into interface design is exactly what we need, and I'm glad to see the Mozilla team start to walk that path.


Reducing malware analysis with code comparison techniques

This is another topic that I file under "someone must have certainly done this already"...

We're struggling with the influx of custom malware that has exploded since 2006. The skills necessary to reverse engineer code are hard to find, and expensive when they surface. As a result, bandwidth is always limited for an organization faced with the need to understand the inner-workings of malware to assess damage, scope, and impact of a system compromised by custom code.

There have been a few discussions within my team recently about how these valuable skills can be focused. For years we've worked to reduce the set of malware that necessitates deep analysis by identifying techniques that enable us to make inferences about the unknown code by comparing it to similar known code, or making assumptions based on its context. Discussion has heated up on this topic of late, especially since a colleague began using an intriguing, if unproven, statistical technique to group malware.

The first question that should come to the reader's mind is, "haven't the anti-virus companies already solved this problem?" They should have. But we've seen first-hand that if they know how to solve this problem, it is either ineffectively implemented or not implemented at all in their code. I could tell stories, but that's not the point of this entry.

The technique that keeps coming to my mind as promising is an analysis of code which represents its flow control as a graph, and then searches for isomorphisms in other code flow graphs to identify identical or similar executables. Identifying complete isomorphisms between graphs is a well-studied problem. For one such example, this paper discusses its utility with VLSI hardware, comparing circuit diagrams to chip layout. It stands to reason that a similar technique could be used with what I'll call the identical software flow problem.

Those with an interest in computational complexity theory would find the following both relevant and intriguing: the graph isomorphism problem has not been proven to be NP-complete, nor is it known to be solvable in polynomial time, meaning it is only NP. Special thanks to Wikipedia for this link (huge PDF), which discusses solving the graph isomorphism problem efficiently despite being NP.

The problem of identifying similar pieces of code, which I'll call the software flow similarity problem, is much more involved and from what I can tell much less studied. In this case, flow control graph subsets would be compared between pieces of code. Some key questions here are:
  1. How big or complex must the subset be, as compared to the complete flow graph, to be meaningful?
  2. How many matches of graph subsets must be identified to confidently call code segments similar?
This is but one technique, and determining software similarity is likely to involve a number of other techniques - computed, observed, statistical, or what have you. I feel this approach would be a very strong indicator on its own, although it would be far more difficult to implement and study than some other heuristic approaches. I'm going to continue searching for papers which discuss these techniques; it seems hard to believe no one has done this before.

Nerd humor

Thanks to my girlfriend for finding this one...

The image isn't coming out so well in blogger, so if you don't have uber-perfect vision, the original is here.


Introducing Ex-Tip

In this post, I'd like to introduce a tool I've been working on called Ex-Tip. Begun as a GCFA Gold practical and developed in Perl, the code is very premature at this point. I intend to develop it through a Sourceforge site I've registered for that purpose, although I haven't yet uploaded the code. I will communicate updates through this blog.

Full disclosure: I do not consider myself to be a developer. The version 0.1 implementation was designed as a proof-of-concept to demonstrate the utility of an easily-extensible, multiple input-output timeline generation tool. It was not designed with memory nor computational efficiency in mind, and has many limitations that can be addressed via further development. Of course, I welcome any feedback, or solicitations for offers of help.

Here is the introduction section of the paper that this code was meant to accompany:

Tools exist to construct timelines based on modify, access, and create times of files on various filesystems to aid in forensic investigations. Sleuthkit's mactime in concert with fls or macrobber is a common example. However, in most investigations, the timeline needs of the forensic analyst have become far more encompassing than simple file activity. Investigations often necessitate a step-by-step recreation of events pulling time data associated with Windows registry entries, anti-virus logs, intrusion detection systems, and any other data available to supplement filesystem activity. At times, both in the lab and in the field, investigators find new time-stamped data that warrants inclusion in a timeline, such as custom application logs. As the digital forensics field matures, the list of critical data available grows longer, as does the number of timeline visualization tools available for data presentation. Adding to the complexity, the nature of these data sources is dynamic as software versions change.

All of this considered, one can see that a gap has emerged between the timeline data needed by analysts and flexible, portable tools available to easily consume this data - aggregation, normalization, and visualization, to be specific. This paper describes an extensible framework to achieve these ends, with plug-ins provided for common timeline data sources and output formats as proof-of-concept.

Image courtesy http://www.timemiser.com/