This page describes our review of existing formats and tries to explain, why we are not using one of them. It works as summarization of benefits and drawbacks of each of them, together with our subjective remarks.
Intrusion Detection Message Exchange Format is format created exactly for exchange of information about security events between detection probes.
It is based on and very tightly coupled with XML and its structure makes heavy use of its paradigms.
It has rigid structure and some fields are dynamic (it is for example able to represent more sources or destinations of the attack). Structure is also very deep and wordy, for example to address attack source IP address, we have to use locator “Alert.Source.Node.Address.address”. Some fields are also recursive, so depth of its structure is not limited and can be arbitrary.
IDMEF has limited means to allow extensibility:
Specification often mentions subclassing and aggregation, however these are reserved for specification authors and (possibly) for some next official versions of IDMEF.
Format is often redundant – timestamps are represented both in machine readable format and human representation, which creates ambiguity in case of error (which representation should be authoritative?).
Also it is sometimes inconsistent – it supports long list of historic (or even obsolete) network protocols in one field, however URL and SNMP are classes on its own.
Incident type is free text, but there exist specific classes for buffer overflow, correlation, tool – but for nothing else.
Format can be validated against the thoroughly defined schema (which is a good thing), however schema itself is not enough – some cases must be validated specifically (for example timestamps, IP addresses).
IDMEF is in fact the basis for our design, it is format which is able to describe the widest number of incident types. The basic problems are its verbosity, depth and attempts to describe “everything”. According to our experiences, several means to group and structure various types of information are never used in real life scenarios. Its drawback is also the need for complex libraries – lightweight detection probes are usually not able to generate IDMEF messages directly and need some kind of intermediate format and translator.
X-ARF Network Abuse Reporting is format, directed towards exchange of incident reports by email between security teams. Its purpose is not for exchange between machines, however we have also included it into analysis, because security event messages often leave its perimeter and get exchanged between various groups of security specialists.
The basis of the event description is set of keys and values, which is enclosed in MIME/DSN encoded container along with human readable description, potential attachments. The structure is rigid and also static – every event can describe only one source and one destination.
Different types of events have specific schemes – there is a small set of values which is common for every types, however every type has its own set of distinct descriptors. So far there are only four different schemes (four different types of security incidents) defined.
Also this format has some reusable ideas:
Incident Object Description Exchange Format is also meant for exchange of incident report description between CSIRT teams. This is evident for example from ability to communicate recommended/required action (Action element).
IODEF is able to describe the whole case and its timeframe, not only one event – each decisive or communication action can be documented.
Like IDMEF, it has limited means for extending, but usable only on several explicitly defined places.
Specification also suggests, that some kind of interoperability with IDMEF is possible. According to our observation, there are only two ways – IDMEF can be recomposed into IODEF, or it can be inserted as is into IncidentAlert wrapper node. Problems with this former approach are evident – analysing tools must be able to decipher the complexity of both formats.
There is much of freedom in representation of Assessment field, types and semantics are dependent on attributes. Sender can choose one representation, which it will continue to use, however receiver would have to know, or in worse case interpret all of them.
Interesting ideas to reuse:
Format, used by Warden system for exchange of simple events from security probes – honeypots, SSH logs, netflow probes, etc.
It could be explained as an analogy of netflow for security incidents. Warden messages are rigid static messages, which (similarly to X-ARF) cannot describe more sources and/or more destinations. Also, information about target of attack is by design incomplete (users usually do not want to spoil addresses of their honeypot systems), however that complicates more general usage, because in incidents against production infrastructure this information may be vital for analysis.
There is no extensibility – just free text field for additional data, mostly meant for human inspection.
Format is used as SOAP serialized HTTP/RPC call – advantage is simple implementation, disadvantage necessary complexity of marshalling and RPC libraries.
Incident timeframe information is incomplete – bulk incidents can get aggregated, but format does not specify, whether timestamp describes start of aggregation window, its end or moment of aggregation.
The specific of this format is tagged description of detection probes – receiver can deduce probe capabilities without exact knowledge of detection software. However, reader must explicitly request this information via another RPC call. The code is under open-source/libre 3-clause BSD license.
AbuseHelper is project for automated processing of incident notifications. Architecture is designed as network of “agents”, which receive, gather, modify and process messages. Agents communicate over XMPP (Jabber) protocol, and there exists number of agents for multitude of existing external sources.
The messages are free sets of keys and values, which do not specify any semantics. On key can contain one value, or set of unique values (so it has limited support for more than one source or destination).
The code is under open-source/libre MIT license, well covered by unit tests, but practically without documentation. Further documentation is apparently available at closed site of ClarifiedNetworks.
By inspection of code in “contrib” directory we can obtain a (fundamentally incomplete) list of keys:
abuse_email, as_description, asn, as_name, av detection, av result, bgpranking, collab url, count, country, country_code, description, description url, download, dummy counter, expert, feed, feed url, geoip cc, gwikipagename, host, id, id:close, id:open, ip, last seen, latitude, longitude, md5, netblock, observation time, sandbox analysis, sbl id, sha1, source, source time, target, type, updated, url, virustotal, webshot
This freedom in key assignment can cause inconsistencies in processing, semantics must get guessed from the code of generating agents or by best effort heuristics.
Important note to make is that none of mentioned formats directly supports anonymization.