TB for incident response investigations?

jerome75 · December 27, 2019, 7:02pm

Hello !

I just discovered TB through the Winter Festival offer, and from what I came to understand of it, I feel it could help me do my job better. But before committing to learning it, I would like to make sure it is, indeed, suitable for what I have in mind.

I work on the investigation side of computer incident response. I must quickly ingest a large quantity of information related to IT systems, individuals and timestamps, in a very fluid context (new information coming in non-stop that can change the current understanding of the events at any point) and ultimately make sense of it.

By the end of my engagement, the goal is to know how an intruder got in, what path he took across the IT infrastructure, and exactly what he did / planted / exfiltrated.

Visually, this is usually shown as a graph of various connected IT systems where we follow a trail of many hijacked user or service accounts, and connexions from/to other systems, with the attacker switching accounts regularly, thus making the timing element crucial (so we have both a graph and a timeline)

This is very intense and I usually find myself writing down in a hurry tons of IP addresses, machine names, user names, user accounts names, filenames, and various timestamps (file modification, systems access, etc). I am piecing this information from both a large OneNote document coming from the forensic team and from my own interviews on-site.

And as you can probably guess by now, there are a lot of “I know I’ve seen this user account/IP address/user name somewhere…” moments, or “to what system(s) does this account belong to, again?”.

OneNote, Excel and other tools so far did not help me making interesting relationships “emerge” as new data keeps pouring. It is still a very manual and intuitive process.

That’s where I feel TB could help.

I’m mainly interested in two aspects of what I have understood of how TB works:

Seeing new connexions emerge as I enter new data (I might, for example, get a machine name + its IP address from a suspicious behavior. I don’t have time to investigate it fully at the moment. The next day, I get from another source several IP addresses, with no machine names attached. One of them corresponds to the previous machine. I would like to see the link between the two informations emerge)
Having a useful timeline self-construct. Being able to see that this machine cannot have been compromised from this other machine because the attacker accessed before it. Or better yet, imagine we have seen a specific user account trying to connect to a specific machine at some specific date, in a way that makes us take notice. We can’t investigate right now because there are much more obvious trails to follow at the moment, but I make a note about it. Later, we discover another machine that was clearly compromised, and it turns out that this is where the intruder gained access to this account, and from this point on started using it to connect to other machines. If this new event happened before the previous, then the initial machine becomes much more interesting for our investigation. This is the kind of emerging insight I’m after.

I am very interested in the way that TB allows to enter data unstructured (or using a simple arbitrary text structure to make things easier). In the heat of an engagement, I often only have a couple minutes back at my laptop to dump whatever new information I’ve gathered from interviews, or to insert new data from the forensic team, before being called up again. So this “pour data in, let agents figure out the connexions” approach really works for me.

What are your thought? Can TB help with some / all of these use cases ?

Thank you for your time!

Best regards,

JB.

eastgate · December 27, 2019, 7:40pm

I think this could work.

What intrigues me about Tinderbox and your work, really, is not so much facilitating individual investigations — though it ought to help there — but rather gradually forming a “little black book” of information and techniques that could accumulate and grow over months and years.

For example, you might hypothetically note a pair of IP addresses x.x.x.178 and x.x.y.169 thats seemed to be working together at Acme, and also notice something a little odd about 169. Six months later, you’re working on a different problem, but there’s x.x.y.169 again! And — now that you know to look for it — there’s the same oddity.

It’s still going to be a manual and intuitive process: that’s why we need investigators! The easy parts that can be automated and don’t require intuition are built into the hardware. But I do believe that writing things down, keeping your hands on the data, and avoiding premature formalization will eventually lead to real insights.

jerome75 · December 27, 2019, 8:55pm

Thanks for your answer! Indeed, I was only focusing on my own tasks down in the trenches, and thus I just imagined having separate TB documents for each engagement (also, many IPs are internal, so unique to each engagements). But the prospect of finding commonalities between public IPs during diverse engagements - even at my lower level - is interesting indeed, as it could produce useful intel down the road that would complement what’s already out there - and this info might even get to me faster if it’s already there in my TB notes instead of relying on an analyst pointing it out for me.

Of course, there are already solutions that help analysts manage such Indicators of Compromise (IoCs), but they are larger, team-oriented, products (and sometimes quite expensive!).

I guess I’ll take the opportunity of the Winter Festival and get a TB licence, even though it’ll probably be a while before I can really focus and use it on the ground! But I’m very intrigued by the prospect (I even tried to use Gephi, at some point, with not much success because it was not suited to very quick and unstructured entry).

mwra · December 27, 2019, 10:22pm

An unknown here is scale: web/net issues scale very big very quickly. Tinderbox is a desktop app, so concerns about scale matter. However, these can be mitigated and the effort is worthwhile as where I’ve found Tinderbox preeminent is its facility in helping one explore an unknown/partly known problem space. What we (think we) know about the problem is often easily overtaken by deeper factors only discovered by exploring structure in the emergent information.

So, will you likely plug Tinderbox into the internet firehose? I hope not, due to the likely inbound volume. But, using it to figure out structure on sub-sets gives you a force multiplier in your analysis. Tinderbox’s flexible import and, more importantly, output allow you to push larger datasets out to tools/libraries that can better do large-scale visualisations. This probably seems counter-intuitive as we all want to believe there is to tool we’ve not yet found that just does it all, soup to nuts. Experience suggests otherwise.

Without seeing the amount of data in an incident, it’s hard to guess how much of the task you may do in Tinderbox, but I’d assert it can/will be a key part. A current lazy assumption about data analysis is to just stick it into a big unstructured database and fire in queries. Tinderbox encourages a slightly different approach: collect basic info about the study objects and then add/formalise metadata to those objects as you tease out their inter-relation. Tinderbox also has strong, robust, prototyping making it easy to make rapid changes to large groups of notes. Good export structure make it easy to export data such as edge/node lists for use in network visualisation tools.

For instance, this is some of my recent PhD work, codified/analysed in Tinderbox before being exported to Gephi for visualisation:

It could has easily been sent to D3, etc. Here, some citation data analysed in Tinderbox has been exported as edge/node data to Gephi, then exported to sigma.js. sure the following’s a cruddy vis style but I’m talking about the validity of the pipeline rather then the exact structure/vis style:

https://www.shoantel.com/proj/acm-ht/2018/index.html

Don’t let any of this put you off doing things inside Tinderbox with maps, but also don’t rush to assume the problem space is too big to address with a desktop app.

jerome75 · December 27, 2019, 11:37pm

Thank you! There will be no scaling issue, as on a typical mid-sized engagement we are talking about a couple hundred items of interest at most (internal IP addresses, very few external IPs, user names, accounts names, machine names), and sometimes much less. The complexity comes from their interactions and the associated timing.

I like what you say about capturing the basic available info at the time and coming back to it when we understand a bit more about it and figure out new potential interactions, adding to the notes and improving the related agents (good think you mention TB makes it possible to enrich many notes from their prototype).

mwra · December 28, 2019, 11:02am

Excellent to hear that scale won’t be a factor as that opens up Tinderbox’s views to you a lot more and lessens the need to learn export up front.

Do ask if you’ve further questions. By all means start new thread(s) if the questions are on specific topics as that helps later readers find useful information.

Welcome to the Tinderbox community.

jerome75 · December 28, 2019, 11:33am

Thank you! I guess I’ll start with a good tutorial and some small use cases to get started. And this might even open up new ideas, too. I have a lot of ideas of mapping information during security audits, bridging the physical and IT world. From what I’ve seen TB might be the key to that too. Very exciting perspectives