Articles, Blog

Tracking Threat Actors through YARA Rules and Virus Total – SANS DFIR Summit 2016

December 23, 2019

(light music) (audience clapping) – Great. Thank you, Jake. You know, you can
go back on Twitter and actually watch the collapse
at my NetWars experience as I started falling in
the standings. (laughs) But welcome, everyone. Welcome back from lunch. My name is Kevin Perlow and to my right is
Allen Swackhamer. We’re just gonna go
ahead and jump right in. We have a lot of technical
content that we want to cover and that we have for you to
digest along with your meal. We’re gonna start with
just basic YARA rules here and what we’re gonna do
is build out a framework for how you can implement
these YARA rules in a way that you can actionable
intelligence for your SOC. We’re gonna come back to
this diagram a couple times. So if you’re ever lost
as to where we are, we’ll bring it back up so that way you can keep
track of what’s going on. The first thing here
is, what is a YARA rule? We’re not gonna
spend too much time getting into the intricacies of hey, how do you actually
write one of these. It’s a little bit
too dry for this, but come talk to us afterwards. A YARA rule is a way
that you can take strings and other artifacts left
over from compiling a program and you can write a rule on it. The idea behind this is, if you have malware samples
that are from the same family or made by the same authors, you can put in these YARA rules and you can start to
group and categorize them and do additional research here. Some of the things you can
do, we mentioned strings, there are also occasionally
indicators of compromise that are in strings but leftover and you can do it based
on a opcode, hex, regex, a couple different ways
to write these rules. The one up there on the
screen is extremely basic but it’s just to
give you an idea of what this
literally looks like. Now we do have a couple
of examples up here. We’re gonna go through
them pretty quickly, but when yeah, you review
the slides later on when they’re posted online,
feel free to jump in on these. This first one here is actually a ransomware
sample that we were tracking back in I think it was
September of last year and this had some fairly
unique strings inside of it. You can see some file password, the executable is being
built out if you look at it. And so, after about
a month and 1/2, this disappeared, but
this was just an example of where you can
track a campaign. This slide here, this shows a
point-of-sale malware example. It’s a little bit different. This is hex-based. It’s really opcode-based. So that’s something
else you can look at while you write these rules. This one here is, I’m gonna
call it a Vawtrak dropper, but that’s really
kind of a misnomer. This is a rule based
on a malicious Office
document campaign that was dropping Pony
in the user’s temp file and you see that in the strings
in the box on the right. And so, if you back and
look at the YARA rules to the left there, you’ll see
a couple of different things. We’ve got the Office document
file header written out. We’ve got executable
written out there. A couple of
different hex values, one of which is to track
if the document has VBA, and so, that way,
we’re not just pulling any old Office document on
that’s getting triggered on the rule and we have
some file paths out there and a little bit
of whitelisting. One of the things that we
really do like to track are these Office documents and the next slide will have
the JavaScripts I think. These are all from
malicious spam campaigns. The reason we want
to track those is because you can actually
go both forwards and backwards in the malware kill chain. In going forwards, you
can see what indicators it’s gonna be pulling off, if
it’s downloading something. In this case, these
files, these Pony files will send POST requests
to Russian C2 servers, then they’ll send a GET
request and pull down Vawtrak from a compromised web server. That’s what I mean
by going forwards. Now going backwards
in the kill chain, you can actually go
and look at what email contain these attachments if you’re talking about the
VirusTotal Intelligence portal. We’ll get to that specific
part a little bit later but that’s what
we mean with that. Clicker. Clicker not working. Come on. The next slide, when I
get it up here, is… Does anyone, battery? Ah, there we are. We’re good. Excellent, thank you. So this is a
JavaScript’s dropper that we wanted to go off of. I actually we want
to start here. When you talk about a
malicious JavaScript file, it tends to be this
giant block of text that you see on the left, but what you can do is run it through what’s
called a beautifier. That’ll actually put it in
what’s a more readable structure and it gives you a visual idea
of what the file looks like. That really helps in terms
of categorizing these and in terms of writing
your YARA rules here because what tends to happen
is through a spam campaign, there will be a couple
different variants of this. But the thing’s
that’ll be consistent are that visual structure
and then the variables. And so, you can
start writing a rule based on the logic around that. We’re gonna be sticking
with this JavaScript file as we move forward
in the presentation. What this did was, this was
from a Proofpoint article, they did some
really good research and they found that
these JavaScript files were actually pulling
down RockLoader. RockLoader was being
an intermediary file, which was then
downloading Locky. Obviously Locky has been in
the news quite a bit recently. We’re gonna be tracking that throughout the rest
of this presentation. When we talk about
writing the rule here, we want to write it, like I
said, off of these structures. We’ve got a couple of
things going on here. On the left, we have,
if you deobfuscate this, it takes time to do but you
wind up with the executable path that the file is gonna be
placed in and run from. And then on the right, you
wind up with a GET request and that’s your
network-based indicator. We also have two
blue boxes there. Those are just other checks
that the JavaScript file ran. By putting all of
these things together, you can write an YARA
role based on that. From there, you
take your YARA rule and you put into the
VirusTotal Intelligence portal. Whenever someone
uploads a file to it, it’s gonna go and it’s gonna
trigger off of that file if it matches the criteria here and it’s gonna send
a notification. Now I’m gonna turn
it over to Allen who’s gonna talk about you can
do with these notifications. – Okay. How many people here have
actually used Elasticsearch or Kibana to visualize data? Okay. All right. You guys should be familiar with some of these concepts
that I’m about to go through. (clears throat) Once we have put the
YARA rule on VirusTotal and files have actually flagged, we’ve retrieved a notification. We can actually receive that
notification through VirusTotal in a couple of different ways. You can have VirusTotal email
you the notification hit so it’ll come into your mailbox. You may get spammed with them if you get a lot
of notifications or you can pull it off
of a REST API endpoint and get it JSON-formatted. We’re actually gonna
use the JSON formatting. (clears throat) Let me just step back
a little bit here too. Everything that we have up
here as far as source code, I’ve released this project. There’s working code on GitHub, which there’ll be
a link at the end where you can actually
implement everything that we talk about
here in your own if you have VirusTotal
Intelligence subscription. Anyway, so what we’re gonna do once we get the notification
document from VirusTotal is we’re going to
index and parse, then push it into
Elasticsearch stack. Let’s take a look at what we actually get
back from VirusTotal. This is the JSON document. We get back a couple
of different things. We get the rule name, the
YARA rule name that fired, the ruleset, so
that’s a collection of YARA rules that fired. We also get basic file
metadata about that information such as hashes, file type. And we also get antivirus scan
data and match information, so the specific string matches that the YARA engine on
VirusTotal’s side matched on. Once we actually get
this notification, we’re gonna want to
do some stuff with it. We’re gonna want to try to
extract out some indicators. There’s a couple of
different ways of doing that, static extraction, then it
has a RAT decoder framework where you can
actually extract out obfuscated configuration
files from malware. That’s a good way of doing it, In this room, after this talk, there is gonna be FLOSS, which talks about
using the binary itself to deobfuscate some strings
that reside in that binary. But we’re gonna focus on, for
this talk, dynamic execution and specifically Cuckoo Sandbox to extract out both file
system and network behaviors on mass VirusTotal
notification alerts. At this point, we’ve
got the notification sent in in VirusTotal and we’re gonna take
that notification down, that JSON document and
we’re gonna index that into Elasticsearch for
visualization and data storage and we’re also gonna
send that notification over to Cuckoo for
Cuckoo to process it. So why even put this
into Elasticsearch? Why not MySQL or SQLite
or something like that? (clears throat) Because Elasticsearch allows
you to aggregate on this data, give you counts
of how many times certain notifications has hit, as well as do trend base
analysis of data histograms, so you can see notifications
over time, how they appear, their spikes, or
any other things. You can also see if a
file has been resubmitted multiple different times. You could also export the data. You can export this through
the Kibana web interface with the data
table visualization or you can export
this programmatically through the Elasticsearch API. This is a dashboard,
a Kibana dashboard. You guys probably are
familiar with this but essentially what this is, is it’s just five
visualizations. The two here are
area charts over time and you can see the
notifications for. I’m limiting this down
to just the ruleset RAT, so I’ve got a RAT ruleset with a multitude of YARA rules on VirusTotal, its flagging. Over time, you can see
it going up and down. In the top middle, you
see that’s split by type, so you can see overwhelmingly
Windows 32 executables but there is a little
bit of unknown as well at the very top of that
area chart as well. Cuckoo Sandbox. Why are we submitting
this to Cuckoo Sandbox when I know they’re sandbox? Cuckoo, it’s very customizable. It’s an open source project. It’s written in Python. I’m using Cuckoo 2.0. I’ve modified the
Elasticsearch reporting module just a little bit to
basically visualize data a little bit better
in Elasticsearch. There’s a pull request in
Cuckoo 2.0’s main branch. You guys may be familiar with the Cuckoo modified forks. Cuckoo 2.0, Cuckoo main
line has actually integrated a lot of those features in. This is actually integrated
into the main line Cuckoo, not the forks. There’s a lot of other
popular sandboxes out there that we could be pushing
this information into. VirusTotal does send it through
their own Cuckoo Sandbox. You can retrieve that data,
however, it’s very limited. But your best case analysis, malware is also VirusTotal
or Cuckoo-backed and you could also send it
into another proprietor sandbox such as Hybrid Analysis. What really did we do with the Elasticsearch
reporting module? We added this template. It’s simple. It does a couple of
different things, set your shard count to
one, login type ingestion. We turned on compression. We added task_id. That task_id maps directly back to the task ID that Cuckoo
has in its database. Report_time, the time which
Cuckoo actually process that. And we also turned
strings to not_analyzed. This is very
important for later on when we do data aggregations because it won’t split
on common delimiters like white space,
dots, or dashes, so IPs will actually
aggregate on the full IP and not break it apart. Back to the Elasticsearch stack and what we’re wanting
to see out of Cuckoo. We’re gonna want to try to
get the files written to disk, our network IOCs. Cuckoo also does
some normalization of
the antivirus data. They also ping in
VirusTotal too. How are we gonna do this? Once we have actually
got the notifications into Elasticsearch, we
need to pull them somehow. This is an example of
using Elasticsearch to scan and scroll
APIs, an efficient way of programmatically exporting
results from indexes. You can see page
equals That’s the main function here
that’s doing all this stuff. The q equals at the
very bottom of that, that’s actually Lucene
search query syntax. That’s the same syntax
that you guys know and love if you use Kibana. It’s in the top
of the text field. You can basically go into
the discover tab in Kibana, type out whatever request
you want in there, the search syntax, and pull
that exact search string out and put it right here. You’ll retrieve all of the
documents that match that. At the end of this
code being executed, hashes, it’s a Python set and you’ll have a unique
place of all the hashes that hits for a specific query. This query is limited down to just the rockdownloader
notifications. At the end, hashes
is just gonna be all the rockdownloader
notifications. And then we’re gonna send
that over into the Cuckoo API using Python request. Again I’ve got all these
codes up on GitHub as well. It does it automatically
from notification time, from VirusTotal being
the notification, it sends it through
Elasticsearch and also sends it into Cuckoo. What do we really
get from Cuckoo? We get a ton of stuff actually. In the summary information, we get files that
are manipulated, written, touched, registries,
mutexes, command line called, a lot of other different
things like that. (clears throat) What you actually see
on the top right here is the actual Elasticsearch
reporting module in Cuckoo, what it’s indexing. By default, it’s just indexing
the target file metadata, the behavioral
summary information, which is what you see on here, and also the VirusTotal results. If you wanted to,
you can modify this and add in additional things. Here is a dashboard. We’re limiting this down. You see at the top, it says star dot js, so this is just the
rockdownloader stuff. We’re limiting the
Elasticsearch database to just the rockdownloader
notification in here. You see the data aggregations, which is every single run. 52 of these runs, 52 of these
unique files on the top left. Resolves host, you can
see that That means that 52
different samples beaconed out to that domain. So you can come in here and do some sort of stacking
analysis of this data and I’m gonna give it over
to Kevin to talk about. – Cool. Thanks, Allen. What we’ve done here is a
lot of technical content. We built a YARA rule. We built all this infrastructure with Elasticsearch and Cuckoo. What’s the point? What you see on the screen is the result of
running that YARA rule integrated into the
VirusTotal Intelligence portal and you can see we
actually came up with I think it’s 700, yeah
747 different files here. What we get out of this is we get a lot of
actionable data for our SOC. A lot of times, you’ll see
a threat intelligence report that says Russia is
pushing out ransomware. I guess that’s nice. That doesn’t do a
whole lot for us. But when we wind up with
these actual indicators, we have domains and
we have IP addresses, we have host-based indicators, we can do something with that. In this example, we
have these domain-based and IP address-based indicators. With a lot of
ransomware variants, it’s pretty common
for a key exchange to be done over the network in order to actually stop
performing the encryption. What we can do is we can
take these indicators and before our users even
run into the spam email that they have been sent, we
can already have them blocked. We might wind up with
that case of beaconing but nothing will happen. No files will be encrypted, no network tribes
will be encrypted. From a protection standpoint, we’re in a pretty good spot. For the host-based indicators, if we aren’t able to
block them in time or if we search,
from beforehand, we see we’ve already got people calling out to those locations, the host-based indicators will help us go and identify where the binary in this
case might be located on the hard drive itself, and so, here you would see
it’s in the temp folder. At the bottom of this, it’s
a little bit hard to see but there’s also VirusTotal
antivirus vendor normalization. With that is it’s taking all the different
antivirus detection rates across all the vendors and
it’s aggregating them for you. Again it’s not a perfect match but it gives you an idea of what you might be looking at especially if you
were talking about a more generic YARA rule. This next slide here just is
a couple of different ways to look at the data. But we want to get to
this one over here. This is another run
from a different rule. This was off of TreasureHunter
point-of-sale malware. What’s happening
here, you wind up with a network-based
indicators again of course but you also wind up
on the host-based ones with these alternative
data streams. What you can do with Kibana here is if you were to
click on one of these, it would filter out
all of the other files that don’t match that, that weren’t creating that
alternate data stream, and it would trash all
your files that match that. A hypothetical third case here would be if you had
network-based indicators that were maybe calling
out to France and Russia. You could filter out based on that geographical
characteristic and then you would hop back in and see which files are
communicating with those, are those from a
different threat actor, are those from a
different spam campaign, has this infrastructure
changed over time. You can start to build
that level of intelligence. What we want to get to last year is this case of additional
API calls that you can make. In VirusTotal itself, if you have access to
the Intelligence portal, you can do a couple
different things here. You can look at parent
objects for one. A parent objects is
typically gonna be a zip file or an email file that someone’s
uploaded to VirusTotal. What it does is it takes out
the components of a zip file or it takes the attachment
from an email file. It includes that and
what you’ve uploaded but it also puts it as its own
separate hash value and file with information
reporting on VirusTotal. The advantage there, I
mentioned there earlier you take these
malicious attachments, these JavaScript files,
these Word documents. You can go backwards
and you can see hey, who’s uploaded this attachment. Obviously I think
everyone in this room knows we shouldn’t be
uploading emails to VirusTotal but people do it so they can
get the information from it since they’re so kind. You wind up in a case
where you can say are people spoofing the
sender address, for example. What you would do then is you might go and
look at ARN report logs and you would say
we’re not seeing these network-based indicators. Is it because we’re
blocking the emails or is it because we’re just
not being targeted by them? A good use case
for the zip files is if you had an executable that was using a
side-loaded DLL. Sometimes people will
upload the executable and they’ll upload
the DLL on a zip file. Because they became
subcomponents, maybe your rule only
matched on the executable but in order to do dynamic
testing or reverse engineering, you also need the library. You would go back to that
zip file as a parent object and you could pull
it down from that. There’s also some network
infrastructure pivoting that you can do from this. On the right, you
see the results of running a whole
bunch of IP addresses across the VirusTotal
Intelligence API. You can get executables
that were either downloaded from the same URL or
executables that called out to the same IP address. That lets you expand your
network infrastructure as well or your understanding of their network
infrastructure as well. Some other things we’ve
put up there, CentralOps, that’s a really
good whois resource. We have a pretty interesting
case from last fall where someone was pushing
out the Swifi banking trojan. Now I’ll say right
now, CentralOps, if you want to use their
API, it’s not free, it’s I think $20
for 1,000 API calls but that’s still a
fairly large amount depending on what you’re doing. In this case, we had
someone pushing out the Swifi banking trojan. We went and we looked
up the IP there. The person registered to that
IP address and that domain, (laughs) three years prior to this, he’d been involved in a search
engine optimization scheme out in the Ukraine. What we found to
doing additional
research on this person and he’d been involved
in a kidnapping scheme. It was kind of ridiculous. But it turned he
got into malware and the Ukrainians went and
put a bounty on his head, literally like go kill
this person bounty. We got that from CentralOps and for doing
research off of that. That’s an example of
where you want to do this sort of pivoting. The last thing we have
up there is PassiveTotal. We had a really passive
DNS talk here earlier on. Just a reminder, you can
expand your understanding of network infrastructure
based on that. With PassiveTotal, you can also
get some historical records as far as DNS is concerned. You can see if
for example a page had been previously flagged by Kaspersky as being
malicious or not. You can use that API as well. These are just ways to take
your research to the next step and get real threat
intelligence out of it. Just to bring it
all back together, what we did is we
built a YARA rule for a JavaScript’s dropper. We identified over 700 files
that were triggering this rule. In doing that, we
pulled these files down. We indexed these notifications
into Elasticsearch, we ran the files in Cuckoo, and we were able to
put the results of that into Elasticsearch and
into Kibana as well. From there, we did
additional pivoting in the VirusTotal
Intelligence portal. We wound up with something
that we can deliver to our SOC that is truly actionable and
something that they can handle. Again the source code is
available in Allen’s GitHub page if you want to
take a look at it. At this point, we’ll go ahead
and we’ll take questions. – [Asker] On the
original JavaScript that you were talking about, how long did it take
you to decode that and figure out the
obfuscation, et cetera? How much time are
you guys spending? And then maybe you’re
really good at it ’cause you’ve been doing
it for a long time. – Yeah. – [Asker] What’s the uptime
to get up to speed on that? – That’s a really good question. Unfortunately the answer, I’ll give you something
more specific than this, but my first answer would
be of course it depends. With JavaScript in particular, if you look a month back
on those Locky downloaders, you can look at the
JavaScript in two seconds, put the pieces together, and say here is
where it’s gonna go. In that case, that one took
me probably about a day. If I were to do it again, it would take maybe
an hour or two. But that particular one was
kind of confusing to me. There are a couple of things specific to these
JavaScript droppers that you’ll usually see
that will help though. One of ’em is they almost
don’t make a try and a catch for the GET request. My understanding is
the purpose behind that is so if for some reason
it can’t call out, it doesn’t hit
you with an error. The user just doesn’t
see anything happen. That’s like one tip I
guess to look there. Office documents, the
Vawtrak ones from the fall were pretty easy
but the Dridex ones and some Locky ones
can take a while. We did have a live incident where we were dealing with a
brand new ransomware sample that hadn’t been reported. We’ve never seen it again. But that one, just in the
interest of getting information really quickly, I
ran it in the sandbox with the Office debugger online, put a couple of breakpoints and just took my
chance like that. If you’re ever in a
pinch, that’s something
else you can do. – One thing to note too
about the Office documents is that if they are DOCX
and they’re zipped up, VirusTotal will actually
decompress those and extract up the macros and you can write your YARA
signatures based off the macros, so you don’t have to match
on the actual zip file, you can match on
the child documents. Something that’s very
powerful about the VirusTotal is the YARA implementation. – [Kevin] Yeah. – [Asker] Hi. In terms of use cases, how
does your process work? Is this primarily
something you go to after you have some kind of
trigger in your environment or is it coming
out the other way where you say okay, there’s
a campaign out there, let’s see if anything
similar is happening in our environment? I’m just curious about
the human process side. – Yeah, a multitude,
different ways. There is a lot of
mailing list out there where you can see like a lot
of the current spam campaigns. I’ve been able to signature
off of like techniques that are used in a spam campaign and then later on detect
that same spam campaign inside of our environment. That’s very common. But if you get a
targeted campaign, you can start to track those
target threat actors to you such as TA530 recently
had a Word document macro spam campaign
where they were making, the file name was
like the company name and then space like
invoice or orders, something like that, dot doc. We were able to signature
on the PNG actually inside of that that
they are using, upload that to VirusTotal. Not only did we see
we got hit by it but we also saw because
the company name was inside the actual file
name of the document file. We saw all these other companies get hit by it as well. We could actually go
through and be like yep, that guy from this company
submitted this file, that guy from this company
submitted this file, and stuff like that. So you can see how the threat
landscape as a whole is taking ’cause your company
is just one part of the bigger landscape
that these people are sending the
spam waves out too if you’re focused on spam waves. Yeah, definitely. With the TA530 stuff, we were able to
identify network IOCs, same campaign but
from other people before it hit our network,
so we were able to detect C2 (clears throat) from like a file that they
sent to another people who uploaded it to VirusTotal. We pulled it down,
submitted it to our sandbox, got that domain out, and we’re able to put
blocking detection into our own environment
before we saw documents that were triggering
out to that domain. – One thing to
add onto that too. If for some reason
you don’t have access to the VirusTotal
Intelligence portal because it’s for example
too cost-prohibitive to have this unlimited, other organizations do aggregate
like a known good database, so you can go that route. You just start aggregating
your own database for this kind of
research and development. That’s another option you have. – [Host] Hey guys, thank
you so much for presenting. Awesome presentation. (audience clapping) Well done. (light music)


  • Reply فهد محمد المحيذيف February 12, 2019 at 2:12 pm


  • Reply Patrick Jonas April 1, 2019 at 4:11 am

  • Reply Patrick Jonas April 1, 2019 at 4:15 am

    #Welcome to team sharing we Support all , Time for VT Enterprise to step up

    Late last year we announced the release of VT Enterprise for existing VT Intelligence subscribers. Since the launch, we have iterated on and improved upon VT Enterprise and it is time to begin a full deprecation of the old VT Intelligence interface. Today, we are announcing a 1 month deprecation timeline. Note that this does not affect APIv2, Graph or any other VirusTotal functionality. Similarly, this comes at no extra cost and existing users of VT Intelligence will be able to continue to use the solution within the new VT Enterprise interface.

  • Leave a Reply