Ramblings

Introspective narcissism since the 2000s.

User Tools

Site Tools


antimalware_software

This is an old revision of the document!


Information Technology/Cybersecurity/
Antimalware Software


An antimalware program is a type of software used in the detection of and defense against malicious programs and exploits.

Electronic devices can store all kinds of data, most importantly login credentials, financial information and personal information such as pictures, chats, and sensitive files about or from their bank, work, family or other private matters. This makes those devices valuable targets for either destruction, theft or holding them hostage. Sometimes it's not about the value of files on the device, but more about the device itself - some attacks exploit them for cryptomining, botnetting and other nefarious activities where it is more about the devices resources as an independent machine, rather than anything specific on that machine.

Most commonly, this is done through software, which is then called “Malware” (Malicious Software). The defining feature of malware is that the actions it takes are unauthorized by the device owner, unwanted and generally destructive in one way or another, and it will typically not notify the user or give them a choice in the matter1). Not all attacks are carried out with malware though, a lot of times this is a matter of exploiting bugs in (oftentimes outdated) software and gaining control over a machine that way. Keep your devices and software up-to-date. Other attacks are carried out by other machines on the network scanning for vulnerable devices or by a bad actor inserting themselves between the user and the internet.

How protection works

0. Basics

In general, the objective of an antivirus is to (1) prevent malicious code from ever executing on your system in the first place or, failing that, to (2) terminate a process if it is found to have done something malicious.

1. Signature Detection

Files are identifiable. With fancy mathematics, any computer file can be broken down into a numerical value that, in most cases, is enough to identify identical files. A signature. The cool thing about being able to identify identical files is that all you need now is a list of files you already know are malicious. Once you have that, you can just compare that list against any computer of your choice and you'll be able to weed out all the malicious programs you already know are malicious, quickly and efficiently.

From here on, your life is simple. If someone gets infected, they call you and say “hey, we got infected by something, we don't know what”, you find the source of the infection, you determine the signature of the file, add it to your list and with the next hourly “Intelligence Update” everyone using your product will be immunized to that particular sample.

Obviously, this is a (1) type of approach. Every time you want to run a program or open a file, the antivirus will check if the signature of that program or file matches with any of the signatures from its database. The advantages of this are overwhelming: Immunization on a massive scale. Once a malicious program is found, it is added to the list and within an hour the entire world can be immune against it, as your antivirus product will stop the execution of the program before it ever gets a chance.

Most of the cheap antivirus products on the market rely almost entirely on signatures. Windows Defender, for example, almost exclusively relies on cloud-based protection which is essentially just a very long list of known malicious file signatures. Extremely scalable, light on resources, it's great.
The downside, of course, is that to be protected from a piece of malware, you have to know about it in advance, at which point you might just check file signatures yourself. Still, there is value in automating that process and having a pool of millions of users and companies that can submit samples for a “crowdfunded” signature pool, curated by cybersecurity experts that can tell between real malicious and false positives. In the field of signature-based detections, good antivirus products and bad antivirus products differ only in the completeness of their signature database. More is better.

A good test for that is to throw a pool of 1000 random malware samples at each product to see how many it catches, based on signatures alone. The percentage you get back from that is your metric. Top antivirus products consistently score ~98%-~99%, with Windows Defender typically being on the lower end of the ranking.

However, until a signature is known to be malicious, a signature-based approach will not stop a malicious program2). This is, however, not a reason to dismiss signature-based detection, or antiviruses as a whole. Most malicious files are known. You, as an individual, are unlikely to run into a piece of software that isn't already on someone's signature list. Yes, unknown malware not yet in someone's signature list exists, but on the internet it is vastly, vastly outnumbered by software we already know is malicious. On the internet, malware can easily linger, sometimes for months, sometimes for years. I'm sure the malicious Minecraft 1.6.4 Portal mod that I downloaded back then can still be downloaded today, and that's been over 10 years ago! And sure, nobody still downloads mods for Minecraft 1.6.4, but the point is that files that old are still making their rounds today. It can be 10 years, it can be 1 year, it can be a month or barely even a day - in cybersecurity terms these are all old files. Detections roll in very, very quickly (we are talking hours), and mass immunization is a valid tool against malware.

Critics will harp on and on about “Zero-Days”, with which they mean “malware that isn't yet detected” as if that was the only thing out there. Yes, undetected malware is a thing, but for the home user that signature list is still extremely valuable. A good signature list is basically like a vaccine that makes you immune to 98% of known pathogens. Undeniably, a strong first line of defense. But even then, signature-based detection is not the only way to defend against malware.

2. Static Analysis

Back to the start. We are an antivirus product. We are trying to protect the user from malicious code. We have our signature list and we're using it. Imagine now that the user is trying to open a program. Of course, what we do is we first halt the execution process and wait to check whether that program is in our signatures. If it is not then that's a good sign, but it doesn't guarantee that the program is safe to use.

The next step is to take an actual look at the program. Our signatures are a quick and easy way to rule out known offenders before having to put in any actual work, but now that this is not a known offender, we have to make sure it isn't an offender at all. The way to do this is by taking the program apart and look inside, see what they are programmed to do. The difficulty here is to distinguish between actually malicious acitivity and activity that just looks funny. The problem with malware is that they do things that in principle could also be done by non-malicious software. Uploading your login credentials/cookies to a server is what a password manager might do. Encrypting your files is just normal and otherwise valid cryptography, but used against you. Any kind of action that is evil in this context can be friendly in any other context. An action can never be inherently tied to malice, and it is the job of an antivirus to make an educated guess based on the information available to it. Of course, sometimes it's quite obvious: Does the program use any known exploits? Does it want connect to any server infrastructure that we already know is controlled by bad actors? That information can sometimes be numerous and paint a very clear picture (that's the easy ones), but more often than not they're sneaky and make it look like their program is just “one of those programs”. Creating a tool that can determine this without human oversight is hard. But - and that's the really cool thing about this - at the end you will have a tool that is capable of detecting malware even if it is previously unknown. No signatures.

But, if that fails, we still have one more tool in our arsenal: Comparison. Malware is often changed only slightly, which leads to a myriad of different “strains” that, ultimately, are still the same piece of malware, just slightly altered. Authoring a completely new type of malware would not only take, well, authoring a novel piece of malware, it would also mean finding and then using a new attack vector. But these are limited. There are only so many vulnerabilities open at any time before new ones are found and the old ones are closed. This means that most malware is most often just a “strain” of already known malware, just slightly altered, which means their programming will be similar.
We can use this to our advantage. If we can't find anything in the program that is obviously malicious, we can just check if the program is generally similar to other programs we already know are malicious.

All of this is called Static Analysis. The advantage to this approach is that it can get a pretty good insight into what a program may do without loading it into memory yet. This is critical because conventional malware can remain dormant on disk without causing harm; it is when it is loaded into memory (be it through the user executing it, or because of a scheduled task or through an autorun entry) that it starts doing malicious things. Analysing software without loading it into memory lets us look at the program without risking getting infected.

3. Sandboxing

In cybersecurity terms, a sandbox refers to a protected virtual environment segmented off from the rest of the system, filled with all the sand imaginable but, ultimately, constrained to the sandbox. Basically, it's a simulated environment in which programs can freely run as if it was running on your machine, without actually running on your machine. Well, of course it is running “on your machine”, but it's sequestered away in a protected environment. The goal of doing this is to just.. observe. If we let the program run, would it do anything objectionable? The sandbox offers the advantage of testing out a program without the risk of infection3).

Static analysis is the analysis of a program as it sits on disk - statically. Static analysis is limited by programming restraints - reverse-engineering an entire program is extremely resource-intensive, if possible at all, so the insight you can gain from it is limited. But in the sandbow we are now working with a live sample and can observe its behavior, as it unfolds. When you actually just run the program and let it do its thing, you can log virtually everything it does. The cool thing is: if it does do anything malicious, its damage remains constrained to the sandbox and all data outside of the sandbox is safe.

Sandboxing is already a standard in smartphones. On operating systems like Android and iOS, pretty much everything on those phones runs sequestered in individual sandboxes which can barely, if at all, interact. Since all damage is always limited to the scope of an application's sandbox, there is very little damage malware can cause on those platforms. The real threats on those platforms lies in hijacking otherwise friendly apps whose sandbox contains valuable data (your browser, for example) or socially engineering the user into doing the malicious thing themselves.
On Desktop platforms like Windows, sandboxing is becoming more and more of a thing as well but, because of their open and interactive structure, this isn't yet proliferating. Those platforms haven't quite yet arrived at treating its resources like, well, resources, where access to each is tightly controlled. This is good because it means less hurdles for programmers to program useful software. Tightly controlled means it will keep out everyone until confirmed otherwise, and it means access will only be permitted in the way the controller says you get. If the controller doesn't want you to access a resource in a certain way then you're out of luck. Shoutout to iOS.

Anyway, the point of sandboxing is to give a program room to do its things so we can see what it does. There is a problem though - if we sandbox every program the user wants to open, how much time do we want to sandbox before deciding that the program is OK? What if the malware is programmed to wait for a minute? This is not acceptable and makes it one of the major weaknesses of sandboxing. Additionally, lots of malware can recognize sandbox environtments, including Virtual Machines, and will refuse to execute in them. This is a recognized sign in and of itself, but it still eats into the effectiveness of sandboxing as a detection tool. Of course, doing it like smartphones do - running everything in a sandbox by default instead of only using the sandbox for analysis - would still be really useful, but that is not where we are.

The cool thing about sandboxing, though, is that it too is able to independently identify malware, even if sample wasn't previously known. It's yet another layer of defense.

4. Behavioral Detection

Behavioral detection is what truly distinguishes good products from terrible ones. However, it is also the hardest to get right, if you get it working at all.
As humans don't care about cybersecurity in terms of software or code. The ones and zeroes in play are just a means to an end. What we are really trying to stop is hackers getting ahold of your login credentials. Your company's top secret files, or perhaps encrypting all your files and demanding money for their decryption. In short, we care about what end malware is working towards, their behavior.

When Signatures, Static Analysis and Sandboxing all return negative, it probably is time to let the program execute. But, behavioral detection keeps watching. If a program acts up and starts doing funny things - for example if it starts encrypting files - it will notice and shut that program down. If a program suddenly starts deleting a bunch of shit, or gives orders to another program to delete a bunch of shit - shut it down. If a program does funny things with your boot configuration, your autoruns, downloading files from the internet or uploading stuff from your PC - very suspicious. Behavioral detection is my favorite type of detection, because it addresses the exact thing that we, you and I, are ultimately talking about: the malicious action itself.

The cool thing about behavioral detection is that it works entirely independently and does not discriminate. Even the most trusted corporation on the planet might one day make a mistake and publish an update to their software that has a vulnerability. Or maybe they got hacked and are now being used in a supply chain attack - behavioral detection does not care whether something is done by a program from a trusted source or by a program your internet friend told you to download4). Games on Steam are generally considered to be “safe”, but fake Steam games exist and can live for over a month before being taken down. The location in which your browser's session cookies are stored is sacred and nobody should get to access it and then just casually make a sneaky transmission over the internet. No program should be able to load hundreds of files into memory per minute, garble their contents and then save them back to disk. Windows does not like programs that it does not know. It is quite hard these days to run scripts and programs you, or someone you know, made. Sometimes it will just refuse outright. But behavioral detection can genuinely make a difference here.

The advantage of this approach is that, regardless of whatever program you throw at it - known or not, popular or not, new or not, published by a trusted source or not, considered perfectly safe or not, system online or not, downloaded from a shady website or not, ran from an unknown USB drive or not - behavioral detection can spot them all (while non-malicious programs are fine). Additionally, behavioral detection can spot all other kinds of exploits not delivered directly through an executable file. Even if a trusted program is weaponized, even if the source of the problem is a malicious image file or sound file, even if the source of the problem is a programming oversight in the code of a programming library which lets remote hackers send arbitrary code to your machine, which it will then execute - behavioral detection sees that. Even supply chain attacks wherein bad actors gain access to a trusted program's development structure and insert malicious code into it, which will then be distributed quickly to a vast number of people, especially businesses, can be caught by behavioral detection.

The downside is that behavioral detection is hard. To understand which system operations exactly are malicious is already difficult enough for humans to agree on. Then putting that from words on paper into actual code, all the while working around the limitations and kinks of the operating system's own security measures… yeah, it's a pain. But - the ambition is there and some of the results are quite impressive. It is, like all other things, just yet another layer of protection, and all layers of protection have cracks and weaknesses. Seriously, as much as I am praising behavioral detection here, I am praising the concept of behavioral detection - actual implementations vary in quality and are often held back by serious capability restrictions or just plain poor quality - current behavioral detection products on the market are not to be relied upon. In fact, no one single product should ever be solely relied upon.

How protection doesn't work

1. Common Sense


You are perfectly right, humans are the biggest threat to their PC. People are the Problem. Therefore, we should not trust humans with keeping a PC safe. Not this guy, not your mom, not you - regardless of how good you think you are. Get proper antimalware.

Common sense is the single most frequent advice that can be found on the internet. And it's true - human judgement can be a good and sometimes even the most effective layer of protection against threats of all kind. In all cases, common sense should be the first first layer of defense. But, human judgement is prone to failure - that's why car accidents happen all the time. That's why most plane crashes happen. To say that you should primarily use common sense is to say that you should just drive better, that you should just “not make mistakes”. This is not how reality works.

With heavy and potentially dangerous machines, operator training and safe handling standards are one half of the equation. The other half sits in the design department with skilled and knowledgeable people who recognize that even the most knowledgeable and experienced operator will make a mistake. A brief moment of distraction, inattention. Tiredness, exhaustion or sometimes maybe just plain stupidity. Any number of cognitive biases. Confirmation bias for example, pilots will know best that the biggest threat to modern airplanes is the human sitting in the cockpit, and they are extensively trained to resist the kind of errors humans often make. To deny this reality is to deny decades of statistics and the science of risk management. Human factor is rule number #1. Don't fall victim to rule #1.

Proper education and training are a crucial part of protecting users from cybersecurity threats, but it is terrible advice to give, especially if expressed as “common sense” (which everyone thinks they have, unlike education and training) or if it is described as “enough” precaution (or anything close to it). It is not enough, it isn't even close to enough.

Risk management, a proper science that would never even think about suggesting something as ridiculous as this, is about minimizing risks at every stage of the process - at the human level, sure, but also at the mechanical level. That's why 50% of the resources of product design go into researching how humans could possibly fuck up using the product, and then minimizing the ways in which it can happen in the first place or how to minimize the potential damage.

And that still doesn’t cover risks beyond your control. Supply chain attacks, insecure devices on your network, careless family members, colleagues, or even vendors whose own security might be weak, remote code execution vulnerabilities, all of these are threats you cannot fix with judgment alone. The digital landscape has countless entry points far beyond simply “not downloading shady files“, and treating common sense as the primary defense ignores the true scale of the problem.

2. Windows Defender?

There is a pervasive myth that common sense plus Windows Defender are enough to keep you safe. Or that Windows Defender is as safe or safer than other products on the market. Safety, of course, is measured in risk, and the only truth here is that different combinations of precautions lead to different levels of risk. But still, the idea is that common sense plus Windows Defender is enough for the risk to be “low enough” for these people.

Maybe. In fact, that’s the setup I personally rely on. But the problem is that common sense is empty advice and, for much of its history, Windows Defender was genuinely terrible. The people who say “Common Sense + Defender” today are the same people who once said “just common sense.” The only real shift is that now we’re debating Defender itself.

Defender has been a properly horrible antimalware product for the longest time. Keeping a signature list of known malware is the most basic form of antimalware and any respectable antivirus product should ace this by default. However, Defender consistently missed well-known, widely publicized malware, including samples that were years old. Back then, tests showed detection rates around ~70% 5), while products like Kaspersky or Bitdefender hit 98%. Despite that, people insisted Defender was “good enough”, indicative of a complete lack of understanding of risk management or cybersecurity.

Now, Defender has improved. It finally catches the obvious malware, its detection rates have climbed to around 95%, and it’s become a passable baseline product. Ironically, this means that people who were wrong for years are now accidentally right, but for the wrong reasons. Their logic hasn’t improved - the facts just shifted closer to their narrative.

Even so, Defender still has major shortcomings. Its scanning is largely signature-based, with minimal static analysis and weak behavioral detection. There’s some anti-ransomware with protected folders, but it has questionable reliability. It's actually worse - A single shell command can disable it completely, delete Defender's its signature definition files, or set the whole PC as an exception. Registry tweaks and group policy edits allow malware to bypass it. And these are not obscure attacks - they’re widely known in the cybersecurity community. Most of these exploits still exist and properly advanced malware will get through.

1)
Unlike stuff like kernel-level anticheat, which does notify you and which does give you a choice.
2)
Windows Defender is already checking out at this point, because it has almost nothing to offer beyond signatures.
3)
Yes yes I know actually making sure your samples don't escape the sandbox is a whole different story.
4)
And then “disable your firewall before running”.
antimalware_software.1780035862.txt.gz · Last modified: by ultracomfy

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki