[0x01] introduction [0x02] how software firewalls work [0x03] process Infection without external .dll [0x04] problems of implementation [0x05] how to implement it [0x06] limits of this implementation [0x07] workaround: another infection method [0x08] conclusion [0x09] last words
[0x0A] references
[0x0B] injector source code
==Phrack Inc.==
Volume 0x0b, Issue 0x3e, Phile #0x0d of 0x10
|=--=[ Using Process Infection to Bypass Windows Software Firewalls ]=--=| |=-----------------------------------------------------------------------=| |=---------------------------=[ rattle ]=--------------------------------=|
-[0x01] :: introduction --------------------------------------------------
This entire document refers to a feature of software firewalls available for Windows OS, which is called "outbound detection". This feature has nothing to do with the original idea of a firewall, blocking incomming packets from the net: The outbound detection mechanism is ment to protect the user from malicious programs that run on his own computer - programs attempting to communicate with a remote host on the Internet and thereby leaking sensible information. In general, the outbound detection controls the communication of local applications with the Internet.
In a world with an increasing number of trojan horses, worms and virii spreading in the wild, this is actually a very handy feature and certainly, it is of good use. However, ever since I know about software firewalls, I have been wondering whether they could actually provide a certain level of security at all: After all, they are just software supposed protect you against other software, and this sounds like bad idea to me.
To make a long story short, this outbound detection can be bypassed, and that's what will be discussed in this paper. I moreover believe that if it is possible to bypass this one restriction, it is somehow possible to bypass other restrictions as well. Personal firewalls are software, trying to control another piece of software. It should in any case be possible to turn this around by 180 degrees, and create a piece of software that controls the software firewall.
Also, how to achieve this in practice is part of the discussion that will follow: I will not just keep on talking about abstract theory. It will be explained and illustrated with sample source code how to bypass a software firewall by injecting code to a trusted process. It might be interesting to you that the method of runtime process infection that will be presented and explained does not require an external DLL - the bypass can be performed by a stand-alone and tiny executable.
Thus, this paper is also about coding, especially Win32 coding. To understand the sample code, you should be familiar with Windows, the Win32 API and basic x86 Assembler. It would also be good to know something about the PE format and related things, but it is not necessary, as far as I can see. I will try to explain everything else as precisely as possible.
Note: If you find numbers enclosed in normal brackets within the document, these numbers are references to further sources. See [0x0A] for more details.
-[0x02] :: how software firewalls work -----------------------------------
Of course, I can only speak about the software firewalls I have seen and tested so far, but I am sure that these applications are among the most widely used ones. Since all of them work in a very similar way, I assume that the concept is a general concept of software firewalls.
Almost every modern software firewall provides features that simulate the behaviour of hardware firewalls by allowing the user to block certain ports. I have not had a close look on these features and once more I want to emphasize that breaking these restrictions is outside the scope of this paper.
Another important feature of most personal firewalls is the concept of giving privileges and different levels of trust to different processes that run on the local machine to provide a measure of outbound detection. Once a certain executable creates a process attempting to access the network, the executable file is checksummed by the software firewall and the user is prompted whether or not he wants to trust the respective process.
To perform this task, the software firewall is most probably installing kernel mode drivers and hooks to monitor and intercept calls to low level networking routines provided by the Windows OS core. Appropriately, the user can trust a process to connect() to another host on the Internet, to listen() for connections or to perform any other familiar networking task. The main point is: As soon as the user gives trust to an executable, he also gives trust to any process that has been created from that executable. However, once we change the executable, its checksum would no longer match and the firewall would be alerted.
So, we know that the firewall trusts a certain process as long as the executable that created it remains the same. We also know that in most cases, a user will trust his webbrowser and his email client.
-[0x03] :: process Infection without external .dll -----------------------
The software firewall will only calculate and analyze the checksum for an executable upon process creation. After the process has been loaded into memory, it is assumed to remain the same until it terminates.
And since I have already spoken about runtime process infection, you certainly have guessed what will follow. If we cannot alter the executable, we will directly go for the process and inject our code to its memory, run it from there and bypass the firewall restriction.
If this was a bit too fast for you, no problem. A process is loaded into random access memory (RAM) by the Windows OS as soon as a binary, executable file is executed. Simplified, a process is a chunk of binary data that has been placed at a certain address in memory. In fact, there is more to it. Windows does a lot more than just writing binary data to some place in memory. For making the following considerations, none of that should bother you, though.
For all of you who are already familiar with means of runtime process infection - I really dislike DLL injection for this purpose, simply because there is definitely no option that could be considered less elegant or less stealthy.
In practice, DLL injection means that the executable that performs the bypass somehow carries the additional DLL it requires. Not only does this heaviely increase the size of the entire code, but this DLL also has to be written to HD on the affected system to perform the bypass. And to be honest - if you are really going to write some sort of program that needs a working software firewall bypass, you exactly want to avoid this sort of flaws. Therefore, the presented method of runtime process infection will work completely without the need of any external DLL and is written in pure x86 Assembly.
To sum it all up: All that is important to us now is the ability to get access to a process' memory, copy our own code into that memory and execute the code remotely in the context of that process.
Sounds hard? Not at all. If you have a well-founded knowledge of the Win32 API, you will also know that Windows gives a programmer everything he needs to perform such a task. The most important API call that comes to mind probably is CreateRemoteThread(). Quoting MSDN (1):
The CreateRemoteThread function creates a thread that runs in the address space of another process.
HANDLE CreateRemoteThread( HANDLE hProcess, LPSECURITY_ATTRIBUTES lpThreadAttributes, DWORD dwStackSize, LPTHREAD_START_ROUTINE lpStartAddress, LPVOID lpParameter, DWORD dwCreationFlags, LPDWORD lpThreadId );
Great, we can execute code at a certain memory address inside another process and we can even pass one DWORD of information as a parameter to it. Moreover, we will need the following 2 API calls: VirtualAllocEx() WriteProcessMemory()
they give us the power to inject our own arbitrary code to the address space of another process - and once it is there, we will create a thread remotely to execute it. To sum everything up: We will create a binary executable that carries the injection code as well as the code that has to be injected in order to bypass the software firewall. Or, speaking in high-level programming terms: We will create an exe file that holds two functions, one to inject code to a trusted process and one function to be injected.
-[0x04] :: problems of this implementation -------------------------------
It all sounds pretty easy now, but it actually is not. For instance, you will barely be able to write an application in C that properly injects another (static) C function to a remote process. In fact, I can almost guarantee you that the remote process will crash. Although you can call the relevant API calls from C, there are much more underlying problems with using a high level language for this purpose. The essence of all these problems can be summed up as follows: compilers produce ASM code that uses hardcoded offsets. A simple example: Whenever you use a constant C string, this C string will be stored at a certain position within the memory of your resulting executable, and any reference to it will be hardcoded. This means, when your process needs to pass the address of that string to a function, the address will be completely hardcoded in the binary code of your executable.
Consider:
void main() { printf("Hello World"); return 0; }
Assume that the string "Hello World" is stored at offset 0x28048 inside your executable. Moreover, the executable is known to load at a base address of 0x00400000. In this case, the binary code of your compiled and linked executable will somewhere refer to the address 0x00428048 directly.
A disassembly of such a sample application, compiled with Visual C++ 6, looks like this:
00401597 ... 00401598 push 0x00428048 ; the hello world string 0040159D call 0x004051e0 ; address of printf 0040159E ... What is the problem with such a hardcoded address? If you stay inside your own address space, there is no problem. However ... once you move that code to another address space, all those memory addresses will point to entirely different things. The hello world string in my example is more than 0x20000 = 131072 bytes away from the actual program code. So, if you inject that code to another process space, you would have to make sure that at 0x00428048, there is a valid C string ... and even if there was something like a C string, it would certainly not be "Hello World". I guess you get the point.
This is just a simple example and does not even involve all the problems that can occur. However, also the addresses of all function calls are hardcoded, like the address of the printf function in our sample. In another process space, these functions might be somewhere else or they could even be missing completely - and this leads to the most weird errors that you can imagine. The only way to make sure that all the addresses are correct and that every single CPU instruction fits, we have to write the injected code in ASM. Note: There are several working implementations for an outbound detection bypass for software firewalls on the net using a dynamic link library injection. This means, the implementation itself consists of one executable and a DLL. The executable forces a trusted process to load the DLL, and once it has been loaded into the address space of this remote process, the DLL itself performs any arbitrary networking task. This way to bypass the detection works very well and it can be implemented in a high level language easiely, but I dislike the dependency on an external DLL, and therefore I decided to code a solution with one single stand-alone executable that does the entire injection by itself. Refer to (2) for an example of a DLL injection bypass.
Also, LSADUMP2 (3) uses exactly the same measure to grab the LSA secrets from LSASS.EXE and it is written in C.
-[0x05] :: how to implement it -------------------------------------------
Until now, everything is just theory. In practice, you will always encounter all kinds of problems when writing code like this. Furthermore, you will have to deal with detail questions that have only partially to do with the main problem. Thus, let us leave the abstract part behind and think about how to write some working code.
Note: I strongly recommend you to browse the source code in [A] while reading this part, and it would most definitely be a good idea to have a look at it before reading [0x0B].
First of all, we want to avoid as much hardcoded elements as possible. And the first thing we need is the file path to the user's default browser. Rather than generally refering to "C:Program FilesInternet Exploreriexplore.exe", we will query the registry key at "HKCRhtmlfileshellopencommand".
Ok, this will be rather easy, I assume you know how to query the registry. The next thing to do is calling CreateProcess(). The wShowWindow value of the STARTUP_INFO structure passed to the function should be something like SW_HIDE in order to keep the browser window hidden.
Note: If you want to make entirely sure that no window is displayed on the user's screen, you should put more effort into this. You could, for instance, install a hook to keep all windows hidden that are created by the process or do similar things. I have only tested my example with Internet Explorer and the SW_HIDE trick works well with it. In fact, it should work with most applications that have a more or less simple graphical user interface.
To ensure that the process has already loaded the most essential libraries and has reached a generally stable state, we use the WaitForInputIdle() call to give the process some time for intialization.
So far, so good - now we proceed by calling VirtualAllocEx() to allocate memory within the created process and with WriteProcessMemory(), we copy our networking code. Finally, we use CreateRemoteThread() to run that code and then, we only have to wait until the thread terminates. All in all, the injection itself is not all that hard to perform.
The function that will be injected can receive a single argument, one double word. In the example that will be presented in [0x0B], the injected procedure connects to www.phrack.org on port 80 and sends a simple HTTP GET request. After receiving the header, it displays it in a message box. Since this is just a very basic example of a working firewall bypass code, our injected procedure will do everything on its own and does not need any further information. However, we will still use the parameter to pass a 32 bit value to our injected procedure: its own "base address". Thus, the injected code knows at which memory address it has been placed, in the conetxt of the remote process. This is very important as we cannot directly read from the EIP register and because our injected code will sometimes have to refer to memory addresses of data structures inside the injected code itself. Once injected and placed within the remote process, the injected code basically knows nothing. The first important task is finding the kernel32.dll base address in the context of the remote process and from there, get the address of the GetProcAddress function to load everything else we need. I will not explain in detail how these values are retrieved, the entire topic cannot be covered by this paper. If you are interested in details, I recommend the paper about Win32 assembly components by the Last Stage of Delirium research group (4). I used large parts of their write-up for the code that will be described in the following paragraphs.
In simple terms, we retrieve the kernel32 base address from the Process Environment Block (PEB) structure which itself can be found inside the Thread Environment Block (TEB). The offset of the TEB is always stored within the FS register, thus we can easiely get the PEB offset as well. And since we know where kernel32.dll has been loaded, we just need to loop through its exports section to find the address of GetProcAddress(). If you are not familiar with the PE format, don't worry.
A dynamic link library contains a so-called exports section. Within this section, the offsets of all exported functions are assigned to human-readable names (strings). In fact, there are two arrays inside this section that interest us. There are actually more than 2 arrays inside the exports section, but we will only use these two lists. For the rest of this paper, I will treat the terms "list" and "array" equally, the formal difference is of no importance at this level of programming. One array is a list of standard, null-terminated C-strings. They contain the function names. The second list holds the function entry points (the offsets).
We will do something very similar to what GetProcAddress() itself does: We will look for "GetProcAddress" in the first list and find the function's offset within the second array this way.
Unfortunately, Microsoft came up with an idea for their DLL exports that makes everything much more complicated. This idea is named "forwarders" and basically means that one DLL can forward the export of a function to another DLL. Instead of pointing to the offset of a function's code inside the DLL, the offset from the second array may also point to a null- terminated string. For instance, the function HeapAlloc() from kernel32.dll is forwarded to the RtlAllocateHeap function in ntdll.dll. This means that the alleged offset of HeapAlloc() in kernel32.dll will not be the offset of a function that has been implemented in kernel32.dll, but it will actually be the offset of a string that has been placed inside kernel32.dll. This particular string is "NTDLL.RtlAllocateHeap". After a while, I could figure out that this forwarder-string is placed immediately after the function's name in array #1. Thus, you will find this chunk of data somewhere inside kernel32.dll:
48 65 61 70 41 6C 6C 6F HeapAllo 63 00 4E 54 44 4C 4C 2E c.NTDLL. 52 74 6C 41 6C 6C 6F 63 RtlAlloc 61 74 65 48 65 61 70 00 ateHeap.
= "HeapAlloc
|