# Reverse Engineering of the proprietary protocol used by the HMU

In the previous post we extracted data from the Home Monitoring Unit by emulating either the modem or the backend server, taking advantage of the lack of mutual authentication between the device and the backend server. We discovered that the device is first authenticating itself to the backend service before sending some raw data. In this post we will go through the reverse engineering of the proprietary protocol used to send the data and then use this knowledge to decrypt the one we extracted.

## Reverse engineering the protocol

Before diving into the analysis of the data, I need to provide you with more detail on why we are starting with the binary analysis. When I gathered these binaries, I did not have the firmware yet (and was actually not even sure to get it at some point) because I was waiting for help with the soldering of the connectors to the board (see post 2). I thus decided to try understanding the protocol by just analyzing the raw data obtained.

### Binary analysis

When confronted to such unknown data, there are some common steps that can be taken to catch low hanging fruits, if any. We can first look if the data is encoded, compressed or encrypted in some way. To do so, the strings utility is quite handy to check for any ASCII texts in the data. In this case it gave nothing. We can also use binwalk to see if there are any known headers (compressed files for instance) inside the binary, but again, no result in this case. Finally, we can calculate the entropy of the data to get an idea of whether it is compressed or encrypted.


$cat hex_data.txt | xxd -r -p | ent Entropy = 7.926409 bits per byte.  As shown by the listing above, the entropy is very close to 8bits per byte, meaning that it is either random data (which I could not exclude at that time) or encrypted data. In my understanding, three kinds of data could be sent using that communication channel: • Patient’s data • Communication related to software updates or HMU configuration • Logs (errors, etc.) However, it was unlikely to me that patient’s data were sent at that moment because we had no pacemaker connected to the HMU. My guess was mainly on logs because no “OTA” update mechanism is ever mentioned in the HMU manual. From there, I collected more data sample over a 48h monitoring session of the HMU while my computer was emulating the modem. That way I had both data from the same session (i.e. sent with the HMU restarting the process on its own) and data from different sessions (me restarting the device manually). Data captured in different sessions is shown in the listing below, and as one can clearly tell, it is not completely random!  05072e00000d[REDACTED]08090d909787d0c71d... # session 1 (msg13) 0502fe000000[REDACTED]0802b5742b63f5343d... # session 2 (msg1) 05030e000001[REDACTED]080d4caa019dee6028... # session 2 (msg2) 05030e000002[REDACTED]080db29ad304f90ffe... # session 2 (msg3)  Indeed, the 12 first bytes were clearly not random, but most likely belonging to a header, as the analysis presented in table 1 shows. However, it does not seem to follow any known protocol. Byte(s) 0 Always 0x05 1-2 Corresponds to approx. the len of the data sent 3-5 A counter that increments each time a packet is sent 6-9 The same 4 bytes sequence, which corresponds to the HMU ID once converted to integer 10 Always 0x08, which I assumed is the start of the encryption packet 11 Always between 0x00 and 0x0e, which could be a padding (thus the encryption) Based on that table, I made the following assumption on the protocol header: In addition to the potential padding, another observation that we can make here is that the potential length of the packet (bytes 2 and 3) is always congruent to 14 modulo 16 (and by extension to 6 modulo 8), in other terms, that means that the data was likely encrypted using a block-cipher, such as DES, 3DES, AES or Blowfish which all have 64 bits or 128 bits key. I was quite confident about this finding, but a little bit stuck: even if the algorithm was DES, which is not secure anymore, it would have been difficult for me to brute force the key used with my personal computer. Fortunately, I soon got hands on the firmware and was able to move forward. ### Filling the gap with the firmware Being a complete novice on reverse engineering, I was not sure of what was the proper way to go. It was indeed my first time having to deal with an embedded device’s firmware and my thesis’ deadline was approaching. Luckily, I had an introductory lab on microcontrollers back in school and I heavily relied on it during this phase of my project. #### Setting up everything in Ghidra To stay in line with one of the goals of my project which was to use only COTS equipment and software, I decided to go for Ghidra as reverse engineering tool. Ghidra was made open-source on GitHub by the NSA in the beginning of April 2019, one month before I started to work on the firmware. An obvious benefit of Ghidra is that it was said to be the “open source equivalent of IDA” but on the other side, there was very few documentation and tutorials available at that time. Ghidra is however quite user-friendly I must say, and perfectly fitted my needs. As a reminder, in the second post of this series, we managed to dump several parts of the memory. The next step was thus to load those binaries into Ghidra, and as always when it comes to microcontrollers, this information can be found in the datasheet, using the memory map. In the case of the AT91RM9200, that gives us: • Bootloader: 0x0000 0000 • SDRAM: 0x1000 0000 • Flash: 0x2000 0000 Also, thanks to the datasheet, we know that we are facing ARM v4t and that this is little endian. To set everything up, I followed this wiki from Travis Goodspeed for the project md3801toools available on GitHub, which proved very helpful as a starting point. #### Finding the functions of interest Given that I had less than 3 weeks to finish my project (both research and writing), I knew it would not be possible to do a thorough analysis of the firmware, therefore I settled on reverse engineering the proprietary protocol that is used to communicate with the backend server. To do so, we can leverage the usage of logs/debug strings in the firmware. For instance, grepping for “get” on the strings output of the RAM or Flash gives the interesting results presented in the listing below: $ cat sdram.img | strings | grep -i "get"
...
Error: GetContainerFromGroup: sanity check failed
Error: GetContainerFromGroup: CRC error in msg container GetDataFromMessageLayer sanity: status not OK
Wrong frame ID in GetDataFromMessageLayer
CRC check in GetDataFromMessageLayer
GetDataFromCompressionLayer sanity
GetDataFromEncryptionLayer: too many padding bytes GetDataFromTransportLayer sanity
CRC check in GetDataFromTransportLayer
GetDataFromTransportLayer:start
TransportLayerToFifo: GetDataFromTransportLayer() GetDataFromEncryptionLayer:start
TransportLayerToFifo: GetDataFromEncryptionLayer() GetDataFromCompressionLayer:start
TransportLayerToFifo: GetDataFromCompressionLayer() GetDataFromMessageLayer:start
TransportLayerToFifo: GetDataFromMessageLayer()
Get Container from Group:start
...
GetDataFromEncryptionLayer: wrong ID byte (%02Xh): expected
TRIPLE_DES_CBC (%02Xh) or AES_CBC (%02Xh)!
GetDataFromEncryptionLayer: wrong ID byte (%02Xh): expected DES
(%02Xh), TRIPLE_DES_CBC (%02Xh) or AES_CBC (%02Xh)!
...  

These different strings really confirmed my thoughts that there was indeed a proprietary protocol used here. Having the strings is nice but now we need to get to the corresponding code. To do so, we can use the “Show References” feature of Ghidra (Right click on the address where a string is stored > References > Show References To Address). Applied on the string related to the string related to the encryption error, we end up in a big function (there are two functions in reality: GetDataFromEncryptionLayer and PackToEncryptionLayer). Using the decompiler of Ghidra, we can look at the reconstructed C code, which contains an interesting pattern:


int var;

FUNCTION_P(start_string_A);

var = FUNCTION_A(something);
if (var == 0) {
// Do something
}
else {
FUNCTION_P(error_string_A);
}

FUNCTION_P was clearly a “print-like” function, and FUNCTION_A the function referenced in both start_string_A and error_string_A. Figure 2 shows parts of the decompiled code where the most important functions and variables have been renamed, and with comments to highlight the different layers of the protocol.

This procedure can be repeated to identify many more functions in the firmware, and thus getting a better understanding of the code. I will not give more details in this post, but more information is available in the thesis.

If we sum up, we were able to identify the following:

• The general function in charge of packing the data (whose code is partially exposed in Figure 2)
• The functions handling the encapsulation for each layer:
• PackToMessageLayer
• PackToCompressionLayer
• PackToEncryptionLayer
• PackToTransportLayer
• The compression format used is “deflate”, which is a “lossless compressed data format” defined in RFC 1951.
• The encryption algorithms used: DES, 3DES CBC and AES CBC

If we look at part of the PackToEncryptionLayer (figure 3), we can see the different algorithms used for encryption, and also the “op code” used to represent each of them: 6 stands for DES, 7 for 3DES CBC and 8 for AES CBC. The 8 looks familiar, right?

With the new information in our possession we can now have a full view of the protocol used here, as presented in Figure 4:

## Decryption!

### Modem and GPRS data

Alright, almost there! We now have everything we need to decrypt the data we extracted from the HMU. Everything… except the AES key. Having the firmware, I was confident I could understand where the key was coming from. As can be seen in Figure 4, the key can be identified in the PackToEncryptionLayer function, and it is thus possible to see where it comes from, which I did. I also wrote a python script to automate the process of analyzing the data (started when I had no firmware, thus the additional info that can be observed at step 1 in the listing below). I tried to decrypt the data with what I believed to be the key, but it did not work. Sad uh?

I could not give up just yet and decided to “brute force” the AES key. My reasoning was the following: “given the components on the board, the key has to be loaded in RAM at some point before the encryption is performed, so if I put a breakpoint just before the execution of the PackToEncryptionLayer function, and dump the RAM at that moment, the AES key has to be in it. If it is in it, then I just have to try all possible keys, which for a 2MB RAM is… 2 097 152 keys, and testing around 2 million keys is possible even on my personal laptop using python.”

But the next question was: how can we detect which key is the correct one? So, at that point we know that the Compression Layer is encapsulated in the Encryption Layer. Good thing that we also know that the Compression Layer’s data will then start with a known header: 0x9F (Compression Layer’s “op code”) + 0x1F8B (header of a gzip file). I thus implemented this in my script, and it worked like a charm! Out of curiosity, I checked the obtained key against the one I had found by looking into the code directly and it was the same. I am still unsure why I got it wrong the first time, but if I had to bet, it would be on the advanced hour of the night and me forgetting to take the endianness into account…

Anyway, I was pretty happy with my script, because it finds the key in both the RAM and Flash in less than one minute (much less in the Flash):

$python3 cm-decrypt.py validation -tests/file_modem.bin -b validation -tests/ram_dump/ram_b_PackToEncryptionLayer.img ** CM DECRYPT v1.0 ** [*] Opening validation -tests/file_modem.bin... -- FILE INFORMATION -- File size: 2514 bytes (hex: 9D2) File created: Sat May 18 00:42:41 2019 File modified: Sat May 18 00:42:41 2019 File entropy: 7.91 bits per byte [*] Data sanitized! [*] Brute force mode [*] Binary: validation -tests/ram_dump/ram_b_PackToEncryptionLayer.img ** Transport Layer ** Type of packets: 5 (hex: 05) Length: 2478 bytes (hex: 09ae) Unknown: 2 (hex: 02) Packet ID: 0 (hex: 0000) CM ID: [REDACTED] (hex: [REDACTED]) Checksum: 47070 (hex: b7de) [*] Brute forcing the AES key ... [*] 2097152 keys to try .............................................................. .............................................................. .................. [*] Key Found in 37.69s! Key: [REDACTED] Addr: 0x0015d180 Once the decompression layer is taken care of, we end up with the Message Layer. However, this layer was mainly containing logs messages (indicating the HMU version and various network connection errors). This was kind of expected as there is no pacemaker connected to the HMU. However, this layer would have contained the patient data as well. $ python3 cm-decrypt.py -k [REDACTED] validation -tests/ file_modem.bin

** CM DECRYPT v1.0 **

[*] Opening validation -tests/file_modem.bin...

-- FILE INFORMATION --

File size: 2514 bytes (hex: 9D2)
File created: Sat May 18 00:42:41 2019
File modified: Sat May 18 00:42:41 2019
File entropy: 7.91 bits per byte

[*] Data sanitized!
[*] Decrypt mode

** Transport Layer **

Type of packets: 5 (hex: 05)
Length: 2478 bytes (hex: 09ae)
Unknown: 2 (hex: 02)
Packet ID: 0 (hex: 0000)
CM ID: [REDACTED] (hex: [REDACTED])
Checksum: 47070 (hex: b7de)

** Encryption Layer **

Length of the packet: 2466 (hex: 9a2) (div by 16? No)
Type of packet: 8 (hex: 08) => AES_CBC
IV: 5c8aff9dd3a7bb54f04915bc18e96fbb
Key: [REDACTED]

** Compression Layer **

Compression packet: Yes
Magic header: 0x1f8b => gzip compressed data, from Unix
Entropy: 7.72

** Message Layer **
Size of the recovered data: 28538 bytes Number of packets: 105
Entropy: 3.94

All of this was performed on the T-Line version, and one thing remained to be tried: was it the same key on the GSM version? As we also have some data coming from the GSM version, I tried to decrypt it using the key obtained on the T-Line version but got no success, which is a good thing! However, now knowing that the key was stored in the flash, we tried to dump the firmware of the GSM version. Without a proper connector, we just used jumper wires hold against the pins by two persons while a third one was launching the OpenOCD “script”. The operation crashed after a few seconds due to the difficulty of holding still and we did not manage to get the full firmware (we got it since, as we acquired much better equipment). However, with only the partial dump and the brute force script, we obtained the key without any problem in less than 10 seconds and decrypted the data the exact same way than on the T-Line version, confirming that both devices are using the same protocol (but individual crypto material).

### SMS DATA

As presented in the previous post, we are able to extract not only data sent through GPRS but also SMS data, directly by eavesdropping on the communication line between the microcontroller and the modem. Taking only the payload from the SMS, we are left with the following raw data (for instance):


0604cac0441230acf7b8ef24287fa3954ca200afbbdc44c170dbca0f125cda043f298b476f6377c4f22f2
5ff99e976f9b066ca00aa5a25ab8a47218d21f232ed41aed4e0f22f42e6189968d7cf6b965b73a768d7cf
6b965b73a768d7cf6b965b73a7f6beb75a443cfcfe  

No need to do the whole analyzing process all over again. It seemed here that we are facing the Encryption Layer, with DES encryption (indicated by the leading “06”). Naively, I assumed that I could simply adapt the brute force script to handle DES encryption instead of AES CBC. I tried it but got no result after all keys were tried. That could mean two things:

• The DES key was not in the partial dump we obtained (no way to check this time as I did not have the code).
• The data that was encrypted was not compressed and thus not starting with the magic header 0x091F8B.

Given that I had the AES key, I assumed that the second option was more likely, and also assumed that I could probably determine the key using the entropy of the decrypted data instead of the magic header. The drawback of this is that all keys have to be tried, thus taking longer time to finish. However, how explained above, that was not really a problem, especially given a partial dump. The idea is thus to try all possible keys, compute the entropy of the decrypted data, and to always keep the lowest entropy (which ends up being around 4).

This method worked surprisingly well: the script found the DES key and I was able to decrypt the data. Again, it was log messages, containing mainly the application version along with errors. The usage of DES to encrypt those data is much more of a security issue, as it does not require physical access to the device. However, I learned later when discussing with the manufacturer, that SMS are not used to send patient’s data, only HMU’s logs.

## Conclusion

We have reached the end of this long post! This part of the project was probably the one I enjoyed the most: I have always been interested in cryptography and having the ability to face it somewhere else than during CTF was very exciting. I also learned a lot during the process, especially on reverse engineering.

Now, you might wonder what one can do with the different findings shown in the previous posts. In the final post of this series, we will see how we can use the obtained knowledge to mount a Man-in-the-Middle attack between the HMU and the backend server.