Language 简体中文正體中文 English

Notice This article is published 2349 days ago, some contents may be deprecated.

AI Summary

This article discusses the potential security risks of Shadowsocks stream ciphers. Although Shadowsocks encryption is strong, stream ciphers lack data integrity, making them vulnerable to tampering. Through a redirect attack experiment, the author demonstrates that attackers can decrypt some traffic using a Shadowsocks server under certain conditions. Users are advised to use AEAD encryption or other tools to enhance security.

Shadowsocks is a "magical Internet browsing" tool that has accompanied countless players for many years, but in recent years, with the increasing height of GFW (Great Firewall), some Shadowsocks traffic has been well identified, and then - "Comrade, your ladder (VPN) has collapsed".

Although it is well known that "the protocol can be identified", we still believe that Shadowsocks encryption is well done, and there should be no way to crack the plaintext information.

However, a paper I saw at night slightly shook my confidence in the data security of Shadowsocks, and made me re-examine the security of stream ciphers. Previously, when I was doing security popularisation, I always mentioned that "ECB is not safe", but now I find that some other stream cipher are not necessarily safe, because they do not guarantee the integrity of the data, so there is a possibility of data tampering.

Shadowsocks is still relatively safe if configured properly, or at least it can make the current cracking methods useless. If you are still not at ease after reading this article, you can use other scientific Internet browsing tools.
Some of the pictures in this article are quoted from Wikipedia and may need to be viewed in a proper way (if you are in some countries).

Some basic concepts

Considering that some of you may not have a basic knowledge of cryptography and are not very familiar with the Shadowsocks protocol, here is a brief introduction. If you already have enough knowledge, you can skip to the next section.

Stream Cipher

Many people have heard of symmetric encryption algorithms such as AES, but they don't know the details. For example, what is IV? What is block-based encryption?

In fact, the AES algorithm itself is not designed to handle infinite-length strings. It can only handle 16 bytes at a time (whether it is AES-128 or AES-256). We call these 16 bytes a "data block", and AES is a block-based encryption algorithm. For this defect, we can use this approach: first cut the plaintext into several data blocks, encrypt each block with AES, and then concatenate each encrypted block.

Isn't it simple? In fact, this is the ECB mode. The AES-256 algorithm that uses this approach is called AES-256-ECB. Since 16 bytes is too small for some data, a large number of duplicate blocks are easily generated. For example, there is a picture:

Original image

Everyone knows this is a little penguin. Let's encrypt it with ECB and take a look:

Encrypted using ECB mode

You should still be able to see the little penguin in the picture. ECB mode cannot hide the characteristics of the plaintext. Please try not to use it.

There are some improved methods, such as providing an initial vector IV as the 0th data block in addition to the password. Before encrypting the i-th block, XOR the plaintext with the encrypted result of the previous block, that is:

Cipher[0] = IV
Cipher[i] = Encrypt(Plain[i] ^ Cipher[i - 1])

This method is much safer than the previous ECB.

There are other ways, all of which (including ECB) are collectively referred to as "stream ciphers". The little penguin at the beginning, after being encrypted with a method other than ECB, looks like this, and it is completely unrecognizable:

Modes other than ECB result in pseudo-randomness

Man-in-the-middle Attack

Man-in-the-middle (MITM) attack is a fun thing. Suppose an attacker uses some methods (such as using a phishing hotspot) to make the packets sent by the victim go to him, then the attacker can view or modify the contents of the packets. It's like Alice passing a note to Bob in class. Originally Alice wrote "I like you", but when it passed through Marvin, Marvin replaced their notes with his own, which is "Get out", showing how much the MITM attack affects us.

Note: Passing notes in class is not a good behaviour. Children should not imitate it.

The most commonly experienced MITM attack in daily life may be: when accessing some HTTP websites, suddenly a "broadband expiration needs to be renewed" notification pops up. This kind of MITM attack is done by some unscrupulous operators, but we are more accustomed to calling it "traffic hijacking".

Data Integrity

This paragraph is a bit lazy, so I'll just quote from Baidu Baike (with some modifications):

Integrity is one of the three basic points of information security, which means that users, processes, or hardware components have the ability to verify the accuracy of what they send or transmit, and that processes or hardware components will not be changed in any way.

In plain language, it means: ensuring that data is not tampered with during transmission, and the content of the note that Bob received is exactly the same as the one that Alice passed out.

The ECC memory error correction algorithm that everyone has heard of can ensure data integrity. It can detect that memory data has been tampered with (such as being hit by high-energy particles or partial damage to hardware), and try to restore the damaged part through some algorithms;
The digital signature in HTTPS is also a method to ensure data integrity. It can detect whether the transmitted data has been tampered with (such as MITM attack) and immediately stop data transmission to prevent users or servers from suffering losses due to receiving false data.

Shadowsocks Protocol Basics

Although Shadowsocks uses the underlying protocol SOCKS5, for this article, the underlying SOCKS5 is not the focus. We only need to focus on how data is transmitted between the Shadowsocks client and server.

According to the official documentation, the data sent by the client to the server is the stream cipher IV at the beginning (that is, the IV is generated by the client and directly thrown into the data packet), followed by a piece of encrypted data. The plaintext format is like this:

[Target Address][Data]

Where data can be of any length; as for the target address, Shadowsocks uses the SOCKS5 representation:

[Type (1 byte)][Hostname][Port (2 bytes)]

Where the type is an enumeration value of 1 byte:

0x01: The hostname is an IPv4 address;
0x03: The hostname is a variable-length string, with the first byte indicating the length (up to 255), followed by the data;
0x04: The hostname is an IPv6 address.

The process of a proxy is as follows:

The client encrypts this data and sends it to the server;
The server decrypts it after receiving it, and gets [Type (1 byte)][Hostname][Port (2 bytes)][Data];
The server sends the data part directly to Hostname:Port;
The server encrypts the data returned by the host using the same algorithm (if the encryption algorithm uses a stream cipher, a new IV will be generated and used, and it will be placed at the beginning of the packet), and sends it to the client;
The client can get the data returned by the host after decryption.

Redirect attack - Weakness of Shadowsocks Stream Ciphers

Going back to the article mentioned at the beginning, the author's discovery is: if an attacker captures a packet returned by a Shadowsocks server and knows the first seven bytes of the data part, it is possible to decrypt most of the content of the packet without knowing the password using that Shadowsocks server (at most losing 16 bytes).

The author's idea is as follows:

Suppose there is a Shadowsocks server, and the attacker captures a packet returned by this Shadowsocks server through sniffing or other ways.

To know the plaintext content, the attacker either needs to brute force the password (which is almost impossible as people's security awareness increases), or find a way to use this Shadowsocks server to help decrypt.

The author chose the latter, that is, to find a way to turn this packet into a packet sent by the client, so that the server can decrypt it and proxy it to the server specified by the attacker, which is called a Redirect attack.

In the previous section, the format of the packet sent by the Shadowsocks client (in plaintext) is [Type (1 byte)][Hostname][Port (2 bytes)][Data]. If the attacker can use the vulnerability of the encryption algorithm to tamper with the plaintext data, the attacker can change the hostname to the attacker's server address, and the Shadowsocks server will think that the client wants to access the attacker's server, so it will send the data part of the decrypted packet over.

First consider how to tamper with the data. Suppose the encryption algorithm used by this Shadowsocks server is AES-256-CFB, then the decryption method is as described in Wikipedia:

Where IV, each block of ciphertext and plaintext are all 16 bytes in length.

The author found that the key is unchanged, and the IV can also be reused from the packet returned by the server. If only the first block of ciphertext is modified, then only the first two blocks of plaintext will change. More importantly, since the first block of plaintext is the XOR value of the first block of ciphertext and a string A, the attacker can completely control the first block of plaintext by modifying the value of the first block of ciphertext! The specific method is as follows:

Suppose the current first block of ciphertext is c1, the first block of plaintext is p1, and the result of the IV after a series of whatever operations is a, then:

Given a ^ c1 == p1
Then a ^ c1 ^ X == p1 ^ X
According to the associative law of XOR
It can be obtained that a ^ (c1 ^ X) == (p1 ^ X)
That is to say, the XOR operation performed by the attacker on c1 will be completely reflected in p1

The attack method is not complicated:

Suppose that p1 is finally changed to q1, we can think that p1 ^ X == q1
XOR both sides with p1, it can be obtained that p1 ^ X ^ p1 == q1 ^ p1
Due to the properties of XOR, the two p1 on the left cancel each other out
It can be obtained that X == q1 ^ p1

The attacker needs to do is c1 ^= (q1 ^ p1). But there is a problem here, we cannot know what the specific p1 is! But fortunately, it is part of the plaintext data. In the process of surfing the Internet, there are always some protocols whose first few bytes are fixed, such as the HTTP protocol.

In the third decade of the 21st century, everyone should have switched to HTTP 1.1 long ago, so the returned data packet must start with 8 bytes HTTP/1.1. Can the attacker compress the host address to such a small size? After all, except for 1 byte type and 2 bytes port, there are only 5 bytes of available space.

For most attackers, it is impossible to get a domain name of less than 5 bytes, so only IPv4 can be considered, and if IPv4 is used, even a total of 7 bytes is enough! For example:

01 c0 a8 01 03 12 12
-- ----------- -----

The three parts marked by the lines represent: using the IPv4 protocol, the address is 192.168.1.3, and the port is 4626.

So we can define:

p1 = 'HTTP/1.'
q1 = '\x01\xc0\xa8\x01\x03\x12\x12'
new_c_part = c1[0:7] ^ p1 ^ q1

Replace the previous 7 bytes of c1 with this 7-byte new_c_part, and then send the entire replaced packet directly to the Shadowsocks server just now.

The Shadowsocks server tries to decrypt it, and after decryption, it finds that the plaintext is like this:

01 c0 a8 01 03 12 12 XX XX XX XX XX XX XX XX XX

The server will think this is a legitimate client request, so it forwards the following string of XX (plaintext data) to 192.168.1.3:4626 according to the requirements of the first 7 bytes.

The attacker has been waiting here for a long time. The method is very simple, just start a port listening with nc:

$ nc -l -p 4626

Since the attacker modified c1, and c1 is used to decrypt p2 in CFB mode, the 16 bytes of p2 received should be garbled. The attacker can eventually restore all data except p2. The command line screenshot in the paper also illustrates this point. The first byte of the obtained data is the 8th byte of the plaintext of the previous packet (the first 7 bytes are HTTP/1.), then 9 bytes are correct, the next 16 bytes are garbled, and then the rest is completely correct:

1 304 Not???????????????? Sat, 26 Jan 2019 07:15:21 GMT
Connection: close
Via: 1.1 varnish
Cache-Control: max-age=600
ETag: W/"5c45d22a-127"
Expires: Sat, 26 Jan 2019 06:59:41 GMT
Age: 0
....

The defense measures given by the author are:

Disable shadowsocks-py, shadowsocks-go, go-shadowsocks2, shadowsocks-nodejs
Only use shadowsocks-libev, and only use AEAD encryption

The reason is as follows: the implementation of shadowsocks-libev has long banned IV reuse, which can prevent this attack to a certain extent; as long as the encryption algorithm has the AEAD feature, the data cannot be tampered with, and the attack method in this article is also invalid.

Impact on the Public

Although the article only lists examples of the HTTP protocol and the CFB mode, theoretically, all protocols with the first 7 bytes of the header known and all combinations similar to stream ciphers can be attacked by this method. You cannot guarantee that your scientific network is always HTTPS. Even if it is HTTPS, if it is a domestic website, when some irresistible force obtains its certificate, your TLS traffic will always be decrypted.

However, there are some comforting things:

Due to the increase of the GFW, everyone has gradually realized that "encryption is not enough", so they have switched to the tools with obfuscation functions. Since the attacker cannot know the parameters of the obfuscation (and even does not know which traffic is the traffic of the VPN), this method no longer works.

Most tools have disabled old encryption algorithms, and even forced the use of encryption algorithms with GCM or Poly1305. These algorithms have strict AEAD features, which can greatly ensure data security. The mandatory use of AEAD in TLS 1.3 also guarantees its security to some extent.

If you are still using Shadowsocks or its derivative tools, and still use ordinary stream ciphers for encryption, then please immediately follow the defence measures given by the author, for your server, and yourself.

References

Update 2020-02-15

Added supplements related to IV reuse.

R·e^x / Zeng

MUGer, hacker, developer, amateur UI designer, punster, Japanese learner.

Redirect Attack - Weakness of Shadowsocks Stream Ciphers