Photo by Growtika / Unsplash

Writeup: Hidden Signal - upCTF (Crypto)

Feri Harjulianto

Challenge

A leaked password database. The data looks random. It isn't. Find the signal.

Flag: upCTF{m4rk0v_w4s_h3r3_4ll_4l0ng}

Analysis

We're given passwords.txt — a file with 4000 lines of what appears to be random uppercase letters, each line thousands of characters long.

YSOTUIQJZKSTWJLQXNRXKTIAJDGEWRXPPUOQZZUB...
MRELFKCPMMZFOQRACXCCWCDJQYPRJKSFZOUDVJQNI...

Step 1: Find the Signal Characters

Frequency analysis of the entire file reveals two distinct populations:

  • 26 uppercase letters — each appearing ~1.307M times (uniform noise)
  • 37 other characters (lowercase a-z, digits 0-9, underscore _) — each appearing ~2-4K times

The non-uppercase characters are the hidden signal embedded in the noise.

Step 2: Examine the Signal Structure

Extracting non-uppercase characters per line shows:

  • Each line contains exactly 25 signal characters
  • They are contiguous — a 25-char block embedded at a random position within the uppercase padding
  • 4000 lines x 25 chars = 100,000 signal characters total

Example extracted signals:

6dky0pzw_n7m38syf0z5n_rny
m53y90mq4c_7cro_ngvgxpckm
ojtbn1ewq15mx7932lqqitpv1

Each line's signal looks random individually.

Step 3: Column Frequency Analysis

The key insight: while the signal characters appear random per-line, analyzing the most frequent character at each column position across all 4000 lines reveals a clear bias:

ColTop CharFrequency
0m562 (14%)
14569 (14%)
2r550 (13%)
3k507 (12%)
40594 (14%)
5v545 (13%)
6_530 (13%)
.........

With 37 possible characters, uniform random would give ~108 occurrences (2.7%) per character per column. The dominant characters appear at 11-14% — a statistically significant deviation.

Reading the most frequent character from each column:

m4rk0v_w4s_h3r3_4ll_4l0ng

Solution Script

from collections import Counter

data = open('passwords.txt').read()
lines = data.strip().split('\n')

# Extract non-uppercase signal characters from each line
signals = [''.join(c for c in line if not c.isupper()) for line in lines]

# Most frequent character per column position
flag = ''.join(
    Counter(s[col] for s in signals).most_common(1)[0][0]
    for col in range(25)
)

print(f'upCTF{{{flag}}}')

Flag

upCTF{m4rk0v_w4s_h3r3_4ll_4l0ng}

The flag "Markov was here all along" references Markov chains — the likely method used to generate the random-looking uppercase noise, making each line appear to be a plausible random password while hiding a statistical signal in the embedded characters.

CryptoCTFWriteup