Writeup: Hidden Signal - upCTF (Crypto)
Challenge
A leaked password database. The data looks random. It isn't. Find the signal.
Flag: upCTF{m4rk0v_w4s_h3r3_4ll_4l0ng}
Analysis
We're given passwords.txt — a file with 4000 lines of what appears to be random uppercase letters, each line thousands of characters long.
YSOTUIQJZKSTWJLQXNRXKTIAJDGEWRXPPUOQZZUB...
MRELFKCPMMZFOQRACXCCWCDJQYPRJKSFZOUDVJQNI...
Step 1: Find the Signal Characters
Frequency analysis of the entire file reveals two distinct populations:
- 26 uppercase letters — each appearing ~1.307M times (uniform noise)
- 37 other characters (lowercase
a-z, digits0-9, underscore_) — each appearing ~2-4K times
The non-uppercase characters are the hidden signal embedded in the noise.
Step 2: Examine the Signal Structure
Extracting non-uppercase characters per line shows:
- Each line contains exactly 25 signal characters
- They are contiguous — a 25-char block embedded at a random position within the uppercase padding
- 4000 lines x 25 chars = 100,000 signal characters total
Example extracted signals:
6dky0pzw_n7m38syf0z5n_rny
m53y90mq4c_7cro_ngvgxpckm
ojtbn1ewq15mx7932lqqitpv1
Each line's signal looks random individually.
Step 3: Column Frequency Analysis
The key insight: while the signal characters appear random per-line, analyzing the most frequent character at each column position across all 4000 lines reveals a clear bias:
| Col | Top Char | Frequency |
|---|---|---|
| 0 | m | 562 (14%) |
| 1 | 4 | 569 (14%) |
| 2 | r | 550 (13%) |
| 3 | k | 507 (12%) |
| 4 | 0 | 594 (14%) |
| 5 | v | 545 (13%) |
| 6 | _ | 530 (13%) |
| ... | ... | ... |
With 37 possible characters, uniform random would give ~108 occurrences (2.7%) per character per column. The dominant characters appear at 11-14% — a statistically significant deviation.
Reading the most frequent character from each column:
m4rk0v_w4s_h3r3_4ll_4l0ng
Solution Script
from collections import Counter
data = open('passwords.txt').read()
lines = data.strip().split('\n')
# Extract non-uppercase signal characters from each line
signals = [''.join(c for c in line if not c.isupper()) for line in lines]
# Most frequent character per column position
flag = ''.join(
Counter(s[col] for s in signals).most_common(1)[0][0]
for col in range(25)
)
print(f'upCTF{{{flag}}}')
Flag
upCTF{m4rk0v_w4s_h3r3_4ll_4l0ng}
The flag "Markov was here all along" references Markov chains — the likely method used to generate the random-looking uppercase noise, making each line appear to be a plausible random password while hiding a statistical signal in the embedded characters.