What starting position does this attack require?

The first step is Exfil sensitive audio (T1041) — a exfiltration primitive. Assumed environment: foothold on the office LAN.

What is the final impact of this kill-chain?

The final step lands on Capture RTP streams (VOIP-RTP-CAPTURE), which falls under Collection. From here, an operator typically pivots into post-exploitation or maintains persistence.

How can defenders detect or prevent this attack?

Detection and prevention vary per step. Refer to each linked MITRE ATT&CK entry under "References" — every technique on that page lists defensive controls, detection telemetry, and known threat-actor usage.

← RegistryDossier · 5 steps · 4 edges

Approved

MITM unencrypted RTP → call eavesdropping

Most internal SIP deployments still use RTP without SRTP. From the same VLAN, ARP-spoof the IP phone + PBX, capture RTP, decode in Wireshark to .wav.

Filed by AD Knowledge BasePublished 2026-05-26

§ Kill-chainDrag · zoom · scroll

Exfiltration

Exfil sensitive audio

T1041 · Exfiltration Over C2 Channel

Initial Access

LAN foothold

T1078 · Valid Accounts

Credential Access

ARP-spoof phone + PBX

N-ARP-SPOOF · ARP Spoofing / Cache Poisoning

Collection

Wireshark RTP → audio

T1056 · Input Capture

Collection

Capture RTP streams

VOIP-RTP-CAPTURE · RTP Stream Capture

§ Context

Assumed environment: foothold on the office LAN. SIP/RTP runs unencrypted between desk phones and PBX. ARP / DHCP guard not configured on the switch.

§ Steps

01
Exfil sensitive audioExfiltration
T1041— Exfiltration Over C2 Channel
02
LAN footholdInitial Access
T1078— Valid Accounts
03
ARP-spoof phone + PBXCredential Access
N-ARP-SPOOF— ARP Spoofing / Cache Poisoning
04
Wireshark RTP → audioCollection
T1056— Input Capture
05
Capture RTP streamsCollection
VOIP-RTP-CAPTURE— RTP Stream Capture

§ References

§ Frequently asked

What is the "MITM unencrypted RTP → call eavesdropping" attack path?: Most internal SIP deployments still use RTP without SRTP. From the same VLAN, ARP-spoof the IP phone + PBX, capture RTP, decode in Wireshark to .wav. It chains 5 steps drawn from real-world offensive-security techniques.
What starting position does this attack require?: The first step is Exfil sensitive audio (T1041) — a exfiltration primitive. Assumed environment: foothold on the office LAN.
What is the final impact of this kill-chain?: The final step lands on Capture RTP streams (VOIP-RTP-CAPTURE), which falls under Collection. From here, an operator typically pivots into post-exploitation or maintains persistence.
How can defenders detect or prevent this attack?: Detection and prevention vary per step. Refer to each linked MITRE ATT&CK entry under "References" — every technique on that page lists defensive controls, detection telemetry, and known threat-actor usage.

MITM unencrypted RTP → call eavesdropping

§ Context

§ Steps

§ References

§ Frequently asked

§ Related dossiers

Cloudflare account compromise → Worker rewrite → mass cred theft

Mass SMS phish → Okta-style portal → SaaS sprawl (0ktapus)

Vesting beneficiary replace → silently drain stream

Apple Pay Express Transit relay → high-value contactless fraud

Insider admin panel coercion → mass account takeover (Twitter 2020)

MITM HL7 v2 → tamper lab orders / results