HW Decoder Freeze on Mid-Playback DRM Key Rotation (Live DASH, HW_SECURE_ALL)

:warning: Before you continue


Before submitting a bug report, please review our troubleshooting documentation at Troubleshoot Issues | Vega Troubleshooting

If you still want to file a bug report, please make sure to fill in all the details below and provide the necessary information.

NOTE: PLEASE ONLY REPORT A SINGLE BUG USING THIS TEMPLATE.
If you’re experiencing multiple issues, please file a separate report for each.


:backhand_index_pointing_right: Bug Description


1. Summary

On Vega (Fire TV Stick 4K Select, SDK 0.22), DRM-protected live DASH playback freezes silently within ~20 seconds of an in-band Widevine license rotation. The freeze is not surfaced as an error: HTMLMediaElement.error remains null, readyState stays at 3 (HAVE_FUTURE_DATA), paused=false, and Shaka Player’s MSE append loop continues advancing buffered.end normally — but the underlying HW decoder stops producing frames. currentTime stops advancing while wall-clock time keeps moving, so liveLatency rises 1:1 with real time.

The license server pushes a fresh license approximately every 15 minutes during ongoing playback (standard live OTT key rotation, not license expiration). The previous license is still valid at the moment of rotation; this is a mid-playback key rotation scenario, not a license-expired scenario.
The freeze does not self-clear. Shaka’s stalldetected event fires repeatedly (86 times across one ~30 minute session) and Shaka’s stallSkip: 0.5 configuration triggers seeks, but the decoder surface does not recover. The only intervention that restores playback is a full VideoPlayer.deinitialize() followed by recreating the VideoPlayer and re-loading via Shaka. A Shaka-only reload that skips videoPlayer.deinitialize() consistently SIGABRTs with app-demux-sourc, indicating the native pipeline state is also corrupted by the freeze.

App Name: Starz
App Link on Amazon Appstore (found through Developer Console → Actions column in App List → View on Amazon.com):

Bug Severity
Select one that applies

  • Impacts operation of app
  • Blocks current development
  • Improvement suggestion
  • Issue with documentation (If selected, please share the doc link and describe the issue)
  • Other

2. Steps to Reproduce

  1. Build a Vega/React Native app using @amazon-devices/react-native-w3cmedia and Shaka Player, configured for live DASH playback with Widevine HW_SECURE_ALL for video and SW_SECURE_CRYPTO for audio.
  2. Configure Shaka Player for live: bufferingGoal: 15, bufferBehind: 2, liveSync: true.
  3. Begin playback of a live DASH stream where the license server rotates Widevine keys at a periodic cadence (in our case, every ~15 minutes; the new license is delivered as a fresh MediaKeySession license response while playback is ongoing).
  4. Wait for the first key rotation event (i.e. a license response received >60 seconds after initial load with a new key).
  5. Within ~10–30 seconds of the rotation, the decoder freezes while everything upstream of the decoder reports healthy.

3. Observed Behavior

Explain what actually happened, noting any discrepancies or malfunctions.

After the new license response is delivered to the EME layer:

- `currentTime` stops advancing (frozen for 28+ seconds in our captures).
- `liveLatency` rises linearly 1:1 with wall-clock time — clean proof that the playhead is stuck while real time keeps moving.
- `getVideoPlaybackQuality().totalVideoFrames` stops advancing (only ~6 frames produced over 28 seconds).
- `droppedFrames` does *not* increase — the decoder isn't dropping frames, it's not emitting them at all.
- `buffered.end` keeps growing — Shaka's MSE `SourceBuffer.appendBuffer` is healthy. Network, manifest, segments, and decryption (from the JS layer's perspective) all report fine.
- `readyState` remains 3 (HAVE_FUTURE_DATA), `error` is `null`, `paused` is `false`, `ended` is `false`.
- `isBuffering` (Shaka diagnostic) reports `false`.
- Shaka's `stalldetected` event fires repeatedly with no effect on the surface.
- The `liveSync` configuration (`liveSyncPlaybackRate: 1.3`, `liveSyncMaxLatency: 5`) eventually triggers a forward `currentTime` snap of ~5 seconds, but the freeze immediately resumes at the new value.

The only reliable recovery is a full `VideoPlayer.deinitialize()` + recreate + Shaka re-load (~4 seconds, customer-visible black screen). A Shaka-only reload that skips `videoPlayer.deinitialize()` consistently SIGABRTs with `app-demux-sourc`.

#### Diagnostic snapshot at moment of freeze

`[FREEZE-EVIDENCE]` snapshots taken every ~2 seconds from the moment of rotation onward:

| Δ from rotation | currentTime | totalFrames | buffered.end | liveLatency | readyState | isBuffering |
|---:|---:|---:|---:|---:|---:|---:|
| **0** (rotation detected) | 24023289.577 | 18430 | 24023305.89 | 35.57 | 3 | true |
| +5.4 s | 24023295.010 | 18562 | 24023317.90 | 35.57 | 3 | true |
| +13.0 s | 24023302.542 | 18888 | 24023317.90 | 35.57 | 3 | true |
| **+17.9 s (freeze begins)** | **24023304.011** | **19119** | **24023329.91** | **39.00** | **3** | **true** |
| +22.3 s | 24023304.011 | 19125 | 24023329.91 | 43.47 | 3 | true |
| +27.5 s | 24023304.011 | 19125 | 24023329.91 | 48.61 | 3 | true |
| +40.8 s | 24023304.011 | 19125 | 24023335.90 | 61.94 | 3 | false |

Key observations:
- `currentTime` is pinned at `24023304.011` for ~28 seconds straight.
- `liveLatency` rises +25.0 seconds in 25.0 seconds of wall-clock — exact 1:1 ratio.
- `totalFrames` advances only 6 frames over 28 seconds.
- `droppedFrames` stays at 1 (no new drops — decoder has stopped producing output, not dropping it).
- `buffered.end` continues to grow normally — network and append paths are fully healthy.

#### Where the problem appears to be

| Layer | State during freeze | Verdict |
|---|---|---|
| Network (Shaka segment fetch) | `buffered.end` keeps advancing | :white_check_mark: OK |
| Manifest parsing | no errors | :white_check_mark: OK |
| DRM license fetch | response received in ~600–900 ms | :white_check_mark: OK |
| MSE `SourceBuffer.appendBuffer` | buffered range grows | :white_check_mark: OK |
| W3C `HTMLMediaElement` | `readyState=3`, `error=null`, `paused=false` | :white_check_mark: reports OK |
| Shaka stall detector | fires `stalldetected` repeatedly | ⚠ detects, can't fix |
| Shaka `stallSkip: 0.5` | seeks fire, surface does not unstick | :x: ineffective |
| **HW decoder output / compositor** | **`currentTime` stuck, frames not rendering** | :x: **frozen** |

The problem appears to sit between "license response received" and "frame rendered to surface." All conventional recovery levers (Shaka stall skip, MSE recovery, W3C error event) are inert because none of those layers see anything wrong. The freeze persists through Shaka unload/load cycles; only a full `VideoPlayer.deinitialize()` clears it, indicating native pipeline state is corrupted by the rotation.

4. Expected Behavior

Describe what you expected the SDK to do under normal operation.

When the license server delivers a new license during ongoing playback (key rotation), the new key should be bound to the active decoder session before subsequent encrypted samples are processed, so playback continues uninterrupted. This is the normal expected behavior for live DRM streams — DRM key rotation is a standard pattern for long-lived live channels.

If the new key cannot be bound for any reason, the decoder should at minimum:
1. Surface an error to the W3C `HTMLMediaElement.error` so the application can react, OR
2. Allow Shaka's stall recovery (`stallSkip`) to actually unstick the surface, OR
3. Allow a Shaka-only `unload`/`load` cycle to fully recover without requiring `VideoPlayer.deinitialize()` and without the SIGABRT in `app-demux-sourc`.

4.a Possible Root Cause & Temporary Workaround

Fill out anything you have tried. If you don’t know, N/A is acceptable

**Suspected root cause:** A native-side issue in binding the rotated Widevine key to the active HW decoder session for `HW_SECURE_ALL` content. The new license arrives and is processed by the EME layer (license response received successfully, JS-layer path looks healthy), but the HW decoder enters a wedged state where it stops emitting frames. No JS-layer error is surfaced.

**Workaround we currently use:** A pre-emptive soft reload triggered ~5 seconds after each detected key rotation. The reload tears down the entire `VideoPlayer` (`deinitialize()` + recreate + Shaka re-load) and takes ~3.8–4.0 seconds end-to-end, during which the user sees a black screen. This avoids the freeze but is not an acceptable user experience for a customer-facing live product.

We have not found any other intervention that recovers playback once the freeze occurs.

5. Logs or crash report

(Please make sure to provide relevant logs as attachment)

6. Environment

Please fill out the fields related to your bug below:

  • SDK Version: Run vega --version (v0.22+) or kepler --version (v0.21 and earlier) and paste output

  • App State: [Foreground/Background]

  • OS Information: Please ssh into the device via vega exec vda shell (or kepler exec vda shell for v0.21 and earlier) and copy the output from cat /etc/os-release into the answer section below. Note, if you don’t have a simulator running or device attached, the command will respond with vda: no devices/emulators found

7. Example Code Snippet / Screenshots / Screengrabs


:backhand_index_pointing_right: Playback Issues


If this is a playback issue, please provide your content URL, any pre-conditions (like geo-location), and let us know if it’s x86 or arm7.


Architecture: armv7 — Fire TV Stick 4K Select 


Pre-conditions:

  - Device: Fire TV Stick 4K Select (callie), armv7, Vega OS 0.22
  - Stream type: Live DASH (linear). VOD on the same DRM is unaffected.
  - DRM: Widevine, videoRobustness=HW_SECURE_ALL, audioRobustness=HW_SECURE_CRYPTO, multikey, server-driven key rotation ~14 min
  - Player: @amazon-devices/react-native-w3cmedia 2.1.80, inline mode (VideoPlayer + KeplerVideoSurfaceView), Shaka (Amazon fork) 4.x
  - Account: authenticated Starz test account with live entitlement (ghostlocker test channel)
  - Network: CloudFront CDN, ~120 ms TTFB, no proxy


Please share the following details in addition:_

  • Player SDK: Shaka
  • Player SDK Version: 4.8.5
    • Audio Codecs: AAC-LC stereo
    • Video Codecs: h.264
    • Manifest Types: dash

Q: If applicable, please provide your media/content url
If this is created dynamically, tokenized, etc please provide a way for us to access it

Q: Are there any special headers required to reproduce the issue you are facing?

[N/A or Insert Headers]

Additionally please provide the following if possible
Provide Screenshots / Screengrabs / Logs. Please include as much information as you can that will help debug.

<# Vega HW Decoder Freeze on DRM Key Rotation

**Date:** 2026-05-18
**Device:** Fire TV Stick 4K Select (`callie`, ARMv7), Vega SDK 0.22
**Source log:** `metro.log` (~30-minute live session)

All recovery paths disabled for this capture — the natural decoder behavior on DRM key rotation is recorded unmodified.

---

## Summary

3 DRM license rotations occurred during the session. **Every rotation was followed within seconds by a playback freeze.** Playback never recovered without external `VideoPlayer.deinitialize()` + recreate.

| Rotation | License `t` since load | Time to freeze after rotation | Frozen `currentTime` | Outcome |
|---|---:|---:|---:|---|
| #1 | 619.9 s | +17.9 s | 24023304.011 | 28 s of stuck `currentTime`, then 5 s `liveSync` jump, stalled again |
| #2 | 673.7 s | < 1 s | 24023314.453 | freeze chained directly onto #1's stall recovery |
| #3 | 1529.7 s | (already stalled) | 24024171.122 | 86 cumulative `stalldetected`, still no recovery |

---

## Freeze signature

After the license response is received, the W3C VideoPlayer enters a state where the playback surface stops updating while every other diagnostic reports as healthy.

`metro.log` lines 1670–1774, around rotation #1:

| Δ from rotation | `currentTime` | `totalFrames` | `buffered.end` | `liveLatency` | `readyState` | `isBuffering` |
|---:|---:|---:|---:|---:|---:|---:|
| −2.1 s | 24023287.495 | 18357 | 24023299.87 | 35.57 | 3 | true |
| **0** | **24023289.577** | **18430** | **24023305.89** | **35.57** | **3** | **true** ← rotation-detected |
| +3.0 s | 24023292.599 | 18495 | 24023311.88 | 35.57 | 3 | true |
| +5.4 s | 24023295.010 | 18562 | 24023317.90 | 35.57 | 3 | true |
| +8.0 s | 24023297.624 | 18632 | 24023317.90 | 35.57 | 3 | true |
| +10.5 s | 24023300.083 | 18767 | 24023317.90 | 35.57 | 3 | true |
| +13.0 s | 24023302.542 | 18888 | 24023317.90 | 35.57 | 3 | true |
| +15.6 s | 24023303.911 | 18943 | 24023323.97 | 36.87 | 3 | true |
| **+17.9 s** | **24023304.011** | **19119** | **24023329.91** | **39.00** | **3** | **true** ← **freeze begins** |
| +19.9 s | 24023304.011 | 19119 | 24023329.91 | 41.00 | 3 | true |
| +22.3 s | 24023304.011 | 19125 | 24023329.91 | 43.47 | 3 | true |
| +24.7 s | 24023304.011 | 19125 | 24023329.91 | 45.82 | 3 | true |
| +27.5 s | 24023304.011 | 19125 | 24023329.91 | 48.61 | 3 | true |
| +29.9 s | 24023304.011 | 19125 | 24023335.90 | 51.00 | 3 | false |
| +32.4 s | 24023304.011 | 19125 | 24023335.90 | 53.52 | 3 | false |
| +34.8 s | 24023304.011 | 19125 | 24023335.90 | 55.89 | 3 | false |
| +36.8 s | 24023304.011 | 19125 | 24023335.90 | 57.93 | 3 | false |
| +38.8 s | 24023304.011 | 19125 | 24023335.90 | 59.94 | 3 | false |
| +40.8 s | 24023304.011 | 19125 | 24023335.90 | 61.94 | 3 | false |
| +42.8 s | 24023304.011 | 19125 | 24023335.90 | 63.95 | 3 | false |
| +45.9 s | 24023309.281 | 19313 | 24023335.90 | 61.74 | 1 | false |

### Observations

- **`currentTime` frozen at `24023304.011` for ~28 s** straight (+17.9 s through +42.8 s).
- **`liveLatency` rose linearly +25.0 s in 25.0 s of wall clock** — exact 1:1 ratio. This is the cleanest possible proof that the playhead is stuck while real time keeps moving.
- **`totalFrames` advanced only 6 frames in 28 s** (19119 → 19125) — the decoder essentially stopped producing output.
- **`droppedFrames` stayed at 1** for the entire freeze — no dropped frames, the decoder just stops emitting.
- **`buffered.end` kept growing** (24023329.91 → 24023335.90) — Shaka's `MediaSource` append loop is fully healthy. Nothing wrong with the manifest, segments, or network.
- **`readyState` stayed at 3 (HAVE_FUTURE_DATA)** the entire time. From the W3C API's perspective the player has data and is ready.
- **`isBuffering` reported as `false`** from +29.9 s onward — Shaka itself does not consider the player to be buffering.

The "recovery" at +45.9 s is a `liveSync` jump (`liveSyncPlaybackRate: 1.3`, `liveSyncMaxLatency: 5`) — `currentTime` snapped forward by 5.27 s and the freeze immediately resumed at a new value.

---

## Where the problem is

The freeze is below Shaka and below the W3C MediaSource API surface. Everything upstream of the decoder reports healthy:

| Layer | State during freeze | Verdict |
|---|---|---|
| Network (Shaka segment fetch) | `buffered.end` keeps advancing | ✅ OK |
| Manifest parsing | no errors | ✅ OK |
| DRM license fetch | response received in 618 ms (rotation #1) | ✅ OK |
| MSE `SourceBuffer.appendBuffer` | buffered range grows | ✅ OK |
| W3C `HTMLMediaElement` | `readyState=3`, `error=null`, `paused=false` | ✅ reports OK |
| Shaka stall detector | fires `stalldetected` 86 times across session | ⚠ detects, can't fix |
| Shaka `stallSkip: 0.5` | seeks fire, surface does not unstick | ❌ ineffective |
| **HW decoder output / compositor** | **`currentTime` stuck, frames not rendering** | ❌ **frozen** |

**The problem sits between "license response received" and "frame rendered to surface".** All conventional recovery levers (Shaka stall skip, MSE recovery, W3C error) are inert because none of those layers see anything wrong.

The only intervention that recovers playback is a full `VideoPlayer.deinitialize()` followed by recreating the `VideoPlayer` and re-loading via Shaka. A Shaka-only reload (skipping `videoPlayer.deinitialize`) consistently SIGABRTs with `app-demux-sourc`, indicating the native pipeline state is also corrupted.

---

## Cumulative damage in the session

The freeze does not self-clear, so subsequent rotations layer on top of an already broken state:

- **Rotation #2** (54 s after #1, while still stalled): freeze starts immediately, `currentTime` pinned at 24023314.453.
- **Rotation #3** (~15 min later): `stallsDetected = 86`, `liveLatency = 64.6 s`. Shaka has been firing stall events the entire time with no effect.

---

## Log artifacts

| Tag | Where | Use |
|---|---|---|
| `[KEY-EXCHANGE] request/response #N ... rotation=true/false` | `src2/shakaplayer/ShakaPlayer.ts` | every DRM license event with timing and size |
| `[FREEZE-EVIDENCE] rotation-detected {...}` | `src2/hooks/videoPlayer/useLiveDecoderWatchdog.ts` | full player snapshot at rotation moment |
| `[FREEZE-EVIDENCE] tick stalls30s=N {...}` | same | full player snapshot every 2 s |

`[FREEZE-EVIDENCE]` JSON fields:

- `currentTime` — `VideoPlayer.currentTime` (W3C)
- `totalFrames` / `droppedFrames` — `getVideoPlaybackQuality()`
- `buffered` — `VideoPlayer.buffered[0]` start-end
- `liveLatency` — Shaka `getStats().liveLatency`
- `readyState`, `networkState`, `paused`, `ended`, `seeking`, `playbackRate` — `VideoPlayer` direct
- `isBuffering`, `stallsDetected` — Shaka diagnostics
>

:backhand_index_pointing_right: Additional Context


This blocks our production launch of DRM-protected live linear playback on Vega. Live DRM key rotation is a standard pattern for long-lived live channels in OTT, and we currently have no path to ship without either (a) a platform fix that allows rotated keys to bind cleanly to the active decoder session, or (b) a Shaka/W3CMedia-level recovery mechanism that doesn't require a full `VideoPlayer.deinitialize()` and ~4-second customer-visible black screen.

We're happy to share full instrumented logs, our diagnostic harness, and our `softReload` implementation with the team. We're also open to testing pre-release fixes on this device and content to help shorten the verification cycle.

Hi @Oleh_Mazur

Welcome to Amazon Developer Community !!

Thank you for this exceptionally thorough bug report - the diagnostic evidence clearly isolates the issue to the native HW decoder/compositor layer below the W3C API surface.

A few notes:

  1. This appears to be a platform-level issue. Your evidence that buffered.end keeps growing, readyState stays at 3, error is null, and only VideoPlayer.deinitialize() recovers playback (while a Shaka-only reload SIGABRTs) all point to a native pipeline issue with binding rotated Widevine keys to an active HW_SECURE_ALL decoder session.
  2. Your workaround (pre-emptive soft reload on key rotation) is reasonable given the constraints, though we understand the ~4s black screen is not acceptable for production.
  3. To help the media team investigate, could you please also provide:
    - The full metro.log from the captured session (you referenced it but it’s not attached)
    - The ACR/crash report from the SIGABRT when attempting Shaka-only reload
    (the app-demux-sourc crash)
    - Your @amazon-devices/react-native-w3cmedia version (you mentioned 2.1.80 -
    please confirm this is the latest available to you)
  4. Confirm your manifest permissions include both DRM services:

     [[wants.service]]
     id = "com.amazon.drm.key"
  
     [[wants.service]]
     id = "com.amazon.drm.crypto"
  
     [[needs.privilege]]
     id = "com.amazon.privilege.security.file-sharing"

This is being checked internally in the meantime.

Warm Regards,
Ivy

Thank you for the quick response.

Regarding point 3:

  • Since I am a new user on the forum, I am currently unable to attach any files and receive an error when trying to do so.
  • The crash does not happen consistently. At the moment I’m unable to reproduce it again in order to provide a crash report, but the video freezes and visual artifacts appear. I can also provide a video recording, but I am facing the same attachment limitation mentioned above.
  • I also tried updating @amazon-devices/react-native-w3cmedia to the latest available version (“@amazon-devices/react-native-w3cmedia”: “^2.2.20”), but this did not change the behavior.

Regarding point 4:

  • Yes, all of those permissions are present.

Hello @Oleh_Mazur

We would need logs to check on this issue, can you please share the logs during the time of the issue to check on this.

Warm Regards,
Ivy

Since I’m unable to attach any files here, I uploaded them to GitHub instead.

Hi @Oleh_Mazur ,

The logs attached are not the journalctl logs, they are react native logs.
Requesting you to kindly share the loggingctl/journalctl logs.

Warm Regards,
Ivy

Added the required logs to the folder:

Hi @Oleh_Mazur

As we could check, the logs are filtered app logs. However, to proceed further, we would need logs from all the system components as well.
Requesting you to kindly share complete logs.

Warm Regards,
Ivy

Okay, I recorded another session with all components logs included. They can be found in the same repository, in the following folder:
jun_2_2026_complete_logs

Hello @Oleh_Mazur

The issue seems to be happening after the DRM session was successfully created, the segments that would decrypt with rotated drm key arrives for decode. Probably the decryption of new segments is not being done with the right key but we do not have enough evidence to prove that point.

We need amznDecrypt log to see what happens during the decryption of new segments.

The attached log does not have the amznDecrypt logs. can you please provide us the content so we can repro the issue at our end?

Warm Regards,
Ivy

Hi Ivy,

I collected the logs using loggingctl log -f, however I do not see any entries with the amznDecrypt tag in the output.

Could you please clarify:

  1. Is there a specific command or log configuration required to enable amznDecrypt logging?
  2. Are these logs expected to be available through loggingctl, or should they be collected from a different device log artifact?
  3. If possible, could you provide an example of an amznDecrypt log line or the expected log tag/process name?

At the moment I am unable to find any amznDecrypt entries in the device logs.

Additionally, for context, the same stream URL and DRM workflow appear to work correctly on other platforms. We have also observed cases on Vega where frames eventually start rendering after the stall, but with significant delay and heavy stuttering.

Hi @Oleh_Mazur,

Thank you for the follow-up. You’re right that amznDecrypt logs are not available through the standard loggingctl log -f output at default log levels - these are internal platform-level logs that may require elevated log level configuration to surface.

To enable more verbose logging for the DRM/decryption layer, please try the following:

  1. Enter the device shell:
    vega exec vda shell
  2. Increase the log level for the media/DRM processes:
    loggingctl config --set-level media debug
  3. Then stream logs with a filter:
    loggingctl log -f | grep -i "decrypt\|cdm\|drm\|widevine\|key"
  4. Reproduce the key rotation freeze and capture the full output.

If amznDecrypt entries still don’t appear, please share the content - it would allow us to reproduce the issue on our side.

Warm Regards,
Ivy

Hello Ivy_Mahajan,

Sorry for the somewhat delayed response.

I have set the logs to the requested level:

{
“message”: “media → debug”,
“note”: [
“Journal MaxLevelStore value: debug”,
“DEFAULT:msev → info”
],
“status”: “success”
}

I uploaded the logs to the jun_11_2026_debug_level folder in the same repository.

Hi @Oleh_Mazur

Thank you for sharing all requested logs.
Is it possible for you to share the content that we can test ourselves?

Reason: We need drmsesrver and amazondecrypt logs which you can not fetch.
The recent logs and previous logs point to some bug during the decrypt and decode session after the DRM init is done.
As amazonDecrypt and drmserver logs are not present, we are not able to debug any further.

Note: On Starz app we do not see any live video, so we would need the content to actually pinpoint the point of failure.

Warm Regards,
Ivy

(post deleted by author)

Hi Ivy,
Unfortunately, I’m unable to provide a URL that reproduces the key rotation issue due to the sensitive nature of the content and related information.

However, I believe I was able to capture the relevant logs (I’m not sure if it’s enough, but it’s at least more than I had before.) . The issue was that I needed to configure the log level for the individual media/DRM-related processes rather than for the application as a whole.

The logs have been uploaded to the jun_17_2026_loggingctl folder. Please let me know if you need any additional information or if there are any other logs you would like me to collect.