livekit

mirror of https://github.com/livekit/livekit.git synced 2026-06-27 15:18:39 -04:00

Author	SHA1	Message	Date
Raja Subramanian	23090163ce	Configurable migration wait duration for longer waits in simulation. (#4624 ) Only applies if it is more than the default 3 seconds.	2026-06-26 20:01:32 +05:30
Raja Subramanian	1faab0c48e	Add support for data blob (a. k. a. async participant attributes) (#4619 ) * Async attributes on participant. How it is different from existing participant attributes? 1. Async attribute can be added one at a time. 2. These are not included in `ParticipantInfo`. 3. Get an attribute bt participant identity and async attribute ID as and when needed. * clean up * get full definitions, not just ids * listener OnDataTrackSchema * name length config * data blob * deps * static check * Add missing request ID * Update protocol commit * Wire up StoreDataBlobResponse * Pass request ID through in GetDataBlobResponse * deps * atomic * sctp at 1.9.5 * remove proto clone --------- Co-authored-by: Jacob Gelman <3182119+ladvoc@users.noreply.github.com>	2026-06-24 14:42:37 +05:30
laosun	6658dd5454	Echo offered audio payload types in single-PC subscriber answer (#4614 ) In single peer connection mode, when the server answers a subscriber's offer, configureSenderAudio set the sender codec preferences from the server MediaEngine's payload types. The answer could therefore advertise Opus on a payload type the offerer never offered (server PT 111 vs offered PT 109). Chrome tolerates this; Firefox decodes 0 samples (silence) -- packets are received but never decoded. The forwarded RTP already uses the offered PT, so only the answer SDP was inconsistent. This regressed in v1.12.0 once the single-PC MediaEngine became a union of publish+subscribe codecs. Parse the remote offer's audio rtpmap and remap the sender audio codec preferences to echo the offered payload types (RFC 3264 6.1) before SetCodecPreferences. Fixes #4599 Co-authored-by: laosun <14806343+cnvipstar@users.noreply.github.com>	2026-06-23 10:36:41 +05:30
Raja Subramanian	4facbc582a	Move lock to addPendingTrack function. (#4617 ) Wrapping the function with the lock outside in the only invocation was not needed.	2026-06-23 10:30:59 +05:30
Ryan Gaus	86a79f83fc	fix: report participant capabilities in ParticipantInfo (#4606 )	2026-06-22 09:23:33 -04:00
Raja Subramanian	f7085535da	Tighten up publish latency stat. (#4615 ) Previously it was anchored to participant transitioning to `ACTIVE` if the add track request happened before that. But, that has a few issues 1.`ACTIVE` is for primary peer connection which could be subscriber peer connection. 2. `ACTIVE` also include data channel establishment. Switch to first connected time of publisher peer connection for that to get a more accurate measure of track publish time.	2026-06-22 17:36:06 +05:30
CloudWebRTC	cfedcc71d0	feat: acquire requested video layer directly at HIGH quality by default (#4595 ) * feat: acquire requested video layer directly at HIGH quality by default Two changes that together remove the visible low->high quality ramp for a new subscriber (both publisher-first and subscriber-first join orders): 1. Default a subscriber's initial video quality to HIGH on bind instead of LOW for adaptive stream, so the subscribed max layer is the top layer. Adaptive stream clients can still scale down afterwards based on viewport. 2. On initial layer acquisition the forwarder/selector latch directly onto the allocator's target (the requested top layer) instead of opportunistically latching onto the first lower key frame that arrives. A short initial-acquisition grace aims the target at the requested layer; if it does not show up in time, the target falls back to the highest layer seen so acquisition never stalls. Always on - no configuration flag. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat: gate start-at-desired-quality behind EnableStartAtDesiredQuality flag Put the "acquire requested video layer directly at HIGH quality" behavior behind a per-subscriber EnableStartAtDesiredQuality flag (default off, so the original low->high ramp-up is restored unless enabled). Plumbed from config.RTC.EnableStartAtDesiredQuality through ParticipantParams -> SubscribedTrack/DownTrack -> Forwarder -> simulcast selector, gating all three behavior changes: the HIGH default on bind, the forwarder's initial-acquisition grace, and the selector's direct-latch-onto-target. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * remove config. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-19 20:12:54 +08:00
Raja Subramanian	a011d995da	Do not call nil callback (#4607 )	2026-06-18 23:24:33 +05:30
Raja Subramanian	e7c63aa537	Log subscription limit breaches (#4603 )	2026-06-18 00:08:39 +05:30
Raja Subramanian	67ca7a12cf	Record more RTC cancellation points. (#4600 ) There are several places the participant can drop off after initiating a connection attempt. Count those places as cancellation including when participant is closed due to specific reasons. Cancels should be discounted when determining RTC/ICE connectivity success/failure percentage.	2026-06-17 20:43:29 +05:30
Paul Wells	12a023ae45	agent: thread attributes map from dispatch to job (#4598 ) * agent: thread simulation flag from dispatch to job Reads simulation from AgentDispatch / RoomAgentDispatch and copies it onto Job in agent.LaunchJob and the inline room-agent path so workers see the flag. Stacked on top of livekit/protocol#1629. * agent: replace simulation bool with attributes map Threads the renamed attributes map (was bool simulation) from dispatch to job and bumps the protocol pseudo-version. * deps	2026-06-16 01:53:01 -07:00
cnderrauber	9746c9a9d6	Enforce subscriptio permission to data track (#4588 ) * Enforce subscriptio permission to data track * use revoke path as same as media track * nil check	2026-06-12 16:02:12 +08:00
shishirng	08ab361e8e	[WIP] rtc: add RestartSessionTimer to re-anchor participant session duration (#4566 ) * rtc: add RestartSessionTimer to re-anchor participant session duration Exposes ParticipantImpl.RestartSessionTimer so the session timer can be re-anchored to the actual join time. Duration is only ever emitted once the participant becomes active, so re-anchoring at join keeps pre-join wall-clock out of the reported/billed duration. Adds the method to the LocalParticipant interface (fake regenerated) and a local protocol replace to pick up SessionTimer.Reset. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * tidy * update protocol * report ended at for inactive sessions --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Paul Wells <paulwe@gmail.com>	2026-06-11 10:02:25 -07:00
cnderrauber	7dc6877738	Preserve original expiry when refreshing token (#4580 ) To avoid shortening the token expiration time during refreshing cause client reconnect failed after network down for a long time (>5min).	2026-06-10 14:51:10 +08:00
Raja Subramanian	8d2b827f44	Add prom metrics for peer connectino state. (#4574 ) * Add prom metrics for peer connectino state. By direction (PUBLISHER vs SUBSCRIBER) and state ("started" -> "connected"). This gives a way to track peer connections failing to finish establishment. The RTC active count can be useful for primary peer connection, but not for non-primary. This counter can be used to track any and can generally be used to understand success/failure rate of peer connection establishment. * add a couple of more states * clean up and avoid duplicate reporting fully established * staticcheck	2026-06-09 16:11:03 +05:30
Paul Wells	77ecf920ff	rtc: report participant session end time on room move (#4561 ) MoveToRoom resets the participant reporter resolver to receive new (room, participant_session) keys for the destination, but the source room's participant_session row never gets an end_time — the periodic duration scrape only emits one once disconnectedAt is set, and a move doesn't transition the participant to DISCONNECTED. Report end_time immediately before the reset so the row is closed out cleanly.	2026-06-03 21:35:39 -07:00
cnderrauber	63be96f631	Prevent panic from nil(illegal) syncState.Subscriptions message (#4560 )	2026-06-04 10:32:24 +08:00
cnderrauber	356ae211a3	Config documentation for advertise_internal_ip and skip_external_ip_validation (#4552 ) See https://github.com/livekit/mediatransportutil/pull/88	2026-06-01 14:37:08 +08:00
shishirng	7c319a67d4	rtc: prevent duration reporting for inactive participants (#4550 ) Added a check to ensure that duration is not published for participants that never became active.	2026-05-27 14:39:04 -04:00
Paul Wells	cde8962709	rtc: emit per-data-track bytes via BytesTrackStats (#4540 ) Data tracks (the new _data_track datachannel) previously only updated a private dataTrackStats that logged a single summary at Close. Bytes never reached the OnTrackStats -> TelemetryService.TrackStats pipeline that media tracks and signal channels feed. Wire DataTrack (UPSTREAM, publisher-home) and DataDownTrack (DOWNSTREAM, per-subscriber) into BytesTrackStats on the same 5s cadence, mirroring the media-track convention: subscriber's country and ID with publisher's track ID for DOWNSTREAM. Cross-region proxy DataTracks leave the stats pointer nil (no publisher reporter on that node, and relayed bytes would double-count). Legacy dataTrackStats packet-loss/frame counters are preserved.	2026-05-23 17:42:55 -07:00
Raja Subramanian	062d12197f	Use NACKQuueInterface type. (#4538 ) And some extra logging for subscription permission when it fails.	2026-05-21 23:00:51 +05:30
Paul Wells	7f08b04c1e	Add IsIntentionalDisconnect helper (#4537 ) Shared helper for callers that need to distinguish intentional/expected participant closures (client leave, admin action, room teardown, migration) from connection failures. Extracted from cloud's IsClosedIntentionally switch so cloud-side code paths can share a single source of truth.	2026-05-20 11:42:51 -07:00
Raja Subramanian	1ab2bf043b	Clean up packet size logging (#4536 ) Reverting - https://github.com/livekit/livekit/pull/4521 - https://github.com/livekit/livekit/pull/4525 There are TWCC feedback packets that are larger than MTU. Seems to happen under a couple of conditions 1. Bad client data, i. e. severely out-of-order packets, bad sequence numbers, etc. 2. On an ICE restart - this is rare, but it seemed to be flaky network with some packets arriving and some not and causing a lot of gaps. Either case, not much to do. If fargmentation/re-assembly back to publisher works, the feedback will make it through. If not, feedbacks will be missed and clients have to work with some missing data which is not unexpected and the protocol is designed to handle. However, filed pion/interceptor issue just in case - https://github.com/pion/interceptor/issues/416	2026-05-20 23:58:05 +05:30
cnderrauber	8ab92a80f6	Don't require media sections when joining (#4535 ) * Don't require media sections when joining Client except browser (rust/libwebrtc is known) could have problem to fire ontrack event when reuses extra media section to subscribe track, so disable this feature in server side and let client determine if extra media sections are needed. * lint	2026-05-20 13:28:51 +08:00
Paul Wells	019a6640ae	rtc: report participant kind code and details (#4534 ) * rtc: report participant kind code and details Plumb ParticipantKind and KindDetails through MediaTrack and BytesTrackStats so track-level reporting can record the numeric kind code plus details codes on every participant_session aggregation, alongside the existing Kind string. Also picks up the new kind fields on resolved BytesSignalStats participants. Adds deployment/agentID/version to the agent worker logger.	2026-05-18 23:20:52 -07:00
Raja Subramanian	b32933b0d4	Log details of RTCP packets. (#4525 ) * Log details of RTCP packets. Seeing large (> MTU) packets on publisher peer connection RTCP. The four types there are - RTCP Receiver Reports - NACK - TWCC - PLI Can't think of what would be blowing up in size. RTCP Receiver Report and PLI are fixed in size NACKs vary, but the limit is 100 NACKs which should fit in 400 bytes even if all of them are spread apart in the sequence number space. TWCC varies, but a feedback packet is sent every 100ms or when it holds 100 packets. So, that also should not be too big. Logging packet details to understand this better. * revert debug	2026-05-14 18:55:00 +05:30
Raja Subramanian	ef2e5efe14	Log large packets receive/send. (#4521 ) * Log large packets receive/send. Seeing cases of servers reporting need for segmentation/re-assembly of packets. So, logging packet receive/send for RTP/RTCP to check if anything is seeing more than 1400 byte packets. * log downtrack RTCP too	2026-05-13 16:04:53 +05:30
Raja Subramanian	20d4a3a168	Populate data track loggers with context (#4514 )	2026-05-09 10:14:48 +05:30
Paul Wells	803999efad	rename agent environment to deployment (#4506 ) * rename agent environment to deployment * deps	2026-05-05 14:19:40 -07:00
Paul Wells	253f977d32	add duration seconds reporting (#4500 ) * add duration seconds reporting * deps * deps	2026-05-02 06:19:23 -07:00
Paul Wells	ffab3bd308	add agent environment (#4498 ) * add agent environment * lint * psrpc error * deps	2026-05-01 19:30:06 -07:00
Raja Subramanian	ccdf23c8a6	Use mediatransportutil/codec package, no functional change (#4497 )	2026-05-01 20:06:29 +05:30
olafal0	f51798bcf6	Fix publish-only limitations being incorrectly applied to receivers (#4495 ) * Fix publish-only limitations being incorrectly applied receive-side in a single PC * `StaticConfigurations` disabled some codecs for publish only, which worked in dual PC * In single PC, the server incorrectly disabled these codecs in both directions * Dual PC mode is unchanged; single PC handles per-direction filtering correctly * Filter recv-side codecs to publish list in single-PC SDP answer * Confirm H264 is present in offer in test	2026-04-30 18:49:34 +05:30
Raja Subramanian	a002337db1	Legacy TrackInfo.Simulcast flag. (#4493 ) * Legacy TrackInfo.Simulcast flag. When AddTrack did not send SimulcastCodecs, the legacy `Simulcast` flag was not set. Fix it by setting the flag when a second layer is published. * staticcheck * use the existing PrimaryReceiver function	2026-04-29 22:43:33 +05:30
Paul Wells	d7c2daf1ac	report all simulcast layers (#4491 )	2026-04-28 10:45:32 -07:00
Jacob Gelman	19b9e8c00a	Additional data tracks logging (#4489 ) * Additional data track logging * Track total bytes published * Rename field	2026-04-28 21:26:07 +09:00
David Chen	743d9c8b3a	add support for client capabilities (#4461 ) * update protocol version * only check for client capabiltiy to strip packet trailer	2026-04-27 17:58:36 -07:00
Raja Subramanian	fc47e47866	Close peer connection unconditionally to unblock set local/remote (#4485 ) * Close peer connection unconditionally to unblock set local/remote description operations. Have been chasing a leak where participants have a lot of connectivity issues and analysed a goref with Claude. Output below. Jo Turk quickly patched sctp for reported issue - https://github.com/pion/sctp/pull/465. This PR moves the peer connection close to before waiting for events queue to be drained as event queue could be blocked on `SetLocal/RemoteDescription` hanging. The scenario is a bit far-fetched as a lot of things have to happen, but it does point to a scenario where things could hang. Remains to be seen if this helps. Note that closing the peer connection early could mean the contained objects (like data channels) could all be closed as part of the peer connection close. But, still keeping the explicit clean up path (which should effectively become no-op) to minimise changes. ------------------------------------------------------------------ The wedge is in pion/sctp's blocking-write gate, called synchronously from inside the PC's operations queue. Five things have to be true at the same time, and on this build they all are: 1. SCTPTransport.Start is synchronous in the SetRemoteDescription op The stuck stack: PeerConnection.SetRemoteDescription.func2 (peerconnection.go:1363) → startRTP → startSCTP → SCTPTransport.Start (sctptransport.go:141) → DataChannel.open (datachannel.go:178) → datachannel.Dial → Client → Stream.WriteSCTP → Association.sendPayloadData (association.go:3141) ← blocks here SCTPTransport.Start synchronously sends the DCEP "OPEN" for each pre-negotiated channel. The operations.start goroutine runs SetRemoteDescription's logic; it does not return until Start does. 2. The wait has no deadline Stream.WriteSCTP (stream.go:289) calls sendPayloadData(s.writeDeadline, ...). s.writeDeadline is the default zero-value deadline.Deadline — never armed, because DataChannel.Dial doesn't call Stream.SetWriteDeadline. So the <-ctx.Done() arm of the wait select can never fire. 3. EnableDataChannelBlockWrite(true) puts SCTP into a serialized-write gate At livekit-server/pkg/rtc/transport.go:362 livekit calls se.EnableDataChannelBlockWrite(true). That flips the sendPayloadData path to: // association.go:3138-3148 if a.blockWrite { for a.writePending { a.lock.Unlock() select { case <-ctx.Done(): // never (no deadline) case <-a.writeNotify: // only fires when writeLoop fully drains pendingQueue } a.lock.Lock() } a.writePending = true } 4. writeNotify only fires after the writeLoop drains everything The only place notifyBlockWritable is called is gatherOutbound (association.go:3085-3088), and only when len(chunks) > 0 && a.pendingQueue.size() == 0 — i.e., the writeLoop actually managed to move all pending chunks to inflight. If cwnd is full and SACKs stop arriving, the writeLoop wakes up, sees zero room, sends nothing, and writePending stays true. 5. There is no association-level abort timer for data writes At association.go:764: assoc.t3RTX = newRTXTimer(timerT3RTX, assoc, noMaxRetrans, rtoMax) noMaxRetrans means the retransmission timer never gives up. INIT has maxInitRetrans, but data does not. There is no equivalent of TCP's tcp_retries2 → ETIMEDOUT → ABORT. So once the path is dead post-handshake, t3RTX keeps firing into the void and the association never transitions out of established on its own. What it takes to wake it up Only an external close: somebody has to terminate the underlying DTLS conn (which makes Association.readLoop's netConn.Read fail, which closes closeWriteLoopCh, which lets timerLoop exit). But — and this is the kicker — readLoop's defer at association.go:976-996 closes everything except it does not call notifyBlockWritable. So even if readLoop unwinds, any goroutine parked on <-a.writeNotify stays parked unless it was watching ctx (which here it isn't). So the trigger sequence on this pod was almost certainly: 1. Peer establishes ICE+DTLS+SCTP, association goes established. 2. Peer disappears (ICE silently fails, NAT rebinding, OS sleep, kill -9, etc.). 3. The first DCEP-OPEN for one of livekit's pre-negotiated channels is queued; cwnd never opens because no SACKs return. 4. writePending is now true for the lifetime of the process, with no deadline, no ctx, no kill. 5. The PC's operations queue is wedged, SetRemoteDescription never returns, livekit-server's handleRemoteOfferReceived event handler is parked, the participant is never torn down, and the SCTP timerLoop pins the entire participant graph in memory until OOM-kill. Realistic fixes (in order of how clean they are) 1. Upstream: in pion/sctp, broadcast notifyBlockWritable() (or close writeNotify) inside readLoop's defer cleanup, so a closed association unblocks any pending writers. This is the right fix. 2. livekit-server: wrap pc.SetRemoteDescription(...) with a timeout, and on timeout call pc.Close() — Close ultimately tears down the DTLS conn, which lets readLoop exit (point 1 still needs to be true for the writer goroutine to actually unblock, though). 3. Workaround: call stream.SetWriteDeadline(...) on the SCTP stream before issuing the DCEP open, so the ctx arm of the select can fire. Requires reaching past webrtc.DataChannel though. 4. Heaviest hammer: don't pre-negotiate the data channels inline with SetRemoteDescription — open them lazily after PC reaches connected so a stuck open never blocks signaling. Without (1), even (2) leaves the writer goroutine itself parked forever — but at least the PC and its participant-side state would be released; only the SCTP goroutine subtree (much smaller) would leak. * revert probe stop change * handle nil offer	2026-04-27 21:38:46 +05:30
Raja Subramanian	3a7f2628b0	Turn off transceiver re-use on Safari. (#4474 ) There are issues with insertable streams + Safari which causes tracks to go missing mid-stream sometimes.	2026-04-23 19:04:10 +05:30
Raja Subramanian	701a37c2d1	Convert sort.Slice -> slices.SortFunc (#4472 ) * Convert sort.Slice -> slices.SortFunc * active speaker loudness in descending order	2026-04-23 15:12:24 +05:30
Raja Subramanian	31083307ec	do not log data track stats if not started (#4468 )	2026-04-23 10:46:33 +05:30
Anunay Maheshwari	9ee06635d6	feat(pion/ice): replace deprecated NAT1To1 with SetAddressRewriteRules (#4466 ) * feat(pion/ice): replace deprecated NAT1To1 with SetAddressRewriteRules * update deps	2026-04-22 12:49:36 +05:30
Raja Subramanian	dbf5cf6196	Store concrete ICE candidate for remote candidates. (#4458 )	2026-04-17 13:14:47 +05:30
Raja Subramanian	3cfb71e7ca	Use Muted in TrackInfo to propagated published track muted. (#4453 ) * Use Muted in TrackInfo to propagated published track muted. When the track is muted as a receiver is created, the receiver potentially was not getting the muted property. That would result in quality scorer expecting packets. Use TrackInfo consistently for mute and apply the mute on start up of a receiver. * update mute of subscriptions	2026-04-16 01:03:40 +05:30
Raja Subramanian	69aa94797b	Some drive-by clean up (#4452 )	2026-04-15 12:23:33 +05:30
Raja Subramanian	6c81f67858	Add subscriber stream start event notification (#4449 )	2026-04-14 22:08:31 +05:30
cnderrauber	ce1bf47b5c	Revert "fix: ensure num_participants is accurate in webhook events (#4265 ) (#…" (#4448 ) This reverts commit `cdb0769c38`.	2026-04-13 22:21:22 +08:00
Onyeka Obi	cdb0769c38	fix: ensure num_participants is accurate in webhook events (#4265 ) (#4422 ) * fix: ensure num_participants is accurate in webhook events (#4265) Three fixes for stale/incorrect num_participants in webhook payloads: 1. Move participant map insertion before MarkDirty in join path so updateProto() counts the new participant. 2. Use fresh room.ToProto() for participant_joined webhook instead of a stale snapshot captured at session start. 3. Remove direct NumParticipants-- in leave path (inconsistent with updateProto's IsDependent check), force immediate proto update, and wait for completion before triggering onClose callbacks. * fix: use ToProtoConsistent for webhook events instead of forcing immediate updates	2026-04-13 09:26:14 +08:00
Raja Subramanian	c91e79af35	Switch to stdlib maps, slices (#4445 ) * Switch to stdlib maps, slices * slices	2026-04-13 00:11:48 +05:30
David Zhao	4b3856125c	chore: pin GH commits and switch to golangci-lint (#4444 ) * chore: pin GH commits * switch to golangci-lint-action * fix lint issues	2026-04-11 13:04:22 -07:00

1 2 3 4 5 ...

1537 Commits