Skip to content

Commit 3c24b5a

Browse files
Backport BBRv3 for k5.15
Signed-off-by: Nicholas Sun <nicholas-sun@outlook.com>
1 parent 04f6068 commit 3c24b5a

27 files changed

Lines changed: 5209 additions & 0 deletions

File tree

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
From 8f02746d442e04734984510d49cb703b4af8a91b Mon Sep 17 00:00:00 2001
2+
From: Neal Cardwell <ncardwell@google.com>
3+
Date: Tue, 11 Jun 2019 12:26:55 -0400
4+
Subject: [PATCH 01/23] net-tcp_bbr: broaden app-limited rate sample detection
5+
6+
This commit is a bug fix for the Linux TCP app-limited
7+
(application-limited) logic that is used for collecting rate
8+
(bandwidth) samples.
9+
10+
Previously the app-limited logic only looked for "bubbles" of
11+
silence in between application writes, by checking at the start
12+
of each sendmsg. But "bubbles" of silence can also happen before
13+
retransmits: e.g. bubbles can happen between an application write
14+
and a retransmit, or between two retransmits.
15+
16+
Retransmits are triggered by ACKs or timers. So this commit checks
17+
for bubbles of app-limited silence upon ACKs or timers.
18+
19+
Why does this commit check for app-limited state at the start of
20+
ACKs and timer handling? Because at that point we know whether
21+
inflight was fully using the cwnd. During processing the ACK or
22+
timer event we often change the cwnd; after changing the cwnd we
23+
can't know whether inflight was fully using the old cwnd.
24+
25+
Origin-9xx-SHA1: 3fe9b53291e018407780fb8c356adb5666722cbc
26+
Change-Id: I37221506f5166877c2b110753d39bb0757985e68
27+
---
28+
net/ipv4/tcp_input.c | 1 +
29+
net/ipv4/tcp_timer.c | 1 +
30+
2 files changed, 2 insertions(+)
31+
32+
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
33+
index 6849094e5..23ac1f0e3 100644
34+
--- a/net/ipv4/tcp_input.c
35+
+++ b/net/ipv4/tcp_input.c
36+
@@ -3811,6 +3811,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
37+
38+
prior_fack = tcp_is_sack(tp) ? tcp_highest_sack_seq(tp) : tp->snd_una;
39+
rs.prior_in_flight = tcp_packets_in_flight(tp);
40+
+ tcp_rate_check_app_limited(sk);
41+
42+
/* ts_recent update must be made after we are sure that the packet
43+
* is in window.
44+
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
45+
index a8592c187..d00dbeb29 100644
46+
--- a/net/ipv4/tcp_timer.c
47+
+++ b/net/ipv4/tcp_timer.c
48+
@@ -613,6 +613,7 @@ void tcp_write_timer_handler(struct sock *sk)
49+
goto out;
50+
}
51+
52+
+ tcp_rate_check_app_limited(sk);
53+
tcp_mstamp_refresh(tcp_sk(sk));
54+
event = icsk->icsk_pending;
55+
56+
--
57+
2.39.3
58+
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
From 955df87f53c7b4519534d795fc47389df11813a7 Mon Sep 17 00:00:00 2001
2+
From: Neal Cardwell <ncardwell@google.com>
3+
Date: Sun, 24 Jun 2018 21:55:59 -0400
4+
Subject: [PATCH 02/23] net-tcp_bbr: v2: shrink delivered_mstamp,
5+
first_tx_mstamp to u32 to free up 8 bytes
6+
7+
Free up some space for tracking inflight and losses for each
8+
bw sample, in upcoming commits.
9+
10+
These timestamps are in microseconds, and are now stored in 32
11+
bits. So they can only hold time intervals up to roughly 2^12 = 4096
12+
seconds. But Linux TCP RTT and RTO tracking has the same 32-bit
13+
microsecond implementation approach and resulting deployment
14+
limitations. So this is not introducing a new limit. And these should
15+
not be a limitation for the foreseeable future.
16+
17+
Effort: net-tcp_bbr
18+
Origin-9xx-SHA1: 238a7e6b5d51625fef1ce7769826a7b21b02ae55
19+
Change-Id: I3b779603797263b52a61ad57c565eb91fe42680c
20+
---
21+
include/net/tcp.h | 9 +++++++--
22+
net/ipv4/tcp_rate.c | 7 ++++---
23+
2 files changed, 11 insertions(+), 5 deletions(-)
24+
25+
diff --git a/include/net/tcp.h b/include/net/tcp.h
26+
index d8920f84f..ef15d509a 100644
27+
--- a/include/net/tcp.h
28+
+++ b/include/net/tcp.h
29+
@@ -800,6 +800,11 @@ static inline u32 tcp_stamp_us_delta(u64 t1, u64 t0)
30+
return max_t(s64, t1 - t0, 0);
31+
}
32+
33+
+static inline u32 tcp_stamp32_us_delta(u32 t1, u32 t0)
34+
+{
35+
+ return max_t(s32, t1 - t0, 0);
36+
+}
37+
+
38+
static inline u32 tcp_skb_timestamp(const struct sk_buff *skb)
39+
{
40+
return tcp_ns_to_ts(skb->skb_mstamp_ns);
41+
@@ -874,9 +879,9 @@ struct tcp_skb_cb {
42+
/* pkts S/ACKed so far upon tx of skb, incl retrans: */
43+
__u32 delivered;
44+
/* start of send pipeline phase */
45+
- u64 first_tx_mstamp;
46+
+ u32 first_tx_mstamp;
47+
/* when we reached the "delivered" count */
48+
- u64 delivered_mstamp;
49+
+ u32 delivered_mstamp;
50+
} tx; /* only used for outgoing skbs */
51+
union {
52+
struct inet_skb_parm h4;
53+
diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
54+
index 042e27f54..cbb2cbe0f 100644
55+
--- a/net/ipv4/tcp_rate.c
56+
+++ b/net/ipv4/tcp_rate.c
57+
@@ -99,8 +99,9 @@ void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
58+
/* Record send time of most recently ACKed packet: */
59+
tp->first_tx_mstamp = tx_tstamp;
60+
/* Find the duration of the "send phase" of this window: */
61+
- rs->interval_us = tcp_stamp_us_delta(tp->first_tx_mstamp,
62+
- scb->tx.first_tx_mstamp);
63+
+ rs->interval_us = tcp_stamp32_us_delta(
64+
+ tp->first_tx_mstamp,
65+
+ scb->tx.first_tx_mstamp);
66+
67+
}
68+
/* Mark off the skb delivered once it's sacked to avoid being
69+
@@ -149,7 +150,7 @@ void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
70+
* longer phase.
71+
*/
72+
snd_us = rs->interval_us; /* send phase */
73+
- ack_us = tcp_stamp_us_delta(tp->tcp_mstamp,
74+
+ ack_us = tcp_stamp32_us_delta(tp->tcp_mstamp,
75+
rs->prior_mstamp); /* ack phase */
76+
rs->interval_us = max(snd_us, ack_us);
77+
78+
--
79+
2.39.3
80+
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
From 85a795dfc3c36ee86a7db11b16e5a9bee49a9c94 Mon Sep 17 00:00:00 2001
2+
From: Neal Cardwell <ncardwell@google.com>
3+
Date: Sat, 5 Aug 2017 11:49:50 -0400
4+
Subject: [PATCH 03/23] net-tcp_bbr: v2: snapshot packets in flight at transmit
5+
time and pass in rate_sample
6+
7+
CC algorithms may want to snapshot the number of packets in flight at
8+
transmit time and pass in rate_sample, to understand the relationship
9+
between inflight and losses or ECN signals, to try to find the highest
10+
inflight value that has acceptable levels of loss/ECN marking.
11+
12+
We split out the code to set an skb's tx.in_flight field into its own
13+
function, so that this code can be used for the TCP_REPAIR "fake send"
14+
code path that inserts skbs into the rtx queue without sending them.
15+
16+
Effort: net-tcp_bbr
17+
Origin-9xx-SHA1: b3eb4f2d20efab4ca001f32c9294739036c493ea
18+
Origin-9xx-SHA1: e880fc907d06ea7354333f60f712748ebce9497b
19+
Origin-9xx-SHA1: 330f825a08a6fe92cef74d799cc468864c479f63
20+
Change-Id: I7314047d0ff14dd261a04b1969a46dc658c8836a
21+
---
22+
include/net/tcp.h | 9 +++++++--
23+
net/ipv4/tcp_output.c | 1 +
24+
net/ipv4/tcp_rate.c | 20 ++++++++++++++++++++
25+
3 files changed, 28 insertions(+), 2 deletions(-)
26+
27+
diff --git a/include/net/tcp.h b/include/net/tcp.h
28+
index ef15d509a..808f01070 100644
29+
--- a/include/net/tcp.h
30+
+++ b/include/net/tcp.h
31+
@@ -873,8 +873,7 @@ struct tcp_skb_cb {
32+
union {
33+
struct {
34+
/* There is space for up to 24 bytes */
35+
- __u32 in_flight:30,/* Bytes in flight at transmit */
36+
- is_app_limited:1, /* cwnd not fully used? */
37+
+ __u32 is_app_limited:1, /* cwnd not fully used? */
38+
unused:1;
39+
/* pkts S/ACKed so far upon tx of skb, incl retrans: */
40+
__u32 delivered;
41+
@@ -882,6 +881,10 @@ struct tcp_skb_cb {
42+
u32 first_tx_mstamp;
43+
/* when we reached the "delivered" count */
44+
u32 delivered_mstamp;
45+
+#define TCPCB_IN_FLIGHT_BITS 20
46+
+#define TCPCB_IN_FLIGHT_MAX ((1U << TCPCB_IN_FLIGHT_BITS) - 1)
47+
+ u32 in_flight:20, /* packets in flight at transmit */
48+
+ unused2:12;
49+
} tx; /* only used for outgoing skbs */
50+
union {
51+
struct inet_skb_parm h4;
52+
@@ -1027,6 +1030,7 @@ struct ack_sample {
53+
struct rate_sample {
54+
u64 prior_mstamp; /* starting timestamp for interval */
55+
u32 prior_delivered; /* tp->delivered at "prior_mstamp" */
56+
+ u32 tx_in_flight; /* packets in flight at starting timestamp */
57+
s32 delivered; /* number of packets delivered over interval */
58+
long interval_us; /* time for tp->delivered to incr "delivered" */
59+
u32 snd_interval_us; /* snd interval for delivered packets */
60+
@@ -1151,6 +1155,7 @@ static inline void tcp_ca_event(struct sock *sk, const enum tcp_ca_event event)
61+
}
62+
63+
/* From tcp_rate.c */
64+
+void tcp_set_tx_in_flight(struct sock *sk, struct sk_buff *skb);
65+
void tcp_rate_skb_sent(struct sock *sk, struct sk_buff *skb);
66+
void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
67+
struct rate_sample *rs);
68+
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
69+
index d46fb6d70..b455d9f24 100644
70+
--- a/net/ipv4/tcp_output.c
71+
+++ b/net/ipv4/tcp_output.c
72+
@@ -2633,6 +2633,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
73+
skb->skb_mstamp_ns = tp->tcp_wstamp_ns = tp->tcp_clock_cache;
74+
list_move_tail(&skb->tcp_tsorted_anchor, &tp->tsorted_sent_queue);
75+
tcp_init_tso_segs(skb, mss_now);
76+
+ tcp_set_tx_in_flight(sk, skb);
77+
goto repair; /* Skip network transmission */
78+
}
79+
80+
diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
81+
index cbb2cbe0f..50a292d48 100644
82+
--- a/net/ipv4/tcp_rate.c
83+
+++ b/net/ipv4/tcp_rate.c
84+
@@ -34,6 +34,24 @@
85+
* ready to send in the write queue.
86+
*/
87+
88+
+void tcp_set_tx_in_flight(struct sock *sk, struct sk_buff *skb)
89+
+{
90+
+ struct tcp_sock *tp = tcp_sk(sk);
91+
+ u32 in_flight;
92+
+
93+
+ /* Check, sanitize, and record packets in flight after skb was sent. */
94+
+ in_flight = tcp_packets_in_flight(tp) + tcp_skb_pcount(skb);
95+
+ if (WARN_ONCE(in_flight > TCPCB_IN_FLIGHT_MAX,
96+
+ "insane in_flight %u cc %s mss %u "
97+
+ "cwnd %u pif %u %u %u %u\n",
98+
+ in_flight, inet_csk(sk)->icsk_ca_ops->name,
99+
+ tp->mss_cache, tp->snd_cwnd,
100+
+ tp->packets_out, tp->retrans_out,
101+
+ tp->sacked_out, tp->lost_out))
102+
+ in_flight = TCPCB_IN_FLIGHT_MAX;
103+
+ TCP_SKB_CB(skb)->tx.in_flight = in_flight;
104+
+}
105+
+
106+
/* Snapshot the current delivery information in the skb, to generate
107+
* a rate sample later when the skb is (s)acked in tcp_rate_skb_delivered().
108+
*/
109+
@@ -66,6 +84,7 @@ void tcp_rate_skb_sent(struct sock *sk, struct sk_buff *skb)
110+
TCP_SKB_CB(skb)->tx.delivered_mstamp = tp->delivered_mstamp;
111+
TCP_SKB_CB(skb)->tx.delivered = tp->delivered;
112+
TCP_SKB_CB(skb)->tx.is_app_limited = tp->app_limited ? 1 : 0;
113+
+ tcp_set_tx_in_flight(sk, skb);
114+
}
115+
116+
/* When an skb is sacked or acked, we fill in the rate sample with the (prior)
117+
@@ -94,6 +113,7 @@ void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
118+
rs->prior_mstamp = scb->tx.delivered_mstamp;
119+
rs->is_app_limited = scb->tx.is_app_limited;
120+
rs->is_retrans = scb->sacked & TCPCB_RETRANS;
121+
+ rs->tx_in_flight = scb->tx.in_flight;
122+
rs->last_end_seq = scb->end_seq;
123+
124+
/* Record send time of most recently ACKed packet: */
125+
--
126+
2.39.3
127+
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
From 2065525b978247b5bf7cd5086856196e6ec49d8d Mon Sep 17 00:00:00 2001
2+
From: Neal Cardwell <ncardwell@google.com>
3+
Date: Thu, 12 Oct 2017 23:44:27 -0400
4+
Subject: [PATCH 04/23] net-tcp_bbr: v2: count packets lost over TCP rate
5+
sampling interval
6+
7+
For understanding the relationship between inflight and packet loss
8+
signals, to try to find the highest inflight value that has acceptable
9+
levels of packet losses.
10+
11+
Effort: net-tcp_bbr
12+
Origin-9xx-SHA1: 4527e26b2bd7756a88b5b9ef1ada3da33dd609ab
13+
Change-Id: I594c2500868d9c530770e7ddd68ffc87c57f4fd5
14+
---
15+
include/net/tcp.h | 4 ++++
16+
net/ipv4/tcp_rate.c | 3 +++
17+
2 files changed, 7 insertions(+)
18+
19+
diff --git a/include/net/tcp.h b/include/net/tcp.h
20+
index 808f01070..ebf984b01 100644
21+
--- a/include/net/tcp.h
22+
+++ b/include/net/tcp.h
23+
@@ -885,6 +885,7 @@ struct tcp_skb_cb {
24+
#define TCPCB_IN_FLIGHT_MAX ((1U << TCPCB_IN_FLIGHT_BITS) - 1)
25+
u32 in_flight:20, /* packets in flight at transmit */
26+
unused2:12;
27+
+ u32 lost; /* packets lost so far upon tx of skb */
28+
} tx; /* only used for outgoing skbs */
29+
union {
30+
struct inet_skb_parm h4;
31+
@@ -1029,9 +1030,12 @@ struct ack_sample {
32+
*/
33+
struct rate_sample {
34+
u64 prior_mstamp; /* starting timestamp for interval */
35+
+ u32 prior_lost; /* tp->lost at "prior_mstamp" */
36+
u32 prior_delivered; /* tp->delivered at "prior_mstamp" */
37+
u32 tx_in_flight; /* packets in flight at starting timestamp */
38+
+ s32 lost; /* number of packets lost over interval */
39+
s32 delivered; /* number of packets delivered over interval */
40+
+ s32 delivered_ce; /* packets delivered w/ CE mark over interval */
41+
long interval_us; /* time for tp->delivered to incr "delivered" */
42+
u32 snd_interval_us; /* snd interval for delivered packets */
43+
u32 rcv_interval_us; /* rcv interval for delivered packets */
44+
diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
45+
index 50a292d48..f3413e5e2 100644
46+
--- a/net/ipv4/tcp_rate.c
47+
+++ b/net/ipv4/tcp_rate.c
48+
@@ -83,6 +83,7 @@ void tcp_rate_skb_sent(struct sock *sk, struct sk_buff *skb)
49+
TCP_SKB_CB(skb)->tx.first_tx_mstamp = tp->first_tx_mstamp;
50+
TCP_SKB_CB(skb)->tx.delivered_mstamp = tp->delivered_mstamp;
51+
TCP_SKB_CB(skb)->tx.delivered = tp->delivered;
52+
+ TCP_SKB_CB(skb)->tx.lost = tp->lost;
53+
TCP_SKB_CB(skb)->tx.is_app_limited = tp->app_limited ? 1 : 0;
54+
tcp_set_tx_in_flight(sk, skb);
55+
}
56+
@@ -109,6 +110,7 @@ void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb,
57+
if (!rs->prior_delivered ||
58+
tcp_skb_sent_after(tx_tstamp, tp->first_tx_mstamp,
59+
scb->end_seq, rs->last_end_seq)) {
60+
+ rs->prior_lost = scb->tx.lost;
61+
rs->prior_delivered = scb->tx.delivered;
62+
rs->prior_mstamp = scb->tx.delivered_mstamp;
63+
rs->is_app_limited = scb->tx.is_app_limited;
64+
@@ -163,6 +165,7 @@ void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
65+
return;
66+
}
67+
rs->delivered = tp->delivered - rs->prior_delivered;
68+
+ rs->lost = tp->lost - rs->prior_lost;
69+
70+
/* Model sending data and receiving ACKs as separate pipeline phases
71+
* for a window. Usually the ACK phase is longer, but with ACK
72+
--
73+
2.39.3
74+
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
From 4b14c2af997dbd6f246ed2f1eb72d889e8a22c53 Mon Sep 17 00:00:00 2001
2+
From: Neal Cardwell <ncardwell@google.com>
3+
Date: Mon, 19 Nov 2018 13:48:36 -0500
4+
Subject: [PATCH 05/23] net-tcp_bbr: v2: export FLAG_ECE in rate_sample.is_ece
5+
6+
For understanding the relationship between inflight and ECN signals,
7+
to try to find the highest inflight value that has acceptable levels
8+
ECN marking.
9+
10+
Effort: net-tcp_bbr
11+
Origin-9xx-SHA1: 3eba998f2898541406c2666781182200934965a8
12+
Change-Id: I3a964e04cee83e11649a54507043d2dfe769a3b3
13+
---
14+
include/net/tcp.h | 1 +
15+
net/ipv4/tcp_input.c | 1 +
16+
2 files changed, 2 insertions(+)
17+
18+
diff --git a/include/net/tcp.h b/include/net/tcp.h
19+
index ebf984b01..a498d3f01 100644
20+
--- a/include/net/tcp.h
21+
+++ b/include/net/tcp.h
22+
@@ -1047,6 +1047,7 @@ struct rate_sample {
23+
bool is_app_limited; /* is sample from packet with bubble in pipe? */
24+
bool is_retrans; /* is sample from retransmission? */
25+
bool is_ack_delayed; /* is this (likely) a delayed ACK? */
26+
+ bool is_ece; /* did this ACK have ECN marked? */
27+
};
28+
29+
struct tcp_congestion_ops {
30+
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
31+
index 23ac1f0e3..70418310e 100644
32+
--- a/net/ipv4/tcp_input.c
33+
+++ b/net/ipv4/tcp_input.c
34+
@@ -3910,6 +3910,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
35+
delivered = tcp_newly_delivered(sk, delivered, flag);
36+
lost = tp->lost - lost; /* freshly marked lost */
37+
rs.is_ack_delayed = !!(flag & FLAG_ACK_MAYBE_DELAYED);
38+
+ rs.is_ece = !!(flag & FLAG_ECE);
39+
tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate);
40+
tcp_cong_control(sk, ack, delivered, flag, sack_state.rate);
41+
tcp_xmit_recovery(sk, rexmit);
42+
--
43+
2.39.3
44+

0 commit comments

Comments
 (0)