otel: add tcp metrics by AgraVator · Pull Request #12652 · grpc/grpc-java

AgraVator · 2026-02-11T07:25:38Z

No description provided.

sauravzg · 2026-02-17T10:19:36Z

core/src/main/java/io/grpc/internal/ClientTransportFactory.java

Can you review go/java-practices/null from the overall PR's perspective in terms of nullable vs optional ? I am not sure if we have a local java convention to prefer or avoid null

sauravzg · 2026-02-17T10:19:44Z

netty/src/main/java/io/grpc/netty/NettyServer.java

  private final int maxMessageSize;
  private final int maxHeaderListSize;
  private final int softLimitHeaderListSize;
+  private MetricRecorder metricRecorder;


How is this being set? I don't see a constructor or a setter that accesses this value?

sauravzg · 2026-02-17T10:19:50Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+import java.util.Collections;
+import java.util.List;
+
+final class TcpMetrics {


optional: Javadocs for classes might be helpful

sauravzg · 2026-02-17T10:20:10Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+  }
+
+  static final class Metrics {
+    final LongCounterMetricInstrument connectionsCreated;


nullability annotations for fields where needed. Also consider the previous discussion about nullability style guide

sauravzg · 2026-02-17T10:20:13Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+   * Safe metric registration or retrieval for environments where TcpMetrics might
+   * be loaded multiple times (e.g., shaded and unshaded).
+   */
+  private static LongCounterMetricInstrument safelyRegisterLongCounter(


Do we need the safelyRegi... private abstraction?

Given that all registration happens during construction and this class is finel , can't we simply ensure we dedupe our metrics during construction, catch and rethrow the error from the constructor? i.e. ensure that the input is valid and catch/throw from a single point which should never be reached.
The fact that "we are passing a valid set of metrics for registration" can probably be trivially unit tested when we construct a TcpMetrics instance.

This should reduce code bloat without any sacrifice to safety, but at the cost of "duplicate input will throw an error instead of being handled intuitively", which is a fair expectation IMO.

sauravzg · 2026-02-17T10:20:18Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+      java.util.List<String> labelValues = getLabelValues(channel);
+      try {
+        if (channel.getClass().getName().equals(epollSocketChannelClassName)) {
+          Class<?> tcpInfoClass = Class.forName(epollTcpInfoClassName);


This is quite a lot of reflection in hot exporting path. So, a couple of questions.

Why do we choose reflectin over typecasting and calling actual methods? Reducing dependency surface?

Is there a way to optimise this like caching value on per channel level? Something like a creating a runnable per channel. Not sure how it'd be possible to share this data for the final collection on channel inactive.

sauravzg · 2026-02-17T10:20:21Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+          Method rttMethod = tcpInfoClass.getMethod("rtt");
+
+          long totalRetrans = (Long) totalRetransMethod.invoke(info);
+          int retransmits = (Integer) retransmitsMethod.invoke(info);


Are these cumulative metrics or are these total retransmits since the connection start? If latter, the current logic is incorrect.

AAh I see, then I think we should not be using a counter for this... Will clarify in the gRFC

ejona86

"otel" is a very surprising prefix to use for the PR/commit, as opentelemetry isn't actually changed at all.

ejona86 · 2026-02-26T22:51:56Z

netty/src/main/java/io/grpc/netty/NettyServerBuilder.java

+   * @since 1.81.0
+   */
+  @CanIgnoreReturnValue
+  public NettyServerBuilder setMetricRecorder(


MetricRecorder is @Internal, so this would need to be internal. On client-side we have addMetricSink(); we always use MetricRecorderImpl. Why do that differently here? This approach won't allow having multiple MetricSinks. Although I really don't understand what is going on, since ServerImplBuilder has addMetricSink().

ejona86 · 2026-02-26T22:59:58Z

core/src/main/java/io/grpc/internal/ClientTransportFactory.java

+    @Nullable
+    public MetricRecorder getMetricRecorder() {
+      return metricRecorder;
+    }


LoadBalancer.Helper.getMetricRecorder() is non-null and ManagedChannelImpl creates a MetricRecorderImpl unconditionally. Make this non-null as well? I see that NameResolver allows it to be null; seems we should just change that. LoadBalancer used to be that way and it was changed after a bug in the Helper implementation caused the channel to panic.

If we need the optimization, we can have the MetricRecorder return whether a particular instrument is enabled. But I see no need to add an additional condition to usages for the metric recorder being missing entirely.

ejona86 · 2026-02-26T23:07:46Z

core/src/main/java/io/grpc/internal/InternalServer.java

+   * Sets the MetricRecorder for the server. This optional method allows setting
+   * the MetricRecorder after construction but before start().
+   */
+  default void setMetricRecorder(MetricRecorder metricRecorder) {}


I want to avoid setters-before-start for this API, as they get wonky and prevent using final. Let's just construct it within the builder and pass it as an argument to ServerImplBuilder.ClientTransportServersBuilder.buildTransportServers()? I'd agree that might not be how it is done long-term, but it is probably closer to the final form than a setter. Right now ServerImpl doesn't even need MetricRecorder itself.

ejona86 · 2026-02-26T23:11:29Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+          optionalLabels);
+
+      if (epollAvailable) {
+        packetsRetransmitted = safelyRegisterLongCounter(registry,


The expectation was that instruments would all be registered unconditionally.

ejona86 · 2026-02-26T23:13:56Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+    try {
+      return registry.registerLongCounter(name, description, unit, requiredLabelKeys,
+          optionalLabelKeys, false);
+    } catch (IllegalStateException e) {


No, no, no. IllegalStateException means "you, the programmer, messed up and should not have called in this state." It is a bug. Prevent the exception from happening; don't try to clean up afterward.

Since these metrics are shared across transports, the registrations need to be as well. I'd say we should put it in grpc-core like done for SubchannelMetrics, but I realize now that would be broken if we shade grpc-core into transports. So these really need to be defined in grpc-api, although they can be in an Internal* class. (We'll need to fix SubchannelMetrics as well.)

ejona86 · 2026-02-26T23:50:47Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+      this(metricRecorder, target, DEFAULT_METRICS);
+    }
+
+    Tracker(MetricRecorder metricRecorder, String target, Metrics metrics) {


@VisibleForTesting?

ejona86 · 2026-02-26T23:50:50Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+          "io.netty.channel.epoll.EpollTcpInfo");
+    }
+
+    Tracker(MetricRecorder metricRecorder, String target, Metrics metrics,


@VisibleForTesting?

ejona86 · 2026-02-26T23:53:16Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+    private io.netty.util.concurrent.ScheduledFuture<?> reportTimer;
+
+    void channelActive(Channel channel) {
+      if (metricRecorder != null && target != null) {


When would the target be null, and if it is null why would we stop metrics? Seems we should worst-case use "" or something of the style <unknown target>. Otherwise you're silently losing metrics, which is the worst failure model for metrics as then you have lies.

ejona86 · 2026-02-26T23:56:11Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+              Collections.singletonList(target), labelValues);
+        }
+      } catch (Throwable t) {
+        // Epoll not available or error getting tcp_info, just ignore.


This comment only applies to a portion of the try block. Limit the scope of the try block to just where it is needed.

ejona86 · 2026-02-26T23:57:59Z

netty/src/main/java/io/grpc/netty/TcpMetrics.java

+            return;
+          }
+          synchronized (accessorCache) {
+            channelReflectionAccessor = accessorCache.get(epollTcpInfoClassName);


Please no specialized cache just for testing. I'd suggest a single field and if epollTcpInfoClassName is provided for testing then simply create a new one (and don't cache it). And we can do all that in the constructor instead of checking every invocation. We do have some "interesting" tools that might help testing this (e.g., StaticTestingClassLoader), but I'll have to look over the tests to see how much help it provides.

AgraVator added 2 commits February 11, 2026 12:55

otel: add tcp metrics

508c141

add optional labels

5ef3b1c

AgraVator closed this Feb 11, 2026

AgraVator reopened this Feb 11, 2026

AgraVator added 5 commits February 13, 2026 14:25

tcp: add jitter

c4111e2

tcp: add null check

172f6c4

fix test

6996afd

increase coverage

42a834b

formatting changes

65487c6

AgraVator marked this pull request as ready for review February 16, 2026 16:14

add missing metric collection when channel becomes inactive

2b44889

sauravzg reviewed Feb 17, 2026

View reviewed changes

tcp: suggested changes

2990c28

AgraVator requested a review from ejona86 February 26, 2026 09:54

ejona86 reviewed Feb 27, 2026

View reviewed changes

Conversation

AgraVator commented Feb 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ejona86 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants