2026-05-27infra

Expressive Power of Floating-Point Neural Networks with Arbitrary Reduction Orders and Inexact Activation Implementations

Yeachan Park, Geonho Hwang, Wonyeol Lee, Sejun Park

PDF preview unavailable

Key claim

Distinguishability is key for universal representability in networks.

This paper explores the expressive power of floating-point neural networks under more realistic execution semantics. It establishes a framework for determining when these networks can represent arbitrary functions, highlighting that distinguishability in the first layer is crucial for universal representability. This finding broadens the understanding of practical activation functions in neural networks.

In plain English

Novelty

8.5/10

The paper introduces a new framework for understanding the expressivity of floating-point neural networks, significantly extending prior work.

Reliability

7.0/10

The claims are supported by a solid theoretical foundation, though empirical validation is limited.

Deep reliability assessment

The methodology supports formal expressivity claims for finite floating-point domains under specified execution semantics, especially necessary and sufficient conditions based on first-layer distinguishability. It does not by itself show that practical trained networks will learn such constructions efficiently, that real hardware libraries always satisfy the assumed bounded-ulp activation conditions, or that the constructed networks are deployable in size or latency.

Reproducibility

No open source code, dataset, or benchmark artifact is mentioned in the provided abstract/introduction/conclusion excerpts; this appears to be a theoretical paper relying on formal proofs rather than experiments.

Discussion questions

1.Is first-layer pairwise distinguishability the right core assumption for practical expressivity, or is it mainly a worst-case condition for exact representation over finite domains?
2.For builders using float16, bfloat16, or float8 on GPUs, does this result meaningfully reduce risk from nondeterministic reduction orders and approximate activations, or are the constructed universal networks too large or artificial to matter?
3.What concrete hardware or math-library behavior would falsify the claimed practical relevance: unbounded activation ulp error, activation implementations violating the mild output conditions, or empirical collisions where supposedly distinguishable inputs cannot be separated?

Key figure

No Figure 1 or architectural diagram is available in the provided text; the key conceptual diagram would likely show a floating-point neural network whose first layer separates distinct floating-point inputs under arbitrary reduction orders and inexact activation implementations.