Responsable : Luis Fredes et Camille Male
Oversmoothing has long been identified as a major limitation of Graph Neural Networks (GNNs): input node features are smoothed at each layer and converge to a non-informative representation, if the weights of the GNN are sufficiently bounded. This assumption is crucial: if, on the contrary, the weights are sufficiently large, then oversmoothing may not happen. Theoretically, GNN could thus learn to not oversmooth. However it does not really happen in practice, which prompts us to examine oversmoothing from an optimization point of view. In this paper, we analyze backward oversmoothing, that is, the notion that backpropagated errors used to compute gradients are also subject to oversmoothing from output to input. With non-linear activation functions, we outline the key role of the interaction between forward and backward smoothing.
Moreover, we show that, due to backward oversmoothing, GNNs provably exhibit many spurious stationary points: as soon as the last layer is trained, the whole GNN is at a stationary point. As a result, we can exhibit regions where gradients are near-zero while the loss stays high.
The proof relies on the fact that, unlike forward oversmoothing, backward errors are subjected to a linear oversmoothing even in the presence of non-linear activation function, such that the average of the output error plays a key role. Additionally, we show that this
phenomenon is specific to deep GNNs, and exhibit counter-example Multi-Layer Perceptron. This paper is a step toward a more complete comprehension of the optimization landscape specific to GNNs.
Networks?
...
A définir
Pour appliquer les méthodes à noyaux, nous avons besoin de noyaux faciles à calculer et suffisamment riches. Comment peut-on concevoir de "bons" noyaux sur les espaces non-euclidiens, notamment sur les espaces symétriques qui sont très récurrents dans les applications ? Nous proposons un nouveau résultat, le "théorème Lp de Godement" comme outil principal pour répondre à cette question. Nous étudions en particulier le cas des espaces symétriques qui sont des "cônes" (cônes de matrices de covariance), où la réponse trouve une forme bien concrète, avec applications à l'appui. Finalement, si on ne peut pas trouver de noyaux définis positifs, que faire ? On montrera qu'il est possible de faire beaucoup de choses avec des noyaux qui sont différence de deux noyaux définis positifs : au lieu d'apprendre dans des RKHS, on peut apprendre dans des RKKS (reproducing Kernel Krein Space).
A définir