Graph Convolutional Networks for node classification in signed graphs-Part 1

Introduction

Signed graphs are a type of graph that can simultaneously express positive and negative relationships. These data structures have been receiving increasing attention due to the rising popularity of online social networks. For example, in social graphs, people create positive relationships, such as friendships, trust, and approval, as well as negative relationships, such as foes, distrust and disapproval. Compared to unsigned graphs that only represent positive relationships, signed graphs are able to reflect more complicated and complex interactions among different entities in graphs.

To realize the potential underlying signed graph data, several graph representation learning techniques have been proposed. Among them, Graph Convolution Network (GCN) proposed by Kipf and Welling (2016), an outstanding method that facilitates machine learning work on graphs and achieves fruitful results in many real-world data, thus providing promising approaches to signed graphs. In this blog, we will show several approaches which utilize GCN advancements to apply to node classification tasks in signed graphs.

What are the key components of signed graphs, and how to capture their information?

Before going to the models, we need to understand the interpretation of important components in signed graphs. A signed graph consists of nodes which are connected by positive and negative edges. Unlike unsigned graphs, where edges only mean the similarity between connected nodes, edges in signed graphs can have different meanings depending on the signs of the edges. In particular, positive edges tend to represent connections that indicate similarity, proximity, or affinity between nodes. Meanwhile, negative edges typically signify relationships of dissimilarity, opposition, or conflict, implying a sense of distance or disagreement.

Capturing structural information from unsigned graphs with only positive edges has been extensively studied in the field of machine learning. However, when it comes to solving machine learning problems in signed graphs, where both positive and negative links exist, there is still considerable room for improvement. This area poses unique challenges and requires further research to develop more effective approaches.

Following are several possible ways to capture negative links in signed graphs:

Treat negative as positive edges: This approach disregards the distinction between positive and negative edges and treats all edges as positive edges.
Ignore negative edges: This approach involves focusing solely on the positive relationships and disregarding the negative edges altogether.
Nodes with the same enemies receive the same information: This approach assumes that nodes with shared enemies are likely to be similar or have common attributes.
Nodes with the same enemies can receive different information: This approach recognizes that nodes with common enemies may still have distinct characteristics or relationships.

General ideas of methodology

We consider GCN as our foundation method where different signed models will be built. You can read a quick introduction to GCN here. One of the defining features of GCN is the neural message-passing mechanism which allows nodes to aggregate messages from their neighbourhood and then pass these messages to a neural network layer. This process can be formulated as follows:

$\mathbf{x}_v^{(\ell + 1)} = \sigma\Bigg(\mathbf{W}^{(\ell)} \sum_{w \in \mathcal{N}(v) \, \cup \, \{ v \}} \frac{1}{\sqrt{d_wd_v}} \cdot \mathbf{x}_w^{(\ell)}\Bigg)$

A simple example

Let’s dive into a simple example to see how these methods work.

We consider a signed graph $\mathcal{G} = (\mathcal{V}, \mathcal{E^+}, \mathcal{E^-})$ comprising two fully connected communities in Figure 2. $\mathcal{V}$ represents the set of $n$ nodes, $\mathcal{E^+ \subseteq V \times V}$ denotes the set of positive edges, and $\mathcal{E^- \subseteq V \times V}$ denotes the set of negative edges, regarding that $\mathcal{E^+ \cap E^-} = \phi$ .

The first community consists of nodes $A, B, C$ , while the second community includes nodes $D, E, F$ . Solid lines represent edges within communities, while dashed lines indicate connections between communities. Each node is associated with two features, and a two-dimensional coordinate system is used to depict them in Figure 3.

The signs of edges are created based on a simple intuition that nodes belonging to the same class tend to be near each other and share a certain degree of similarity. Thus, they should be connected by positive links. On the contrary, nodes in different classes are likely to be dissimilar; therefore, they should be linked by negative connections.

This example will explore the workings of several signed models and examine how they generate new node embeddings, which can then facilitate node classification tasks. Node classification involves predicting labels, such as types, categories, or attributes, for all nodes in the graph $\mathcal{G}$ . Given that we only have access to true labels for a subset of nodes in the training set, our goal is to predict the labels for the remaining nodes based on the available training data.

Figure 2. A signed graph $\mathcal{G}$ with two communities

Figure 2. A signed graph $\mathcal{G}$ with two communities

Figure 3. Two-dimensional representation of nodes in graph $\mathcal{G}$

Treat negative as positive edges

An initial approach to handling signed graphs is to directly apply the conventional GCN without explicitly considering the meaning of negative links. The advantage of this approach is that it allows for the direct application of existing GCN methods without the need for additional changes or adjustments.

Figure 4 shows information sources from different neighbouring nodes to the target node $A$ . The graph aggregation process can be expressed in vector form as follows:

$\overrightarrow{OU_A}=\frac{\overrightarrow{OA} + \overrightarrow{OB} + \overrightarrow{OC} + \overrightarrow{OD} + \overrightarrow{OE} + \overrightarrow{OF}}{6}$
where $U_i$ is the updated representation of node $i$ in the first layer in the two-dimensional coordination. Here, the normalization constant is equal to 1/6 as each node has six connections. This function means that the new representation of node $A$ is the average of node $A$ neighbourhood’s information, including the node itself.

Since GCN does not distinguish between positive and negative edges during the update of node embeddings, all nodes in the graph end up having indistinguishable representations, as illustrated in Figure 5. In this approach, GCN does not effectively leverage negative information. Therefore, this approach should be applied only in cases where the negative information is minimal or negligible compared to the positive information.

Figure 4. Node $A$ embedding, $U_A$ , is the average of positive and negative information

Figure 5. After the overall update, every node has the same embedding

Ignore negative edges

It seems that the original GCN struggles to utilize the information from negative links; thus, why not just simply remove negative edges? Pos-GCN provides an approach which specifically focuses on positive links and ignores negative links in signed graphs. This approach simplifies even further the model by only considering positive relationships, which can be advantageous in certain scenarios.

The graph aggregation process for all nodes now turns into the following:
$\overrightarrow{OU_A}=\frac{\overrightarrow{OA} + \overrightarrow{OB} + \overrightarrow{OC}}{3}$

Notice that the normalization coefficient is now equal to 1/3 as nodes only aggregate information from node $A$ ‘s three positive connections, as shown in Figure 6.

Figure 6. $U_A$ only aggregates information from positive edges

Figure 6. $U_A$ only aggregates information from positive edges

Figure 7. After the overall update, two communities are clearly defined

As the two communities in the graph are fully connected, the updated node representations for nodes within the same community become identical. Specifically, $\overrightarrow{OU_A} = \overrightarrow{OU_B} = \overrightarrow{OU_C}$ and $\overrightarrow{OU_D} = \overrightarrow{OU_E} = \overrightarrow{OU_F}$ . This can be observed in Figure 7, where nodes $A$ , $B$ , and $C$ are transformed into $U_A$ , $U_B$ , and $U_C$ , respectively, while nodes $D$ , $E$ , and $F$ are transformed into $U_D$ , $U_E$ , and $U_F$ , respectively. The updated node representations clearly exhibit the formation of two distinct communities. This transformation greatly facilitates the classification task between the two communities since nodes belonging to the same community now possess highly similar features.

The Pos-GCN approach aligns with the homophily assumption, which suggests that connected nodes are more likely to share similarities compared to nodes without connections. Pos-GCN can be particularly effective when there is sufficient positive information available in the graph. However, in cases where graphs are sparse and lack significant positive connections, completely discarding negative information may result in a loss of important data and potentially limit the model’s performance. This is because negative links can still provide valuable insights and contribute to the overall understanding of the graph structure.

Conclusion

In this blog post, we have introduced two simple approaches that extend the traditional GCN to signed graphs. The attractive aspect of these approaches is that methods originally designed for unsigned graphs can be directly applied to signed graphs. However, treating negative edges as positive edges or removing negative edges may introduce incorrect assumptions into the model (given that negative and positive edges have different meanings) or overlook valuable information present in negative edges, which could have been extracted from the complex network structure of signed networks. Hence, in the next part, we will explore other signed methods that can integrate different treatments for negative edges, providing more comprehensive solutions.

Reference

Kipf, T. N., and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. In ICLR.

Featured photo by Pixabay from Pexels: https://www.pexels.com/photo/close-up-photography-of-yellow-green-red-and-brown-plastic-cones-on-white-lined-surface-163064/

Data and Methods Exploration Group

Blog on Data Science research and projects conducted by lab members

Graph Convolutional Networks for node classification in signed graphs-Part 1

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply