Weightedness and Directedness of a Graph in Bibliometrics

What is it?

In bibliometrics, and more broadly in network analysis, a graph is a visual representation of relationships between different entities (like authors, publications, or institutions). The weightedness and directedness of a graph are two important characteristics. Weightedness refers to whether the edges (connections) in the graph have weights that signify the strength or intensity of the connection. For example, in a co-authorship network, the weight could represent the number of papers co-authored by two researchers. Directedness indicates whether the edges have a direction, representing asymmetric relationships. In a citation network, for example, an edge from paper A to paper B would signify that A cites B, not the other way around.

Why is it Important?

Weightedness and directedness are two characteristics that inform what can be done analysis-wise with network data. For instance, these characteristics determine what type of community detection algorithm may be useful and which may produce meaningful results. Directedness refers to the presence of directed edges in a graph, indicating the directionality of flows, e.g. of information, between nodes. In bibliometrics or altmetrics, directedness can be an important factor when identifying communities. For example, in a directed social network where interactions are one-way (e.g., following a user on Twitter), community detection algorithms may prioritize densely connected groups of users who receive and share information more frequently with each other. Weightedness refers to the presence of edges with different weights in a graph. In the context of community detection, weightedness can be used to capture the significance or strength of connections between nodes within a community. For instance, in bibliometrics or altmetrics, where some users have more followers or higher levels of influence than others, algorithms may assign higher weights to these connections and prioritize communities that are densely connected with influential individuals. Similarly, the analysis of co-authorship structures on author or institutional level will benefit from considering the weight (i.e. the number of co-published papers) rather than just relying on information if co-authorship occurred in a binary fashion, i.e. yes or no. The relevance of directedness and weightedness in graph analysis becomes evident when considering different algorithms for community detection. For example, the Louvain method is a popular community detection algorithm that incorporates both relatedness, via centrality measures, and weightedness, by using the weights to edges based on their impact or influence. This approach allows for the identification of communities with not only high relatedness but also strong ties to influential nodes within those communities. In a similar vein, the Leiden algorithm, a rather elegant extension of the Louvain method, works the same way in this respect. Another community detection algorithm, the Girvan-Newman algorithm, leverages edge removal to identify communities based on maximizing betweenness centrality (or eigenvector centrality for that matter). By considering both directedness and weightedness, this algorithm can detect densely connected groups of nodes while also accounting for the significance of connections within those communities.

Overall, understanding weightedness and directedness is important in exploratory bibliometric analysis for several reasons. First of all, the characteristics allow for a more nuanced and accurate representation of the relationships in a scholarly network and thereby also a more informed interpretation. Again, it does make a difference if a network is directed or not, e.g. if a paper references another paper or is cited by it. It does make a difference if a network is weighted, i.e. if a co-publication between two organizations is framed as simply existing or not or if a strength (e.g. the Salton Cosine of co-publications) is being observed. In short, honoring these characteristics can enable researchers to conduct more detailed and specific analyses, like identifying influential authors (in directed graphs) or strong collaboration networks (in weighted graphs). Furthermore, directed and weighted graphs provide deeper insights into the nature and intensity of scholarly interactions, which can be crucial for understanding scientific communication and collaboration patterns. Weighted networks sometimes offer clear and informative visual representations of complex relationships by integrating the strength of ties as width of the connecting edges. Finally, positionality algorithms will benefit from weighted graphs as this helps in integrating weights into the calculator, thereby putting strongly related nodes closer together than less strongly related nodes.

How Does it Work?

Usually the two graph characteristics can be deduced from the way the network data has been collected and constructed. When using data from a third party it is always advisable to double check if the assumption being made, e.g. an author collaboration matrix being undirected-weighted, is actually true. In the case of a collaboration network based on authors researchers may make the decision to give the relationship between first author (or last author PI position depending on the field) and all the other authors a direction to, for instance, signify some sort of managerial role. The way such networks are constructed thereby follows a) the potentiality of what weightedness and directedness can actually be achieved based on the underlying data, and b) the research question in mind and the rationale behind the construction process. Some characteristics can be checked. If weights greater 1 are present in a dataset, then it is not unweighted. If only weights of 0 or 1 exist this does NOT imply that the graph is weighted. It simply can only have relationships of 0 and 1. For all intents and purposes it CAN be treated as unweighted, though. When cell-wise subtracting the matrix representing a graph with its transposed form (rows and columns are switched, i.e. the matrix is tipped to the side, and the result for all cells is NOT 0, then the graph should be directed, assuming no mistake has been made constructing it. If all cells feature 0, then the graph CAN be undirected. Yet, in the case of directedness the graph should still be treated as directed.

Limitations

Always check, never assume

The thing with these characteristics is that sometimes developers of an algorithm implementation assume that the user knows what they are doing and take some measures to optimize. For instance, developers might choose to only use the upper or lower triangle of a matrix representing a graph as they hold the same information in an undirected graph anyway and because, assuming N number of nodes, doing \(N*\dfrac{(N-1)}{2}\) calculations is less than doing \(N*N-N\) calculations. In short: An implementation can produce results even IF the data does not follow the requirements of the method.