The post New paper on the role of stabilizing and communicating symptoms appeared first on Psych Networks.
]]>As two graduate students in the Psychological Methods department at the University of Amsterdam, we were familiarized with the work of Cramer and Borsboom on conceptualizing mental disorders as complex networks of interacting symptoms. This conceptualization signifies the role of symptoms and their interactions within and across disorders, and has inspired novel theoretical definitions of clinical concepts such as core symptoms and comorbidity^{1}.
We often found ourselves discussing the potential of tools and metrics from other research areas using network analytic techniques. In the summer of 2016 we came across Santo Fortunato’s Community detection in graphs (2010) – an excellent paper on various applications and implications of network analytic techniques^{2}. One specific sentence caught our attention:
“Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules. So, vertices with a central position in their clusters, i.e. sharing a large number of edges with the other group partners, may have an important function of control and stability within the group; vertices lying at the boundaries between modules play an important role of mediation and lead the relationships and exchanges between different communities.” (p. 3)
Reading this passage immediately sparked a discussion on the numerous possibilities of utilizing the community detection toolbox to develop empirical definitions of these theoretical concepts. The notion of “vertices with a central position within their cluster […] may have an important function of control and stability within the group” can readily be translated to the idea of core symptoms. Similarly, the idea that “vertices lying at the boundaries between modules play an important role [… in] exchanges between different communities” can be mapped onto the theoretical definition of comorbidity within the network perspective on psychopathology.
In our paper, entitled “The role of stabilizing and communicating symptoms given overlapping communities in psychopathology”, we aspired to complement the statistical toolbox of the network approach to psychopathology by exploring what overlapping community detection analysis has to offer. Using community detection and inspecting the differential role of symptoms within and between communities offers a framework to study the clinical concepts of comorbidity, heterogeneity and hallmark symptoms. Symptoms with many and strong connections within a community, defined as stabilizing symptoms, could be thought of as the core of a community, whereas symptoms that belong to multiple communities, defined as communicating symptoms, facilitate the communication between problem areas.
We applied community detection to a large dataset (N=2089) assessing a variety of psychological problems using the Symptom Checklist 90. We identified 18 communities of closely related symptoms. Importantly, these communities are empirically derived instead of theoretically defined. In the paper we illustrate how the proposed definitions on the differential role of symptoms can inform us on the structure of the psychopathological landscape: both globally as well as locally. As such, we adopted established metrics in network science to accelerate our understanding of the psychopathological landscape.
Figure 1. Illustration of (a) the local structure of Feelings of Worthlessness community, (b) its connection to other communities; and (c) a symptom-level example of its connection to the community Worried about Sloppiness.
From our perspective, this endeavour highlights that diving into the world of network science across all kinds of research areas can inspire great advances for the toolbox we use to study psychopathology networks. Drawing inspiration from fields concerned with complex systems such as brain networks, economic networks and social networks, the options seem infinite – and we cannot wait to explore them.
Footnotes:
The post New paper on the role of stabilizing and communicating symptoms appeared first on Psych Networks.
]]>The post Estimating psychological networks via Information Filtering Networks appeared first on Psych Networks.
]]>Markov Random Fields (MRF) have quickly become the state-of-the-art in psychological network modeling for obtaining between-subjects networks. The implementation for binary data is called the Ising Model, and for continuous or ordinal data, Gaussian Graphical Models (GGMs) have been used^{1}. The beauty of these models is that a zero entry between two variables in the adjacency matrix (i.e. the matrix that encodes the parameters that we then plot as networks) means that the two variables are conditionally independent, given all other variables.
In a new paper entitled “Network Structure of the Wisconsin Schizotypy Scales-Short Forms: Examining Psychometric Network Filtering Approaches”, Christensen et al. (2018)^{2} introduce Information Filtering Networks (IFNs) to the psychological literature, and compare them to lasso regularized models. Like MRFs, IFNs are partial correlation networks, and the two models differ mainly in one key aspect: addressing a common challenge that Christensen et al. describe very well:
Networks contain multiple connections across all possible pairs of variables (e.g., symptoms, items) included in the model and therefore are likely to have spurious edges (i.e., multiple comparisons problem). Thus, filtering is necessary to minimize spurious connections and to increase the interpretability of the network. This, however, introduces a problem known as sparse structure learning (Zhou, 2011): How best to reduce the complexity and dimensionality of the network while retaining relevant information?
This is a longstanding problem, and many different solutions have been proposed. In MRFs based on lasso regularization, the number of edges is determined largely by fit (i.e. minimizing the extended BIC, see our regularized partial correlation network tutorial). This has a number of advantages: it controls for multiple testing (i.e. for the numerous regressions that the model estimates under the hood); the procedure results in a parsimonious/sparse network structure that is somewhat easier to interpret; and putting coefficients to exact zero means they need not be estimated anymore, which reduces the number of parameters. The default lasso procedure sacrifices specificity for sensitivity, meaning that edges in the estimated network are also very likely in the data, but that some (weak) edges in the data might not be recovered by the lasso.
Christensen et al. consider the lasso as “biased” because edges included in MRFs are a function of sample size. I wouldn’t call this a bias, but it is correct that the lasso puts even moderately large edges to zero in situations of low power because it cannot reliably distinguish these edges from zero, whereas the lasso will put nearly no edges to zero in extremely large samples because it can very reliably distinguish even tiny edges from zero^{3}. I think about the lasso as a feature that works very similar in standard statistical methods used in psychological research: if a correlation of 0.2 is estimated in a small sample with little power, its confidence intervals (CIs) are large and often overlap with 0 [CI -0.6; 0.8]. In this case, we treat the coefficient as not significantly different from zero. But if we have sufficient power, a correlation coefficient of 0.2 might well be distinguishable from zero [CI 0.1;0.3].
How are edges chosen in the IFNs, if not based on the lasso? The paper by Barfuss et al. 2016^{4} provides a fantastic introduction to IFNs, and also covers the lasso, ridge regression, and the elastic net.
IFNs deal with the issue of minimizing spurious relations not by choosing the edges based on fit to the data. Instead, IFNs estimate a fixed number of edges based on the formula “3 * nodes – 6”. A network with 20 nodes has 54 (out of 190; ~28%) edges, and the networks the authors estimate in the paper, with 60 nodes, have 174 (out of 1770 potential; ~10%) edges. I see two main challenges here.
First, in case two networks differ from each other (i.e. one has many connections, the other fewer), estimating the same number of edges in both networks might artificially inflate similarity between the structures—or it might lead to the opposite. Without simulation studies, which the paper does not contain, we do not know, and conclusions are premature. Second, I could find no rationale why “3 * nodes – 6” would be a reasonable formula for the number of edges we expect in psychological network structures. There might be some general rules that emerge across psychological networks, and maybe it turns out to be a good approximation. But it seems a strong assumption to me. The procedure is similar to running a linear regression with 10 predictors and determining before looking at your data that 3 will be different from 0. It is also worth noting that IFNs get sparser with a larger number of nodes: with k=10 nodes, 24 of 45 edges are estimated (nearly 50%), but in k=100, 294 of 4950 edges are estimated (6%).
Below a visualization of the relationship between the number of nodes, and the sparsity (i.e. % of estimated IFN edges in relation to all potential edges):
matrix <- matrix(NA, nrow=2, ncol=100)
for (n in 5:100) {
matrix[1, n] <- n*3-6 #edges in IFN
matrix[2, n] <- n*(n-1)/2 #all potential edges in network
}
plot(matrix[1, ] / matrix[2, ], ylab="Sparsity", xlab="Number of Nodes", main="Sparsity of Information Filtering Networks")
This behavior serves the goal to estimate sparse network structures. But it's also easy to think of scenarios in which IFNs will do a bad job at recovering the true network structure. For instance, imagine a scenario where the true network structure has many nodes and is dense: the IFN will always lead to a very sparse network. Then again, we can envisions scenarios as well where the lasso would not perform well, such as a dense true structured estimated in a small sample.
There are some statements in the paper I disagree with, and I post them here as post-publication review in the hopes that it will lead to a dialogue with the authors so we can resolve potential misunderstandings together.
First, Christensen et al. write that in a case of a lot of shared variance among items, the focus of MRFs on unique variances will remove the shared variances, leaving items disconnected^{5}. They reference the paper of Forbes et al. 2017 discussed previously that was written under the same assumption. But the opposite is the case: if all items share a lot of variance, i.e. if a unidimensional factor model describes the data well, the network will be fully connected. In other words, if you simulate data from a unidimensional factor model, you get a fully connected, not an empty network model; this has been shown many times both in simulation studies and in mathematical proofs. See below where we first simulate data (n=300) from a unidimensional factor model, and then fit a MRF to the data that is fully connected, resulting in considerable partial correlations (all code available here).
# Lavaan simulation
population.model <- ' f1 =~ x1 + 0.8*x2 + 1.2*x3 + 0.7*x4 + 0.5*x5 + 0.8*x6'
set.seed(1337)
myData <- simulateData(population.model, sample.nobs=300L)
fitted(sem(population.model))
round(cov(myData), 3)
round(colMeans(myData), 3)
myModel <- ' f1 =~ x1 + x2 + x3 + x4 + x5 + x6'
fit <- sem(myModel, data=myData)
summary(fit)
network2 <- estimateNetwork(myData, default="EBICglasso")
plot(network2, details=TRUE)
Second, the authors state that "The shrinkage of correlations below a certain threshold [when using the lasso] also contributes to reduced reproducibility because variables can be eliminated based on statistical significance rather than theory." The lasso does not eliminate variables, it eliminates edges. The goal is to estimate a model that describes the data well, whilst avoiding the estimation of spurious relations by finding a good balance between false positive and false negatives. IFNs do the same: they are data-driven models that differ from MRFs in that they use a different strategy to obtain a parsimonious structure. So if there is any criticism regarding theory (which is an argument one can make), it applies to both models.
Third, the authors conclude that the pitfalls of the lasso-based MRFs are "biased comparability, reduced reproducibility, and the elimination of hierarchical information". With biased comparability they mean that the lasso regularizes proportional to power, which is true: if we simulate data for n=200 and n=2000, both times from the same true network structure, and estimate networks for both datasets, the network in n=200 will likely be sparser than the network in n=2000 (i.e. fewer edges), because that is how the lasso operates. It has more power in n=2000 to reliably distinguish small edges from zero, similar to t-tests or linear regressions that can more reliably detect differences (e.g. from zero) in larger samples. But this also means that this main criticism can simply be circumvented by either a) making sure sample size is similar when comparing network structures, which is commonly done^{6}, or b) by using permutation tests that take sample size into account when comparing network structures, which the Network Comparison Test developed exactly for this purpose does. The second point "reduced reproducibility", is primarily based on assertions of Forbes et al. 2017, all of which have been thoroughly refuted^{7}. Christensen et al. add to the argument by comparing 2 network structures of 2 datasets, and find that IFNs have higher replicability than MRFs^{8}. Even if we, for the sake of the argument, do not object to the way Christensen et al. conduct the comparison, the conclusion that IFNs replicate better in this specific case allows no conclusion whatsoever about replicability of models in general. And obviously, the authors retrieve the same amounumbert of edges across the two network structures because IFNs a priori estimate the same number of edges in case the number of nodes is the same. For their last point, "elimination of hierarchical information", I do not understand how IFNs get around that.
Finally, it is odd to read Christensen et al.'s repeated criticism of partial correlations and conditional dependence relations … given that the network model they put forward is a partial correlation network / conditional independence network.
In general, what I would have loved to see in the paper is a simulation study that actually shows how IFNs perform, which is necessary to vet any proposed methodology. That is, it is crucial to show that if you simulate data from a known structure X, your methodology will do well in recovering that structure.
I'm extremely thankful Christensen et al. 2018 brought IFNs to the world of psychological network modeling. From my perspective, we have two different approaches, and it is premature to conclude that one approach is inherently superior to the other; this goes both ways, obviously. The benefits likely depend on the context, such as the true network structure the data come from, prior knowledge about the network structure, the sample size, and the number of nodes.
Sacha Epskamp, who programs faster than light, has implemented IFNs in our R-package bootnet and sent around example code I will paste below. This was possible because Christensen et al. implemented the estimation routine in the package NetworkToolbox.
You can estimate the models via estimateNetwork(default="TMFG")
. The below code estimates a MRF and a IFN on the same data, and compares them superficially. As dataset we use the BFI data that are openly available.
# Install packages:
devtools::install_github("sachaepskamp/bootnet")
library("NetworkToolBox")
library("bootnet")
library("psych")
library("qgraph")
# Estimate networks, first a Gaussian Graphical Model, then an Information Filtering Network:
data(bfi)
LassoNetwork <- estimateNetwork(bfi[,1:25], default = "EBICglasso")
TMFGNetwork <- estimateNetwork(bfi[,1:25], default = "TMFG")
# Average Layout so networks can be compared:
Layout <- averageLayout(LassoNetwork, TMFGNetwork)
# Plot both networks using the same layout:
layout(t(1:2))
plot(LassoNetwork, layout = Layout, title = "EBIC glasso")
plot(TMFGNetwork, layout = Layout, title = "Triangulated Maximally Filtered Graph")
I've worked a lot on stability of network models in recent years, together with Sacha Epskamp, and a natural question that follows this work is: how stable are IFNs, and how stable are they compared to MRFs?
One quick look in one dataset — obviously, this does not generalize to anything but this specific dataset — leads to fairly low stability in some edges, but excellent stability in others, which is not surprising. Imagine you have 20 nodes in a network, with 30 strong edges, 30 moderate edges, 30 weak edges, and 100 absent edges. The IFN will always estimate 20*3-6 = 54 edges. This means it will correctly estimate the 30 strong edges, but then pick 24 of the 30 equally strong moderate edges. Every time you bootstrap the network, you will pick a random selection of 24 moderately strong edges. Overall, the estimation of the very strong and very weak edges will be stable (i.e. always very similar when bootstrapping), but moderately strong edges will be estimated with low precision.
Estimating and bootstrapping MRF vs IFN using the BFI dataset leads to the following edge weights and CIs (again, codes & graph from Sacha)^{9}:
boot1lasso <- bootnet(LassoNetwork, nBoots =1000, nCores = 8)
boot1tmfg <- bootnet(TMFGNetwork, nBoots =1000, nCores = 8)
plot(boot1lasso, labels = FALSE, order = "sample")
plot(boot1tmfg, labels = FALSE, order = "sample")
So there is plenty of future work to do, and I hope the next paper on IFNs will contain some simulation studies to vet their performance in different situations.
Acknowledgements
We discussed the paper in the labgroup, and the blog post is a summary of many points raised there. Obviously, all mistakes in the blog post are my mistakes only.
Footnotes
The post Estimating psychological networks via Information Filtering Networks appeared first on Psych Networks.
]]>The post FAQ on network stability, part II: Why is my network unstable? appeared first on Psych Networks.
]]>This blog post is the second part of the series, and highlights issues related to stability or accuracy of network models. It’s largely based on our Tutorial on Regularized Partial Correlation Networks forthcoming in Psychological Methods, and on a recent discussion on Facebook. Many of the points below were raised by Payton Jones, Denny Borsboom, and Sacha Epskamp, so all credit to them. This is just an accessible summary.
As described in the Facebook discussion, there are some reasons for stability problems.
As you know by now, model stability is correlated to power, and power comes from 1) more participants and 2) fewer nodes in the network (because this means you have less parameters to estimate)^{2}. If you have a highly unstable model, with parameters that are all over the place, the reason is probably the same as for factor models, regressions, and t-tests: Too few participants for the parameters you estimate.
Outliers can lead to problems, especially in small samples. Remember that the bootstrapping routines we use in the R-package bootnet to look at the stability of your network model resample your data. If parameters differ a lot depending on whether the few people with severe outliers are included in the sample or not, then you might end up with imprecise results^{3}.
Network models as currently implemented often use regularization to err on the side of sparsity: This puts edge coefficients exactly to zero. Think about this as some sort of threshold that edges must reach. If they don’t, they are put to zero^{4}. If you have many edges that are just barely above this threshold, and you use bootstrapping routines, it’s likely that each time you estimate your network based on bootstrapped data, different weak edges survive regularization, which will lead to an unstable network.
Related, regularization assumes that your true network is sparse. If this is not the case — i.e. if your true network is dense — and especially in cases when edge weights are similar to each other, this can result in estimation and stability issues.
Closeness in a network becomes 0 if a network has at least one node that is unconnected to the rest. Imagine you have a very weakly connected network: It is possible that in half of the bootstrapped networks, one node is unconnected. Since we look at the similarity of centrality across bootstraps in bootnet, this would result in a very low stability coefficient. Similarly, since we drop cases in the bootstrapping routines (to determine if the same centrality order emerges when subsetting the data), this can lead to unconnected nodes, which dramatically reduces closeness stability.
Betweenness centrality may be unstable if there are (a) multiple plausible shortest paths connecting for instance two communities X and Y, and if (b) these multiple shortest paths are roughly equally strong.
Here is an example for the situation (codes and full output available here) that Sacha just put together for this blog post; the codes also nicely functions as a tutorial on how to set up your own brief simulation study, using the netSimulator()
function we described in our recent tutorial paper on network power estimation.
First we create a network structure that has 2 bridges, and simulate a dataset with n=5000. This should give us ample power for stable estimation.
As expected, network estimation looks highly stable; the correlation of the estimated network with the true network is high, and sensitivity and specificity are good (run the codes yourself to see the corresponding plots). The correlation stability for strength centrality is 0.44, meaning you can drop 44% of your dataset and still retain a correlation of about 0.7 between the order of strength centrality in your subsampled data/network and the order of strength centrality in your full data/network.
For Betweenness, however, the centrality stability coefficient is 0. Why? Every time you bootstrap and estimate a network structure, one of the two edges connecting the two communities is likely going to be a tiny bit larger than the other, meaning it has very high betweenness centrality (the node of the other edge get a very small betweenness centrality). This varies across bootstraps, leading to highly unstable betweenness results.
We stumbled across this when analyzing a dataset of about 8000 participants; results are described in detail here (pp. 5 and 6). To investigate this further, we dropped 5% of the sample 1000 times in this dataset, and plotted Betweenness for all items:
As you can see, items V6, V8, V11 and V14 (which were the 4 items forming the two bridges across communities) showed pronounced Betweenness, leading to a centrality correlation coefficient of 0.
When you estimate the correlations of your items as input for the Gaussian Graphical Model^{5} via polychoric correlations, this can lead to problems in case you have (a) a small dataset and (b) very skewed items / infrequent categories. This leads to zeroes in the marginal crosstables, resulting in unstable results. For instance, here we show bootstrap results of highly unstable edges even though N is high (if you cannot access the paper due to the paywall, see here). Even if this problem does not exist in your raw data, keep in mind that it might be introduced once you start bootstrapping the data, because, as Sacha wrote:
Bootstrapping will reduce entropy, as the collection of all unique outcomes in your bootstrapped dataset is by definition equal or smaller than the collection of all unique outcomes in your raw dataset.
This is related to skip questions. Especially in large epidemiology datasets, it is quite common to skip certain symptoms dependent on others. For instance, if a person does not meet at least one of the two core symptoms for Major Depression, the other 7 secondary symptoms are usually not queried. These missing values are commonly replaced by zeros (zero-imputation), which can lead to considerable problems described in more detail here. These problems are also visible in stability analysis. When we bootstrapped a very large dataset (that we expected to be very stable), we found that the core items that determine skip very extremely unstable (the grey areas in the plot below indicate the 95% CI of the edge weight parameter estimates; x-axis is parameter strength, y-axis the node in question).
As Denny Borsboom stated:
The most important dark horse is network structure. Some network structures are very hard to estimate even with massive datasets and others work very well at small sample sizes. E.g. Isingfit has trouble recovering scale free networks even at very high sample sizes, see https://www.nature.com/articles/srep05918.
A scale-free network is a specific type of network structure where the degree distribution follows a power law. The paper reference above shows that IsingFit, the R package developed to estimate Ising Models in psychological data, performs very well if data come from random or small world networks, but does not perform well when data where generated from scale-free networks:
The above pattern of results, involving adequate network recovery with high specificity and moderately high sensitivity, is representative for almost all simulated conditions. The only exception to this rule results when the largest random and scale-free networks (100 nodes) are coupled with the highest level of connectivity. In these cases, the estimated coefficients show poor correlations with the coefficients of the generating networks, even for conditions involving 2000 observations […].
Without stability analysis, inference cannot follow, which is why we previously suggested to adding stability as a third step to the network psychometrics routine: Network estimation, network inference, network stability. But as pointed out in the recent Winter School here in Amsterdam, it makes more sense to change around the order to network estimation, network stability, network inference, for obvious reasons.
There is another benefit to stability analysis: You might not notice that there were problems in the network estimation itself, such as zeroes in the marginal cross-tables, but these often come to light in the stability analysis. As such, you can use the bootnet routines as a way to identify potential issues in your data as well.
The post FAQ on network stability, part II: Why is my network unstable? appeared first on Psych Networks.
]]>The post Collection of PTSD network papers & recent conference talks appeared first on Psych Networks.
]]>Last year, 15 articles using network analytic methods in PTSD/psychotraumatology research were published, marking an 150% increase of publications compared to the year 2016. Furthermore, the European Journal of Psychotraumatology published a special issue on “Symptomics”, with several articles and an editorial on network analysis in psychotraumatology. These publications reflect an increasing interest in network analysis in psychotraumatology research. The number of panels, presentations and posters at the 33rd annual meeting of the International Society for Traumatic Stress Studies (ISTSS) in Chicago in November 2017, also demonstrated that network analysis is currently one of the “hot topics” in this field^{1}.
For those who could not attend the 33rd ISTSS meeting or want to go through the presentations again, I have started to collect presentations and posters from the speakers. Many of the speakers agreed to share their slides or poster. These resources can be found on the OSF. The names of the speakers and their talks are also provided here:
Apart from the collection of talks and posters, I have put together a list with all papers using network analytic methods in psychotraumatology research. I updated it for the first time, including more publications (in total 27) and included some basic information on these publications, namely name of the authors, name of the journal, sample size, the populations trauma type and links to the paper and the supplementary materials. Thus, the reading list can be used as a starting point for a literature review or to find specific publications or supplementary materials. The current version of the reading list, which will be updated on a regular basis, can be found on the OSF or as an interactive list on ResearchGate. Updates will be announced via Twitter.
Please let me know If you want to share your slides, notice missing papers, broken links or have a general.
Footnotes:
The post Collection of PTSD network papers & recent conference talks appeared first on Psych Networks.
]]>The post 7 new papers on network replicability appeared first on Psych Networks.
]]>Our paper entitled “Replicability and generalizability of PTSD networks: A cross-cultural multisite study of PTSD symptoms in four trauma patient samples” was published a few days ago Clinical Psychological Science (PDF). I described the results of the paper in more detail in a previous blog post. In summary, the paper, for the first time in the literature, compared estimated network structures across four different datasets. Specifically, we compared networks of PTSD symptoms across 4 moderate to large clinical datasets of patients receiving treatment for PTSD, and found considerable similarities (and some difference) across network structures, item endorsement levels, and centrality indices. See the paper & blog post for details.
» Fried, E. I., Eidhof, M. B., Palic, S., Costantini, G., Huisman-van Dijk, H. M., Bockting, C. L. H., … Karstoft, K. I. (2017). Replicability and generalizability of PTSD networks: A cross-cultural multisite study of PTSD symptoms in four trauma patient samples. Clinical Psychological Science. PDF.
It is well known that depressed patients suffer from numerous symptoms that go beyond the DSM criteria for Major Depression, such as anger, irritability, or anxiety. In 2016, we investigated^{1} whether DSM symptoms are more central than non-DSM symptoms in a large clinical population, and found that this was not the case.
Kendler et al. 2017 published a paper last week in the Journal of Affective Disorders that is a conceptual replication of this previous paper, in a different very large clinical sample of highly depressed Han Chinese women (the CONVERGE data); conceptual replication because the population in CONVERGE differs considerably from the STAR*D data from the first paper, and because item content also differed.
The results are the same: DSM symptoms were not more central than non-DSM symptoms.
This means that there is nothing special in terms of network psychometrics about DSM symptoms for depression compared to non-DSM symptoms. Which makes sense, given that the DSM symptoms were chosen largely for historic and not empirical, scientific, or psychometric reasons^{2}.
» The Centrality of DSM and non-DSM Depressive Symptoms in Han Chinese Women with Major Depression (2017). Kendler, K. S., Aggen, S. H., Flint, J., Borsboom, D., & Fried, E.I. Journal of Affective Disorders. PDF.
In another paper published in the same journal, van Loo et al. divided the CONVERGE sample mentioned above into 8 subgroups based on 4 variables of genetic and environmental risk: family history (present vs absent), polygenic risk score (low vs high), early vs. late age at onset, and severe adversity^{3} (present vs present).
The network structures did not significantly differ across these 4 variables^{4}.
I was surprised by these remarkable similarities across different subgroups, which (contrasting my own work) could be interpreted in the sense of one common pathway to depression. Then again, CONVERGE is a very specific sample, with recurrent severe symptomatology, and I’m looking forward to see replication attempts of these results in less severely depressed samples.
» van Loo, H.M., van Borkulo, C. D., Peterson, R.E., Fried, E.I., Aggen, S.H., Borsboom, D., Kendler, K.S. (2017). Robust symptom networks in recurrent major depression across different levels of genetic and environmental risk. Journal of Affective Disorders. PDF.
I have covered the paper by Forbes et al. and the rebuttal by Borsboom et al. published in the Journal of Abnormal Psychology in detail already in a recent blog, and will refrain from reiterating the points here. In sum, both papers investigated the degree to which several network models replicate across 2 very large community datasets, with the following results:
And here the edge weights as a heat map to stress how strong the replication is^{5}:
» Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017). Evidence that Psychopathology Symptom Networks have Limited Replicability. Journal of Abnormal Psychology, 126(7). PDF.
» Borsboom, D., Fried, E. I., Epskamp, S., Waldorp, L. J., van Borkulo, C. D., van der Maas, H. L. J., & Cramer, A. O. J. (2017). False alarm? A comprehensive reanalysis of “evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger (2017). Journal of Abnormal Psychology, 126(7). PDF.
There was considerable disagreement among the 2 teams of authors whether these network structures replicate across the datasets. Both teams agreed, however, that readers should decide for themselves by reading both papers.
A letter published in JAMA Psychiatry last week is a non-replication of a prior finding by van Borkulo et al. 2015^{6}. I described the paper in a previous blog in more detail, the relevant point here is: Schweren et al. 2017 split a sample into two subgroups at time 2 (treatment responders and non-responders), and then compared the time 1 networks of these two groups for connectivity (i.e. the sum of all absolute connections in the network structures). Contrasting van Borkulo et al. 2015, Schweren et al. found no significant differences across the groups. Or if you put it differently: networks replicated across the subgroups, not dissimilar to the paper by van Loo et al. 2017 above. Again, we need follow-up work on this, since the effect was in the direction predicted by van Borkulo et al. 2015, and since the statistical test used requires a lot of power to detect differences.
» Schweren, L., van Borkulo, C. D., Fried, E. I., & Goodyer, I. M. (2017). Assessment of Symptom Network Density as a Prognostic Marker of Treatment Response in Adolescent Depression. JAMA Psychiatry, 1–3. PDF.
In early 2017, Psychological Medicine published a network analysis of OCD and depression comorbidity authored by McNally et al., who used network models to estimate undirected and directed network structures in a cross-sectional sample of 408 adults with primary OCD.
Last week, Jones et al. 2017 published a paper that looked into the network structure of the same OCD and depression items in 87 adolescents. The publication has found a home in the Journal of Anxiety Disorders, and is entitled “A Network Perspective on Comorbid Depression in Adolescents with Obsessive-compulsive Disorder” (PDF).
Interestingly, the authors use the same items in both papers, so Payton Jones wrote a blog post in which he compares the results of both papers. Payton writes that this is not a direct replication — samples differ considerably from each other — but also notes that “the similarities (e.g., the parts that ‘replicate’) are likely to say something universal about how OCD and depression work, and the differences (e.g., the parts that ‘don’t replicate’) might tell us about what makes adults and adolescents unique (or they might be spurious – we’ll have to be careful)”.
I admit that I am surprised the authors obtained a network structure at all in such a small sample — with n=87 for 300 parameters, I would have expected the lasso to put all edges to zero — and I am even more surprised that network structures seem to resemble each other fairly well (the correlation between adjacency matrices is 0.67).
» Payton, J. J., Mair, P., Riemann, B. C., Mungno, B. L., & McNally, R. J. (2017). A Network Perspective on Comorbid Depression in Adolescents with Obsessive-compulsive Disorder. Journal of Anxiety Disorders. PDF.
A core component to establishing replicability of findings is to formally compare network structures. One way to find out out whether network structures are different from each other is to use the Network Comparison Test (NCT) developed by Claudia van Borkulo that I described in a bit more detail in the last tutorial blog post. However, the NCT requires a lot of power to detect differences, so a negative result (p > 0.05) can mean (a) that there is no difference between networks, or (b) that you do not have sufficient power to detect differences. Note also that the NCT uses Pearson correlations by default, and there are many situations in which Pearson is not appropriate for your data.
Therefore, we might want to complement the investigation of differences (e.g. via the NCT) with an estimate of the similarity of networks. One way to do that is to look at all individual edges via the NCT, and report how many of these are not different from each other. We did that in the above reference Clinical Psych Science paper, where we compared each pair of the 4 networks with each other:
“Of all 120 edges for each comparison of networks, only 2 edges (1.7%; comparison networks 1 vs. 2 and 1 vs. 4) to 8 edges (6.7%; comparison networks 3 vs. 4) differed significantly across the networks, with a mean of significantly different edges across the 6 comparisons of 3.1 edges.
That means that while, for instance, networks 1 and 2 differed significantly from each other in the omnibus NCT (not shown; i.e. structure is not exactly the same), only 1.7% of all 120 edges of networks 1 and 2 differed significantly from each other in the posthoc tests — giving us a measure of similarity^{7}.
But the most common metric for similarity of two networks in the literature is to estimate a correlation coefficient between two adjacency matrices. Since we usually compare regularized network structures — with sparse adjacency matrices where many elements are exact zero — correlations might not be the best idea here, and it would be nice to run a simulation study to see what happens exactly, under which cases. I’ve used correlation coefficients as well, in several papers, because I think it offers some very rough insight, but I wanted to highlight here that this is probably not the best way to move forward.
Another topic that deserves more attention is replicability of findings in time-series studies. For instance, Madeline Pe^{8} wrote a great paper showing that the connectivity of the temporal network structure a depressed sample is higher than that of a healthy sample, and I’m not aware of replications of this finding. There is one group-level^{9} and one idiographic paper^{10} on critical slowing down, and I’m very curious to what degree these phenomena will replicate on other data. And then of course there are many papers fitting exploratory models to time-series data, and I am looking forward to seeing to which degree these models replicate in similar datasets. Such data-gathering and modeling efforts will also allow us the question to which degree intra-individual network structures of individuals (e.g. depressed patients) are similar to each other. Preliminary evidence^{11} suggests that there are marked differences.
Footnotes
The post 7 new papers on network replicability appeared first on Psych Networks.
]]>The post Does network connectivity predict future depression? Non-replication paper published appeared first on Psych Networks.
]]>The blog is structured into 5 sections:
Then I’ll eat breakfast.
Back in 2015, Claudia van Borkulo, a postdoc in the Psychosystems Group in Amsterdam, and colleagues published a paper entitled “Association of Symptom Network Structure With the Course of Depression” in JAMA Psychiatry^{2} that has gathered nearly 80 citations since^{3}.
In the paper, the authors took 515 participants from the NESDA cohort with last-year diagnosis of MDD, and split the sample into two groups at the follow-up timepoint 2 years later: treatment responders vs treatment non-responders. Van Borkulo et al. then estimated network structures at timepoint 1 for these two groups, and found that the non-responder group at baseline had higher network connectivity than the responders, i.e. the absolute sum of all edges in the network structure.
The test statistic for the difference in network connectivity was 1.79 (p = .01). In two sensitivity analyses, the test statistic remained significant (analysis 1: 1.55, p = .04; analysis 2: 1.65, p = .02).
So what is global connectivity or global density, and why might it be relevant? Connectivity is the sum of all the things going on in a network structure, and it depends on what kind of network you have to make sense of it. In a temporal network collected with time-series data, for instance, you could add all temporal cross-lagged effects per patient (sad mood predicting anhedonia; concentration problems predicting sleep problems; etc.), and then see if there is more going on for patient A than B. You could add to that the autoregressive coefficients (sad mood on sad mood; anhedonia on anhedonia), and then you’d have a measure of temporal density. You could interpret this of sum of “activation” carried from one time-point to the next, and compute some kind of R^2 (if the temporal connectivity is 0, then you also explain no variance of one timepoint to the next).
In cross-sectional between-subjects networks, connectivity is simply a sum of all absolute edge values^{4}. It means that, in a given population, items tend to have stronger conditional dependence relations, which usually also means items have higher correlations; in a factor model perspective, this can translate into higher Cronbach’s Alpha, i.e. scale reliability.
How do you compute global density? You can download a freely available example dataset here, including code^{5}.
library("qgraph")
library("bootnet")
load("data.Rda")
network1 <- estimateNetwork(data, default="EBICglasso", corMethod="cor")
graph1 <- plot(network1, layout="spring", cut=0)
sum(abs(network1$graph))/2
#global density: 8.48
We load the data, estimate a network, plot the network (not shown), and then compute connectivity. We have to divide the absolute sum by 2 because all edges are encoded twice in the adjacency matrix (A with B but also B with A).
How do you compare global connectivity across two networks? Claudia van Borkulo developed the Network Comparison Test (NCT) for that purpose.
First, let's split our data into two sub-datasets that we want to compare.
library("dplyr")
data1 <- slice(data, c(1:110))
data2 <- slice(data, c(111:221))
We then estimate and plot two network structures:
network2 <- estimateNetwork(data1, default="EBICglasso", corMethod="cor")
network3 <- estimateNetwork(data2, default="EBICglasso", corMethod="cor")
graph2 <- plot(network2, layout="spring", cut=0, details=T)
graph3 <- plot(network3, layout="spring", cut=0, details=T)
L <- averageLayout(graph2,graph3)
layout(t(1:2))
graph2 <- plot(network2, layout=L, cut=0, maximum=0.29)
graph3 <- plot(network3, layout=L, cut=0, maximum=0.29)
Haha ... I swear I only split the data once (back in summer this year! for a workshop in Spain), so for this tutorial, from an educational perspective, I guess we're lucky we get so pronounced connectivity differences.
sum(abs(network2$graph))/2 #0.5436501
sum(abs(network3$graph))/2 #5.794574
Now, on to the NCT, a permutation test that tests against the null-hypothesis that networks have identical connectivity:
nct_results <- NCT(data1, data2, it=1000, binary.data=FALSE)
This gives us 3 main outcomes:
nct_results$glstrinv.sep # connectivity values: 0.54 vs 5.79
nct_results$glstrinv.real # difference in connectivity: 5.25
nct_results$glstrinv.pval # global strength invariance: 0.71
The first line replicates connectivity that we estimated by hand above; the second lists the difference that was used as test statistic in the permutation test; followed by the p-value.
We now have a perfect example showing that the NCT requires sufficient power to detect differences between the connectivity of two networks — because networks that do seem to differ in connectivity (both graphically and statistically), as in the example above, are found to be identical in connectivity due to the small sample size here (Claudia shows this clearly in her NCT validation paper that is, I believe, still under revision). So if the NCT is non-significant, and you have two small samples, this can mean that there is no difference, or that there is not enough power to detect the difference. This is important to keep in mind.
(Note that you can compare networks in many more ways than just connectivity. The Network Comparison Test can also test for differences in network structure, and differences in all individual edges; further, there are other tests available such as assessing the similarity of network structures. For 2 example papers where we did all of this, including all code how to do it, see 1 and 2.)
On to the new paper by Schweren et al. 2017 that was a conceptual replication of van Borkulo et al. 2015. We also split the data based on timepoint 2 into responders and nonresponders, but the baseline networks for these two groups do not differ in connectivity:
Global connectivity was higher in poor responders, similar to the paper by van Borkulo et al., but the difference was not significant (good responders, 3.6; poor, 4.3; p = 0.15).
It's very cool to see critical replications and non-replications, and I think we should do that much more for substantive findings that might be relevant. For this particular effect, I am still not convinced what to think, given the fact that the result went into the same direction, and the connectivity difference between networks wasn't negligible.
How about we write a high-powered follow-up study with multiple larger samples? I have the STAR*D data lying around (~3500 patients) I could contribute, if anybody is interested to replicate this with higher power: let me know, let's join up, let's do it. Code from two papers, expertise, and plenty of co-authors interested in this are available! And I'm sure I'm not saying too much if I include Claudia and Lizanne here as potentially interested candidates for collaborations ;)!
One fairly severe complication is that in the tutorial above, I used CorMethod="cor" when estimating networks for simplicity, which forces estimateNetwork() to use Pearson instead of polychoric correlations for the skewed ordinal items in the data (these are PTSD symptoms in a subclinical sample, so substantial skew). This is, of course, inappropriate. If I repeat the analysis with polychoric correlations, the connectivity for the networks is 8.30 vs. 8.38, very much identical (compared to 0.54 vs 5.79 that we got from the Pearson correlations, a pronounced difference). The strongest edge is now 0.48 instead of 0.28.
The results change dramatically, which implies the importance of looking at the distribution and type of your data, and make sure you use the appropriate correlations. Sacha Epskamp and I describe this in some detail in the FAQ of our new tutorial paper on regularized partial correlations forthcoming in Psychological Methods.
There is considerable evidence, across numerous datasets I reviewed in a recent empirical, and also in two very large datasets I studied for that paper, that network structures of depression symptoms in healthy people are more highly interconnected that network structures of depression symptoms in depressed people^{6}. Many explanations have been offered for this, and although I've worked on this for over 3 years now, I really haven't figured out how to make sense of these findings, or how to make sense of them in relation to the findings by Claudia and Lizanne. If you're interested (or have suggestions to offer), I summarized the paper in this blog post.
And now:
Footnotes:
The post Does network connectivity predict future depression? Non-replication paper published appeared first on Psych Networks.
]]>The post Network models do not replicate … not. appeared first on Psych Networks.
]]>Today’s blog tackles the replicability of network models, and I will provide my personal take on the topic here. The blog does not reflect the view of my colleagues or co-authors^{1}, and is a very personal tale … full of woe and wonder.
The Journal of Abnormal Psychology just decided to go ahead with publishing a paper although there is at least one known serious error, and a number of major problems. I read this paper, entitled “Evidence that psychopathology networks do not replicate”, the first time about half a year ago (the title changed since), when we were invited by Abnormal to write a commentary. In the paper, Forbes, Wright, Markon and Krueger, from here on FWMK, fit 4 network models to two large community datasets of 18 depression and anxiety symptoms, and investigated whether the network models replicate across the 2 datasets.
FWMK conclude:
“Popular network analysis methods produce unreliable results”, “Psychopathology networks have limited replicability”, “poor utility”, and, later, “current psychopathology network methodologies are plagued with substantial flaws”.
That is devastating, and is worrisome for anybody who has used network models before. So we — Denny Borsboom, Sacha Epskamp, Lourens Waldorp, Sacha Epskamp, Claudia van Borkulo, Han van der Maas, Angelique Cramer, and me — sat down and took a closer look at the paper, and found the following problems. And just to highlight that again, I’m describing the version of the paper we received after it was accepted for publication.
This is a remarkable collection of issues for a paper that draws the strong conclusions about methodology I quoted above, and took us a considerable amount of time to identify all by digging through the code of FWMK. We informed the editor that the paper contains a number of serious flaws, and were surprised that Abnormal decided to go along with the publication of the paper. We followed the editor’s invitation to write a commentary, in which we re-analyzed the data and fixed the mistakes we identified; all code and results are available online.
A number of curious things happened next. First, because the editor gave us only a few weeks to reply, and because we wanted to re-analyze the data to make sense of the implausible network structures in the manuscript, we asked the editor to confirm in writing before we started working on the commentary that this is the final accepted version of the manuscript; the editor confirmed it is.
Second, we then found out that we have to apply for one of the two datasets, and pay for it, because the replicability paper by FWMK with very strong claims about a whole family of psychometric models was actually not reproducible^{2}.
Third, while working on our re-analysis, FWMK changed the final accepted version of the manuscript (that we were guaranteed by the editor to be final) not once, but twice (final manuscript 1, final manuscript 2, final manuscript 3). In total, FWMK fixed the DAG errors, rewrote parts of the paper, changed the title (“Evidence that psychopathology networks have limited replicability”) and the results, but left the discussion and conclusions untouched. The paper was not peer-reviewed after the changes^{3}, and the incorrect estimation of the the relative importance networks persisted^{4}, so did association networks based on non positive definitive correlation matrices, implausible correlations among depression symptoms of 0.95 due to zero-imputation, and the application of linear regression to binary data. And, of course, the devastating conclusions about network methodology in general.
Let’s ignore the fact that the editor refused to give us even one more week to write the commentary, despite the authors changing the paper twice while we were writing up the commentary and re-analyzing all data. And let’s ignore the fact that the editor and the reviewers insisted we cannot call any aspect of the manuscript an “error” or “wrong”, but needed to use words such as “statistical inaccuracy” — while they were happy to have FWMK draw extremely strong conclusions about methodology based on (mis)applying models to two datasets. And let’s also ignore for now that the editor asked specifically us to declare conflicts of interests because we “teach network models” — despite the fact that FWMK also teach (e.g. factor models), despite the fact that Steinley et al. (also invited to comment on FWMK’s paper) also teach network analysis, and despite the fact that it is fairly uncommon to read: “We report severe conflicts of interests regarding the t-tests used in this manuscript … because we teach them to students.”^{5}
In any case, ignoring the very weird review and publication process, the main point here is that the Journal of Abnormal Psychology and FWMK decided to go ahead with publishing a paper that contained significant errors that the editor and the authors were aware of.
In the final paper that was published yesterday …
These facts are not available to readers of the paper.
I haven’t address the biggest problem yet. Assume we don’t understand regressions very well: it is a new methodology. You fit a regression of smoking on mortality to a large community dataset 1, and then you fit the same regression to a large community dataset 2. The coefficients are very similar. You write this up and call your paper “Regression methodology replicates well”.
Now imagine the opposite: the coefficients are very different across the two datasets, and you call your paper “Regression methodology does not replicate”.
Both conclusions are equally absurd. Why? Because you cannot vet the methodology of regression analysis by applying it to two different datasets. After all, the results could be different because the datasets differ, and you might find different results if you look into different data. You need simulation studies for that^{6}.
FWMK published pretty much that paper, except that they fit network models, and not regressions, to two datasets, and then draw conclusions about network methodology. Actually, they kind of did publish that paper, because network models are a bunch of regressions, with a bit of regularization on top.
Irrespective of the results FWMK obtained, conclusions about methodology do not follow from fitting models to two datasets — because you do not know the true model in these datasets, which could differ. To vet methodology, you want to simulate data from a given true model and see if you can estimate it back reliably with your statistical procedure.
This point — vetting methodology requires more than fitting a method to two datasets — is one of the main points we made in our invited rejoinder. Unfortunately, the authors did not address this point in their rebuttal. To see how weird the claim is that network methodology is flawed because it does not replicate across 2 datasets, let’s exchange “network model” with “factor model”, and fit 12 different factor models for the MADRS depression scale established in the prior literature to a new dataset (Quilty et al., 2013). We find that only 1 of these 12 models provides acceptable fit. If FWMK read that paper, would they conclude that “factor models do not replicate” and are “plagued by substantial flaws”? Of course not. They would, like me, conclude that these datasets really seem to differ in the correlation matrices and factor structures. This is not a shortcoming of factor models, but due to differences in data.
The conclusions about network models in general the authors draw from fitting network models to two datasets do not follow. But we can ask the question instead: how well do network models replicate across these two specific datasets FWMK used, similar to asking: how well do factor models replicate across these two specific datasets? This is an interesting empirical question, which fits the outlet of the paper, an applied clinical journal.
Network models — just like factor models — produce parameters, and the question of replicability is how similar these parameters are across the models fit to two datasets. An Ising Model, for instance, has (k * (k-1)) / 2 edge parameters (where k is the number of items)^{7}, so in the case of the FWMK data with 18 items, 153 parameters.
There are many ways to compare these 153 parameters across two models. The quickest (and dirtiest) way is to correlate these parameters. FWMK fit three network models (the Association Network is simply a visualization of the correlation matrix, which we won’t count as a ‘model’ here): Ising Models, Relative Importance Networks, and DAGs. The correlations of parameters are 0.95 for the Ising Model, and 0.98 for the relative importance network^{8}. These correlations are not provided in the original paper, the authors instead report the % of edges of networks models in dataset 1 that are also identified in the models fitted to dataset 2: 86.3% for the Ising Model, 98.3% for the relative importance network^{9}, and 79.4% for the DAGs.
Even if the authors hadn’t made mistakes, and even if we take their results at face value — I fail to understand how FWMK went from their own results to calling their paper “Evidence that psychopathology networks have limited replicability”. In our re-analysis of the data, we used some additional metrics to assess replicability in addition to correlations of parameters and % of edges that replicate, all of which are reported in the main table of our rejoinder.
Of note, we also used the Network Comparison Test, a validated statistical test to compare Ising Models across datasets; you can think about this as being similar to measurement invariance tests in the factor modeling literature^{10}. The result of the test was that no significant difference between the two Ising Models could be detected, which was expected given the very high correlation among parameters, and the very high replicability of individual edges.
Now, I couldn’t stress enough — and we do so in the commentary as well — that the high replicability of network models in these two datasets does not make network models great, or replicable. In fact, it says very little about network models in general — that’s what simulation studies are for — and we conclude in our commentary that the stunning similarity of the network models comes from the skip imputation the authors performed. In my own work on network replicability (4 clinical datasets of patients receiving treatment for PTSD, no skip questions), the similarity of network structures is somewhat lower than in FWMK … but more of that later.
The authors used two datasets that had skip questions for anxiety and depression symptoms. For instance, symptoms 3 to 9 for Major Depression were not coded if people did not have at least symptoms 1 or 2. These missing values are commonly replaced with 0s, which is what FWMK also did. In both datasets. You see where this is going: you induce the same spurious correlations in both datasets and then assess the replicability of statistical models that rely on the (partial) correlations among items. This makes investigating replicability very difficult, because you cannot distinguish the signal in the data from the spurious associations induced by replacing skip-out items by zeros. In the data of FWMK, listwise deletion leads to correlation coefficients of 0.33 in the data^{11}, while zero-imputation leads to a non positive definite correlation matrix, the next positive definite of which features average correlations of 0.95. This is not a plausible correlation matrix.
Now, is it ok to impute skip missing data with 0s? I cannot answer that question here in general. Is it commonplace? Absolutely. Does it make sense to induce spurious correlations in two datasets at the same time when you want to compare how well a statistical model based on item covariances generalizes from one dataset to the other? It is a very big problem. FWMK not only ignored the topic completely in their original paper, but conclude in their rebuttal that “zero-imputation is thus a potential limitation of extant network approaches”. But obviously, factor or IRT models, and even regressions, would have exactly the same issues: if you replace missings on items 3 to 9 by 0s in case people do not have item 1 or 2, you will create spurious dependencies among items 3 to 9 (because they often get 0s together), and you will also create spurious dependencies between items 3 to 9 and items 1 and 2 (because 3 – 9 depend on the presence of 1 and 2). Concluding that “zero-imputation is thus a potential limitation of regression analysis” would be equally silly as the conclusion FWMK draw. The authors’ rebuttal that other network papers in the past have based the estimation of network models on data after zero imputation does not change the fact that they ignored an issue that was discussed already in the very first empirical network paper by the Amsterdam Psychosystems group^{12} and several other papers^{13}, that the strategy altered the correlation among items in their datasets dramatically, and that the strategy introduced the same spurious relationships among the two datasets they wanted to compare. In addition, other researchers who used the approach, like Borsboom & Cramer^{14}, clearly stated: “The emphasis on free availability of data and replicability of the reported analyses occasionally means that the analyses may not be fully appropriate for the data (e.g., when computing partial correlations on dichotomous variables); in these cases, which will be indicated to the reader, the empirical results have the main purpose of illustration rather than interpretation in meaningful substantive terms.” This differs from the devastating conclusions FWMK draw about a whole family of statistical models.
Another point I find important to highlight is that FWMK use a different layout for the networks to show how different they are. To show you why this is a problem, let me give you an example: are the two network models below — that I just made up, they don’t have anything to do with the results of FWMK — the same or not?
They are exactly identical, and here is the code (download the Rdata file here).
library("qgraph")
library("bootnet")
load("data.Rdata")
pdf("blog1.pdf", width=9.5, height=5.5)
layout(t(1:2))
n1<-estimateNetwork(data, default="EBICglasso")
n2<-estimateNetwork(data, default="EBICglasso")
g1<-plot(n1, layout="spring", cut=0)
g2<-plot(n2, layout="spring", cut=0, repulsion=0.00000001)
dev.off()
On the right side, I simply changed the repulsion argument so that nodes would be very far apart: all edges are literally the same weight, both in the model result and graphically. This visualization is uninformative, and it is very similar to giving people two correlation matrices where you change columns and rows and then ask them how similar the matrices are. To enable the comparison of rows and columns of two matrices, nodes need to be in the same place in two networks.
The same holds for network structures. Note that the visual comparison is not very important anyway — we should compare models statistically, not based on visualization, as I've highlighted in a previous blog post and in many recent reviews. But FWMK decided to provide graphs in their paper in a way that is uninformative, so it makes sense to post the updated graphs from our commentary here (click the thumbnail for a larger and more legible version of the networks).
Update November 12: Dr Forbes commented below, strongly pushing back the argument that layouts should be constrained ("ridiculous", "outrageous"). I am honestly surprised, and had anticipated that we could agree on this point after the explanations above. So I will give this yet another try: below is a visualization (provided by Sacha Epskamp) of the adjacency matrices (in the form of heatmaps) that are used as input for the network graphs, across the two datasets. This is just another way to visualize the edge weights of the networks. Not only is it clear that the conclusion of FWMK that networks do not replicate is not warranted — it also shows why a constrained layout is important, and I could honestly not see anybody argue that we should not constrain the layout of these heat maps to ensure the same edges are in the same rows and columns across 2 datasets. They enable comparison, and do not "obscuring the differences"; constraining the layouts of the network graphs is exactly the same point. Click for full size PDF; reproducible codes and data here (thanks Sacha).
The Ising Model was developed in 1917, is very well understood, and has been used in physics, machine learning, artificial intelligence, biometrics, economics, image processing, neural networks, and many other disciplines. Although the implementation to psychology only happened fairly recently^{15}, I find it remarkable that FWMK — after having never worked with the model before, and after fitting it to two datasets — feel confident to conclude that it is "plagued with substantial flaws". I would not have the confidence to use, for instance, machine learning methodology for the first time, and then write a paper confidently attacking a class of well-established statistical models.
If you read the paper more closely, much of their argument surrounds centrality estimates, and so it is worth mentioning that network estimation and network inference — the interpretation of network topology after you estimated it — are different analytic steps and should not be confused^{16}. And while network methodology is worked out fairly well, the inference is indeed a lot more difficult, and I will return to it at the end of the blog where I hope to find common ground with FWMK. There are definitely problems with network inference, and we need ton of thorough investigations to work these out.
In FWMK's paper and rebuttal, there are many instances where the authors confuse methodology with the interpretation of methodology, i.e. inference. If you want to criticize the methodology of regression analysis, go ahead and perform a simulation study. If you want to criticize how people interpret regression coefficients, because you do not think these interpretations follow from the regression model, then models are not the problem, but inference. But these two points are very different things, and while the paper by FWMK is clearly focused on the first (see title, abstract, or general scientific summary that is about methodology, not interpretation), the rebuttal pretends it was about interpretation all along, in several sections, but then, again, concludes that "current psychopathology network methodologies are plagued with substantial flaws".
It does not follow.
Borsboom et al. — our group — published a brief rejoinder to the rebuttal here.
I've made mistakes, and I think over the course of a scientific career, everybody will. And it is really important to highlight that this is not the issue here. The issue is that a team of authors for the first time used a specific class of psychometric models, made major mistakes in the implementation of these models, drew inferences that do not follow from the results, and then, in their rebuttal, instead of clearly identifying and correcting these mistakes, one-upped their original conclusions with even harsher ones.
Interestingly, when you read their rebuttal, you will notice that FWMK don't refute any of the points of our rejoinder. Instead, they develop two new arguments: they cite the second commentary that was written on their paper by Steinley et al. numerous times to support their argument, and bring a new argument to the table: PTSD network replicability.
The commentary of Steinley et al. was not actually a commentary on the paper of FWMK, but a critique of the stability of network models, in which the authors propose a new methodology to vet network models. Sacha Epskamp looked into these models and provided a thorough and reproducible refutation of the methodology here^{17}. To summarize, Steinley et al. simulate data from what they suggest to be a proper null model, a random model, but they actually simulate data from a Rasch Model — which is a fully connected network model, not an empty one. So instead of a flat eigenvalue curve where you have no structure in the correlation matrix, they simulated from a model that leads to one very strong first eigenvalue. Their conclusions, therefore, do not follow, because deviations from their null model are not, as they interpret, "indistinguishable from what would be expected by chance"; chance does not lead to a fully connected network or a Rasch model.
I will reply to the second point in the rebuttal of FWMK, PTSD, later.
It's time to move forward, and make the best of this awkward situation. And I'd argue that this shouldn't be too hard, actually, because FWMK highlight several points in their paper and rebuttal I agree with.
Vetting statistical methodology, and adequate interpretation of statistical parameters, is crucial before drawing substantive (e.g. clinical) inference. FWMK highlight centrality as a problematic parameter that has been thoroughly over-interpreted by some researchers, and I couldn't agree more. All workshops and lectures on centrality I gave in 2017 contain at least one slide on being careful with interpreting centrality, and here is part of a review I wrote 6 weeks ago:
"My main concern is the terminology and conclusions surrounding centrality. Many previous papers weren’t entirely clear about the potential relevance of central symptoms — maybe the authors could invest a bit more work in this. Playing devil’s advocate here, the most central symptom is likely the most difficult to treat, because after turning it “off” (thinking in terms of the binary Ising Model) it would likely be turned on again due to all the connections. This means that many of the authors’ conclusions regarding treatment do not necessarily follow, and I would be much more careful with clinical implications."
This is from one of my papers accepted a few days ago:
"It is important to highlight that centrality does not automatically translate to clinical relevance and that highly central symptoms are not automatically viable intervention targets. Suppose a symptom is central because it is the causal endpoint for many pathways in the data: Intervening on such a product of causality would not lead to any changes in the system. Another possibility is that undirected edges imply feedback loops (i.e. A—B comes from AB), in which case a highly central symptom such as insomnia would feature many of these loops. This would make it an intervention target that would have a strong effect on the network if it succeeded—but an intervention with a low success probability, because feedback loops that lead back into insomnia would turn the symptom ‘on’ again after we switch it ‘off’ in therapy. A third example is that a symptom with the lowest centrality, unconnected to most other symptoms, might still be one of the most important clinical features. No clinician would disregard suicidal ideation or paranoid delusions as unimportant just because they have low centrality values in a network. Another possibility is that a symptom is indeed highly central and causally impacts on many other nodes in the network, but might be very difficult to target in interventions. As discussed in Robinaugh et al. (Robinaugh et al., 2016), “nodes may vary in the extent to which they are amenable to change” (p. 755). In cognitive behavioral therapy, for example, clinicians usually try to reduce negative emotions indirectly by intervening on cognitions and behavior (Barlow, 2007). Finally, a point we discuss in more detail in the limitations, centrality can be biased in case the shared variance between two nodes does not derive from an interaction, but from measuring the same latent variable.
In sum, centrality is a metric that needs to be interpreted with great care, and in the context of what we know about the sample, the network characteristics, and its elements. If we had to put our money on selecting a clinical feature as an intervention target in the absence of all other clinical information, however, choosing the most central node might be a viable heuristic."
And it is true that we see in the analysis of the accuracy of centrality that they are often not estimated very reliably, and I have largely stopped looking into analyzing betweenness and closeness centrality in recent months because they often fail to meet minimal criteria for parameter stability.
This brings us to the accuracy of statistical parameters, and I agree with FWMK that this is a crucial topic. When writing up one of my first network papers in 2015^{18}, I was very unhappy that I could only obtain an order of centrality values, without knowing whether node A was substantially (or significantly, if you want) more central than node B. So Sacha Epskamp and I sat down for months and tried to find solutions to the problem, and we ended up developing what later became bootnet, a package for testing the accuracy of network parameters. Our tutorial paper on network accuracy^{19} was published a few months ago and has already gathered about 60 citations — and I wrote a brief blog on the topic here — which means that many researchers adopted the package quickly because they are interested in the accuracy of network parameters to help them draw proper inference. I think that's a great sign for a field^{20}.
And I have always highlighted that bootnet is definitely not the answer, but a starting point, e.g. in my network analysis workshops in the sections on stability, and that we need more methodologists pick up critical work. So if FWMK suggest to use split-half reliability for network studies instead of bootnet, I think that's an interesting complementary approach, and I would like to see some simulation studies to find out how this method performs when we know the true model.
I also agree with FWMK that we need to be vocal about potential challenges and misinterpretations. But I think that many of us have a pretty good track record here. I do that at least once a month when I review network papers, have written several blog posts to safeguard against misinterpretation of network models and parameters (e.g. don't overinterpret networks visually; don't interpret coefficients without looking at the accuracy of these coefficients), and have written three papers on the topic. The first paper with Angelique Cramer discusses at length 5 challenges to network theory and methodology, and one of these is replicability^{21}. The second paper — with Sacha Epskamp and Denny Borsboom — is a tutorial on estimating the accuracy of network parameters, which contains a section on replicability. For instance, we state that "[t]he current replication crisis in psychology stresses the crucial importance of obtaining robust results, and we want the emerging field of psychopathological networks to start off on the right foot"^{22}. Third, because we have seen some people misinterpret regularization, Sacha Epskamp and I wrote a tutorial paper in which we explain regularization to applied researchers, and tackle some common misconceptions (e.g. conditioning on sum-scores) and problems^{23}. And I'm obviously not the only person who has been critical here. I urge you to read Sacha Epskamp's dissertation, which features several outstanding papers. You will find a thorough, careful discussion of network inference, and both the bootnet paper we wrote together, and the discussion of Sacha's dissertation, deal critically with centrality metrics. Or look at the great critical work by Kirsten Bulteel and colleagues^{24}, or Berend Terluin and colleagues^{25}, or Sinan Guloksuz and colleagues^{26}.
And this goes beyond papers: I also gave talks at numerous conferences urging applied researchers to be careful about interpreting networks, e.g. at APS 2016^{27}; and my workshops always include sections on challenges, limitations, and common misconceptions^{28}. I know this is as lot me me me, but I want to clarify that if the topic is interpretation and application, I really hope we can work together on this instead against each other. I am very much interested in this, and have spent the majority of my last 3 years on working on this. I'd like to be an ally, not an adversary.
FWMK also make an interesting point that I hadn't given much thought before: the overall interpretation of the results of factor models and network models rests on somewhat different parts of the parameter space. For factor models, it is often about the number of factors, and maybe how much variance they can explain in the data, while few researchers would write up in the results section that "factor loading for x1 is substantially larger than factor loading of x2". We do, however, often see interpretations about strongest edges and most central items in network analysis. This means that we can separate local features from global features of the parameter space. Global features are, for instance, the number of factors vs the number of communities, while local features are the specific factor loadings vs the specific centrality estimates. As I highlighted on Twitter a few days back, global features will replicate similarly badly or well for factor and network models, and local features will also replicate similarly badly or well for both, because the models are mathematically equivalent. The only difference here can be due to the precision of parameters, which is a function of number of parameters we estimate; precision will be somewhat lower for networks, which is one reason network models often use regularization techniques. Apart from that, replicability for both types of models will be the same, and we show that the replicability metric FWMK invented to vet local features of network models performs equally badly for local features of factor models in our commentary. This means that any conclusion FWMK draw about network methodology must hold for factor methodology, and this is not an opinion, but based on mathematical proofs.
The rebuttal of FWMK also contains a table on PTSD network papers.
The table, while somewhat incomplete (you can find a full list of all PTSD network papers I know of here), clearly highlights the substantial heterogeneity in the PTSD network literature, which is the very reason that I started a large interdisciplinary multisite PTSD network replicability project end of 2015. In our paper that was accepted in Clinical Psychological Science just this week — I have written up the main results in a blog post here — we estimate networks in 4 clinical datasets of patients receiving treatment for PTSD, which I find to be more informative than community data when it comes to network structures. The data also have no skip problems, and thus circumvent the majority of the problems inherent in the data of FWMK.
There are a ton of other challenges that lie ahead, many of which are not only statistical, but conceptual (again, the methodology really is just fine). If you are interested, I mention numerous papers above that I link to in footnotes. To tackle these issues, it is crucial that both applied and methodological researchers pick up network methodology, distinguish clearly between network theory and network methodology, advance network theory, vet network methodology, think clearly about interpretation, and voice their criticisms.
If possible, however, I would like to do that in cooperation with others, rather than as adversaries. Working together is a lot more fun than writing critical commentaries and blogs. This is consistent with the personal take of Sacha Epskamp on the topic, who published a post-publication review about the target paper discussed here on PubPeer. Like me, Sacha concludes that there are numerous important issues that need to be explored further in the future, highlights critical work he and colleagues conducted — but also points out that the conclusions by FWMK do not follow from the evidence they present.
Sacha Epskamp posted his personal comments on the paper as post-publication review on pubpeer.com.
The post Network models do not replicate … not. appeared first on Psych Networks.
]]>The post Network analysis summer school & workshop materials online appeared first on Psych Networks.
]]>The post Network analysis summer school & workshop materials online appeared first on Psych Networks.
]]>The post The expanded network approach: moving beyond symptoms? appeared first on Psych Networks.
]]>If you are reading this, it is likely that you have heard of the network approach to mental disorders. Since the dawn of medicine, scientists have tried to isolate disorders which give rise to symptoms. Just a few years ago, the network approach to mental disorders flipped this view on its head by asserting that some problems may in fact represent interacting symptoms which give rise to disorders.
This view has been revolutionary, sparking the advent of important new research. However, after several years following the network view, the initial tenets may need to be expanded.
In a recent review of the network approach, Denny Borsboom elucidated four key principles of the initial network view. Principle 2, in particular, states that mental disorders are best represented by networks emerging from interactions among “nodes”, which correspond to symptoms from diagnostic manuals.
It seems evident to clinicians and researchers alike that symptoms do indeed interact with one another, and certainly play a role in the emergence of mental disorders; but can we really blame them for everything?
Part of the problem with using symptoms from diagnostic manuals comes from the imperfect history of psychopathology research. Symptoms from the DSM or other manuals are not God-given – they have been carefully chosen by past researchers, often using consensus as a final criterion. Unfortunately, they were not picked with the network perspective in mind. Instead:
Past researchers often gave precedence to “hallmark” symptoms that were unique to a specific problem, and easily differentiable from other problems (such as anhedonia in depression). This means that diagnostic manuals often ignore symptoms that are shared among many disorders, such as concentration problems and sad mood. Ironically, these “transdiagnostic” features seem to be some of the most important nodes in at least some recent network analyses.
Symptoms are generally only identified if a patient or their family identifies them as problematic. But are we so naïve to think that only the most problematic facets of a phenomenon play a role in that phenomenon? Researchers have known for a long time that many causally important aspects of mental disorders are “ego-syntonic” (that is, they are not immediately distressing to the patient). For instance, a patient suffering from OCD is likely to report uncomfortable obsessions and compulsions as potential problems to fix, but may not realize that his or her cognitive style of intolerance of uncertainty is contributing to the problem. This focus on only the visibly distressing aspects of a disorder (similar to a medical check-up) seems to make intuitive sense for client whose distress brought them into therapy in the first place, but may not provide a complete picture of the true causal structure of the problem.
When imagining a causal system of a mental disorder, we can also hypothesize that there may be some protective factors involved. These potential “brakes” in the system will obviously be missed by symptom manuals, which focus only on the negatives.
Ironically, many symptom manuals were designed in direct opposition to the network view. That is, those designing the measures intentionally tried to pick symptoms which did not interact with one another, for psychometric reasons. For instance, clinicians are often trained in diagnostic interviews to attempt to separate “simple tiredness from sleep issues” from “a different, more profound tiredness associated with depression”, and only count the latter as a symptom. This is obviously problematic if we are attempting to put symptoms into a network, which focuses on the relatedness of symptoms to one another in the emergence of a disorder.
In addition, some manualized symptoms really do seem to be reflective of problems, rather than causally important. For instance, “weight loss” is often included as a symptom of depression. It is easy to see how weight loss could be reflective of other depression symptoms. However, it is somewhat difficult to imagine that weight loss is responsible for causing depression (at least in a majority of people). If networks are supposedly causal systems, it doesn’t make much sense to include this particular symptom.
Recently, our lab published a commentary on Borsboom’s review, proposing an expanded view which may help with some of these problems. Instead of relying wholly on symptoms from diagnostic manuals, we propose that networks should include variables that are plausible causal candidates in the etiology or maintenance of mental disorders. This still includes symptoms, but also includes other variables such as cognitive factors, biological factors, and protective factors.
We aren’t the only ones who have thought along these lines. In Eiko Fried and Angelique Cramer‘s recent comprehensive summary of challenges to the network approach, they discuss problems with the term “symptom” and how focusing primarily on symptoms has led network researchers to ignore important elements of mental disorders. Related, the paper includes a discussion on how more static components that play a role in mental disorders (e.g., an initial trauma that sparked PTSD or personality traits) might be conceptualized from a dynamic systems point of view.
By focusing network analyses only on symptoms, what have we missed? Between the commentary and the review, here are some suggested possibilities: impairment of functioning, information processing biases, maladaptive (or adaptive) schemas, metacognitive beliefs, life events, self-esteem, social interactions, rejection events, physical activities, and substance abuse.
In the meantime, some researchers have already been putting the rubber to the road by including non-symptom nodes in network analyses. Here are some prime examples of nodes in recent empirical publications that follow the expanded network approach:
Are symptoms sufficient for network analyses going forward? Or are other causal candidates also important to include as nodes? The future is up to you!
Written by: Payton Jones (payton_jones@g.harvard.edu) and Haley Elliott (haleyelliott@college.harvard.edu)
The post The expanded network approach: moving beyond symptoms? appeared first on Psych Networks.
]]>The post Talking to Laura Bringmann: ESM data, network models, & an exciting project appeared first on Psych Networks.
]]>Since Laura and colleagues (i.e. Casper Albers, Jojanneke Bastiaansen, and Yoram Kunkels) have started a fascinating project on addressing the very gap that our meeting was about, I decided to use the opportunity to pick Laura’s brain regarding her project. We recorded a 15 minute discussion in which we talk about ESM data, potential advantages of temporal over cross-sectional data, recent methodological advances in network modeling, time-varying network models, and how to choose the best model for the data from the large amount of models that are available now.
Most importantly, Laura introduces the fantastic international and multidisciplinary research project she has been setting up with colleagues. For this project, different groups of researchers independently analyze the same dataset of one patient from the dataset shared by Aaron Fisher that he recently described here in more detail. The goal for each of the research groups is then to provide clinical recommendations for this particular patient based on their results. Laura’s main question is how similar these clinical recommendations are.
You can listen the audio file here:
Laura and colleagues are currently analysing the main results we discuss above, and aim to be able to disseminate the results around spring 2018. We hope you enjoyed the conversation, feel free to leave feedback below. If there is sufficient interest, I might do more of these in the near future on network-related topics.
References
The papers Laura and I talk about are listed below.
People & labs
And here are the labs and people Laura mentioned: the Ilab, the TRANS-ID project, the Idiographic Dynamics Lab of Aaron Fisher, the Utrecht Dynamic Modeling Lab, Peter Molenaar, Jojanneke Bastiaansen, Yoram Kunkels, and Casper Albers, Noémi Schuurman, Denny Borsboom, and Francis Tuerlinckx.
Phew. We probably forgot someone, and apologize profusely already. Let us know!
Credits: the intro song is “Electric Lady Supastar” by Scomber, licensed under Creative Commons Attribution Noncommercial (3.0). The song was not changed or adapted.
The post Talking to Laura Bringmann: ESM data, network models, & an exciting project appeared first on Psych Networks.
]]>