When I see limitations in published network papers that I find sufficiently relevant to address, I discuss the papers here, and try to provide ways forward. For instance, Madhoo & Levine (2016) drew inferences from visually inspecting their networks, which inspired me to write a tutorial blog post on the topic. And when the paper by Afzali et al. (2016) could have benefitted from additional stability analyses, and I wrote a tutorial blog post on network stability.
Today’s blog tackles the replicability of network models, and I will provide my personal take on the topic here. The blog does not reflect the view of my colleagues or co-authors1, and is a very personal tale … full of woe and wonder.
The Journal of Abnormal Psychology just decided to go ahead with publishing a paper although there is at least one known serious error, and a number of major problems. I read this paper, entitled “Evidence that psychopathology networks do not replicate”, the first time about half a year ago (the title changed since), when we were invited by Abnormal to write a commentary. In the paper, Forbes, Wright, Markon and Krueger, from here on FWMK, fit 4 network models to two large community datasets of 18 depression and anxiety symptoms, and investigated whether the network models replicate across the 2 datasets.
“Popular network analysis methods produce unreliable results”, “Psychopathology networks have limited replicability”, “poor utility”, and, later, “current psychopathology network methodologies are plagued with substantial flaws”.
That is devastating, and is worrisome for anybody who has used network models before. So we — Denny Borsboom, Sacha Epskamp, Lourens Waldorp, Sacha Epskamp, Claudia van Borkulo, Han van der Maas, Angelique Cramer, and me — sat down and took a closer look at the paper, and found the following problems. And just to highlight that again, I’m describing the version of the paper we received after it was accepted for publication.
- For the Directed Acyclic Graphs (DAGs), the authors had used the correlation matrix instead of the raw data as input for the R-package bnlearn, leading to nonsensical results.
- For the relative importance networks, FWMK had deleted the strongest edges from the estimated networks without mentioning this in the paper itself. This procedure makes no sense, nobody has done that before, and it is akin to deleting the strongest regression coefficients from a regression, or the strongest factor loadings from a factor model, and not reporting that you do so in the paper.
- The authors fit these relative importance networks that are based linear regressions to binary symptom data, which violates basic distributional assumptions; again, nobody in the literature has done that before.
- For the association networks (simple visualizations of the correlation matrices), they had based their analysis on non positive definitive correlation matrices without acknowledging it in the paper. The correlation matrix was non positive definitive because the authors imputed a ton of missing data based on skip questions with 0s, dramatically distorting the correlation matrix. The distortion is so severe that the average correlation among depression items was 0.33 before imputation (after listwise deletion), and 0.95 after imputation.
- Only 25% of the four models — the Ising Models — were estimated correctly.
This is a remarkable collection of issues for a paper that draws the strong conclusions about methodology I quoted above, and took us a considerable amount of time to identify all by digging through the code of FWMK. We informed the editor that the paper contains a number of serious flaws, and were surprised that Abnormal decided to go along with the publication of the paper. We followed the editor’s invitation to write a commentary, in which we re-analyzed the data and fixed the mistakes we identified; all code and results are available online.
A number of curious things happened next. First, because the editor gave us only a few weeks to reply, and because we wanted to re-analyze the data to make sense of the implausible network structures in the manuscript, we asked the editor to confirm in writing before we started working on the commentary that this is the final accepted version of the manuscript; the editor confirmed it is.
Second, we then found out that we have to apply for one of the two datasets, and pay for it, because the replicability paper by FWMK with very strong claims about a whole family of psychometric models was actually not reproducible2.
Third, while working on our re-analysis, FWMK changed the final accepted version of the manuscript (that we were guaranteed by the editor to be final) not once, but twice3. In total, FWMK fixed the DAG errors, rewrote parts of the paper, changed the title (“Evidence that psychopathology networks have limited replicability”) and the results, but left the discussion and conclusions untouched. The paper was not peer-reviewed after the changes4, and the incorrect estimation of the the relative importance networks persisted5, so did association networks based on non positive definitive correlation matrices, implausible correlations among depression symptoms of 0.95 due to zero-imputation, and the application of linear regression to binary data. And, of course, the devastating conclusions about network methodology in general.
Let’s ignore the fact that the editor refused to give us even one more week to write the commentary, despite the authors changing the paper twice while we were writing up the commentary and re-analyzing all data. And let’s ignore the fact that the editor and the reviewers insisted we cannot call any aspect of the manuscript an “error” or “wrong”, but needed to use words such as “statistical inaccuracy” — while they were happy to have FWMK draw extremely strong conclusions about methodology based on (mis)applying models to two datasets. And let’s also ignore for now that the editor asked specifically us to declare conflicts of interests because we “teach network models” — despite the fact that FWMK also teach (e.g. factor models), despite the fact that Steinley et al. (also invited to comment on FWMK’s paper) also teach network analysis, and despite the fact that it is fairly uncommon to read: “We report severe conflicts of interests regarding the t-tests used in this manuscript … because we teach them to students.”6
In any case, ignoring the very weird review and publication process, the main point here is that the Journal of Abnormal Psychology and FWMK decided to go ahead with publishing a paper that contained significant errors that the editor and the authors were aware of.
In the final paper that was published yesterday …
- … 25% of the models (relative importance networks) are wrong
- … 25% of the models (the association networks) are based on non positive definitive correlation matrices with highly implausible values
- … 25% of the models (DAGs) were re-estimated after final acceptance of the paper, and the respective sections rewritten, without peer-review
- … and 25% of the models (Ising Models) do not contain the only validated statistical test for assessing replicability (the Network Comparison Test); using this test in their data leads to the opposite conclusion FWMK draw.
These facts are not available to readers of the paper.
I haven’t addressed the biggest problem yet. Assume we don’t understand regressions very well: it is a new methodology. You fit a regression of smoking on mortality to a large community dataset 1, and then you fit the same regression to a large community dataset 2. The coefficients are very similar. You write this up and call your paper “Regression methodology replicates well”.
Now imagine the opposite: the coefficients are very different across the two datasets, and you call your paper “Regression methodology does not replicate”.
Both conclusions are equally absurd. Why? Because you cannot vet the methodology of regression analysis by applying it to two different datasets. After all, the results could be different because the datasets differ, and you might find different results if you look into different data. You need simulation studies for that7.
FWMK published pretty much that paper, except that they fit network models, and not regressions, to two datasets, and then draw conclusions about network methodology. Actually, they kind of did publish that paper, because network models are a bunch of regressions, with a bit of regularization on top.
Irrespective of the results FWMK obtained, conclusions about methodology do not follow from fitting models to two datasets — because you do not know the true model in these datasets, which could differ. To vet methodology, you want to simulate data from a given true model and see if you can estimate it back reliably with your statistical procedure.
This point — vetting methodology requires more than fitting a method to two datasets — is one of the main points we made in our invited rejoinder. Unfortunately, the authors did not address this point in their rebuttal. To see how weird the claim is that network methodology is flawed because it does not replicate across 2 datasets, let’s exchange “network model” with “factor model”, and fit 12 different factor models for the MADRS depression scale established in the prior literature to a new dataset (Quilty et al., 2013). We find that only 1 of these 12 models provides acceptable fit. If FWMK read that paper, would they conclude that “factor models do not replicate” and are “plagued by substantial flaws”? Of course not. They would, like me, conclude that these datasets really seem to differ in the correlation matrices and factor structures. This is not a shortcoming of factor models, but due to differences in data.
Do networks replicate or not?
The conclusions about network models in general the authors draw from fitting network models to two datasets do not follow. But we can ask the question instead: how well do network models replicate across these two specific datasets FWMK used, similar to asking: how well do factor models replicate across these two specific datasets? This is an interesting empirical question, which fits the outlet of the paper, an applied clinical journal.
Network models — just like factor models — produce parameters, and the question of replicability is how similar these parameters are across the models fit to two datasets. An Ising Model, for instance, has (k * (k-1)) / 2 edge parameters (where k is the number of items)8, so in the case of the FWMK data with 18 items, 153 parameters.
There are many ways to compare these 153 parameters across two models. The quickest (and dirtiest) way is to correlate these parameters. FWMK fit three network models (the Association Network is simply a visualization of the correlation matrix, which we won’t count as a ‘model’ here): Ising Models, Relative Importance Networks, and DAGs. The correlations of parameters are 0.95 for the Ising Model, and 0.98 for the relative importance network9. These correlations are not provided in the original paper, the authors instead report the % of edges of networks models in dataset 1 that are also identified in the models fitted to dataset 2: 86.3% for the Ising Model, 98.3% for the relative importance network10, and 79.4% for the DAGs.
Even if the authors hadn’t made mistakes, and even if we take their results at face value — I fail to understand how FWMK went from their own results to calling their paper “Evidence that psychopathology networks have limited replicability”. In our re-analysis of the data, we used some additional metrics to assess replicability in addition to correlations of parameters and % of edges that replicate, all of which are reported in the main table of our rejoinder.
Of note, we also used the Network Comparison Test, a validated statistical test to compare Ising Models across datasets; you can think about this as being similar to measurement invariance tests in the factor modeling literature11. The result of the test was that no significant difference between the two Ising Models could be detected, which was expected given the very high correlation among parameters, and the very high replicability of individual edges.
Now, I couldn’t stress enough — and we do so in the commentary as well — that the high replicability of network models in these two datasets does not make network models great, or replicable. In fact, it says very little about network models in general — that’s what simulation studies are for — and we conclude in our commentary that the stunning similarity of the network models comes from the skip imputation the authors performed. In my own work on network replicability (4 clinical datasets of patients receiving treatment for PTSD, no skip questions), the similarity of network structures is somewhat lower than in FWMK … but more of that later.
The authors used two datasets that had skip questions for anxiety and depression symptoms. For instance, symptoms 3 to 9 for Major Depression were not coded if people did not have at least symptoms 1 or 2. These missing values are commonly replaced with 0s, which is what FWMK also did. In both datasets. You see where this is going: you induce the same spurious correlations in both datasets and then assess the replicability of statistical models that rely on the (partial) correlations among items. This makes investigating replicability very difficult, because you cannot distinguish the signal in the data from the spurious associations induced by replacing skip-out items by zeros. In the data of FWMK, listwise deletion leads to correlation coefficients of 0.33 in the data12, while zero-imputation leads to a non positive definite correlation matrix, the next positive definite of which features average correlations of 0.95. This is not a plausible correlation matrix.
Now, is it ok to impute skip missing data with 0s? I cannot answer that question here in general. Is it commonplace? Absolutely. Does it make sense to induce spurious correlations in two datasets at the same time when you want to compare how well a statistical model based on item covariances generalizes from one dataset to the other? It is a very big problem. FWMK not only ignored the topic completely in their original paper, but conclude in their rebuttal that “zero-imputation is thus a potential limitation of extant network approaches”. But obviously, factor or IRT models, and even regressions, would have exactly the same issues: if you replace missings on items 3 to 9 by 0s in case people do not have item 1 or 2, you will create spurious dependencies among items 3 to 9 (because they often get 0s together), and you will also create spurious dependencies between items 3 to 9 and items 1 and 2 (because 3 – 9 depend on the presence of 1 and 2). Concluding that “zero-imputation is thus a potential limitation of regression analysis” would be equally silly as the conclusion FWMK draw. The authors’ rebuttal that other network papers in the past have based the estimation of network models on data after zero imputation does not change the fact that they ignored an issue that was discussed already in the very first empirical network paper by the Amsterdam Psychosystems group13 and several other papers14, that the strategy altered the correlation among items in their datasets dramatically, and that the strategy introduced the same spurious relationships among the two datasets they wanted to compare. In addition, other researchers who used the approach, like Borsboom & Cramer15, clearly stated: “The emphasis on free availability of data and replicability of the reported analyses occasionally means that the analyses may not be fully appropriate for the data (e.g., when computing partial correlations on dichotomous variables); in these cases, which will be indicated to the reader, the empirical results have the main purpose of illustration rather than interpretation in meaningful substantive terms.” This differs from the devastating conclusions FWMK draw about a whole family of statistical models.
Another point I find important to highlight is that FWMK use a different layout for the networks to show how different they are. To show you why this is a problem, let me give you an example: are the two network models below — that I just made up, they don’t have anything to do with the results of FWMK — the same or not?
They are exactly identical, and here is the code (download the Rdata file here).
library("qgraph") library("bootnet") load("data.Rdata") pdf("blog1.pdf", width=9.5, height=5.5) layout(t(1:2)) n1<-estimateNetwork(data, default="EBICglasso") n2<-estimateNetwork(data, default="EBICglasso") g1<-plot(n1, layout="spring", cut=0) g2<-plot(n2, layout="spring", cut=0, repulsion=0.00000001) dev.off()
On the right side, I simply changed the repulsion argument so that nodes would be very far apart: all edges are literally the same weight, both in the model result and graphically. This visualization is uninformative, and it is very similar to giving people two correlation matrices where you change columns and rows and then ask them how similar the matrices are. To enable the comparison of rows and columns of two matrices, nodes need to be in the same place in two networks.
The same holds for network structures. Note that the visual comparison is not very important anyway — we should compare models statistically, not based on visualization, as I’ve highlighted in a previous blog post and in many recent reviews. But FWMK decided to provide graphs in their paper in a way that is uninformative, so it makes sense to post the updated graphs from our commentary here (click the thumbnail for a larger and more legible version of the networks).
Update November 12: Dr Forbes commented below, strongly pushing back the argument that layouts should be constrained (“ridiculous”, “outrageous”). I am honestly surprised, and had anticipated that we could agree on this point after the explanations above. So I will give this yet another try: below is a visualization (provided by Sacha Epskamp) of the adjacency matrices (in the form of heatmaps) that are used as input for the network graphs, across the two datasets. This is just another way to visualize the edge weights of the networks. Not only is it clear that the conclusion of FWMK that networks do not replicate is not warranted — it also shows why a constrained layout is important, and I could honestly not see anybody argue that we should not constrain the layout of these heat maps to ensure the same edges are in the same rows and columns across 2 datasets. They enable comparison, and do not “obscuring the differences”; constraining the layouts of the network graphs is exactly the same point. Click for full size PDF; reproducible codes and data here (thanks Sacha).
Network estimation versus network inference
The Ising Model was developed in 1917, is very well understood, and has been used in physics, machine learning, artificial intelligence, biometrics, economics, image processing, neural networks, and many other disciplines. Although the implementation to psychology only happened fairly recently16, I find it remarkable that FWMK — after having never worked with the model before, and after fitting it to two datasets — feel confident to conclude that it is “plagued with substantial flaws”. I would not have the confidence to use, for instance, machine learning methodology for the first time, and then write a paper confidently attacking a class of well-established statistical models.
If you read the paper more closely, much of their argument surrounds centrality estimates, and so it is worth mentioning that network estimation and network inference — the interpretation of network topology after you estimated it — are different analytic steps and should not be confused17. And while network methodology is worked out fairly well, the inference is indeed a lot more difficult, and I will return to it at the end of the blog where I hope to find common ground with FWMK. There are definitely problems with network inference, and we need ton of thorough investigations to work these out.
In FWMK’s paper and rebuttal, there are many instances where the authors confuse methodology with the interpretation of methodology, i.e. inference. If you want to criticize the methodology of regression analysis, go ahead and perform a simulation study. If you want to criticize how people interpret regression coefficients, because you do not think these interpretations follow from the regression model, then models are not the problem, but inference. But these two points are very different things, and while the paper by FWMK is clearly focused on the first (see title, abstract, or general scientific summary that is about methodology, not interpretation), the rebuttal pretends it was about interpretation all along, in several sections, but then, again, concludes that “current psychopathology network methodologies are plagued with substantial flaws”.
It does not follow.
Borsboom et al. — our group — published a brief rejoinder to the rebuttal here.
Mistakes are normal and can happen to everybody
I’ve made mistakes, and I think over the course of a scientific career, everybody will. And it is really important to highlight that this is not the issue here. The issue is that a team of authors for the first time used a specific class of psychometric models, made major mistakes in the implementation of these models, drew inferences that do not follow from the results, and then, in their rebuttal, instead of clearly identifying and correcting these mistakes, one-upped their original conclusions with even harsher ones.
Interestingly, when you read their rebuttal, you will notice that FWMK don’t refute any of the points of our rejoinder. Instead, they develop two new arguments: they cite the second commentary that was written on their paper by Steinley et al. numerous times to support their argument, and bring a new argument to the table: PTSD network replicability.
The commentary of Steinley et al. was not actually a commentary on the paper of FWMK, but a critique of the stability of network models, in which the authors propose a new methodology to vet network models. Sacha Epskamp looked into these models and provided a thorough and reproducible refutation of the methodology here18. To summarize, Steinley et al. simulate data from what they suggest to be a proper null model, a random model, but they actually simulate data from a Rasch Model — which is a fully connected network model, not an empty one. So instead of a flat eigenvalue curve where you have no structure in the correlation matrix, they simulated from a model that leads to one very strong first eigenvalue. Their conclusions, therefore, do not follow, because deviations from their null model are not, as they interpret, “indistinguishable from what would be expected by chance”; chance does not lead to a fully connected network or a Rasch model.
I will reply to the second point in the rebuttal of FWMK, PTSD, later.
It’s time to move forward, and make the best of this awkward situation. And I’d argue that this shouldn’t be too hard, actually, because FWMK highlight several points in their paper and rebuttal I agree with.
Vetting statistical methodology, and adequate interpretation of statistical parameters, is crucial before drawing substantive (e.g. clinical) inference. FWMK highlight centrality as a problematic parameter that has been thoroughly over-interpreted by some researchers, and I couldn’t agree more. All workshops and lectures on centrality I gave in 2017 contain at least one slide on being careful with interpreting centrality, and here is part of a review I wrote 6 weeks ago:
“My main concern is the terminology and conclusions surrounding centrality. Many previous papers weren’t entirely clear about the potential relevance of central symptoms — maybe the authors could invest a bit more work in this. Playing devil’s advocate here, the most central symptom is likely the most difficult to treat, because after turning it “off” (thinking in terms of the binary Ising Model) it would likely be turned on again due to all the connections. This means that many of the authors’ conclusions regarding treatment do not necessarily follow, and I would be much more careful with clinical implications.”
This is from one of my papers accepted a few days ago:
“It is important to highlight that centrality does not automatically translate to clinical relevance and that highly central symptoms are not automatically viable intervention targets. Suppose a symptom is central because it is the causal endpoint for many pathways in the data: Intervening on such a product of causality would not lead to any changes in the system. Another possibility is that undirected edges imply feedback loops (i.e. A—B comes from A↔B), in which case a highly central symptom such as insomnia would feature many of these loops. This would make it an intervention target that would have a strong effect on the network if it succeeded—but an intervention with a low success probability, because feedback loops that lead back into insomnia would turn the symptom ‘on’ again after we switch it ‘off’ in therapy. A third example is that a symptom with the lowest centrality, unconnected to most other symptoms, might still be one of the most important clinical features. No clinician would disregard suicidal ideation or paranoid delusions as unimportant just because they have low centrality values in a network. Another possibility is that a symptom is indeed highly central and causally impacts on many other nodes in the network, but might be very difficult to target in interventions. As discussed in Robinaugh et al. (Robinaugh et al., 2016), “nodes may vary in the extent to which they are amenable to change” (p. 755). In cognitive behavioral therapy, for example, clinicians usually try to reduce negative emotions indirectly by intervening on cognitions and behavior (Barlow, 2007). Finally, a point we discuss in more detail in the limitations, centrality can be biased in case the shared variance between two nodes does not derive from an interaction, but from measuring the same latent variable.
In sum, centrality is a metric that needs to be interpreted with great care, and in the context of what we know about the sample, the network characteristics, and its elements. If we had to put our money on selecting a clinical feature as an intervention target in the absence of all other clinical information, however, choosing the most central node might be a viable heuristic.”
And it is true that we see in the analysis of the accuracy of centrality that they are often not estimated very reliably, and I have largely stopped looking into analyzing betweenness and closeness centrality in recent months because they often fail to meet minimal criteria for parameter stability.
This brings us to the accuracy of statistical parameters, and I agree with FWMK that this is a crucial topic. When writing up one of my first network papers in 201519, I was very unhappy that I could only obtain an order of centrality values, without knowing whether node A was substantially (or significantly, if you want) more central than node B. So Sacha Epskamp and I sat down for months and tried to find solutions to the problem, and we ended up developing what later became bootnet, a package for testing the accuracy of network parameters. Our tutorial paper on network accuracy20 was published a few months ago and has already gathered about 60 citations — and I wrote a brief blog on the topic here — which means that many researchers adopted the package quickly because they are interested in the accuracy of network parameters to help them draw proper inference. I think that’s a great sign for a field21.
And I have always highlighted that bootnet is definitely not the answer™, but a starting point, e.g. in my network analysis workshops in the sections on stability, and that we need more methodologists pick up critical work. So if FWMK suggest to use split-half reliability for network studies instead of bootnet, I think that’s an interesting complementary approach, and I would like to see some simulation studies to find out how this method performs when we know the true model.
I also agree with FWMK that we need to be vocal about potential challenges and misinterpretations. But I think that many of us have a pretty good track record here. I do that at least once a month when I review network papers, have written several blog posts to safeguard against misinterpretation of network models and parameters (e.g. don’t overinterpret networks visually; don’t interpret coefficients without looking at the accuracy of these coefficients), and have written three papers on the topic. The first paper with Angelique Cramer discusses at length 5 challenges to network theory and methodology, and one of these is replicability22. The second paper — with Sacha Epskamp and Denny Borsboom — is a tutorial on estimating the accuracy of network parameters, which contains a section on replicability. For instance, we state that “[t]he current replication crisis in psychology stresses the crucial importance of obtaining robust results, and we want the emerging field of psychopathological networks to start off on the right foot”23. Third, because we have seen some people misinterpret regularization, Sacha Epskamp and I wrote a tutorial paper in which we explain regularization to applied researchers, and tackle some common misconceptions (e.g. conditioning on sum-scores) and problems24. And I’m obviously not the only person who has been critical here. I urge you to read Sacha Epskamp’s dissertation, which features several outstanding papers. You will find a thorough, careful discussion of network inference, and both the bootnet paper we wrote together, and the discussion of Sacha’s dissertation, deal critically with centrality metrics. Or look at the great critical work by Kirsten Bulteel and colleagues25, or Berend Terluin and colleagues26, or Sinan Guloksuz and colleagues27.
And this goes beyond papers: I also gave talks at numerous conferences urging applied researchers to be careful about interpreting networks, e.g. at APS 201628; and my workshops always include sections on challenges, limitations, and common misconceptions29. I know this is as lot me me me, but I want to clarify that if the topic is interpretation and application, I really hope we can work together on this instead against each other. I am very much interested in this, and have spent the majority of my last 3 years on working on this. I’d like to be an ally, not an adversary.
FWMK also make an interesting point that I hadn’t given much thought before: the overall interpretation of the results of factor models and network models rests on somewhat different parts of the parameter space. For factor models, it is often about the number of factors, and maybe how much variance they can explain in the data, while few researchers would write up in the results section that “factor loading for x1 is substantially larger than factor loading of x2”. We do, however, often see interpretations about strongest edges and most central items in network analysis. This means that we can separate local features from global features of the parameter space. Global features are, for instance, the number of factors vs the number of communities, while local features are the specific factor loadings vs the specific centrality estimates. As I highlighted on Twitter a few days back, global features will replicate similarly badly or well for factor and network models, and local features will also replicate similarly badly or well for both, because the models are mathematically equivalent. The only difference here can be due to the precision of parameters, which is a function of number of parameters we estimate; precision will be somewhat lower for networks, which is one reason network models often use regularization techniques. Apart from that, replicability for both types of models will be the same, and we show that the replicability metric FWMK invented to vet local features of network models performs equally badly for local features of factor models in our commentary. This means that any conclusion FWMK draw about network methodology must hold for factor methodology, and this is not an opinion, but based on mathematical proofs.
The rebuttal of FWMK also contains a table on PTSD network papers.
The table, while somewhat incomplete (you can find a full list of all PTSD network papers I know of here), clearly highlights the substantial heterogeneity in the PTSD network literature, which is the very reason that I started a large interdisciplinary multisite PTSD network replicability project end of 2015. In our paper that was accepted in Clinical Psychological Science just this week — I have written up the main results in a blog post here — we estimate networks in 4 clinical datasets of patients receiving treatment for PTSD, which I find to be more informative than community data when it comes to network structures. The data also have no skip problems, and thus circumvent the majority of the problems inherent in the data of FWMK.
There are a ton of other challenges that lie ahead, many of which are not only statistical, but conceptual (again, the methodology really is just fine). If you are interested, I mention numerous papers above that I link to in footnotes. To tackle these issues, it is crucial that both applied and methodological researchers pick up network methodology, distinguish clearly between network theory and network methodology, advance network theory, vet network methodology, think clearly about interpretation, and voice their criticisms.
If possible, however, I would like to do that in cooperation with others, rather than as adversaries. Working together is a lot more fun than writing critical commentaries and blogs. This is consistent with the personal take of Sacha Epskamp on the topic, who published a post-publication review about the target paper discussed here on PubPeer. Like me, Sacha concludes that there are numerous important issues that need to be explored further in the future, highlights critical work he and colleagues conducted — but also points out that the conclusions by FWMK do not follow from the evidence they present.
Update December 04 2017
Sacha Epskamp posted his personal comments on the paper as post-publication review on pubpeer.com.
- Or maybe it does, in any case I wrote it and I’m responsible for the content. Sacha Epskamp, whom I sent the draft prior to publishing, said he’d want to sign it, so I mention his name here specifically.
- The authors did share their code, and were very responsive to our emails, which is what enabled us to identify the problems.
- March 4th 2019: I had uploaded the 3 different final versions here originally, but just found out that Google Scholar indexed them, which leads to confusion about what the final version of the paper is. I have therefore removed the links to prior versions, which are from now on available per request
- FWMK brought to my attention that 2 editors looked over the paper after these changes, however.
- In their rebuttal, the authors state that readers should consider the relative importance networks from our reanalysis instead of those provided in their original paper.
- We obviously also teach factor models, and regressions, and Bayesian statistics, and t-tests … as you can see, we have extremely severe conflicts here.
- I mention simulations several times in this blog; mathematical derivations are preferred, of course, but since I can hardly spell the word ‘math’, I prefer simulation studies because I understand what they can and cannot do.
- There are also k threshold parameters, but let’s ignore them here to keep things simple.
- Correlations don’t work for DAGs, see our commentary.
- FWMK concluded 74.2% originally, but due to an error in their implementation of the methods, they concede in their rebuttal that our estimate is the correct one
- This test has been developed by Claudia van Borkulo and colleagues for the Ising Model, the network most commonly used so far in the literature. This test is very conservative and tests whether all edges across two networks are exactly identical; van Borkulo, C. D., Boschloo, L., Kossakowski, J. J., Tio, P., Schoevers, R. A., Borsboom, D., … Boschloo, L. (2017). Comparing network structures on three aspects. http://doi.org/10.13140/RG.2.2.29455.38569
- Note that I wouldn’t argue listwise deletion is the way to go here, but the differences between the two methods of dealing with missing data are astonishing.
- See footnotes 6-10 in Cramer, A. O. J., Waldorp, L. J., van der Maas, H. L. J., & Borsboom, D. (2010). Comorbidity: a network perspective. The Behavioral and Brain Sciences, 33(2–3), 137–50. http://doi.org/10.1017/S0140525X09991567
- e.g. Boschloo, L., van Borkulo, C. D., Rhemtulla, M., Keyes, K. M., Borsboom, D., & Schoevers, R. A. (2015). The Network Structure of Symptoms of the Diagnostic and Statistical Manual of Mental Disorders. Plos One, 10(9), e0137621. http://doi.org/10.1371/journal.pone.0137621; the paper contains contains a section entitled “Sensitivity analyses (dealing with skip-related missingness)”, in Boschloo et al. state: “As implied by the skip logic, the skip-related missing values on the non-screening questions were imputed with zeros, indicating absence. This imputation strategy may have artificially induced strong connections within diagnoses and weak or absent connections between diagnoses”.
- Borsboom, D., & Cramer, A. O. J. (2013). Network analysis: an integrative approach to the structure of psychopathology. Annual Review of Clinical Psychology, 9, 91–121. http://doi.org/10.1146/annurev-clinpsy-050212-185608
- van Borkulo, C. D., Borsboom, D., Epskamp, S., Blanken, T. F., Boschloo, L., Schoevers, R. A., & Waldorp, L. J. (2014). A new method for constructing networks from binary data. Scientific Reports, 4(5918), 1–10. http://doi.org/10.1038/srep05918
- See e.g. this introductory tutorial: Epskamp, S., Borsboom, D., & Fried, E. I. (2017). Estimating Psychological Networks and their Accuracy: A Tutorial Paper. Behavior Research Methods, 1–34. http://doi.org/10.3758/s13428-017-0862-1
- Preprint updated March 9th 2018
- Fried, E. I., Epskamp, S., Nesse, R. M., Tuerlinckx, F., & Borsboom, D. (2016). What are “good” depression symptoms? Comparing the centrality of DSM and non-DSM symptoms of depression in a network analysis. Journal of Affective Disorders, 189, 314–320. http://doi.org/10.1016/j.jad.2015.09.005.
- Epskamp, S., Borsboom, D., & Fried, E. I. (2017). Estimating Psychological Networks and their Accuracy: A Tutorial Paper. Behavior Research Methods, 1–34. http://doi.org/10.3758/s13428-017-0862-1
- You know who didn’t use bootnet for testing stability of their analysis? FMWK ;) …
- Fried, E. I., & Cramer, A. O. J. (2017). Moving forward: challenges and directions for psychopathological network theory and methodology. Perspectives on Psychological Science, 1–22. http://doi.org/10.1177/1745691617705892]
- Epskamp, S., Borsboom, D., & Fried, E. I. (2017). Estimating Psychological Networks and their Accuracy: A Tutorial Paper. Behavior Research Methods, 1–34. http://doi.org/10.3758/s13428-017-0862-1
- Epskamp, S., & Fried, E. I. (2017). A Tutorial on Regularized Partial Correlation Networks. Accepted in Psychological Methods. https://arxiv.org/abs/1607.01367
- Bulteel, K., Tuerlinckx, F., Brose, A., & Ceulemans, E. (2016). Using raw VAR regression coefficients to build networks can be misleading. Multivariate Behavioral Research. See also this really cool preprint!
- Terluin, B., de Boer, M. R., & de Vet, H. C. W. (2016). Differences in Connection Strength between Mental Symptoms Might Be Explained by Differences in Variance: Reanalysis of Network Data Did Not Confirm Staging. Plos One, 11(11), e0155205. http://doi.org/10.1371/journal.pone.0155205; I also wrote a blog summarizing the paper.
- Guloksuz, S., Pries, L., & Van Os, J. (2017). Application of network methods for understanding mental disorders: Pitfalls and promise. Psychological Medicine, 1-10. doi:10.1017/S0033291717001350.
- “How to increase robustness and replicability in psychopathological network research”.
- Here all materials for the September 2017 workshop I gave in Madrid.