The post Fixed-margin sampling & networks: New commentary on network replicability appeared first on Psych Networks.

]]>

The methodological journal *Multivariate Behavioral Research* just published our latest contribution to the debate surrounding the replicability of psychological networks (the pre-print and codes were already available on OSF). To recap, last year, the *Journal of Abnormal Psychology* published a series of four papers:

- A paper claiming networks have limited replicability (Forbes, Wright, Markon, & Krueger, 2017a)
- Our commentary on this paper showing the networks to replicate well in a comprehensive re-analysis
^{1}(Borsboom et al., 2017) - A commentary by Steinley, Hoffman, Brusco, & Sher (2017) introducing a new method and claiming networks do not differ from what is expected by chance, supporting arguments of Forbes et al.
- And a rebuttal of the original authors (Forbes, Wright, Markon, & Krueger, 2017b), relying on the work by Steinley et al. (2017) as well as a literature review in PTSD networks to present further evidence that networks have limited replicability.

Papers 1, 2, and in some extent 4 have already extensively been discussed online, and I will not discuss them in detail again here. The Psychosystems group posted a short statement on its blog, Eiko posted a longer blog on the whole process, and I posted a public post-publication peer review on pubpeer (the original authors responded to these, so make sure to read their comments as well as ours to get a fair and balanced overview). We mentioned working on a (critical) commentary on paper 3 in these discussions as well.

In an unprecedented display of scientific integrity, Douglas Steinley himself invited us in response to submit this commentary to the prestigious methodological journal *Multivariate Behavioral Research* instead of the *Journal of Abnormal Psychology*, which we happily accepted. This brings me to the topic at hand. In this blog post, I will summarize the two main points of our commentary, showing that the conclusions made in paper 3 are unwarranted. Next, I will showcase an example not discussed in our commentary, in which the proposed methodology has strong utility, by re-analyzing a 10-year old network of the DSM-IV-TR.

The commentary by Steinley et al. (2017) (paper 3) introduces a new method for creating *“confidence intervals” *from network parameters and descriptives*.* We term this method “fixed-margin sampling” as it entails generating new random binary datasets while keeping the margins (row and column totals) intact. These sampled datasets can subsequently be used to create intervals for any statistic. Using this method, the authors conclude that *“many of the results are indistinguishable from what would be expected by chance”*, labeling such findings *“uninteresting”*, and suggesting that* “previously published findings using *[eLasso]* should be reevaluated using the above testing procedure.”* Forbes et al. (2017b) re-iterate the last statement in paper 4:* “this finding highlights the central role that Steinley et al.’s (2017) proposed method should have in psychopathology network research going forward.”*

In our new commentary in *Multivariate Behavioral Research*, we show that the work of Steinley et al. (2017) relies on a misinterpretation of psychological networks. The crux of the matter lies in this paragraph:

“Clearly, psychopathology networks fall into the class of affiliation matrices where the connections are measured between observation and diagnostic criteria. The relationships between the criteria are then then derived by transforming the two-mode affiliation matrix to a one-mode so-called “overlap/similarity” matrix between the criteria, where traditional network methods are applied to this overlap/ similarity matrix.”

Steinley et al. (2017) interpret Ising models used in psychology as one-mode projections of so-called two-mode or bipartite graphs. That means that they interpret a standard person by symptom data matrix:

Depressed mood | Fatigue | |

Bob | 1 | 1 |

Alice | 1 | 0 |

To actually encode a network:

Depressed mood — Bob — Fatigue — Alice

Of which the symptom by symptom network is a so-called *projection*:

Depressed mood – Fatigue

That is, depressed mood and fatigue interact with one-another because they share one person: Bob. Similarly, Bob and Alice interact with one-another because they share one symptom: fatigue. But this is not the intention of the Ising model, which is a model for conditional independencies. In fact, one core assumption in many multivariate statistical models is that the cases (Bob and Alice) are *independent*, which means they do not interact with one-another because they share a symptom. The symptom fatigue is also a different property of Alice and Bob, and not an entity in the world they both interact with.

While keeping the column totals intact has little to no effect in generating such data, keeping the row totals (in this case: number of symptoms per person) intact has a striking effect; it leads to highly one-dimensional models used as null-distribution:

This means that due to latent variable – network equivalences, fixed-margin sampling takes a fully connected network model as null-distribution to test estimated network models. Such a procedure will lead to false conclusions on the importance of estimated network parameters. We show in our commentary that the method performs poorly in classifying true effects as interesting and false effects as uninteresting.

Fixed-margin sampling generates data under a particular kind of unidimensionality: a model in which each item is interchangeable (Verhelst, 2008). Such a model is also known as the Rasch model. As the DSM classification of disorders typically treats symptoms as interchangeable, it is interesting to see how well combining fixed-margin sampling with the eLasso Ising estimation method performs as a non-parametric test for the Rasch model. This may be worthwhile, as it would give us insight in where the data diverge from the Rasch model and thus alternative explanations are warranted (although not required). We investigated this in two simulation studies. In one simulation study, we simulated data under the following model:

By varying the C parameter (correlation between factors), we can change the model from two independent variables (C = 0) to one latent variable (C = 1), and by increasing the R parameter (residual effect), we can add two violations of the one- or two-factor model. The results are as followed:

The colored areas in the background show the probability to flag the edges related to parameter R as not being in line with the Rasch model. It shows that the method works very well in detecting these local violations of the Rasch model. The boxplots show global departures and should be high if all edges are flagged as departures from the Rasch model. This should be the case in the C = 0 condition but doesn’t happen often. This shows that while this method is powerful in detecting local departures from the Rasch model, it is far less powerful in detecting global departures form the Rasch model. As such, I would recommend using this method to gain insight in where unidimensionality does not hold, but not to use it as a test for the Rasch model itself by counting the number of flagged edges.

While fixed-margin sampling should not be used to assess psychometric networks that are based on estimating statistical models from large sample sizes of independent cases (e.g., the Ising model), the method has strong utility in the analysis of one-mode network structures that are derived from bipartite graphs. One such a network is actually the first network I ever constructed and analyzed: the DSM-IV-TR network (Borsboom, Cramer, Schmittmann, Epskamp, & Waldorp, 2011):

I worked on this network about 10 years ago as an undergraduate student, long before we even entertained the notion of estimating network models from data. All the codes and data used for the network visualizations are still online. To create this network, we created an affiliation matrix of 439 symptoms by 148 disorders, encoding if a symptom was listed as a symptom of a disorder in the DSM-IV-TR. The data simply is a 439 row and 148 column matrix with 0 indicating a symptom is not listed in a disorder and 1 indicating a symptom is listed in a disorder. This dataset can subsequently be transformed to a 439 by 439 adjacency matrix encoding if symptoms are both listed in at least one shared disorder by multiplying the data with it’s transpose and making every non-zero element one^{2}.

While the dataset used for this network looks similar to a dataset you may use when estimating an Ising model (zeroes and ones), it is actually a very different kind of data. In an Ising model, the more cases we add the more precise our estimates of the network model: if we double the sample size from 10.000 to 20.000 we would not expect a completely different model, merely to be able to estimate the parameters even more precise. In the DSM-IV-TR affiliation matrix, however, this is not the case: doubling the number of symptoms listed will fundamentally change the interpretation of the model (doubling the number of nodes), and doubling the number of disorders listed will fundamentally change the structure of the network. We also cannot do this, as we already listed all symptoms and disorders from the DSM-IV-TR. Rather than columns representing random stochastic variables and rows representing independent realizations, the columns and rows both represent simply static entities: words in a book. The network structure is simply a description of this book, and equivalent to adding more cases would be to test more books (e.g., Tio, Epskamp, Noordhof, & Borsboom, 2016).

This means we also cannot bootstrap the dataset, as resampling symptoms with replacement or dropping symptoms hardly makes sense. So what can we do? The fixed-margin sampling method described by Steinley et al. (2017) actually gives a very nice new tool to investigating such structures. Given that some symptoms are listed in many disorders (e.g., insomnia is listed in 17 disorders), and some disorders feature many symptoms (e.g., Schizoaffective Disorder lists 33 symptoms), we would expect certain levels of connectivity by chance alone. If that is the case, the network structure itself is not very *interesting*, and investigating the symptom and disorder sum totals would be sufficient by itself.

I re-investigated the dataset using fixed-margin sampling and constructed 1,000 networks (codes available here). These are three random samples of the generated networks:

In this case, there is no need for any quantitative analysis and the plots themselves already reveal a remarkable difference between the networks expected by chance alone and the network observed in the DSM-IV-TR: the fixed-margin sampling networks are far denser (more edges) and interconnected. This means that we can conclude that there is structure in the DSM-IV-TR, and symptoms are not randomly assigned to disorders. Of course, there is a structure in the DSM-IV-R imposed by the chapters alone (e.g., mood disorders, personality disorders, etcetera). A follow-up analysis could be to split up the data per chapter, apply fixed-margin sampling to each block, and subsequently combine the data again. Three snapshots of these networks are as follows:

These look **much** more similar to the observed DSM-IV-TR network, which means that the clustering per chapters already explains a lot of the structure. However, these networks are still denser (number of edges ranging from 3,513 to 3,674, compared to 2,626 in the observed network), meaning that investigating the graph structure is still interesting. When looking at strength centrality, we can see that in the high ranges of strength centrality the observed node strengths are *less* than could be expected by chance:

Here, red dots indicate nodes with a strength that was not in the expected interval by fixed-margin sampling.

To conclude, our new manuscript shows that the fixed-margin sampling routine proposed by Steinley et al. (2017) should not be used to evaluate psychometric network models, but shows promise in detecting local departures from Rasch models. Furthermore, the method of fixed-margin sampling is highly valuable in analyzing typical network structures that are constructed rather than estimated. I think that the combination of our commentary in *the Journal of Abnormal psychology *last year (Borsboom et al., 2017) and the new commentary discussed in this blog post safely put most criticism raised in last years series of papers to rest, and I look forward moving this discussion further in discussing crucial challenges network analysis faces in the coming years, of which there are many (see, e.g., comment # 5 on the pubpeer discussion, several publications on challenges to network analysis, and continued debate on the interpretation of networks).

If you would like to study fixed-margin sampling yourself, all codes for our simulations are available on the Open Science Framework. These rely on both R and Matlab, however, to fully replicate the analysis as proposed by Steinley et al. (2017). For R-based alternatives, the R packages RaschSampler and vegan should have similar performance.

Borsboom, D., Cramer, A. O. J., Schmittmann, V. D., Epskamp, S., & Waldorp, L. J. (2011). The Small World of Psychopathology. *PLoS ONE*, *6*(11), e27407.

Borsboom, D., Fried, E., Epskamp, S., Waldorp, L., Van Borkulo, C., Van Der Maas, H., & Cramer, A. (2017). False alarm? A comprehensive reanalysis of “Evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger. *Journal of Abnormal Psychology*, *126*(7), 989–999. http://doi.org/10.17605/OSF.IO/TGEZ8

Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017a). Evidence that Psychopathology Symptom Networks have Limited Replicability. *Journal of Abnormal Psychology*, *126*(7), 969–988. http://doi.org/10.1037/abn0000276

Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017b). Further evidence that psychopathology networks have limited replicability and utility: Response to Borsboom et al. and Steinley et al. *Journal of Abnormal Psychology*, *126*(7), 1011–1016.

Steinley, D., Hoffman, M., Brusco, M. J., & Sher, K. J. (2017). A Method for Making Inferences in Network Analysis: Comment on Forbes, Wright, Markon, and Krueger (2017). *Journal of Abnormal Psychology*, *126*(7), 1000–1010.

Tio, P., Epskamp, S., Noordhof, A., & Borsboom, D. (2016). Mapping the manuals of madness: Comparing the ICD-10 and DSM-IV-TR using a network approach. *International Journal of Methods in Psychiatric Research*, *25*(4), 267–276. http://doi.org/10.1002/mpr.1503

Verhelst, N. D. (2008). An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals. *Psychometrika*, *73*(4), 705–728. http://doi.org/10.1007/s11336-008-9062-3

**Footnotes**

The post Fixed-margin sampling & networks: New commentary on network replicability appeared first on Psych Networks.

]]>The post (Mis)interpreting Networks: An Abbreviated Tutorial on Visualizations appeared first on Psych Networks.

]]>Network analysis is an exploding field! I absolutely love seeing the constant flow of new papers and new researchers using network methods.

With such a quickly growing science, it’s difficult to keep up! Although I have personally found the network community to be very welcoming, friendly, open, and accessible, that doesn’t negate the fact that there is just a lot of information to keep up with.

As I work to keep up and learn new information, I’ve become aware of some mistakes I made early on. This tutorial is intended to keep you from making the same mistakes that I did.

At this point, I’ve seen at least a few dozen symposium presentations on network analysis, many of them from researchers just starting out with network analysis. Here are some of the most frequent errors:

“The somatic symptoms of depression were out on the periphery, barely part of the network”

“Extraversion is right in the middle of the personality network”

This misinterpretation pops up all the time. I blame linguistics.

In reality, there are several different types of node centrality, and none of them necessarily correspond to network plots. You have your centrality values and centrality plots—use those instead of looking at the network plot. Eiko wrote about this and similar centrality interpretation problems in a recent blog post.

“As you can see, sad mood and agitation were on opposite ends of the network”

“Surprisingly, weight gain and weight loss were right next to each other”

Again, not so. A good way to reality check is to look at the edges: if node distance corresponds perfectly to node similarity, all edges of a certain thickness should have exactly the same length, and all edges of the same length should have the same thickness (hint: that’s rarely if ever true).

“Intrusive thoughts were far to the right, close to the depression cluster”

This one is rarer but pops up occasionally. I see people make this error especially when there are meaningful clusters. Resist temptation; if you want to know if a node is “close to the depression cluster”, use bridge centrality instead.

“As you can see from the plots, the networks did not replicate well, indicating that edges in network analyses are mostly comprised of measurement error”

*Cough*. Relating to Error #3, in most network plots rotation is totally arbitrary (the enemy’s gate is down!). In addition, certain types of network plots (e.g., force-directed) are very unstable even with similar networks. This can wreak some serious havoc when trying to interpret multiple networks.

In my experience it’s much more informative to use a correlational approach (e.g., do the edges correlate? does centrality correlate?) to judge replicability (Eiko discussed these and similar metrics in the section “A word of caution” in this blog post). For plotting, it’s best to either use an consistent averaged layout for both plots or the Procrustes method (see Figure 6 in the full tutorial).

One way to fix the interpretation problem is to stop making any visual interpretation! Certainly, we shouldn’t pretend we understand 20-dimensional causal information just because we made a 2-dimensional plot of partial correlations (!).

But the whole point of a visualization is to help us understand our data better. And although we should stick to the numbers for our research conclusions, there is something to be said for exploratory hypothesis generation that comes from good visualizations (as long as you don’t pretend that these hypotheses were confirmatory all along).

So our second option is to try and do the best we can to make accurate visualizations, while simultaneously reigning ourselves in with visual interpretations. Here is a super quick overview of some of the options.

This is a short version of the open-access tutorial. You’ll need the qgraph and networktools packages for the code to work, and we’ll get some data from package MPsychoR. First some code for getting a network:

`library(qgraph)`

library(networktools)

library(MPsychoR)

data(Rogers)

mynetwork <- EBICglasso(cor_auto(Rogers), nrow(Rogers))

`myqgraph <- qgraph(mynetwork, layout="spring")`

Most networks you see "in the wild" are plotted with the Fruchterman-Reingold algorithm. This algorithm works by treating each network edge is like a spring—it pulls when connected nodes get to far away and pushes when they get too close.

This creates really nice-looking networks in which nodes never overlap, and edges are mostly about the same length (the "resting state" for the spring forces). In very sparse networks, it can be a good way to visualize clusters. But all of the Big Four are dangerous here.

`MDSnet(myqgraph, MDSadj=mynetwork)`

Multidimensional scaling solves our Error #2—distances between nodes actually become interpretable in an MDS plot. In other words, the algorithm works so that nodes placed close together usually share a strong relationship, and nodes far apart do not. This is, of course, accounting for the fact that we've squashed everything down into just two dimensions—so stay careful with interpretations!

`PCAnet(myqgraph, cormat=cor_auto(Rogers))`

You've probably heard of PCA—but for plotting a network? PCA is a simplification method—it tries to squash all of your complex data down into just a few variables. This is perfect for us, because our plots have only two (count 'em) dimensions! The idea here is that we give each node a score on Component #1 and on Component #2, and then use these scores to plot on an X/Y axis (this solves Error #3). We preserve complexity in the form of network edges but make the plot as simple as two principal components. If you're feeling adventurous, you could even come up with labels for what the dimensions might mean.

`EIGENnet(myqgraph)`

If you liked PCA, you're in for a treat with eigenmodels. PCA is great, but it requires that you either have the original data or a correlation matrix from that original data. In other words, PCA isn't really based on your network per se, it's just based on the same data that generated the network. Thankfully, someone[https://www.stat.washington.edu/~pdhoff/code.php] came up with a way to extract latent variables from symmetric relational data (AKA undirected network data). The interpretation is similar to PCA plotting, but everything comes straight from the network itself.

And that's it!

If you liked the abbreviated version, you can check out the full tutorial for a deeper look at the same concepts and some more sophisticated code. Happy visualizations!

Citation:

Jones, P. J., Mair, P., & McNally, R. (2018). Visualizing Psychological Networks: A Tutorial in R. Frontiers in Psychology, 9, 1742. https://doi.org/10.3389/fpsyg.2018.01742

The post (Mis)interpreting Networks: An Abbreviated Tutorial on Visualizations appeared first on Psych Networks.

]]>The post How to interpret centrality values in network structures (not) appeared first on Psych Networks.

]]>- A paper by Madhoo & Levin 2016 prompted me to write a tutorial on community detection
- A paper by Terluin et al. 2016 led to a blog post on differential variability
- A paper by Afzali et al. 2017 prompted me to write a tutorial on network stability
- A paper by Guloksuz et al. 2017 led me to write a piece on challenges of the network approach
- And a paper by Forbes et al. 2017 led to another blog post on stability

I wanted to write about centrality inference for a while, and a new paper published in *Molecular Psychiatry*, one of the leading journals in psychiatry in terms of impact factor and visibility, convinced me I should write this up. The paper is entitled “The symptom network structure of depressive symptoms in late-life: Results from a European population study”, by Murri and colleagues. This paper is written up in a similar way to many other papers, and I really don’t mean to single out this specific paper or the specific authors here. It just comes at a time where I don’t want to prepare my course or review the paper for Abnormal … so here we go.

After estimating network structures, e.g. among symptoms, in between-subjects (cross-sectional) or within-subjects (time-series) data^{1}, researchers often calculate centrality estimates. This provides information about the inter-connectedness of a variable. There are different ways to do that, and many different centrality measures exist.

For instance, this R syntax creates a small network, and shows that the green node has a centrality of 6 because it is connected to 6 other variables:

```
library("qgraph")
AM <- matrix(0,10,10)
AM[1,2] <- AM[2,1] <- AM[2,3] <- AM[3,2] <- AM[2,4] <- AM[4,2] <- AM[3,4] <- AM[4,3] <- AM[3,6] <- AM[6,3] <- AM[3,8] <- AM[8,3] <- AM[3,9] <- AM[9,3] <- AM[3,10] <- AM[10,3] <- AM[4,7] <- AM[7,4] <- AM[5,7] <- AM[7,5] <- AM[5,10] <- AM[10,5] <- AM[9,10] <- AM[10,9] <- 1
gr <- list(c(1,2,4:10), 3)
names <- c("1","3","6","3","2","1","2","1","2","3")
N <- qgraph(AM, groups = gr, color=c('#cccccc', '#3CB371'), labels=names,
border.width=3,edge.width=2, vsize=9,
border.color='#555555', edge.color="#555555", label.color="#555555")
```

In the abstract of their paper, Murri et al. conclude, after estimating a network structure in cross-sectional data:

*Death wishes, depressed mood, loss of interest, and pessimism had the highest values of centrality. Insomnia, fatigue and appetite changes had lower centrality values […]. In conclusion, death wishes, depressed mood, loss of interest, and pessimism constitute the “backbone” that sustains depressive symptoms in late-life. Symptoms central to the network of depressive symptoms may be used as targets for novel, focused interventions and in studies investigating neurobiological processes central to late-life depression.*

I am not sure this necessarily follows, and I will explain below why.

Researchers often estimate centrality values after the network structures are estimated, and then use these to draw substantive inferences. One common inference in cross-sectional data is that central symptoms are the most important symptoms, another that we should intervene on central symptoms. Murri et al. above describe that central symptoms "sustain" depression, which has a clear temporal component to it.

There are number statistical and substantive concerns you should keep in mind here. And just to clarify this again, this is not an exercise in finger-pointing. While I have always tried to be careful, and while my colleagues will tell you how careful I try to be when it comes to causal language (thanks in large part to my education as a postdoc in the lab of Francis Tuerlinckx), I am sure a few sentences have slipped through my fingers in papers I am co-author on. In my own work, the strongest statement I could find is in my first network paper, where we found loneliness to play a crucial role in bereavement, in longitudinal data. In the abstract, we concluded that "future studies should examine interventions that directly target such symptoms". In the discussion, we wrote that "that intervention programs should directly target loneliness". This is supported by and embedded in the clinical literature on the relation between loneliness and bereavement, but writing this paper today I would clarify that this conclusion does not follow from the network model alone.

So what are the main concerns and pitfalls when interpreting centrality values?

For my very first paper published in 2014, I analyzed the relations between 14 specific depression symptoms and impairment. It turned out that some symptoms explained a lot more variance than other symptoms. One of the reviewers raised the concern of *differential variability*, which I have not forgotten since. Differential variability means that items differ in their variability (standard deviation and variance), and items with little to no variability cannot relate to other variables. Since centrality is a function of relations among items, such floor or ceiling effects that can stem from differential variability will affect centrality estimates. Terluin et al. 2016 wrote a paper specifically about this for network models, which I discuss in more detail elsewhere. This means that when you estimate centrality, you should consider checking means and variances of items, and try to understand how these values determine network structures and centrality estimates. In other words, what happens if you correlate centrality values and standard deviations of your items … and they result in a correlation of 0.5? It's worth thinking about this.

Another issue is reliable estimation: Are you sure the most central symptom in your network is actually meaningfully more central than the other symptoms? This is similar to other statistics, where the mean height of 177cm in a group of men vs 171 cm in a group of women does not tell you if there is a meaningful (or statistically significant) difference between the height of men and women — unless you know the sample size and distributions. You can test this statistically, and probably want to do that before drawing inferences. We describe here how to do that in detail, via the centrality difference test.

What if 3 nodes in your network actually measure the same latent variable, such as the CES-D scale that captures sad mood, feeling blue, and feeling depressed? Your network will feature strong edges between these nodes, and their centrality will be very high, but intervening on either to decrease the others would not be a real "network intervention" because all you do is reducing sadness by intervening on sadness. That may be interesting all by itself, but the edges between these 3 items are not legitimate putative causal relations: They are simply shared variances due to measuring the same thing multiple times, as we highlight (along with a potential solution to this) in our challenges paper published in *Perspectives on Psychological Science*.

There is also the danger of conditioning on colliders or other estimation problems. Conditioning on colliders, for instance, will induce artificial edges in your network that are not part of the true model. In other words, be careful not to confuse an estimated parameter (like an edge weight) for the truth … obviously, this applies to all models, and not only network models.

Finally, there is the issue of mixing levels. Network models in cross-sectional data are estimated on between-subjects data, and as has been highlighted in recent work, it does not automatically follow that such results lead to proper conclusions regarding the within-subjects level. I am not saying it never follows, and I think these levels might align quite often, but it is an empirical question we have not yet answered. Here a fairly strongly worded recent investigation by Fisher et al. published in *PNAS *showing that there are important differences between these levels; Simpson's paradox is also highly relevant on this context.

Now let's assume our network is estimated without any problems or bias, and concentration problems is the most central depression symptom in the network structure of symptoms based on cross-sectional data. Can we conclude that it is the "most important" problem, and that we should focus our interventions on concentration problems?

As we wrote up in the discussion section of a recent paper we published in *Clinical Psychology Science*, this conclusion does not necessarily follow, for various reasons.

*It is important to highlight that centrality does not automatically translate to clinical relevance and that highly central symptoms are not automatically viable intervention targets. Suppose a symptom is central because it is the causal endpoint for many pathways in the data: Intervening on such a product of a causal chain would not lead to any changes in the system.*

In other words, the endpoint of a causal chain would end up being a highly central symptom in your network structure *if* there are many problems that lead to this specific symptom. Given the cross-sectional nature of your data, you cannot find evidence for this temporal relationship, and will, in this case, draw wrong causal inferences that does not follow from the results. So I urge caution with these and similar interpretations.

*Another possibility is that undirected edges imply feedback loops (i.e., A—B comes from AB), in which case a highly central symptom such as insomnia would feature many of these loops. This would make it an intervention target that would have a strong effect on the network if it succeeded — but an intervention with a low success probability, because feedback loops that lead back into insomnia would turn the symptom “on” again after we switch it “off” in therapy.*

Put differently, it is an important and non-trivial question if it might not be worthwhile to intervene on peripheral (non central) symptoms, because the probability of switching them permanently is higher: Few other symptoms will keep them in their original state. If sleep problems leads to 5 other problems, but is at the same time the consequence of 5 problems, it will be nearly impossible to simply target insomnia via interventions because you don't target the *causes* of insomnia.

*A third example is that a symptom with the lowest centrality, unconnected to most other symptoms, might still be one of the most important clinical features. No clinician would disregard suicidal ideation or paranoid delusions as unimportant just because they have low centrality values in a network. Another possibility is that a symptom is indeed highly central and causally affects many other nodes in the network but might be very difficult to target in interventions. As discussed by Robinaugh, Millner, and McNally (2016), “Nodes may vary in the extent to which they are amenable to change” (p. 755).*

I believe these are significant challenges to common centrality interpretations. We conclude in the paper by stating:

*In sum, centrality is a metric that needs to be interpreted with great care and in the context of what we know about the sample, the network characteristics, and its elements. If we had to put our money on selecting a clinical feature as an intervention target in the absence of all other clinical information, however, choosing the most central node might be a viable heuristic.*

There is other critical work on centrality on the way. One paper that is accepted in the* Journal of Consulting and Clinical Psychology*, by Rodebaugh and colleagues, features a detailed empirical investigation of centrality in both cross-sectional and time-series data. You can find the preprint here. The most relevant parts of the abstract read:

*We first estimated a state-of-the-art regularized partial correlation network based on participants with social anxiety disorder (N = 910) to determine which symptoms were more central. Next, we tested whether change in these central symptoms were indeed more related to overall symptom change in a separate dataset of participants with social anxiety disorder who underwent a variety of treatments (N = 244). […] Centrality indices successfully predicted how strongly changes in items correlated with change in the remainder of the items. Findings were limited to the measure used in the network and did not generalize to three other measures related to social anxiety severity. In contrast, infrequency of endorsement ^{2} showed associations across all measures. […] The transfer of recently published results from cross-sectional network analyses to treatment data is unlikely to be straightforward.*

The whole idea of network theory is that things are complicated. We should draw inferences proportional to this level of complexity, and be careful of over-interpreting our data. Obviously, this is just as important for time-series analyses, where we have time as an additional (and very important) dimension, but that only buys us Granger-causality, and only helps with a few of the issues described above.

A crucial step forward is to actually *test interventions* in patients based on centrality (and other) estimations, and I am excited to see such projects putting network theory to the test — and provides a fantastic opportunity for falsification of network theory we should all embrace.

**EDIT 11-29-2018: **

There are two new preprints discussing centrality inferences critically, which you can find here (Fablander & Hinne, 2018) and here (Bringmann et al., 2018).

The post How to interpret centrality values in network structures (not) appeared first on Psych Networks.

]]>The post Bootstrapping edges after regularization: clarifications & tutorial appeared first on Psych Networks.

]]>One of the core features of the R package bootnet is bootstrapping of network edge weights. Bootstrapping is a procedure where you estimate your network structure and parameters of interest many times (e.g. 1000), each time with a slightly different sample. You obtain these different samples by drawing people from your data randomly with replacement. This means that in your first bootstrap, Bob might be in there 3 times (but not Alice), whereas in the second bootstrap, Alice is in there twice (but Bob is absent). The larger the sample, and the more similar the people to each other, the more stable your parameters will be.

We put together bootnet to give you an idea about the stability of the edge weights and other parameters. If the edge between two nodes, A and B, is widely different every time you resample, it means your bootstrapped 95% CI will be all over the place. I have described bootnet and its functionalities in a previous tutorial blog post, and we have a tutorial paper on bootnet that was published in 2017.

Here, for instance, is the output of the `bootnet()`

function in bootnet from our recent CPS paper for dataset 1, available in the supplementary materials:

On the y-axis are all 120 edges in the network (labels omitted to keep it legible), the x-axis shows the strength of the edge weights (you can see that nearly all edge weights are positive). The red dots are the point estimates of all edges, the grey area the “95% bootstrapped CI”, as I used to call it. The important point is that many of the CIs will just overlap with zero. How do we interpret this? Usually, if a point estimate of a parameter is 0.1 (e.g. a correlation), we do not know if that parameter is different from zero. This is a normal situation in statistics, and the reason why we usually look at the CI coverage: if the CI includes 0, we interpret the parameter as likely not being different from 0.

In the case of regularized partial correlation networks, the story is different. If an edge is 0.1 *after regularization*, that means we have two types of information about the parameter: 1) our best guess is that the parameter is 0.1; 2) our best guess is that the parameter is different from 0.

Why? because we use regularization, a well-validated, sophisticated statistical technique to only keep coefficients in the network that are not zero^{1}. Obviously, regularization can still lead to errors, there are situations in which regularization does not do well, and there are numerous other methods that should be considered when estimating networks (for a summary on these points, see Sacha Epskamp’s recent blog post). But the main point is that we have to interpret the 95% CI of regularized edge weights differently than we usually do.

For the supplementary materials of a network paper on depression symptoms & inflammation that we are about to submit, Jonas Haslbecks helped us look at this topic from a somewhat different angle, and also provided some insights on the topic. It would take me more sentences to reiterate what Jonas said very concisely, so I will simply paste the relevant part of the supplementary materials here:

“In order to quantify the uncertainty associated with all edge-estimates, we computed a bootstrapped sampling distribution based on 100 bootstrap samples, for each of the edge-estimates. For each of the six networks estimated in the main article we present summaries of the p(p-1)/2 bootstrapped sampling distributions, one for each edge parameter. Specifically, we display the 5% and 95% quantiles of the bootstrapped sampling distribution and show the proportion of nonzero estimates on point that indicates the mean of the sampling distribution.

Because we use regularization to estimate the network models, all edge-estimates are biased towards zero, which implies that all sampling distributions are biased towards zero. Thus, these sampling distributions are not Confidence Intervals (CIs) centered on the true (unbiased) parameter value. This means that if the quantiles of the bootstrapped sampling distribution overlap with zero it could be that the corresponding CI does not overlap with zero. However, if the quantiles of the bootstrapped sampling distribution do not overlap with zero, we know that also the corresponding CI does not overlap with zero (explained in detail in Epskamp, Borsboom & Fried, 2017). Further details of bootstrap analyses are available in the supplemented R code.”

Jonas also produced the following plot^{2}:

The numbers show how often an edge was estimated non-zero in the 100 bootstraps. As you can see, the edge C2—C7 was included in all networks, and while the 95% bootstrapped CI of D1—C1 does include zero, it was estimated to be non-zero in 78% of the 100 estimated networks. The code for these plots can be found in the supplementary materials of our paper.

And as announced a while ago on Twitter, Sacha has recently implemented a function somewhat similar to what Jonas had put together in bootnet 1.1. This version of bootnet is currently available on github, and should be on CRAN soon. As Sacha explained in the blog, you can now plot the quantile intervals only for the times the parameter was not set to zero, in addition to a box indicating how often the parameter was set to zero.

```
install.packages("devtools")
library("devtools")
install_github("sachaepskamp/bootnet")
library("bootnet")
library("psych")
data(bfi)
network1 <- estimateNetwork(bfi[,1:5], default = "glasso")
boot1 <- bootnet(network1, nBoots = 500, nCores = 8)
plot(boot1, plot = "interval", split0 = TRUE, order="sample", labels=FALSE)
```

The above code will lead to a few warnings (ignore for the purpose of this tutorial^{3}), and leads to the following figure:

The saturation is proportional to how often an edge was included in the network. The figure doesn't scale too well at present (i.e. to more than 5 or 10 nodes), but it's something you'd likely report in the supplementary materials anyway, and not in the main part of your paper.

Thanks to Sacha and Jonas for the work they've put into this. Oh, and you know what's also new? Bootnet estimates and tells you how long your coffee break should be ...

The post Bootstrapping edges after regularization: clarifications & tutorial appeared first on Psych Networks.

]]>The post APS 2018: Collection of all network presentations appeared first on Psych Networks.

]]>Large conferences like APS can sometimes be too generalist, too broad, and can lack more detailed information, a focus on more specialist questions. This was not the case at APS regarding network modeling. I counted 5 symposia on network-related topics, and several more in which network talks were featured. It felt like a small mini-conference on networks, with a ton of familiar faces in the symposia, and also a ton of new faces. There were many insightful talks, but also very in-depth discussions with many very well informed audience members. Not only were there numerous in-depth talks; together, the presentations also covered a very wide range of topics, ranging from the equivalence between factor and network models, nomothetic analyses in cross-sectional and longitudinal data, idiographic analyses in time-series data, and clinical trials with interventions based on network models, all the way to numerous statistical extensions such as novel centrality indices, or entirely new network models.

I was most excited to see severals talks about the integration of 1) new data assessment strategies (e.g. via smartwatches); 2) new methodological tools to analyze such data; and 3) empirical studies in everyday clinical practice. When we submitted our own symposium on the topic to APS, we ended the symposium title with a question mark: “From description to intervention: Can network models based on ambulatory assessments provide novel treatment targets?” At APS, I realized how many groups in the world are working on this intersection of methodology and clinical psychology. This is crucial and will allow us to actually test and try to falsify network theory, and will show us how useful the framework really is. But talks were not limited to clinical psychology and methodology – personality is also becoming a hot topic that networks models are applied to more often.

Here is a brief summary of all talks & presenters; the order is simply the order by which I obtained the presentations. You can find all slides on the Open Science Framework^{1}.

- Richard McNally | Bayesian Network Analyses of Symptoms in Patients with Bipolar Disorder
- Riet van Bork | Simplicity of networks and factor models
- Adela Isvoranu | Big5 In Schizophrenia: Personality through Exploratory Structural Equation Modeling and Network Analysis
- Alexandre Heeren | Mapping Network Connectivity among Symptoms of Social Anxiety and Comorbid Depression in People with Social Anxiety Disorder
- Sacha Epskamp | Intra-individual Networks and Latent Variable Models
- Sacha Epskamp | Personalized Networks in Clinical Practices: Recent developments, Challenges and Future Directions.
- Emorie Beck | Idiographic Personality: A Methodological Perspective on Measuring and Modeling Individuals
- Adriene Beltz | Behavioral Networks in Oral Contraceptive Users: Exploring Ovarian Hormone Links to Gendered Cognition and Personality Qualities
- Casper Albers | Changing Individuals Modelling smooth and sudden changes in temporal dynamics
- Aaron Fisher | Data-DrivenCase Conceptualization: Applying Research to Routine Care
- Payton Jones | Bridge centrality: identifying bridge symptoms in psychopathology networks
- Charlotte Vrijen | Personalized interventions based on experience sampling can effectively improve pleasure
- Tim Kaiser | Intersession Processes in Psychotherapy
- Oisín Ryan | Centrality and Interventions in Continuous-Time Dynamical Networks
- Siwei Liu | Can We Use the Random Effects Estimates in Multilevel Models to Characterize Individuals?
- Benjamin Bellet | Bereavement Outcomes as Causal Systems
- Angelique Cramer | The baby and the bathwater: The promise of both nomothetic and idiographic (network) modeling
- Aidan Wright | Toward an Individualized Psychology: Promises and Challenges in Modeling the Individual
- Katherine Jonas | A Comparison of Network Models and Latent Variable Models for Longitudinal Data
- Date van der Veen | Prel@pse– Preventing Relapse in OCD, a proof of principle study
- Julia Möller | Mixed emotions about school: A co-endorsement network analysis of positive and negative emotions

In case you are interested, I wrote two other posts about APS 2018 on my personal blog: The first covers issues with transparency, inclusion, and open science at APS; the second summarizes our APS symposium entitled “Measurement Schmeasurement”, featuring talks by Jessica Flake, Mijke Rhemtulla, Andre Wang, Scott Lilienfeld, and yours truly.

I also want to highlight briefly that psych-networks.com is transitioning more into a community platform, featuring many guest bloggers. I want this site to become a hub of communication among network researchers in psychology, where they can post new papers, new ideas, discuss hot topics, and so forth. So if you want to write something, please contact me, and I’d be very happy to see if we can make it work! This opportunity is meant for everybody, from very early career researchers all the way to professors. Since last year, most guest bloggers were male, I would like to feature more female guest bloggers … help me make it happen!

The post APS 2018: Collection of all network presentations appeared first on Psych Networks.

]]>The post New paper on the role of stabilizing and communicating symptoms appeared first on Psych Networks.

]]>As two graduate students in the Psychological Methods department at the University of Amsterdam, we were familiarized with the work of Cramer and Borsboom on conceptualizing mental disorders as complex networks of interacting symptoms. This conceptualization signifies the role of symptoms and their interactions within and across disorders, and has inspired novel theoretical definitions of clinical concepts such as core symptoms and comorbidity^{1}.

We often found ourselves discussing the potential of tools and metrics from other research areas using network analytic techniques. In the summer of 2016 we came across Santo Fortunato’s Community detection in graphs (2010) – an excellent paper on various applications and implications of network analytic techniques^{2}. One specific sentence caught our attention:

“Identifying modules and their boundaries allows for a classification of vertices, according to their structural position in the modules. So, vertices with a central position in their clusters, i.e. sharing a large number of edges with the other group partners, may have an important function of control and stability within the group; vertices lying at the boundaries between modules play an important role of mediation and lead the relationships and exchanges between different communities.” (p. 3)

Reading this passage immediately sparked a discussion on the numerous possibilities of utilizing the community detection toolbox to develop empirical definitions of these theoretical concepts. The notion of “vertices with a central position within their cluster […] may have an important function of control and stability within the group” can readily be translated to the idea of core symptoms. Similarly, the idea that “vertices lying at the boundaries between modules play an important role [… in] exchanges between different communities” can be mapped onto the theoretical definition of comorbidity within the network perspective on psychopathology.

In our paper, entitled “The role of stabilizing and communicating symptoms given overlapping communities in psychopathology”, we aspired to complement the statistical toolbox of the network approach to psychopathology by exploring what overlapping community detection analysis has to offer. Using community detection and inspecting the differential role of symptoms within and between communities offers a framework to study the clinical concepts of comorbidity, heterogeneity and hallmark symptoms. Symptoms with many and strong connections within a community, defined as stabilizing symptoms, could be thought of as the core of a community, whereas symptoms that belong to multiple communities, defined as communicating symptoms, facilitate the communication between problem areas.

We applied community detection to a large dataset (N=2089) assessing a variety of psychological problems using the Symptom Checklist 90. We identified 18 communities of closely related symptoms. Importantly, these communities are empirically derived instead of theoretically defined. In the paper we illustrate how the proposed definitions on the differential role of symptoms can inform us on the structure of the psychopathological landscape: both globally as well as locally. As such, we adopted established metrics in network science to accelerate our understanding of the psychopathological landscape.

Figure 1. Illustration of (a) the local structure of Feelings of Worthlessness community, (b) its connection to other communities; and (c) a symptom-level example of its connection to the community Worried about Sloppiness.

From our perspective, this endeavour highlights that diving into the world of network science across all kinds of research areas can inspire great advances for the toolbox we use to study psychopathology networks. Drawing inspiration from fields concerned with complex systems such as brain networks, economic networks and social networks, the options seem infinite – and we cannot wait to explore them.

**Footnotes:**

The post New paper on the role of stabilizing and communicating symptoms appeared first on Psych Networks.

]]>The post Estimating psychological networks via Information Filtering Networks appeared first on Psych Networks.

]]>Markov Random Fields (MRF) have quickly become the state-of-the-art in psychological network modeling for obtaining between-subjects networks. The implementation for binary data is called the *Ising Model*, and for continuous or ordinal data, *Gaussian Graphical Models* (GGMs) have been used^{1}. The beauty of these models is that a zero entry between two variables in the adjacency matrix (i.e. the matrix that encodes the parameters that we then plot as networks) means that the two variables are conditionally independent, given all other variables.

In a new paper entitled “Network Structure of the Wisconsin Schizotypy Scales-Short Forms: Examining Psychometric Network Filtering Approaches”, Christensen et al. (2018)^{2} introduce Information Filtering Networks (IFNs) to the psychological literature, and compare them to lasso regularized models. Like MRFs, IFNs are partial correlation networks, and the two models differ mainly in one key aspect: addressing a common challenge that Christensen et al. describe very well:

Networks contain multiple connections across all possible pairs of variables (e.g., symptoms, items) included in the model and therefore are likely to have spurious edges (i.e., multiple comparisons problem). Thus, filtering is necessary to minimize spurious connections and to increase the interpretability of the network. This, however, introduces a problem known as sparse structure learning (Zhou, 2011): How best to reduce the complexity and dimensionality of the network while retaining relevant information?

This is a longstanding problem, and many different solutions have been proposed. In MRFs based on lasso regularization, the number of edges is determined largely by fit (i.e. minimizing the extended BIC, see our regularized partial correlation network tutorial). This has a number of advantages: it controls for multiple testing (i.e. for the numerous regressions that the model estimates under the hood); the procedure results in a parsimonious/sparse network structure that is somewhat easier to interpret; and putting coefficients to exact zero means they need not be estimated anymore, which reduces the number of parameters. The default lasso procedure sacrifices specificity for sensitivity, meaning that edges in the estimated network are also very likely in the data, but that some (weak) edges in the data might not be recovered by the lasso.

Christensen et al. consider the lasso as “biased” because edges included in MRFs are a function of sample size. I wouldn’t call this a bias, but it is correct that the lasso puts even moderately large edges to zero in situations of low power because it cannot reliably distinguish these edges from zero, whereas the lasso will put nearly no edges to zero in extremely large samples because it can very reliably distinguish even tiny edges from zero^{3}. I think about the lasso as a feature that works very similar in standard statistical methods used in psychological research: if a correlation of 0.2 is estimated in a small sample with little power, its confidence intervals (CIs) are large and often overlap with 0 [CI -0.6; 0.8]. In this case, we treat the coefficient as not significantly different from zero. But if we have sufficient power, a correlation coefficient of 0.2 might well be distinguishable from zero [CI 0.1;0.3].

How are edges chosen in the IFNs, if not based on the lasso? The paper by Barfuss et al. 2016^{4} provides a fantastic introduction to IFNs, and also covers the lasso, ridge regression, and the elastic net.

IFNs deal with the issue of minimizing spurious relations not by choosing the edges based on fit to the data. Instead, IFNs estimate a fixed number of edges based on the formula “3 * nodes – 6”. A network with 20 nodes has 54 (out of 190; ~28%) edges, and the networks the authors estimate in the paper, with 60 nodes, have 174 (out of 1770 potential; ~10%) edges. I see two main challenges here.

First, in case two networks differ from each other (i.e. one has many connections, the other fewer), estimating the same number of edges in both networks might artificially inflate similarity between the structures—or it might lead to the opposite. Without simulation studies, which the paper does not contain, we do not know, and conclusions are premature. Second, I could find no rationale why “3 * nodes – 6” would be a reasonable formula for the number of edges we expect in psychological network structures. There might be some general rules that emerge across psychological networks, and maybe it turns out to be a good approximation. But it seems a strong assumption to me. The procedure is similar to running a linear regression with 10 predictors and determining before looking at your data that 3 will be different from 0. It is also worth noting that IFNs get sparser with a larger number of nodes: with k=10 nodes, 24 of 45 edges are estimated (nearly 50%), but in k=100, 294 of 4950 edges are estimated (6%).

Below a visualization of the relationship between the number of nodes, and the sparsity (i.e. % of estimated IFN edges in relation to all potential edges):

```
matrix <- matrix(NA, nrow=2, ncol=100)
for (n in 5:100) {
matrix[1, n] <- n*3-6 #edges in IFN
matrix[2, n] <- n*(n-1)/2 #all potential edges in network
}
plot(matrix[1, ] / matrix[2, ], ylab="Sparsity", xlab="Number of Nodes", main="Sparsity of Information Filtering Networks")
```

This behavior serves the goal to estimate sparse network structures. But it's also easy to think of scenarios in which IFNs will do a bad job at recovering the true network structure. For instance, imagine a scenario where the true network structure has many nodes and is dense: the IFN will always lead to a very sparse network. Then again, we can envisions scenarios as well where the lasso would not perform well, such as a dense true structured estimated in a small sample.

There are some statements in the paper I disagree with, and I post them here as post-publication review in the hopes that it will lead to a dialogue with the authors so we can resolve potential misunderstandings together.

First, Christensen et al. write that in a case of a lot of shared variance among items, the focus of MRFs on unique variances will remove the shared variances, leaving items disconnected^{5}. They reference the paper of Forbes et al. 2017 discussed previously that was written under the same assumption. But the opposite is the case: if all items share a lot of variance, i.e. if a unidimensional factor model describes the data well, the network will be *fully connected*. In other words, if you simulate data from a unidimensional factor model, you get a fully connected, not an empty network model; this has been shown many times both in simulation studies and in mathematical proofs. See below where we first simulate data (n=300) from a unidimensional factor model, and then fit a MRF to the data that is fully connected, resulting in considerable partial correlations (all code available here).

```
# Lavaan simulation
population.model <- ' f1 =~ x1 + 0.8*x2 + 1.2*x3 + 0.7*x4 + 0.5*x5 + 0.8*x6'
set.seed(1337)
myData <- simulateData(population.model, sample.nobs=300L)
fitted(sem(population.model))
round(cov(myData), 3)
round(colMeans(myData), 3)
myModel <- ' f1 =~ x1 + x2 + x3 + x4 + x5 + x6'
fit <- sem(myModel, data=myData)
summary(fit)
network2 <- estimateNetwork(myData, default="EBICglasso")
plot(network2, details=TRUE)
```

Second, the authors state that "The shrinkage of correlations below a certain threshold [when using the lasso] also contributes to reduced reproducibility because variables can be eliminated based on statistical significance rather than theory." The lasso does not eliminate variables, it eliminates edges. The goal is to estimate a model that describes the data well, whilst avoiding the estimation of spurious relations by finding a good balance between false positive and false negatives. IFNs do the same: they are data-driven models that differ from MRFs in that they use a different strategy to obtain a parsimonious structure. So if there is any criticism regarding theory (which is an argument one can make), it applies to both models.

Third, the authors conclude that the pitfalls of the lasso-based MRFs are "biased comparability, reduced reproducibility, and the elimination of hierarchical information". With biased comparability they mean that the lasso regularizes proportional to power, which is true: if we simulate data for n=200 and n=2000, both times from the same true network structure, and estimate networks for both datasets, the network in n=200 will likely be sparser than the network in n=2000 (i.e. fewer edges), because that is how the lasso operates. It has more power in n=2000 to reliably distinguish small edges from zero, similar to t-tests or linear regressions that can more reliably detect differences (e.g. from zero) in larger samples. But this also means that this main criticism can simply be circumvented by either a) making sure sample size is similar when comparing network structures, which is commonly done^{6}, or b) by using permutation tests that take sample size into account when comparing network structures, which the Network Comparison Test developed exactly for this purpose does. The second point "reduced reproducibility", is primarily based on assertions of Forbes et al. 2017, all of which have been thoroughly refuted^{7}. Christensen et al. add to the argument by comparing 2 network structures of 2 datasets, and find that IFNs have higher replicability than MRFs^{8}. Even if we, for the sake of the argument, do not object to the way Christensen et al. conduct the comparison, the conclusion that IFNs replicate better in this specific case allows no conclusion whatsoever about replicability of models in general. And obviously, the authors retrieve the same amounumbert of edges across the two network structures because IFNs a priori estimate the same number of edges in case the number of nodes is the same. For their last point, "elimination of hierarchical information", I do not understand how IFNs get around that.

Finally, it is odd to read Christensen et al.'s repeated criticism of partial correlations and conditional dependence relations … given that the network model they put forward is a partial correlation network / conditional independence network.

In general, what I would have loved to see in the paper is a simulation study that actually shows how IFNs perform, which is necessary to vet any proposed methodology. That is, it is crucial to show that if you simulate data from a known structure X, your methodology will do well in recovering that structure.

I'm extremely thankful Christensen et al. 2018 brought IFNs to the world of psychological network modeling. From my perspective, we have two different approaches, and it is premature to conclude that one approach is inherently superior to the other; this goes both ways, obviously. The benefits likely depend on the context, such as the true network structure the data come from, prior knowledge about the network structure, the sample size, and the number of nodes.

Sacha Epskamp, who programs faster than light, has implemented IFNs in our R-package bootnet and sent around example code I will paste below. This was possible because Christensen et al. implemented the estimation routine in the package NetworkToolbox.

You can estimate the models via `estimateNetwork(default="TMFG")`

. The below code estimates a MRF and a IFN on the same data, and compares them superficially. As dataset we use the BFI data that are openly available.

```
# Install packages:
devtools::install_github("sachaepskamp/bootnet")
library("NetworkToolBox")
library("bootnet")
library("psych")
library("qgraph")
# Estimate networks, first a Gaussian Graphical Model, then an Information Filtering Network:
data(bfi)
LassoNetwork <- estimateNetwork(bfi[,1:25], default = "EBICglasso")
TMFGNetwork <- estimateNetwork(bfi[,1:25], default = "TMFG")
# Average Layout so networks can be compared:
Layout <- averageLayout(LassoNetwork, TMFGNetwork)
# Plot both networks using the same layout:
layout(t(1:2))
plot(LassoNetwork, layout = Layout, title = "EBIC glasso")
plot(TMFGNetwork, layout = Layout, title = "Triangulated Maximally Filtered Graph")
```

I've worked a lot on stability of network models in recent years, together with Sacha Epskamp, and a natural question that follows this work is: how stable are IFNs, and how stable are they compared to MRFs?

One quick look in one dataset — obviously, this does not generalize to anything but this specific dataset — leads to fairly low stability in some edges, but excellent stability in others, which is not surprising. Imagine you have 20 nodes in a network, with 30 strong edges, 30 moderate edges, 30 weak edges, and 100 absent edges. The IFN will always estimate 20*3-6 = 54 edges. This means it will correctly estimate the 30 strong edges, but then pick 24 of the 30 equally strong moderate edges. Every time you bootstrap the network, you will pick a random selection of 24 moderately strong edges. Overall, the estimation of the very strong and very weak edges will be stable (i.e. always very similar when bootstrapping), but moderately strong edges will be estimated with low precision.

Estimating and bootstrapping MRF vs IFN using the BFI dataset leads to the following edge weights and CIs (again, codes & graph from Sacha)^{9}:

```
boot1lasso <- bootnet(LassoNetwork, nBoots =1000, nCores = 8)
boot1tmfg <- bootnet(TMFGNetwork, nBoots =1000, nCores = 8)
plot(boot1lasso, labels = FALSE, order = "sample")
plot(boot1tmfg, labels = FALSE, order = "sample")
```

So there is plenty of future work to do, and I hope the next paper on IFNs will contain some simulation studies to vet their performance in different situations.

**Acknowledgements**

We discussed the paper in the labgroup, and the blog post is a summary of many points raised there. Obviously, all mistakes in the blog post are my mistakes only.

**Footnotes**

The post Estimating psychological networks via Information Filtering Networks appeared first on Psych Networks.

]]>The post FAQ on network stability, part II: Why is my network unstable? appeared first on Psych Networks.

]]>This blog post is the second part of the series, and highlights issues related to stability or accuracy of network models. It’s largely based on our Tutorial on Regularized Partial Correlation Networks forthcoming in *Psychological Methods*, and on a recent discussion on Facebook. Many of the points below were raised by Payton Jones, Denny Borsboom, and Sacha Epskamp, so all credit to them. This is just an accessible summary.

As described in the Facebook discussion, there are some reasons for stability problems.

As you know by now, model stability is correlated to power, and power comes from 1) more participants and 2) fewer nodes in the network (because this means you have less parameters to estimate)^{2}. If you have a highly unstable model, with parameters that are all over the place, the reason is probably the same as for factor models, regressions, and t-tests: Too few participants for the parameters you estimate.

Outliers can lead to problems, especially in small samples. Remember that the bootstrapping routines we use in the R-package bootnet to look at the stability of your network model resample your data. If parameters differ a lot depending on whether the few people with severe outliers are included in the sample or not, then you might end up with imprecise results^{3}.

Network models as currently implemented often use regularization to err on the side of sparsity: This puts edge coefficients exactly to zero. Think about this as some sort of threshold that edges must reach. If they don’t, they are put to zero^{4}. If you have many edges that are just barely above this threshold, and you use bootstrapping routines, it’s likely that each time you estimate your network based on bootstrapped data, different weak edges survive regularization, which will lead to an unstable network.

Related, regularization assumes that your true network is sparse. If this is not the case — i.e. if your true network is dense — and especially in cases when edge weights are similar to each other, this can result in estimation and stability issues.

Imagine the results of boostrapping show that your centrality coefficients are estimated unreliably, or, put differently, that each time you take a subsample of your sample, the order of centrality estimates comes out differently. Usually, this implies a problem, and there are many reasons this could come from. The first reason we usually think of is that there is a clear order of centrality in the true model (node A -> node B -> node C -> node D), where A is the most central node, and D the least central node. But there is a second reason that I want to highlight here: imagine you have a strongly connected true network where all 4 nodes are roughly equally central, so A = B = C = D. What happens if you bootstrap this? With every bootstrapping, sampling variability would lead to different centrality orders, and the results would appear highly unstable. When it truth, this instability merely reflects sampling error. Not knowing the true model, there is no way for us to find out, but it seems important to add that information here.

Closeness in a network becomes 0 if a network has at least one node that is unconnected to the rest. Imagine you have a very weakly connected network: It is possible that in half of the bootstrapped networks, one node is unconnected. Since we look at the similarity of centrality across bootstraps in bootnet, this would result in a very low stability coefficient. Similarly, since we drop cases in the bootstrapping routines (to determine if the same centrality order emerges when subsetting the data), this can lead to unconnected nodes, which dramatically reduces closeness stability.

Betweenness centrality may be unstable if there are (a) multiple plausible shortest paths connecting for instance two communities X and Y, and if (b) these multiple shortest paths are roughly equally strong.

Here is an example for the situation (codes and full output available here) that Sacha just put together for this blog post; the codes also nicely functions as a tutorial on how to set up your own brief simulation study, using the `netSimulator()`

function we described in our recent tutorial paper on network power estimation.

First we create a network structure that has 2 bridges, and simulate a dataset with n=5000. This should give us ample power for stable estimation.

As expected, network estimation looks highly stable; the correlation of the estimated network with the true network is high, and sensitivity and specificity are good (run the codes yourself to see the corresponding plots). The correlation stability for strength centrality is 0.44, meaning you can drop 44% of your dataset and still retain a correlation of about 0.7 between the order of strength centrality in your subsampled data/network and the order of strength centrality in your full data/network.

For Betweenness, however, the centrality stability coefficient is 0. Why? Every time you bootstrap and estimate a network structure, one of the two edges connecting the two communities is likely going to be a tiny bit larger than the other, meaning it has very high betweenness centrality (the node of the other edge get a very small betweenness centrality). This varies across bootstraps, leading to highly unstable betweenness results.

We stumbled across this when analyzing a dataset of about 8000 participants; results are described in detail here (pp. 5 and 6). To investigate this further, we dropped 5% of the sample 1000 times in this dataset, and plotted Betweenness for all items:

As you can see, items V6, V8, V11 and V14 (which were the 4 items forming the two bridges across communities) showed pronounced Betweenness, leading to a centrality correlation coefficient of 0.

When you estimate the correlations of your items as input for the Gaussian Graphical Model^{5} via polychoric correlations, this can lead to problems in case you have (a) a small dataset and (b) very skewed items / infrequent categories. This leads to zeroes in the marginal crosstables, resulting in unstable results. For instance, here we show bootstrap results of highly unstable edges even though N is high (if you cannot access the paper due to the paywall, see here). Even if this problem does not exist in your raw data, keep in mind that it might be introduced once you start bootstrapping the data, because, as Sacha wrote:

Bootstrapping will reduce entropy, as the collection of all unique outcomes in your bootstrapped dataset is by definition equal or smaller than the collection of all unique outcomes in your raw dataset.

This is related to skip questions. Especially in large epidemiology datasets, it is quite common to skip certain symptoms dependent on others. For instance, if a person does not meet at least one of the two core symptoms for Major Depression, the other 7 secondary symptoms are usually not queried. These missing values are commonly replaced by zeros (zero-imputation), which can lead to considerable problems described in more detail here. These problems are also visible in stability analysis. When we bootstrapped a very large dataset (that we expected to be very stable), we found that the core items that determine skip very extremely unstable (the grey areas in the plot below indicate the 95% CI of the edge weight parameter estimates; x-axis is parameter strength, y-axis the node in question).

As Denny Borsboom stated:

The most important dark horse is network structure. Some network structures are very hard to estimate even with massive datasets and others work very well at small sample sizes. E.g. Isingfit has trouble recovering scale free networks even at very high sample sizes, see https://www.nature.com/articles/srep05918.

A scale-free network is a specific type of network structure where the degree distribution follows a power law. The paper reference above shows that IsingFit, the R package developed to estimate Ising Models in psychological data, performs very well if data come from random or small world networks, but does not perform well when data where generated from scale-free networks:

The above pattern of results, involving adequate network recovery with high specificity and moderately high sensitivity, is representative for almost all simulated conditions. The only exception to this rule results when the largest random and scale-free networks (100 nodes) are coupled with the highest level of connectivity. In these cases, the estimated coefficients show poor correlations with the coefficients of the generating networks, even for conditions involving 2000 observations […].

Without stability analysis, inference cannot follow, which is why we previously suggested to adding stability as a third step to the network psychometrics routine: Network estimation, network inference, network stability. But as pointed out in the recent Winter School here in Amsterdam, it makes more sense to change around the order to network estimation, network stability, network inference, for obvious reasons.

There is another benefit to stability analysis: You might not notice that there were problems in the network estimation itself, such as zeroes in the marginal cross-tables, but these often come to light in the stability analysis. As such, you can use the bootnet routines as a way to identify potential issues in your data as well.

The post FAQ on network stability, part II: Why is my network unstable? appeared first on Psych Networks.

]]>The post Collection of PTSD network papers & recent conference talks appeared first on Psych Networks.

]]>Last year, 15 articles using network analytic methods in PTSD/psychotraumatology research were published, marking an 150% increase of publications compared to the year 2016. Furthermore, the European Journal of Psychotraumatology published a special issue on “Symptomics”, with several articles and an editorial on network analysis in psychotraumatology. These publications reflect an increasing interest in network analysis in psychotraumatology research. The number of panels, presentations and posters at the 33rd annual meeting of the International Society for Traumatic Stress Studies (ISTSS) in Chicago in November 2017, also demonstrated that network analysis is currently one of the “hot topics” in this field^{1}.

For those who could not attend the 33rd ISTSS meeting or want to go through the presentations again, I have started to collect presentations and posters from the speakers. Many of the speakers agreed to share their slides or poster. These resources can be found on the OSF. The names of the speakers and their talks are also provided here:

- Eddinger, Jasmine: Comparison of PTSD Symptom Centrality in Two College Student Samples
- Papini, Santiago: Uncovering PTSD Symptom Network Dynamics during Treatment
- Greene, Talya: How Does PTSD Unfold in Real Time? Contemporaneous and Temporal Networks of PTSD
- Moshier, Samantha: Applying Network Theory to DSM-5 PTSD: A Comparison of Clinician- and Patient-rated Instruments
- Lueger-Schuster, Brigitte: A Network Analysis Approach to Anger Symptoms in DSM-5 PTSD and Proposed ICD-11 PTSD and Complex PTSD
- Birkeland, Marianne: Making Connections: Exploring the Centrality of Posttraumatic Stress Symptoms and Covariates after a Terrorist Attack
- Spiller, Tobias: Symptoms of Posttraumatic Stress Disorder in a Clinical Sample of Refugees

Apart from the collection of talks and posters, I have put together a list with all papers using network analytic methods in psychotraumatology research. I updated it for the first time, including more publications (in total 27) and included some basic information on these publications, namely name of the authors, name of the journal, sample size, the populations trauma type and links to the paper and the supplementary materials. Thus, the reading list can be used as a starting point for a literature review or to find specific publications or supplementary materials. The current version of the reading list, which will be updated on a regular basis, can be found on the OSF or as an interactive list on ResearchGate. Updates will be announced via Twitter.

Please let me know If you want to share your slides, notice missing papers, broken links or have a general.

**Footnotes:**

The post Collection of PTSD network papers & recent conference talks appeared first on Psych Networks.

]]>The post 7 new papers on network replicability appeared first on Psych Networks.

]]>Our paper entitled “Replicability and generalizability of PTSD networks: A cross-cultural multisite study of PTSD symptoms in four trauma patient samples” was published a few days ago Clinical Psychological Science (PDF). I described the results of the paper in more detail in a previous blog post. In summary, the paper, for the first time in the literature, compared estimated network structures across four different datasets. Specifically, we compared networks of PTSD symptoms across 4 moderate to large clinical datasets of patients receiving treatment for PTSD, and found considerable similarities (and some difference) across network structures, item endorsement levels, and centrality indices. See the paper & blog post for details.

» Fried, E. I., Eidhof, M. B., Palic, S., Costantini, G., Huisman-van Dijk, H. M., Bockting, C. L. H., … Karstoft, K. I. (2017). Replicability and generalizability of PTSD networks: A cross-cultural multisite study of PTSD symptoms in four trauma patient samples. Clinical Psychological Science. PDF.

It is well known that depressed patients suffer from numerous symptoms that go beyond the DSM criteria for Major Depression, such as anger, irritability, or anxiety. In 2016, we investigated^{1} whether DSM symptoms are more *central* than non-DSM symptoms in a large clinical population, and found that this was not the case.

Kendler et al. 2017 published a paper last week in the Journal of Affective Disorders that is a conceptual replication of this previous paper, in a different very large clinical sample of highly depressed Han Chinese women (the CONVERGE data); *conceptual* replication because the population in CONVERGE differs considerably from the STAR*D data from the first paper, and because item content also differed.

The results are the same: DSM symptoms were not more central than non-DSM symptoms.

This means that there is nothing special in terms of network psychometrics about DSM symptoms for depression compared to non-DSM symptoms. Which makes sense, given that the DSM symptoms were chosen largely for historic and not empirical, scientific, or psychometric reasons^{2}.

» The Centrality of DSM and non-DSM Depressive Symptoms in Han Chinese Women with Major Depression (2017). Kendler, K. S., Aggen, S. H., Flint, J., Borsboom, D., & Fried, E.I. *Journal of Affective Disorders*. PDF.

In another paper published in the same journal, van Loo et al. divided the CONVERGE sample mentioned above into 8 subgroups based on 4 variables of genetic and environmental risk: family history (present vs absent), polygenic risk score (low vs high), early vs. late age at onset, and severe adversity^{3} (present vs present).

The network structures did not significantly differ across these 4 variables^{4}.

I was surprised by these remarkable similarities across different subgroups, which (contrasting my own work) could be interpreted in the sense of one common pathway to depression. Then again, CONVERGE is a very specific sample, with recurrent severe symptomatology, and I’m looking forward to see replication attempts of these results in less severely depressed samples.

» van Loo, H.M., van Borkulo, C. D., Peterson, R.E., Fried, E.I., Aggen, S.H., Borsboom, D., Kendler, K.S. (2017). Robust symptom networks in recurrent major depression across different levels of genetic and environmental risk. *Journal of Affective Disorders*. PDF.

I have covered the paper by Forbes et al. and the rebuttal by Borsboom et al. published in the Journal of Abnormal Psychology in detail already in a recent blog, and will refrain from reiterating the points here. In sum, both papers investigated the degree to which several network models replicate across 2 very large community datasets, with the following results:

And here the edge weights as a heat map to stress how strong the replication is^{5}:

» Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017). Evidence that Psychopathology Symptom Networks have Limited Replicability. *Journal of Abnormal Psychology*, 126(7). PDF.

» Borsboom, D., Fried, E. I., Epskamp, S., Waldorp, L. J., van Borkulo, C. D., van der Maas, H. L. J., & Cramer, A. O. J. (2017). False alarm? A comprehensive reanalysis of “evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger (2017). *Journal of Abnormal Psychology*, 126(7). PDF.

There was considerable disagreement among the 2 teams of authors whether these network structures replicate across the datasets. Both teams agreed, however, that readers should decide for themselves by reading both papers.

A letter published in JAMA Psychiatry last week is a non-replication of a prior finding by van Borkulo et al. 2015^{6}. I described the paper in a previous blog in more detail, the relevant point here is: Schweren et al. 2017 split a sample into two subgroups at time 2 (treatment responders and non-responders), and then compared the time 1 networks of these two groups for connectivity (i.e. the sum of all absolute connections in the network structures). Contrasting van Borkulo et al. 2015, Schweren et al. found no significant differences across the groups. Or if you put it differently: networks replicated across the subgroups, not dissimilar to the paper by van Loo et al. 2017 above. Again, we need follow-up work on this, since the effect was in the direction predicted by van Borkulo et al. 2015, and since the statistical test used requires a lot of power to detect differences.

» Schweren, L., van Borkulo, C. D., Fried, E. I., & Goodyer, I. M. (2017). Assessment of Symptom Network Density as a Prognostic Marker of Treatment Response in Adolescent Depression. JAMA Psychiatry, 1–3. PDF.

In early 2017, *Psychological Medicine* published a network analysis of OCD and depression comorbidity authored by McNally et al., who used network models to estimate undirected and directed network structures in a cross-sectional sample of 408 adults with primary OCD.

Last week, Jones et al. 2017 published a paper that looked into the network structure of the same OCD and depression items in 87 adolescents. The publication has found a home in the Journal of Anxiety Disorders, and is entitled “A Network Perspective on Comorbid Depression in Adolescents with Obsessive-compulsive Disorder” (PDF).

Interestingly, the authors use the same items in both papers, so Payton Jones wrote a blog post in which he compares the results of both papers. Payton writes that this is not a direct replication — samples differ considerably from each other — but also notes that “the similarities (e.g., the parts that ‘replicate’) are likely to say something universal about how OCD and depression work, and the differences (e.g., the parts that ‘don’t replicate’) might tell us about what makes adults and adolescents unique (or they might be spurious – we’ll have to be careful)”.

I admit that I am surprised the authors obtained a network structure at all in such a small sample — with n=87 for 300 parameters, I would have expected the lasso to put all edges to zero — and I am even more surprised that network structures seem to resemble each other fairly well (the correlation between adjacency matrices is 0.67).

» Payton, J. J., Mair, P., Riemann, B. C., Mungno, B. L., & McNally, R. J. (2017). A Network Perspective on Comorbid Depression in Adolescents with Obsessive-compulsive Disorder. Journal of Anxiety Disorders. PDF.

A core component to establishing replicability of findings is to formally compare network structures. One way to find out out whether network structures are different from each other is to use the Network Comparison Test (NCT) developed by Claudia van Borkulo that I described in a bit more detail in the last tutorial blog post. However, the NCT requires a lot of power to detect differences, so a negative result (p > 0.05) can mean (a) that there is no difference between networks, or (b) that you do not have sufficient power to detect differences. Note also that the NCT uses Pearson correlations by default, and there are many situations in which Pearson is not appropriate for your data.

Therefore, we might want to complement the investigation of differences (e.g. via the NCT) with an estimate of the *similarity* of networks. One way to do that is to look at all individual edges via the NCT, and report how many of these are not different from each other. We did that in the above reference Clinical Psych Science paper, where we compared each pair of the 4 networks with each other:

“Of all 120 edges for each comparison of networks, only 2 edges (1.7%; comparison networks 1 vs. 2 and 1 vs. 4) to 8 edges (6.7%; comparison networks 3 vs. 4) differed significantly across the networks, with a mean of significantly different edges across the 6 comparisons of 3.1 edges.

That means that while, for instance, networks 1 and 2 differed significantly from each other in the omnibus NCT (not shown; i.e. structure is not exactly the same), only 1.7% of all 120 edges of networks 1 and 2 differed significantly from each other in the posthoc tests — giving us a measure of similarity^{7}.

But the most common metric for similarity of two networks in the literature is to estimate a correlation coefficient between two adjacency matrices. Since we usually compare regularized network structures — with sparse adjacency matrices where many elements are exact zero — correlations might not be the best idea here, and it would be nice to run a simulation study to see what happens exactly, under which cases. I’ve used correlation coefficients as well, in several papers, because I think it offers some very rough insight, but I wanted to highlight here that this is probably not the best way to move forward.

Another topic that deserves more attention is replicability of findings in time-series studies. For instance, Madeline Pe^{8} wrote a great paper showing that the connectivity of the temporal network structure a depressed sample is higher than that of a healthy sample, and I’m not aware of replications of this finding. There is one group-level^{9} and one idiographic paper^{10} on critical slowing down, and I’m very curious to what degree these phenomena will replicate on other data. And then of course there are many papers fitting exploratory models to time-series data, and I am looking forward to seeing to which degree these models replicate in similar datasets. Such data-gathering and modeling efforts will also allow us the question to which degree intra-individual network structures of individuals (e.g. depressed patients) are similar to each other. Preliminary evidence^{11} suggests that there are marked differences.

**Footnotes**

The post 7 new papers on network replicability appeared first on Psych Networks.

]]>