Psych Networks

R tutorial: clique percolation to detect communities in networks

Eiko Fried — Mon, 04 Nov 2019 16:05:23 +0000

In two previous blog posts, we identified a fundamental challenge to community detection in psychometric network analysis: Commonly used algorithms assign each node to one particular community. One of these blog posts was an R tutorial I wrote, the other a guest blog by Tessa Blanken and Marie Deserno on a new method to identify communities.

What is the problem? Let’s look at this empirical network of 20 PTSD symptoms:

While the walktrap algorithm—a commonly used method that I described previously—detects 3 communities, we can see that node D2 in the bottom sort of belongs to several communities: while it is assigned to the large red community, it also has a strong connection to D4, which in turn connects to D3. So arguably, D2 is, at least to some degree, also part of that other community.

Many of you will know by now that network models are equivalent to latent variable models under a set of conditions. If you generate data under a 1-factor model, you will find a fully connected network with one giant community. And if you simulate data from a network model that has 3 strong communities that are unconnected, a 3-factor model will describe the data very well. Denny Borsboom has described this from a more theoretical perspective in this guest blog, and Joost Kruis described his statistical paper on the topic in this guest blog.

The advantage of factor models is that they can deal with cross-loadings, i.e. when an item loads on 2 or more factors¹:

Current community detection methods that do not allow such “cross-loadings” are therefore akin to simple structure in confirmatory factor models where an item can only load on one factor (i.e. belong to one community). And if there is any agreement among psychometricians, then that simple structure is rarely found in psychological data.

Clique Percolation

Enter clique percolation, a well-established method e.g. described in chapter 9 of the free network science book by Barabasi². Usually, the program cfinder is used for that, which was also used in the network psychometrics paper by Tessa and Marie. Clique percolation allows to identify nodes that belong to multiple communities.

A few days ago, Jens Lange published the R package CliquePercolation on CRAN, and finally we can use clique percolation in R! Jens provided a beautiful and detailed explanation of the package’s functionalities, and I will use some of his code and explanations here in this short tutorial, and apply it to an open clinical dataset.

Clique percolation tutorial

So let’s use empirical data and see how the method performs. Code and data for this tutorial are available here. We use a dataset of 221 military veterans for whom we have 20 PTSD symptoms based on the DSM-5; our empirical paper on this dataset is available here where you can also find more information on the sample composition, measurement of symptoms, etc.

1. PTSD network based on the DSM-5

In a first step, we simply estimate the network structure, using a regularized gaussian graphical model, and use 4 communities based on theory: the 4 communities of symptoms described in the DSM-5.

This can be easily done in R:

 
load('data.Rdata')

### estimate network
n1 <- estimateNetwork(data, default="EBICglasso")

### plot network
names<-c('B1','B2', 'B3', 'B4', 'B5', 'C1', 'C2', 'D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'E1', 'E2', 'E3', 'E4', 'E5', 'E6')
longnames <- c('Intrusive thoughts', 'Nightmares', 'Flashbacks', 'Emotional cue reactivity', 'Physiological cue reactivity', 'Avoidance of thoughts', 'Avoidance of reminders', 'Trauma-related amnesia', 'Negative beliefs', 'Blame of self or others', 'Negative trauma-related emotions', 'Loss of interest', 'Detachment', 'Restricted affect', 'Irritability/anger', 'Self-destructive/reckless behavior', 'Hypervigilance', 'Exaggerated startle response', 'Difficulty concentrating', 'Sleep disturbance')
gr1 <- list('Intrusions'=c(1:5), 'Avoidance'=c(6:7), 'Cognition & mood alterations'=c(8:14),'Arousal & reactivity alterations'=c(15:20)) 

pdf("Network1.pdf", width=8.5, height=5)
g1<-plot(n1, labels=names, layout="spring", vsize=6, cut=0, border.width=1.5, border.color='black', title="DSM-5 communities",
         groups=gr1, color=c('#a8e6cf', '#dcedc1', '#ffd3b6', '#ff8b94'), nodeNames = longnames,legend.cex=.35)
dev.off()

As you can see, the theoretical relations do not seem to map well onto the empirical relations. Especially E2 and E3 do not really closely inter-relate with other nodes from their DSM-5 community.

2. PTSD network based on the walktrap algorithm

So let's estimate the network using the walktrap algorithm, which is implemented in the R package EGAnet:

 
comm1<- EGA(data, plot.EGA = TRUE); comm1
gr2 <- list('Intrusions & Avoidance'=c(1:7), 'Cogn. & mood alterations, arousal'=c(8,9,12:20), 'Blame/emotions'=c(10:11)) 

pdf("Network2.pdf", width=8.5, height=5)
g2<-plot(n1, labels=names, vsize=6, cut=0, border.width=1.5, border.color='black', title="Walktrap communities",
          groups=gr2, color=c('#a8e6cf', '#ff8b94', '#ffd3b6'), nodeNames = longnames,legend.cex=.35)
dev.off()

Walktrap identifies 3 communities:

Note that there are several other ways to identify communities, such as spinglass or simply eigenvalue decomposition, which I described in my previous tutorial. I use walktrap here because it's implemented in EGAnet, and more convenient to estimate.

3. Clique percolation by optimizing I and k

Third, we compare this to the results of clique percolation, for which we use Jens' CliquePercolation package. You can find more details on the methodology here, which I only summarize briefly. There are several ways of running the algorithm, and I showcase 2 here.

To run the algorithm for weighted networks, one option is to optimize k and I, where I determines how strong the average relations among a community need to be to be detected as a community, and k determines the minimum clique size. CliquePercolation requires a minimum k of 3, so we cannot use k=2, which would have made sense based on the results of the walktrap algorithm that identified a community of 2 nodes (D3 & D4). In the example below, I asked the program to search through ranges of I from 0.01 to 0.20, given that an average partial correlation of 0.20 would appear to be very large. This is usually done for larger networks than we have here, so might not be entirely appropriate.

We then identify the optimal value of I based on the rule that with increasing I, we should extract the solution for which the ratio threshold crosses to values above 2, in the best case accompanied by a large χ value. In our case, this is I=0.12, as you can see in the output below. Missing values ("NA") are based on the fact that certain metrics can only be computed for networks with at least 2 or 3 communities.

Here the accompanying R code:

 
### use Clique Percolation
W <- qgraph(n1$graph) 

thresholds <- cpThreshold(W, method = "weighted", k.range = 3,       
                          I.range = c(seq(0.20, 0.01, by = -0.01)), 
                          threshold = c("largest.components.ratio","chi")); thresholds

results <- cpAlgorithm(W, k = 3, method = "weighted", I = 0.12)

Now we plot the network:

 
pdf("Network3.pdf", width=8.5, height=5)
g3 <- cpColoredGraph(W, list.of.communities = results$list.of.communities.numbers, layout=L, theme='colorblind',
                     color=c('#a8e6cf', '#ff8b94', '#ffd3b6', '#444444'), labels=names, vsize=6, cut=0, border.width=1.5, 
                     border.color='black', nodeNames = longnames,legend.cex=.35,
                     edge.width = 1, title ="PTSD communities based on Clique Percolation")
dev.off()

We see that clique percolation also groups a lot of the initial nodes into one group, like the DSM-5 and walktrap; nodes E2 and D1 are not assigned to any community; and 4 nodes are assigned to 2 communities. Overall, results seem sensible, including the assignment of D6 to partially red, given its relations to other red and half-red nodes. Interestingly, E2 is not assigned to any community. Unlike node D1, which has the lowest centrality (i.e. interconnectedness), where no assignment makes sense, E2 is at least moderately connected, to both the blue and red community. EDIT: Turns out we can visualize this via Fruchterman-Reingold easily, so I updated the section above a bit (thanks Jens for your email).

4. Clique percolation by optimizing entropy

A second way that may be better suited for smaller networks is using entropy, based on Shannon information. Jens has a detailed tutorial on that near to the bottom of the package vignette for CliquePercolation, so I will not describe this in detail here. For the PTSD dataset, this method leads us to choose an intensity value of I=0.14 rather than 0.12 above, and the final network solution looks like this:

The solution seems overall a bit more conservative, with 5 unassigned nodes. Extracted solutions appear to make sense, although not assigning D3 to a community does not align with prior results, and seems inconsistent given the strong edges the node shows.

Conclusions and ways forward

Data and code for the tutorial are available here. It's awesome to see commonly used community detection algorithms from other fields being implemented into the free open source software environment, and I see several avenues for future work.

Model validation: how well does clique percolation recover the true network structure of weighted graphs, under which set of conditions.
What specific guidelines should we follow to decide what solution to extract? The current guidelines, e.g. for optimizing I and k simultaneously, leave considerable researcher degrees of freedom, which could be exploited if researchers pursue data analysis with certain goals in mind. These lose rules would also make a preregistration of Clique Percolation challenging.
Power analysis: how many datapoints are required for the method to reliably recover communities, given what specific method?
Extensions to cases where nodes can belong to more than two communities (akin to cross loadings in the CFA context).

Thanks again Jens for translating cfinder into R³! And if someone finds a way to make the code colorblind friendly (i.e. by assigning different shades/backgrounds to parts of nodes), please do send it over and I am happy to upload it here (with full attribution of course).

The post R tutorial: clique percolation to detect communities in networks appeared first on Psych Networks.

How to study early-warning signals for clinical change

Merlijn Olthof — Mon, 28 Oct 2019 14:56:49 +0000

Merlijn Olthof is a PhD student at the Behavioural Science Institute of Radboud University, Nijmegen, The Netherlands. His main research interest is the study of complexity in mental health with a particular focus on phase transitions and early-warning signals. Together with a project group – including his supervisors Anna Lichtwarck-Aschoff and Fred Hasselman – he recently published two preregistered studies on EWS and phase transitions in clinical change. In this blogpost, he addresses how EWS for clinical change can(not) be studied.

The concept of early-warning signals (EWS) for sudden shifts in the global behavior of complex systems received much attention after two review papers by Marten Scheffer and colleagues (Scheffer et al., 2009, 2012). These reviews illustrate how sudden qualitative changes (tipping points) in various complex systems are preceded by generic EWS¹. For example, EWS have shown to precede tipping points in systems as diverse as the laser, the climate (e.g., the greenhouse-icehouse transition), and the brain (e.g., epileptic seizures). Clinical researchers have hypothesized that EWS and tipping points may also exist in mental health. Initial evidence for this idea was presented in a single-case study where EWS preceded relapse into depression. This shows that EWS are promising for personalized prediction in clinical practice.

The study of EWS for clinical change is, however, not straightforward. Many basic issues in this new field of research, such as how to define EWS in the context of clinical change, have not yet been addressed. In this blogpost, I address two very basic questions for the study of EWS for clinical change that kept me and my co-authors busy in preparation of our research. First, what are EWS? Second, how can we study EWS for clinical change? I end with some recommendations for EWS research and a brief summary of our recent study where we found that EWS have a real-time predictive value for sudden shifts in symptom severity.

What is an early-warning signal?

An early-warning signal is a within-system change in dynamics (indicative of critical fluctuations and/or critical slowing down) that is predictive of a phase transition.

This may not sound very sensible now, so let’s elaborate. Based on complex systems theories such as synergetics and catastrophe theory, two EWS can be distinguished: critical fluctuations and critical slowing down. Both indicate the rising instability in a system that precedes a phase transition (tipping point). This sounds abstract, but can be visualized intuitively (I hope you agree) by a ball rolling in a landscape with valleys and hills.

Consider the figure below which illustrates a simplified example of a phase transition that may occur in clinical change. At first, the system has only one stable state (an attractor) to which it will always return after a perturbation. This attractor state is visualized by a deep valley with steep hillsides. The behavior of the system is represented by the path of the ball. No matter where the ball is dropped in the most left landscape, it will very quickly settle in this attractor state. The system has a low return time to this attractor and can be said to be constrained: it has just a few degrees of freedom available to generate its behaviour. When this attractor landscape changes (middle part of the figure), the existing attractor is destabilized, which means that the system will now have more degrees of freedom available to generate its behaviour. This leads to increased variability of behaviour (i.e., critical fluctuations) and an increased time for the ball to return to the (now very shallow) valley, after it is dropped somewhere in the landscape. This increase in return time is called critical slowing down. After such a period of instability, the system re-stabilizes into a new attractor state, a process called self-organization (right part of the figure).

The example above highlights that EWS are not simply predictors of future events. Instead, they indicate the present instability in a system. Because this instability is often (but not always) followed by a phase transition, EWS are predictive.

Note that EWS are unspecific to the upcoming transition. Instability increases the probability of change (by increasing the degrees of freedom), but does not inform about the kind or direction of change when multiple new stable states are possible. This is why they are called generic EWS. Accordingly, our study found, in line with our hypothesis, that EWS predicted both sudden gains (transitions towards lower symptom severity) and sudden losses (transitions towards higher symptom severity).

How can we study EWS for clinical change?

EWS are studied using time series analysis. A time series is a piece of intensive longitudinal data. For example, see the picture below. On the y-axis we see the scores on the item ‘I am feeling anxious’ for a hypothetical individual. On the x-axis we see the time. Different time series measures may be used as EWS. Variance, dynamic complexity and entropy have been used for critical fluctuations. Autocorrelation and variance have been used for critical slowing down. However, calculating such an EWS measure of a time series does not automatically mean that one is studying EWS. Three requirements can be formulated for the study of EWS.

EWS can only be studied within-person

First, EWS describe a within-person process (the rise of instability over time) and can only be studied as such. Consider two persons who are entering psychotherapy for depression (see figure below). During a baseline measurement period preceding treatment, individual 1 has low variance in time series of daily self-ratings. Individual 2, in contrast, has high variance. Is the higher variance in individual 2 an EWS?

No. Between-person differences on measures used for EWS (e.g., variance) are in itself unrelated to EWS as is visualized in the figure below. The true EWS is the rise in instability over time within a system. So, it is not about the EWS measure (e.g. variance), but about the change in the measure over time. Only if we assume that individuals have exactly the same attractor landscape, which changes in an exactly similar way, leading to exactly similar behavior (i.e., ergodicity), between-person differences in dynamics can be used as EWS. But these conditions are never met as complex systems are per definition non-ergodic systems.

EWS should precede phase transitions

Second, EWS should warn for something. More specifically, they should warn for a phase transition. Again, this means that EWS can only be studied on the within-person level where phase transitions can be defined and predicted. When EWS are related to outcome while a phase transition is not defined, many alternative explanations for both the presence or absence of a relation exist. For example, symptom decrease might actually have preceded the EWS instead of the other way around. Or, patients might have improved within the same attractor state, in the absence of a phase transition. Or, EWS might have predicted symptom decrease in part of the sample, but increase in another part, leading to a null result. The relation between instability and treatment outcome is interesting to study, as findings suggest that successful therapies feature periods of instability. But we should avoid the term EWS when the specific transitions are unknown. The term ‘destabilization’, that is often used in psychotherapy research, seems more appropriate.

There should be periods of relative stability

Third, EWS should increase prior – and decrease after – the transition. In other words, there should be periods of relative stability before and after the phase transitions².

EWS should be present shortly, as they indicate an instability in the system that cannot be maintained. But how short is short? That depends. A central issue in complex systems is that similar processes take place on multiple timescales. For example, in mental health we expect phase transitions in developmental time (e.g., onset of psychopathology), real-time (e.g., in emotional states, or even neuronal firing) and various in-between timescales (e.g., sudden gains in psychotherapy). In this regard, we can learn a lot from the dynamic systems approach to developmental psychology, which has a long tradition of dealing with different timescales (e.g., see this dynamic systems model for the development of anti-social behavior on multiple timescales). On a more practical note: we found that critical fluctuations in daily self-ratings were predictive a few days before a sudden gain/loss, but the optimal prediction window is an avenue for future research.

Eight recommendations for EWS research

With the requirements given above in mind, I present eight practical recommendations for the study of EWS for clinical change.

Define the timescale of interest

Different types of phase transitions will occur at different timescales. It is important to have insight in this timescale for choosing an appropriate sampling rate (how frequent participants will have to complete self-ratings). Some transitions (e.g., recovery from depression) might be quite slow and daily or bi-daily sampling may be enough. Other transitions (e.g., aggression incidents) might be extremely fast, requiring more intensive sampling. For these ‘fast transitions’, physiological data may provide more suitable time series.

Collect data during a change process

Researchers interested in EWS should collect data during a change process where phase transitions take place (for example, during psychotherapy). This makes EWS research fundamentally different from most contemporary idiographic research where it is recommended to collect data when change is absent (see this blogpost by Marilyn Piccirillo).

Pinpoint the transition

The transition should be pinpointed in order to test the predictive value of EWS. Some transitions might be easier to pinpoint than others. Prime candidates are dramatic change events such as relapse in addiction, suicide attempts, and the onset of psychotic episodes. In repeated symptom severity measures, sudden gains and losses are also well-definable transitions. In experimental studies, sudden insight is a good example of a phase transition that can be predicted with EWS (e.g., see this study).

Choose an appropriate measure of EWS

Autocorrelation and variance are very popular EWS measures, but they have limitations. Autocorrelation is an indirect measure of critical slowing down, as the real return times after perturbations are often unknown. This limits the interpretability, especially in clinical psychology where much is still unkown about the nature of attractors and perturbations. Measuring return time poses challenges, but it is something to strive for as it is the only way to obtain strong evidence that critical slowing down can be observed in clinical psychology. Variance also has limitations. It is sensitive to the mean, which is irrelevant for EWS and it does not take the time-ordering of variables into account. Entropy measures (e.g. permutation entropy), dynamic complexity, or recurrence quantification measures may be more suitable measures for critical fluctuations (many of these analyses are available in R-packages such as casnet).

Perform a windowed analysis of EWS

When the transition is defined, EWS can be calculated. This is done through a windowed analysis. Make sure to use a backwards window. Only with a backwards window, the predictive value of EWS can be assessed. There is no informed standard for the window size, but a seven-day window may be used to control for weekend effects.

First analyze, then aggregate

This advice given by Peter Molenaar is quoted in the amazing book The End of Average by Todd Rose. In line with the principles of idiographic science, transitions and EWS should be defined and calculated on an individual basis. Only after all individuals in a large dataset are analyzed, one can aggregate to test the predictive value of EWS by specifying a survival model. Multi-level modelling is essential in clinical change research as it allows to model individual differences. The multi-level survival model can handle different time series lengths, different timing of transitions and can include individuals with one transition, multiple transitions, or no transitions at all. Make sure to time-lag your predictors. This ensures that you really predict a transition in the future.

Work transdisciplinary

This is a more broad recommendation, and quite obvious for everyone who is familiar with the work by Marten Scheffer and colleagues. But there is even more! For me it was very helpful to examine how EWS and phase transitions have been studied in other areas of science. Thanks to my supervisors and colleagues Fred Hasselman, Anna Lichtwarck-Aschoff and Maarten Wijnants, I got to read a lot of valuable ‘classics’ in movement science, cognition, development and psychotherapy research (for a list of my favourites see ³). Of course, especially in the natural science, there is a whole world of phase transition research to discover (for those not afraid of mathematics). I think the ‘complexity in mental health’ field can benefit from working transdisciplinary. This requires time effort, but in the end it will be efficient as we don’t waste our time to re-invent the wheel.

Preregister your confirmatory study

Both confirmatory and exploratory research on EWS for clinical change are valuable and needed. But they should be clearly distinguishable. It is advisable to preregister confirmatory hypothesis tests for EWS given the large researcher degrees of freedom in defining transitions, choosing EWS and determining window sizes (for an example, you can find my preregistration on the osf).

Our study and outlook

Our recent study tested whether EWS were predictive of sudden gains and losses in psychotherapy for mood disorders in a large sample of patients. The study was set-up in line with the recommendations above. Patients completed daily self-ratings about their therapeutic process. For every patient, we pinpointed the phase transitions (sudden gains and losses) and calculated the EWS measures (dynamic complexity in our case) using a moving window analysis. Then, we tested the predictive power of EWS with a multi-level survival model. The findings are the first to show that EWS have a real-time predictive value for phase transitions in clinical change. Specifically, we found that an increase of 1 standard deviation in dynamic complexity values (a measure for critical fluctuations) was related to an 1.5 increased chance for a sudden gain or loss in the upcoming 4 days.

Our results show that EWS in daily self-ratings can potentially be used for real-time prediction of sudden gains and losses. In the future, it would be valuable to test whether we can also predict different clinical transitions, such as suicide attempts or relapse in addiction. Especially for these transitions, an early-warning system could be valuable for prevention. Another avenue for future research lies in the timing of interventions. Recall that EWS indicate an increase in the degrees of freedom available to a system, meaning that systems are more open to change when EWS are present. Psychological interventions might thus be more effective when timed during sensitive periods in which EWS are present.

Conclusion

The main message of this post is that an early-warning signal for clinical change is a within-person change in dynamics (indicative for critical fluctuations or critical slowing down) that is predictive of a phase transition. I hope that researchers interested in EWS find the post helpful and I look forward to more EWS research!

Merlijn Olthof

I wish to thank Fred Hasselman and Freek Oude Maatman for helpful suggestions and corrections for this blogpost. Also, I thank my co-authors – Fred Hasselman, Guido Strunk, Marieke van Rooij, Benjamin Aas, Marieke Helmich, Günter Schiepek and Anna Lichtwarck-Aschoff – for our cooperation on the EWS study.

The post How to study early-warning signals for clinical change appeared first on Psych Networks.

Experience sampling software ‘mobileQ’: new, free, open source

Peter Kuppens — Fri, 12 Jul 2019 08:14:20 +0000

Peter Kuppens is Professor of Psychology at KU Leuven-University of Leuven in Belgium. He studies the nature and dynamics of emotions in their natural habitat, daily life, and how they play a role in well-being and mood disorder. His research group has developed an Android smartphone platform, called mobileQ, to perform experience sampling research, which he talks about in this blog.

Experience Sampling Method (ESM) aka Ecological Ambulatory Assessment (EMA) is becoming ubiquitous in research investigating the nature and dynamics of everyday psychological phenomena.

A brief history

When our research group started doing ESM research back in 2004, we used special wristwatches that could be programmed to beep at fixed timepoints. Participants wore these watches on their other wrist than their usual watch, and when they beeped, took out a booklet to complete a pencil-and-paper survey. To my surprise, dual watches did not become a fashion thing. This method helped us collect the data we were after, but it was laborious and time-consuming to enter the data and not great for participant compliance. I think it was around 2007 that we moved to palmtop computers. Yes, it fit in the palm of your hand, but “computer” was a bit of an overstatement! The device could be used for notes and calendar keeping, but could not call, text, or browse the internet, let alone tweet or find you a date (and it cost more than a modern smartphone!). But it worked well for beeping participants and digitally recording their responses to the questions we formulated. We used iESP to program our palmtop studies, and I am still indebted to Barrett and Barrett (2001)¹ for making these devices amenable for ESM research and for instilling in me the jealousy to also one day publish a paper that has my name on it twice.

The age of the smartphone

Flash forward to 2011, when we figured we should join the smartphone age. It was easy to purchase smartphones, but less easy to do ESM research on them. There were a few homegrown software platforms available, but they suffered from various ailments. There were a few paid platforms available too, but they quickly became expensive when you had plans to conquer the world, cure cancer, and halt all psychopathology with ESM research. So we thought it would be easiest to just make our own software to program smartphones. That was an underthinkment! It took us several years to make something that is easy to use, does not require any programming or coding experience from the part of the user, is reliable in terms of data collection and storage, and has enough flexibility to accommodate most of the varieties of research protocols typically encountered in ESM research. We called it mobileQ. While many people were involved in its development along the way, it was Kristof Meers who was the driving force and coding genius that made it all happen and work.

MobileQ

We and our collaborators have been using mobileQ for several years now. In total, it has collected over 7 million responses from 200,000+ surveys. From the start, it was our plan to also make our ESM smartphone brainchild freely available to other researchers. Yet, this also took longer than expected, as we had to properly document its features for novice researchers and to figure out the Terms and Conditions and Privacy statements researchers should agree to when wanting to use mobileQ. This turned out to be our biggest obstacle in today’s GDPR and lawsuit-obsessed age (a big thanks here to the KU Leuven R&D department).

So here it is.

We are very happy to make available to you, the research community, a free and flexible platform to perform ESM research. Let me explain a number of its features, advantages and possible disadvantages. Aside from being free, a big advantage is that we hope it is easy to use for novice researchers and is able to accommodate most (standard) ESM protocols, including different questions types, randomization options, flexible time-points, possibility for both time and event-sampling, and so on. As a simple research group, we cannot afford to set up a call-center or helpdesk to answer all questions and queries. Yet, we have documented all features as well as possible with an accompanying paper (Meers , et al., 2019)² and instructional videos featuring the irresistible Aussie accent of Elise Kalokerinos and the video editing skills of Egon Dejonckheere. There is also a forum for researchers to post questions that we will also monitor ourselves.

MobileQ’s focus on Android

One potential drawback perhaps is that mobileQ only works on Android phones, and we would advise that it would be best to work with specific types of research dedicated phones. That is, we suggest that researchers buy their own Android phones for their labs and install mobileQ on these phones (which are cheap these days: recommended devices do not cost a lot more than 100 euros or dollars). Why? One first reason is perhaps that we came from palmtops and that research dedicated devices did not seem unreasonable at the time. A second reason is that unlike today, when we started it was difficult to develop something that worked on both Android and iOS. But I also think there is a case to be made to use research dedicated phones. First, they may aid compliance. Participants see it as a research instrument that they may ignore less easily than their own smartphone, they cannot tinker with the notification settings (or avoid notifications!), and their phone cannot interfere with notification systems (something we have found problematic in some versions of Android with a particularly persistent battery saver model). Second, you have more control over what is displayed, and it is displayed in a way that is uniform across participants. Third, you can include all participants, even those without a smartphone (important for some populations e.g. older adults).

Protected servers & GDPR

Currently, mobileQ is available for free use from our own KU Leuven protected servers. The platform is GDPR compliant and takes into account today’s security, legal, and privacy requirements (of course, you are still responsible for the nature of the data you collect). In addition, we will make the code open-source across the course of 2019. This option will allow to store your data on your own servers, a requirement perhaps at some institutions, especially when collecting sensitive data. Perhaps more importantly, it will also allow you, the research community, to further build on our work and adapt and enhance mobileQ for whatever purpose and functionality you need on top on what is already there.

Feel free to try it out.

From our point of view, we are happy we are able to give something to the research community that may hopefully help along your research, and in doing so, help us understand the complexity of human behavior, thoughts, and emotion (and who knows, cure cancer).

Peter Kuppens
with thanks to all involved in the development of mobileQ including Marlies Houben, Pete Koval, Madeline Pe, Koen Rummens, …

The post Experience sampling software ‘mobileQ’: new, free, open source appeared first on Psych Networks.

Idiography: Where have we come from, where should we go to?

Marilyn Piccirillo — Wed, 15 May 2019 11:58:37 +0000

Marilyn Piccirillo (email) is a graduate student in the Washington University Clinical Science Ph.D. program. She is primarily interested in idiographic methodology and how psychologists can implement these methods into clinical and applied settings. Recently, she authored two papers on idiography in collaboration with her mentor – Tom Rodebaugh – and fellow grad student – Emorie Beck. A brief summary of these papers is presented here. The first paper reviews the history of idiographic methods in psychology and the second paper details ways that clinical scientists can implement these methods into their work.

Idiography – or the study of individuals – is achieving new prominence within psychology. In a field that seems inextricably linked to the study of an individual’s experience, personality, relationships, and symptomatology, it seems almost strange that idiographic methodology hasn’t always been at the forefront of psychological research. To be fair, numerous psychologists and therapists in the 20th century designed person-centered studies or integrated idiographic methodology into their clinical work. However, these earlier studies were limited by the use of (understandably) rudimentary methodology. Over the past decade or so, there has been substantial improvement in the data collection and statistical methodology available for N = 1 studies and a notable increase in the accessibility of these methods for psychologists. This has allowed those interested in idiography to model more complex psychological dynamics of an individual’s experience.

History of Idiography in Psychology

There has been a long history of idiographic work during the 20th century. Raymond Cattell introduced the use of the data-box as a method for orienting psychologists to person-centered research. A figure of the data-box is shown below. As opposed to studying several variables in many people at one point in time (see the front panel of the databox), a researcher could instead measure a set of variables in one individual over several points in time (see the shaded panel of the databox).

Clinicians and clinical scientists continued in this direction using analytic methods like P-technique or dynamic factor analysis to analyze a single patient’s psychotherapy. Similarly, psychologists began using these methods to examine the structure of personality. Numerous other researchers published ground-breaking idiographic studies, and reading their work can quickly turn into a delightful dive into the history of psychology and psychology methodology (personal experience of Marilyn L. Piccirillo, 2017 comprehensive exam studying)! A selected reading list of these historical studies can be found here.

During this early era of idiography, data collection and statistical methodology was understandably more basic and less comprehensive. With advancements in data collection techniques, we’re now able to use experience sampling methodology to collect in-vivo assessments of the individual’s mood or experience rather than relying on retrospective self-report. Likewise, our statistical methods are increasingly able to capture the complexity of psychological time-series data.

Contemporary Idiographic Work in Psychology

Several researchers have promoted idiography through the use of these improved methods. In a review of idiographic studies within psychology that I authored with Tom Rodebaugh, we highlighted work from Aaron Fisher, Aidan Wright, and colleagues who have published results demonstrating the idiographic nature of symptomatology in generalized anxiety and borderline personality disorders, respectively. Additionally, using a factor-based time-series approach, Peter Molenaar, Emilio Ferrer, John Nesselroade, and colleagues have examined the affective dynamics of interpersonal interactions between various close relatives. Notably, the work of Laura Bringmann, Ellen Hamaker, and colleagues in developing and testing time-varying analytic methods marks an important improvement in idiography. These analytic approaches can assist with modeling more complex psychological processes and can account for some violations to stationarity.

Other researchers have worked with group-level approaches that are also able to model individual-level processes. Methods such as group iterative multiple model estimation (GIMME) used by researchers including, Adriene Beltz, Kathleen Gates, Stephanie Lane, and Aidan Wright, as well as multilevel dynamic structural equation modeling, used by Ellen Hamaker and colleagues improve upon our ability to study individual-level processes within the context of the group. Most notably, in the case of the GIMME method, group-level models are constructed from individual-level models rather than relying on averages calculated across individuals. A selected reading list of more contemporary idiographic studies is included here.

It was truly exciting to review the work from our colleagues who are working to move the field of idiography forward. Their work demonstrated substantial progress towards using newer and more advanced time series methods. Yet, there is also considerable work ahead if we are to continue working towards integrating idiographic methods into applied areas, especially clinical work. Lian van der Krieke and colleagues have conducted some of the first dissemination and implementation work by developing an automated platform to collect and analyze idiographic data using vector autoregression (autoVAR). They have published studies measuring reactions and attitudes towards this automated platform, and the use of an automated platform may help to improve the accessibility of idiographic methods.

However, studies integrating idiographic methods into applied settings are still limited, which may be due to the lack of accessible information about how to best design an idiographic study. In a tutorial for clinical scientists and clinicians, Emorie Beck, Tom Rodebaugh, and I put forth our suggestions for designing an idiographic study based on the papers reviewed above and information gathered via personal communications at conferences, email, and academic Twitter.

Designing An Idiographic Study

Our main takeaways for designing an idiographic study are included below:

Select items that can be measured continuously and use a continuous scale (i.e., preferably a 0 – 100 scale). Without a continuous scale, there may not be enough variance around each item to analyze.
When possible, use more than one item to assess a given construct. Although participant burden is a concern, there are plenty of issues that can arise from relying on single item measurement. In terms of data analysis, you will also want to consider compositing items that are highly correlated or modeling latent variables.
Be cautious about stationarity! Many of the easily accessible time-series methods assume stationarity around a given process, yet it is an open question as to whether psychological processes can ever be truly stationary. Consider collecting data during times of likely stationarity – such as when symptom change has stabilized. When it comes time to analyze your data, examine the data for trends that can be accounted for through detrending procedures or use a time-varying approach that can appropriately model nonstationarity.
Consider the timing of your assessments. This may be one of the trickiest decision points, because we don’t have much, if any, empirical evidence as to the within and between person time trends of an emotional experience or symptom. Our best advice is to strike a compromise between numerous assessment points and participant burden. Our previous studies have administered surveys at 5 – 7 time points throughout a 12 – hour period. Regardless, be prepared to analyze data at different lags (i.e., Lag 1, Lag 2, Lag 3), as this will allow you to model multiple timescales of item and inter-item relationships.

The final two points above represent big hurdles for the field of idiography, and I’m looking forward to advancements in our understanding of how psychological systems change over time, both within and between individuals. In a field that is inherently focused on the individual, it is exciting to witness the rapid evolution of idiographic methodology as it can be used to improve our theoretical and clinical understanding of psychological systems!

The post Idiography: Where have we come from, where should we go to? appeared first on Psych Networks.

ICPS 2019: Collection of presentations related to dynamical systems

Eiko Fried — Fri, 15 Mar 2019 14:23:58 +0000

ICPS 2019 in Paris was last week, the European version of the US APS conference. Many researchers working on networks, time-series data, and dynamical systems were present, and I wanted to share all slides here. I have been asked to state that the slides were never intended to be used outside of the respective talks, because some researchers feared sharing slides that weren’t always entirely clear when reading them would reflect badly upon them. But I don’t think that’s the case: I’d rather get a rough idea of a talk than no idea at all.

If you know about other talks I missed, please let me know and I’m happy to add them here anytime. Special thanks to Sacha and Payton for helping me collect these talks!

Presentations

Emily Bernstein: Unique And Predictive Relationships Between Components Of Cognitive Vulnerability And Symptoms Of Depression (slides)
Giovanni Briganti: Network analysis of empathy items from the Interpersonal Reactivity Index in 1973 young adults (slides)
Julian Burger: TIPS: Therapy Implications from Psychopathological Dynamical Systems (slides)
Giulio Costantini: Towards disentangling correspondence and emergence: The case of conscientiousness (slides)
Jonas Dalege: The Attitudinal Entropy (AE) Framework as a General Theory of Attitude (slides)
Sacha Epskamp: Intermediate Stable States in Substance Use — Can allowing use prevent abuse? (slides)
Sacha Epskamp: Network psychometrics — phase 2 (slides)
Talya Greene: Dynamic network analysis of depression symptoms (slides)
Alexandre Heeren: Deconstructing trait anxiety — A network perspective (slides)
Adela Isvoranu: State of the Art and Clinical Applications of Network Psychometrics (slides)
Payton Jones: Beyond symptoms — why diagnostic criteria are not enough for network analysis (slides)
Payton Jones: Depression comorbidity – applying bridge centrality in networks to understand overlap with other mental disorders (slides)
Payton Jones: Breaking the Assumption of Group Homogeneity in Networks — Partitioning Networks with Machine Learning (slides)
Lachlan McWilliams: Reconceptualizing adult attachment relationships — a network perspective (slides)
Lachlan McWilliams: A Network Perspective on the Relationship between Life Satisfaction and Depression (slides)
Daniel Moriarity: Comparison of the networks of depression and anxiety symptoms in adolescents as a function of inflammation (slides)
Maien Sachisthal: Uncovering Countries’ Science Interest Structure Using a Psychometric Network Approach (slides)
Matthew Southward: Which deficits are most central to Borderline Personality Disorder? A network analysis of 4,000 participants (slides)

I also zipped all slides, which you can find here.

The post ICPS 2019: Collection of presentations related to dynamical systems appeared first on Psych Networks.

Network models of factor scores: mixing apples with oranges

Eiko Fried — Fri, 08 Feb 2019 14:31:59 +0000

We recently had a new paper accepted, for which we estimated networks of the 7 subscales of the Contingencies of Self-Worth Scale. The paper was written by Giovanni Briganti uses open data of 670 participants.

I conducted some sensitivity and robustness analyses for the paper, and became interested in all the different ways in which one can summarize the subscale scores (e.g sum scores vs factor scores), and how this affects the resulting correlation matrices and regularized partial correlation networks. Since a lot of this was beyond the scope of the paper, I wanted to write this up here, in a tutorial with open code and open data. This seems especially relevant because more and more researchers are interested in modeling networks of meaningful constructs, and I firmly believe that subscales can have a lot of advantages over individual items in case they represent such meaningful constructs; see Giovanni’s paper — and also our recent paper on schizotypal personality — for examples. Folks with more time & talent could easily turn this tutorial into a simulation study. Hint hint.

In any case, you can find data and code for this tutorial here. I will not show all code below, only the most relevant parts. Now, let’s start to mix apples and oranges! Overall, the goal of the tutorial here is to estimate correlation coefficient and regularized partial correlation networks based on either sum-scores or factor scores of the items belonging to each of the 7 subscales. Finally, we will also explore what happens if we estimate the factor scores not based on simple structure, but allow for cross-loadings.

Introduction: generalized network psychometrics

Before we start, please make sure to add Sacha Epskamp’s paper on generalized network psychometrics to your reading list, which is the first comprehensive publication on combining latent variable models and network models. Sacha wrote the R-package lvnet, which allows you to tackle the challenge we faced in Giovanni’s paper in one single step. lvnet first takes out the shared variances of items and models them as latent variables, and then models the the relations among the latent variables as a network. This is called a latent network model — here an example from Sacha’s paper:

Unfortunately, we were unable to use lvnet for the paper because we had too many items; lvnet does not scale well beyond 20 or so items. If you have fewer items, this is likely what you want to do.

Comparison of zero order correlations

In a first step, we estimate correlations of the 7 domains estimated via sum-scores, and compare that to correlations among the 7 domains estimated via factor-scores (using a confirmatory factor analysis).

We estimate sum-scores by adding up items, and estimate factor scores in a 7-factor model in lavaan:

 
cmodel <-  ' FS =~ c7 + c10 + c16 + c24 + c29 
             C  =~ c3 + c12 + c20 + c25 + c32
             A  =~ c1 + c4 + c17 + c21 + c30
             GL =~ c2 + c8 + c18 + c26 + c31
             AC =~ c13 + c19 + c22 + c27 + c33
             V  =~ c5 + c11 + c14 + c28 + c34
             OA =~ c6 + c9 + c15 + c23 + c35 '

fit <- cfa(cmodel, data=data)

Fit is surprisingly good for a 7-factor simple structure model¹, with a significant chi-square of ~1593 and 539df in n=680; CFI=0.91, TLI=0.9, and RSMEA=0.05.

In the next step, we now estimate and visualize correlations among either sum scores or factor scores:

 
sum <- read_delim("sumscoredata.csv", ";", escape_double = FALSE, trim_ws = TRUE)
factor <- read_delim("factorscoredata.csv", ";", escape_double = FALSE, trim_ws = TRUE)

cor_sum <- cor_auto(sum)
cor_fac <- cor_auto(factor)

layout(t(1:2))
nw_sum <- qgraph(cor_sum, details=T, maximum=.72, title="correlation among sum scores")
nw_fac <- qgraph(cor_fac, details=T, title="correlation among factor scores")
dev.off()

As you can see, the correlations in the case of factor scores (compared to sum-scores) are considerably larger, due to disattenuation of the correlation coefficients due to unreliability. That is, if we assume that variation in the latent variables causes variation in the observed items, i.e., if we believe that the observed items are passive indicators that are caused by the latent variable — an assumption I find highly plausible given the nature of the scale — then the results can be interpreted in the sense that the factor scores remove measurement error. This increases relationships between subscales because we have more reliable subscale scores now².

The next code snippet shows the sum of all correlations for each case, and you can see that the sum of correlations in case of factor scores is larger compared to the sum scores:

 
sum(abs(cor_sum[upper.tri(cor_sum)])) # 4.91
sum(abs(cor_fac[upper.tri(cor_fac)])) # 6.95

The two correlation matrices are nearly perfectly linearly related, with a correlation of 0.99:

Comparison of regularized partial correlation networks

In a second step, we estimate regularized partial correlation networks (Gaussian Graphical Models, GGMs) on the data. They look (somewhat?) different³, and once again, the coefficients in the GGM are stronger:

 
n1 <- estimateNetwork(sum, default="EBICglasso", threshold=TRUE, lambda.min.ratio=0.001)
n2 <- estimateNetwork(factor, default="EBICglasso", threshold=TRUE, lambda.min.ratio=0.001)

layout(t(1:2))
plot1 <- plot(n1, layout='spring', title="sum score GGM", details=TRUE, maximum=0.55)
plot2 <- plot(n2, layout=plot1$layout, title="factor score GGM", details=TRUE)
dev.off()

The correlation among the two adjacency matrices is 0.92, and again, the coefficients seem to be linearly related and stronger in case of the factor scores.

The plot below summarizes both correlations and regularized partial correlation structures for sum scores vs factor scores:

Conclusion

The specific results in this paper are entirely consistent with the disattenuation hypothesis: correlations and regularized partial correlations among subscales estimated based on factor scores of the 7 subscales are stronger than those estimated based on sum scores. And while the two GGMs differ in that the factor score GGM has 2 negative edges, the analysis here reveals that this is likely due to the fact that the lasso put these specific negative relations to zero in the sum score GGM. This is the case because the reliability of the sum score subscales was lower, which translates into lower power for the lasso to detect relations above zero. We can sort of test this by estimating a partial correlation network without regularization, in which case we'd expect that the coefficient for the negative edge that is featured in the regularized factor score GGM, but not in the regularized sum score GGM, would be slightly negative. This is indeed the case:

 
n1pcor <- estimateNetwork(sum, default="pcor")
n2pcor <- estimateNetwork(factor, default="pcor")

layout(t(1:2))
plot1 <- plot(n1pcor, layout=plot1$layout, title="sum score GGM, no regularization", details=TRUE, maximum=0.58)
plot2 <- plot(n2pcor, layout=plot1$layout, title="factor score GGM, no regularization", details=TRUE)
dev.off()

Overall, I'd like to see more simulation work on this, but I see no principled obstacle for using factor scores for subscales, and advantages such as removing measurement error (if that indeed makes sense of the items & construct under scrutiny).

Updates

This blog led to some interesting discussions, and I would like to highlight two points specifically.

First, as pointed out by Carlo Chiorri, correlations among factors scores tend to be inflated in case of cross-loadings when simple structure is enforced. Assume we have a data-generating model with some cross-loadings (i.e. some items load on more than 1 factor). Now we fit two models: model 1, a simple structure CFA model as we do above, and model 2, a factor model where we model all cross-loadings via ESEM (Exploratory Structural Equation Modeling). The inter-correlations among factors scores in model 1 will be larger than in model 2. This has been shown in the 2013 paper by Marsh and colleagues.

If we estimate this ESEM model and compare the resulting networks to the two models we estimated above (sum-score and factor-score networks), we get a network model that is somewhere between the sum-score network (smaller relations) and the factor score network (larger relations). This makes sense: there is some disattenuation (inflation?), but it is smaller than in the simple structure model.

It's been a while since I fit ESEM models in lavaan, if somebody knows a quick way to do that, please let me know and I'll add the code here.

Second, Erikson Kaszubowski summarized some general issues of using factor scores as observed variables, which are worth quoting in full:

The problem of using factor score in any subsequent analysis as an observed variable is a very old problem in factor analysis literature, as you probably know. It's mainly a problem because there is an infinite number of solutions that satisfy the equations used to estimate factor scores. Most software packages (like lavaan) simply spit the least square estimates for the factor scores, which have some interesting properties (well, they ARE least square estimates and they also maximize factor score and latent factor correlation). But they don't preserve latent variable correlation: even if the model has orthogonal latent variables, the factor scores computed from the solution will correlate.

Correlation between factor scores computed using least square estimates are usually an overestimate of latent factor correlation (compare cor(factor_scores) with inspect(lav_obj, 'std.all')$psi. And the problem goes deeper: we can build factor scores that are orthogonal or better reflect the correlation between latent variables (the 'Anderson' and 'ten Berge' methods in 'psych' package), but, given the infinite number of possible solutions, there is some arbitrarity in factor score solutions and their correlation matrix.

Given factor scores indeterminacy, I would suggest three alternatives to using (least squares) factor scores:

(1) Apply the latent variable network model from lvnet to single indicator latent variable using some reliability estimate to fix parameters.
(2) Apply the network model directly to the estimated latent variable correlation (or covariance) matrix. Not the best alternative, but still possible.
(3) Evaluate factor score indeterminacy to guarantee the indeterminacy is small enough to be ignored and proceed with the analysis with factor scores anyway.

You can find code & data here; source for the image "apples & oranges": Michael Johnson, flickr.

The post Network models of factor scores: mixing apples with oranges appeared first on Psych Networks.

Fixed-margin sampling & networks: New commentary on network replicability

Sacha Epskamp — Thu, 22 Nov 2018 15:30:15 +0000

This guest blog post was written by Sacha Epskamp, Assistant Professor at the University of Amsterdam. It does not necessarily reflect the opinions of other authors on the paper introduced below.

The methodological journal Multivariate Behavioral Research just published our latest contribution to the debate surrounding the replicability of psychological networks (the pre-print and codes were already available on OSF). To recap, last year, the Journal of Abnormal Psychology published a series of four papers:

A paper claiming networks have limited replicability (Forbes, Wright, Markon, & Krueger, 2017a)
Our commentary on this paper showing the networks to replicate well in a comprehensive re-analysis¹(Borsboom et al., 2017)
A commentary by Steinley, Hoffman, Brusco, & Sher (2017) introducing a new method and claiming networks do not differ from what is expected by chance, supporting arguments of Forbes et al.
And a rebuttal of the original authors (Forbes, Wright, Markon, & Krueger, 2017b), relying on the work by Steinley et al. (2017) as well as a literature review in PTSD networks to present further evidence that networks have limited replicability.

Papers 1, 2, and in some extent 4 have already extensively been discussed online, and I will not discuss them in detail again here. The Psychosystems group posted a short statement on its blog, Eiko posted a longer blog on the whole process, and I posted a public post-publication peer review on pubpeer (the original authors responded to these, so make sure to read their comments as well as ours to get a fair and balanced overview). We mentioned working on a (critical) commentary on paper 3 in these discussions as well.

In an unprecedented display of scientific integrity, Douglas Steinley himself invited us in response to submit this commentary to the prestigious methodological journal Multivariate Behavioral Research instead of the Journal of Abnormal Psychology, which we happily accepted. This brings me to the topic at hand. In this blog post, I will summarize the two main points of our commentary, showing that the conclusions made in paper 3 are unwarranted. Next, I will showcase an example not discussed in our commentary, in which the proposed methodology has strong utility, by re-analyzing a 10-year old network of the DSM-IV-TR.

Fixed-margin sampling in network psychometrics

The commentary by Steinley et al. (2017) (paper 3) introduces a new method for creating “confidence intervals” from network parameters and descriptives. We term this method “fixed-margin sampling” as it entails generating new random binary datasets while keeping the margins (row and column totals) intact. These sampled datasets can subsequently be used to create intervals for any statistic. Using this method, the authors conclude that “many of the results are indistinguishable from what would be expected by chance”, labeling such findings “uninteresting”, and suggesting that “previously published findings using [eLasso] should be reevaluated using the above testing procedure.” Forbes et al. (2017b) re-iterate the last statement in paper 4: “this finding highlights the central role that Steinley et al.’s (2017) proposed method should have in psychopathology network research going forward.”

In our new commentary in Multivariate Behavioral Research, we show that the work of Steinley et al. (2017) relies on a misinterpretation of psychological networks. The crux of the matter lies in this paragraph:

“Clearly, psychopathology networks fall into the class of affiliation matrices where the connections are measured between observation and diagnostic criteria. The relationships between the criteria are then then derived by transforming the two-mode affiliation matrix to a one-mode so-called “overlap/similarity” matrix between the criteria, where traditional network methods are applied to this overlap/ similarity matrix.”

Steinley et al. (2017) interpret Ising models used in psychology as one-mode projections of so-called two-mode or bipartite graphs. That means that they interpret a standard person by symptom data matrix:

	Depressed mood	Fatigue
Bob	1	1
Alice	1	0

To actually encode a network:

Depressed mood — Bob — Fatigue — Alice

Of which the symptom by symptom network is a so-called projection:

Depressed mood – Fatigue

That is, depressed mood and fatigue interact with one-another because they share one person: Bob. Similarly, Bob and Alice interact with one-another because they share one symptom: fatigue. But this is not the intention of the Ising model, which is a model for conditional independencies. In fact, one core assumption in many multivariate statistical models is that the cases (Bob and Alice) are independent, which means they do not interact with one-another because they share a symptom. The symptom fatigue is also a different property of Alice and Bob, and not an entity in the world they both interact with.

While keeping the column totals intact has little to no effect in generating such data, keeping the row totals (in this case: number of symptoms per person) intact has a striking effect; it leads to highly one-dimensional models used as null-distribution:

This means that due to latent variable – network equivalences, fixed-margin sampling takes a fully connected network model as null-distribution to test estimated network models. Such a procedure will lead to false conclusions on the importance of estimated network parameters. We show in our commentary that the method performs poorly in classifying true effects as interesting and false effects as uninteresting.

Fixed-margin sampling to assess unidimensionality

Fixed-margin sampling generates data under a particular kind of unidimensionality: a model in which each item is interchangeable (Verhelst, 2008). Such a model is also known as the Rasch model. As the DSM classification of disorders typically treats symptoms as interchangeable, it is interesting to see how well combining fixed-margin sampling with the eLasso Ising estimation method performs as a non-parametric test for the Rasch model. This may be worthwhile, as it would give us insight in where the data diverge from the Rasch model and thus alternative explanations are warranted (although not required). We investigated this in two simulation studies. In one simulation study, we simulated data under the following model:

By varying the C parameter (correlation between factors), we can change the model from two independent variables (C = 0) to one latent variable (C = 1), and by increasing the R parameter (residual effect), we can add two violations of the one- or two-factor model. The results are as followed:

The colored areas in the background show the probability to flag the edges related to parameter R as not being in line with the Rasch model. It shows that the method works very well in detecting these local violations of the Rasch model. The boxplots show global departures and should be high if all edges are flagged as departures from the Rasch model. This should be the case in the C = 0 condition but doesn’t happen often. This shows that while this method is powerful in detecting local departures from the Rasch model, it is far less powerful in detecting global departures form the Rasch model. As such, I would recommend using this method to gain insight in where unidimensionality does not hold, but not to use it as a test for the Rasch model itself by counting the number of flagged edges.

Fixed-margin sampling in network science

While fixed-margin sampling should not be used to assess psychometric networks that are based on estimating statistical models from large sample sizes of independent cases (e.g., the Ising model), the method has strong utility in the analysis of one-mode network structures that are derived from bipartite graphs. One such a network is actually the first network I ever constructed and analyzed: the DSM-IV-TR network (Borsboom, Cramer, Schmittmann, Epskamp, & Waldorp, 2011):

I worked on this network about 10 years ago as an undergraduate student, long before we even entertained the notion of estimating network models from data. All the codes and data used for the network visualizations are still online. To create this network, we created an affiliation matrix of 439 symptoms by 148 disorders, encoding if a symptom was listed as a symptom of a disorder in the DSM-IV-TR. The data simply is a 439 row and 148 column matrix with 0 indicating a symptom is not listed in a disorder and 1 indicating a symptom is listed in a disorder. This dataset can subsequently be transformed to a 439 by 439 adjacency matrix encoding if symptoms are both listed in at least one shared disorder by multiplying the data with it’s transpose and making every non-zero element one ².

While the dataset used for this network looks similar to a dataset you may use when estimating an Ising model (zeroes and ones), it is actually a very different kind of data. In an Ising model, the more cases we add the more precise our estimates of the network model: if we double the sample size from 10.000 to 20.000 we would not expect a completely different model, merely to be able to estimate the parameters even more precise. In the DSM-IV-TR affiliation matrix, however, this is not the case: doubling the number of symptoms listed will fundamentally change the interpretation of the model (doubling the number of nodes), and doubling the number of disorders listed will fundamentally change the structure of the network. We also cannot do this, as we already listed all symptoms and disorders from the DSM-IV-TR. Rather than columns representing random stochastic variables and rows representing independent realizations, the columns and rows both represent simply static entities: words in a book. The network structure is simply a description of this book, and equivalent to adding more cases would be to test more books (e.g., Tio, Epskamp, Noordhof, & Borsboom, 2016).

This means we also cannot bootstrap the dataset, as resampling symptoms with replacement or dropping symptoms hardly makes sense. So what can we do? The fixed-margin sampling method described by Steinley et al. (2017) actually gives a very nice new tool to investigating such structures. Given that some symptoms are listed in many disorders (e.g., insomnia is listed in 17 disorders), and some disorders feature many symptoms (e.g., Schizoaffective Disorder lists 33 symptoms), we would expect certain levels of connectivity by chance alone. If that is the case, the network structure itself is not very interesting, and investigating the symptom and disorder sum totals would be sufficient by itself.

I re-investigated the dataset using fixed-margin sampling and constructed 1,000 networks (codes available here). These are three random samples of the generated networks:

In this case, there is no need for any quantitative analysis and the plots themselves already reveal a remarkable difference between the networks expected by chance alone and the network observed in the DSM-IV-TR: the fixed-margin sampling networks are far denser (more edges) and interconnected. This means that we can conclude that there is structure in the DSM-IV-TR, and symptoms are not randomly assigned to disorders. Of course, there is a structure in the DSM-IV-R imposed by the chapters alone (e.g., mood disorders, personality disorders, etcetera). A follow-up analysis could be to split up the data per chapter, apply fixed-margin sampling to each block, and subsequently combine the data again. Three snapshots of these networks are as follows:

These look much more similar to the observed DSM-IV-TR network, which means that the clustering per chapters already explains a lot of the structure. However, these networks are still denser (number of edges ranging from 3,513 to 3,674, compared to 2,626 in the observed network), meaning that investigating the graph structure is still interesting. When looking at strength centrality, we can see that in the high ranges of strength centrality the observed node strengths are less than could be expected by chance:

Here, red dots indicate nodes with a strength that was not in the expected interval by fixed-margin sampling.

Conclusion

To conclude, our new manuscript shows that the fixed-margin sampling routine proposed by Steinley et al. (2017) should not be used to evaluate psychometric network models, but shows promise in detecting local departures from Rasch models. Furthermore, the method of fixed-margin sampling is highly valuable in analyzing typical network structures that are constructed rather than estimated. I think that the combination of our commentary in the Journal of Abnormal psychology last year (Borsboom et al., 2017) and the new commentary discussed in this blog post safely put most criticism raised in last years series of papers to rest, and I look forward moving this discussion further in discussing crucial challenges network analysis faces in the coming years, of which there are many (see, e.g., comment # 5 on the pubpeer discussion, several publications on challenges to network analysis, and continued debate on the interpretation of networks).

If you would like to study fixed-margin sampling yourself, all codes for our simulations are available on the Open Science Framework. These rely on both R and Matlab, however, to fully replicate the analysis as proposed by Steinley et al. (2017). For R-based alternatives, the R packages RaschSampler and vegan should have similar performance.

References

Borsboom, D., Cramer, A. O. J., Schmittmann, V. D., Epskamp, S., & Waldorp, L. J. (2011). The Small World of Psychopathology. PLoS ONE, 6(11), e27407.

Borsboom, D., Fried, E., Epskamp, S., Waldorp, L., Van Borkulo, C., Van Der Maas, H., & Cramer, A. (2017). False alarm? A comprehensive reanalysis of “Evidence that psychopathology symptom networks have limited replicability” by Forbes, Wright, Markon, and Krueger. Journal of Abnormal Psychology, 126(7), 989–999. http://doi.org/10.17605/OSF.IO/TGEZ8

Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017a). Evidence that Psychopathology Symptom Networks have Limited Replicability. Journal of Abnormal Psychology, 126(7), 969–988. http://doi.org/10.1037/abn0000276

Forbes, M. K., Wright, A. G. C., Markon, K. E., & Krueger, R. F. (2017b). Further evidence that psychopathology networks have limited replicability and utility: Response to Borsboom et al. and Steinley et al. Journal of Abnormal Psychology, 126(7), 1011–1016.

Steinley, D., Hoffman, M., Brusco, M. J., & Sher, K. J. (2017). A Method for Making Inferences in Network Analysis: Comment on Forbes, Wright, Markon, and Krueger (2017). Journal of Abnormal Psychology, 126(7), 1000–1010.

Tio, P., Epskamp, S., Noordhof, A., & Borsboom, D. (2016). Mapping the manuals of madness: Comparing the ICD-10 and DSM-IV-TR using a network approach. International Journal of Methods in Psychiatric Research, 25(4), 267–276. http://doi.org/10.1002/mpr.1503

Verhelst, N. D. (2008). An Efficient MCMC Algorithm to Sample Binary Matrices with Fixed Marginals. Psychometrika, 73(4), 705–728. http://doi.org/10.1007/s11336-008-9062-3

Footnotes

The post Fixed-margin sampling & networks: New commentary on network replicability appeared first on Psych Networks.

(Mis)interpreting Networks: An Abbreviated Tutorial on Visualizations

Payton Jones — Wed, 26 Sep 2018 14:38:26 +0000

This guest post was written by Payton Jones (payton_jones@g.harvard.edu), and is the abbreviated version of a tutorial published in Frontiers recently (“Visualizing Psychological Networks: A Tutorial in R“). Payton is a graduate student at Harvard University in the Richard J. McNally lab. His research focuses on the etiology of mental disorders and statistical methods.

Network analysis is an exploding field! I absolutely love seeing the constant flow of new papers and new researchers using network methods.

With such a quickly growing science, it’s difficult to keep up! Although I have personally found the network community to be very welcoming, friendly, open, and accessible, that doesn’t negate the fact that there is just a lot of information to keep up with.

As I work to keep up and learn new information, I’ve become aware of some mistakes I made early on. This tutorial is intended to keep you from making the same mistakes that I did.

The Big Four: How NOT to Interpret Networks

At this point, I’ve seen at least a few dozen symposium presentations on network analysis, many of them from researchers just starting out with network analysis. Here are some of the most frequent errors:

Error #1: Nodes in the center are central

“The somatic symptoms of depression were out on the periphery, barely part of the network”
“Extraversion is right in the middle of the personality network”

This misinterpretation pops up all the time. I blame linguistics.

In reality, there are several different types of node centrality, and none of them necessarily correspond to network plots. You have your centrality values and centrality plots—use those instead of looking at the network plot. Eiko wrote about this and similar centrality interpretation problems in a recent blog post.

Error #2: Nodes close together are similar, nodes far apart are not

“As you can see, sad mood and agitation were on opposite ends of the network”
“Surprisingly, weight gain and weight loss were right next to each other”

Again, not so. A good way to reality check is to look at the edges: if node distance corresponds perfectly to node similarity, all edges of a certain thickness should have exactly the same length, and all edges of the same length should have the same thickness (hint: that’s rarely if ever true).

Error #3: Left, right, up, and down mean something

“Intrusive thoughts were far to the right, close to the depression cluster”

This one is rarer but pops up occasionally. I see people make this error especially when there are meaningful clusters. Resist temptation; if you want to know if a node is “close to the depression cluster”, use bridge centrality instead.

Error #4: A different looking network means a different network structure

“As you can see from the plots, the networks did not replicate well, indicating that edges in network analyses are mostly comprised of measurement error”

*Cough*. Relating to Error #3, in most network plots rotation is totally arbitrary (the enemy’s gate is down!). In addition, certain types of network plots (e.g., force-directed) are very unstable even with similar networks. This can wreak some serious havoc when trying to interpret multiple networks.

In my experience it’s much more informative to use a correlational approach (e.g., do the edges correlate? does centrality correlate?) to judge replicability (Eiko discussed these and similar metrics in the section “A word of caution” in this blog post). For plotting, it’s best to either use an consistent averaged layout for both plots or the Procrustes method (see Figure 6 in the full tutorial).

How Do We Fix It?

One way to fix the interpretation problem is to stop making any visual interpretation! Certainly, we shouldn’t pretend we understand 20-dimensional causal information just because we made a 2-dimensional plot of partial correlations (!).

But the whole point of a visualization is to help us understand our data better. And although we should stick to the numbers for our research conclusions, there is something to be said for exploratory hypothesis generation that comes from good visualizations (as long as you don’t pretend that these hypotheses were confirmatory all along).

So our second option is to try and do the best we can to make accurate visualizations, while simultaneously reigning ourselves in with visual interpretations. Here is a super quick overview of some of the options.

Visualizing Your Network

This is a short version of the open-access tutorial. You’ll need the qgraph and networktools packages for the code to work, and we’ll get some data from package MPsychoR. First some code for getting a network:

library(qgraph) library(networktools) library(MPsychoR) data(Rogers) mynetwork <- EBICglasso(cor_auto(Rogers), nrow(Rogers))

1. Force-directed algorithms (AKA the way you've been doing it already)

myqgraph <- qgraph(mynetwork, layout="spring")

Most networks you see "in the wild" are plotted with the Fruchterman-Reingold algorithm. This algorithm works by treating each network edge is like a spring—it pulls when connected nodes get to far away and pushes when they get too close.

This creates really nice-looking networks in which nodes never overlap, and edges are mostly about the same length (the "resting state" for the spring forces). In very sparse networks, it can be a good way to visualize clusters. But all of the Big Four are dangerous here.

2. Multidimensional scaling (MDS)

MDSnet(myqgraph, MDSadj=mynetwork)

Multidimensional scaling solves our Error #2—distances between nodes actually become interpretable in an MDS plot. In other words, the algorithm works so that nodes placed close together usually share a strong relationship, and nodes far apart do not. This is, of course, accounting for the fact that we've squashed everything down into just two dimensions—so stay careful with interpretations!

3. Principal components analysis (PCA)

PCAnet(myqgraph, cormat=cor_auto(Rogers))

You've probably heard of PCA—but for plotting a network? PCA is a simplification method—it tries to squash all of your complex data down into just a few variables. This is perfect for us, because our plots have only two (count 'em) dimensions! The idea here is that we give each node a score on Component #1 and on Component #2, and then use these scores to plot on an X/Y axis (this solves Error #3). We preserve complexity in the form of network edges but make the plot as simple as two principal components. If you're feeling adventurous, you could even come up with labels for what the dimensions might mean.

4. Eigenmodels

EIGENnet(myqgraph)

If you liked PCA, you're in for a treat with eigenmodels. PCA is great, but it requires that you either have the original data or a correlation matrix from that original data. In other words, PCA isn't really based on your network per se, it's just based on the same data that generated the network. Thankfully, someone[https://www.stat.washington.edu/~pdhoff/code.php] came up with a way to extract latent variables from symmetric relational data (AKA undirected network data). The interpretation is similar to PCA plotting, but everything comes straight from the network itself.

And that's it!

If you liked the abbreviated version, you can check out the full tutorial for a deeper look at the same concepts and some more sophisticated code. Happy visualizations!

Citation:
Jones, P. J., Mair, P., & McNally, R. (2018). Visualizing Psychological Networks: A Tutorial in R. Frontiers in Psychology, 9, 1742. https://doi.org/10.3389/fpsyg.2018.01742

The post (Mis)interpreting Networks: An Abbreviated Tutorial on Visualizations appeared first on Psych Networks.

How to interpret centrality values in network structures (not)

Eiko Fried — Mon, 03 Sep 2018 14:54:33 +0000

For posts on psych-networks, I usually want to write about a topic for a few months, don’t find the time, and then a new paper comes along that prompts me to write up what I had in mind. For instance:

A paper by Madhoo & Levin 2016 prompted me to write a tutorial on community detection
A paper by Terluin et al. 2016 led to a blog post on differential variability
A paper by Afzali et al. 2017 prompted me to write a tutorial on network stability
A paper by Guloksuz et al. 2017 led me to write a piece on challenges of the network approach
And a paper by Forbes et al. 2017 led to another blog post on stability

I wanted to write about centrality inference for a while, and a new paper published in Molecular Psychiatry, one of the leading journals in psychiatry in terms of impact factor and visibility, convinced me I should write this up. The paper is entitled “The symptom network structure of depressive symptoms in late-life: Results from a European population study”, by Murri and colleagues. This paper is written up in a similar way to many other papers, and I really don’t mean to single out this specific paper or the specific authors here. It just comes at a time where I don’t want to prepare my course or review the paper for Abnormal … so here we go.

Centrality

After estimating network structures, e.g. among symptoms, in between-subjects (cross-sectional) or within-subjects (time-series) data¹, researchers often calculate centrality estimates. This provides information about the inter-connectedness of a variable. There are different ways to do that, and many different centrality measures exist.

For instance, this R syntax creates a small network, and shows that the green node has a centrality of 6 because it is connected to 6 other variables:

 
library("qgraph")
AM <- matrix(0,10,10) 
AM[1,2] <- AM[2,1] <- AM[2,3] <- AM[3,2] <- AM[2,4] <- AM[4,2] <- AM[3,4] <- AM[4,3] <- AM[3,6] <- AM[6,3] <- AM[3,8] <- AM[8,3] <- AM[3,9] <- AM[9,3] <- AM[3,10] <- AM[10,3] <- AM[4,7] <- AM[7,4] <- AM[5,7] <- AM[7,5] <- AM[5,10] <- AM[10,5] <- AM[9,10] <- AM[10,9] <- 1 
gr <- list(c(1,2,4:10), 3) 
names <- c("1","3","6","3","2","1","2","1","2","3")
N <- qgraph(AM, groups = gr, color=c('#cccccc', '#3CB371'), labels=names,
              border.width=3,edge.width=2, vsize=9, 
              border.color='#555555', edge.color="#555555", label.color="#555555")

In the abstract of their paper, Murri et al. conclude, after estimating a network structure in cross-sectional data:

Death wishes, depressed mood, loss of interest, and pessimism had the highest values of centrality. Insomnia, fatigue and appetite changes had lower centrality values […]. In conclusion, death wishes, depressed mood, loss of interest, and pessimism constitute the “backbone” that sustains depressive symptoms in late-life. Symptoms central to the network of depressive symptoms may be used as targets for novel, focused interventions and in studies investigating neurobiological processes central to late-life depression.

I am not sure this necessarily follows, and I will explain below why.

Problematic inferences

Researchers often estimate centrality values after the network structures are estimated, and then use these to draw substantive inferences. One common inference in cross-sectional data is that central symptoms are the most important symptoms, another that we should intervene on central symptoms. Murri et al. above describe that central symptoms "sustain" depression, which has a clear temporal component to it.

There are number statistical and substantive concerns you should keep in mind here. And just to clarify this again, this is not an exercise in finger-pointing. While I have always tried to be careful, and while my colleagues will tell you how careful I try to be when it comes to causal language (thanks in large part to my education as a postdoc in the lab of Francis Tuerlinckx), I am sure a few sentences have slipped through my fingers in papers I am co-author on. In my own work, the strongest statement I could find is in my first network paper, where we found loneliness to play a crucial role in bereavement, in longitudinal data. In the abstract, we concluded that "future studies should examine interventions that directly target such symptoms". In the discussion, we wrote that "that intervention programs should directly target loneliness". This is supported by and embedded in the clinical literature on the relation between loneliness and bereavement, but writing this paper today I would clarify that this conclusion does not follow from the network model alone.

So what are the main concerns and pitfalls when interpreting centrality values?

Statistical concerns

For my very first paper published in 2014, I analyzed the relations between 14 specific depression symptoms and impairment. It turned out that some symptoms explained a lot more variance than other symptoms. One of the reviewers raised the concern of differential variability, which I have not forgotten since. Differential variability means that items differ in their variability (standard deviation and variance), and items with little to no variability cannot relate to other variables. Since centrality is a function of relations among items, such floor or ceiling effects that can stem from differential variability will affect centrality estimates. Terluin et al. 2016 wrote a paper specifically about this for network models, which I discuss in more detail elsewhere. This means that when you estimate centrality, you should consider checking means and variances of items, and try to understand how these values determine network structures and centrality estimates. In other words, what happens if you correlate centrality values and standard deviations of your items … and they result in a correlation of 0.5? It's worth thinking about this.

Another issue is reliable estimation: Are you sure the most central symptom in your network is actually meaningfully more central than the other symptoms? This is similar to other statistics, where the mean height of 177cm in a group of men vs 171 cm in a group of women does not tell you if there is a meaningful (or statistically significant) difference between the height of men and women — unless you know the sample size and distributions. You can test this statistically, and probably want to do that before drawing inferences. We describe here how to do that in detail, via the centrality difference test.

What if 3 nodes in your network actually measure the same latent variable, such as the CES-D scale that captures sad mood, feeling blue, and feeling depressed? Your network will feature strong edges between these nodes, and their centrality will be very high, but intervening on either to decrease the others would not be a real "network intervention" because all you do is reducing sadness by intervening on sadness. That may be interesting all by itself, but the edges between these 3 items are not legitimate putative causal relations: They are simply shared variances due to measuring the same thing multiple times, as we highlight (along with a potential solution to this) in our challenges paper published in Perspectives on Psychological Science.

There is also the danger of conditioning on colliders or other estimation problems. Conditioning on colliders, for instance, will induce artificial edges in your network that are not part of the true model. In other words, be careful not to confuse an estimated parameter (like an edge weight) for the truth … obviously, this applies to all models, and not only network models.

Finally, there is the issue of mixing levels. Network models in cross-sectional data are estimated on between-subjects data, and as has been highlighted in recent work, it does not automatically follow that such results lead to proper conclusions regarding the within-subjects level. I am not saying it never follows, and I think these levels might align quite often, but it is an empirical question we have not yet answered. Here a fairly strongly worded recent investigation by Fisher et al. published in PNAS showing that there are important differences between these levels; Simpson's paradox is also highly relevant on this context.

Substantive concerns

Now let's assume our network is estimated without any problems or bias, and concentration problems is the most central depression symptom in the network structure of symptoms based on cross-sectional data. Can we conclude that it is the "most important" problem, and that we should focus our interventions on concentration problems?

As we wrote up in the discussion section of a recent paper we published in Clinical Psychology Science, this conclusion does not necessarily follow, for various reasons.

It is important to highlight that centrality does not automatically translate to clinical relevance and that highly central symptoms are not automatically viable intervention targets. Suppose a symptom is central because it is the causal endpoint for many pathways in the data: Intervening on such a product of a causal chain would not lead to any changes in the system.

In other words, the endpoint of a causal chain would end up being a highly central symptom in your network structure if there are many problems that lead to this specific symptom. Given the cross-sectional nature of your data, you cannot find evidence for this temporal relationship, and will, in this case, draw wrong causal inferences that does not follow from the results. So I urge caution with these and similar interpretations.

Another possibility is that undirected edges imply feedback loops (i.e., A—B comes from AB), in which case a highly central symptom such as insomnia would feature many of these loops. This would make it an intervention target that would have a strong effect on the network if it succeeded — but an intervention with a low success probability, because feedback loops that lead back into insomnia would turn the symptom “on” again after we switch it “off” in therapy.

Put differently, it is an important and non-trivial question if it might not be worthwhile to intervene on peripheral (non central) symptoms, because the probability of switching them permanently is higher: Few other symptoms will keep them in their original state. If sleep problems leads to 5 other problems, but is at the same time the consequence of 5 problems, it will be nearly impossible to simply target insomnia via interventions because you don't target the causes of insomnia.

A third example is that a symptom with the lowest centrality, unconnected to most other symptoms, might still be one of the most important clinical features. No clinician would disregard suicidal ideation or paranoid delusions as unimportant just because they have low centrality values in a network. Another possibility is that a symptom is indeed highly central and causally affects many other nodes in the network but might be very difficult to target in interventions. As discussed by Robinaugh, Millner, and McNally (2016), “Nodes may vary in the extent to which they are amenable to change” (p. 755).

I believe these are significant challenges to common centrality interpretations. We conclude in the paper by stating:

In sum, centrality is a metric that needs to be interpreted with great care and in the context of what we know about the sample, the network characteristics, and its elements. If we had to put our money on selecting a clinical feature as an intervention target in the absence of all other clinical information, however, choosing the most central node might be a viable heuristic.

There is other critical work on centrality on the way. One paper that is accepted in the Journal of Consulting and Clinical Psychology, by Rodebaugh and colleagues, features a detailed empirical investigation of centrality in both cross-sectional and time-series data. You can find the preprint here. The most relevant parts of the abstract read:

We first estimated a state-of-the-art regularized partial correlation network based on participants with social anxiety disorder (N = 910) to determine which symptoms were more central. Next, we tested whether change in these central symptoms were indeed more related to overall symptom change in a separate dataset of participants with social anxiety disorder who underwent a variety of treatments (N = 244). […] Centrality indices successfully predicted how strongly changes in items correlated with change in the remainder of the items. Findings were limited to the measure used in the network and did not generalize to three other measures related to social anxiety severity. In contrast, infrequency of endorsement² showed associations across all measures. […] The transfer of recently published results from cross-sectional network analyses to treatment data is unlikely to be straightforward.

Conclusions

The whole idea of network theory is that things are complicated. We should draw inferences proportional to this level of complexity, and be careful of over-interpreting our data. Obviously, this is just as important for time-series analyses, where we have time as an additional (and very important) dimension, but that only buys us Granger-causality, and only helps with a few of the issues described above.

A crucial step forward is to actually test interventions in patients based on centrality (and other) estimations, and I am excited to see such projects putting network theory to the test — and provides a fantastic opportunity for falsification of network theory we should all embrace.

EDIT 11-29-2018:
There are two new preprints discussing centrality inferences critically, which you can find here (Dablander & Hinne, 2018) and here (Bringmann et al., 2018).

The post How to interpret centrality values in network structures (not) appeared first on Psych Networks.

Bootstrapping edges after regularization: clarifications & tutorial

Eiko Fried — Thu, 19 Jul 2018 14:17:12 +0000

I have read the following statement numerous times in the last months: “If the 95% boostrapped CI of an edge weight in a network contains zero, the edge cannot be differentiated from zero”. This is incorrect in case of regularized partial correlation networks, and I thought I would write this up as a short public answer I can refer to more easily in the future.

One of the core features of the R package bootnet is bootstrapping of network edge weights. Bootstrapping is a procedure where you estimate your network structure and parameters of interest many times (e.g. 1000), each time with a slightly different sample. You obtain these different samples by drawing people from your data randomly with replacement. This means that in your first bootstrap, Bob might be in there 3 times (but not Alice), whereas in the second bootstrap, Alice is in there twice (but Bob is absent). The larger the sample, and the more similar the people to each other, the more stable your parameters will be.

We put together bootnet to give you an idea about the stability of the edge weights and other parameters. If the edge between two nodes, A and B, is widely different every time you resample, it means your bootstrapped 95% CI will be all over the place. I have described bootnet and its functionalities in a previous tutorial blog post, and we have a tutorial paper on bootnet that was published in 2017.

Here, for instance, is the output of the bootnet() function in bootnet from our recent CPS paper for dataset 1, available in the supplementary materials:

On the y-axis are all 120 edges in the network (labels omitted to keep it legible), the x-axis shows the strength of the edge weights (you can see that nearly all edge weights are positive). The red dots are the point estimates of all edges, the grey area the “95% bootstrapped CI”, as I used to call it. The important point is that many of the CIs will just overlap with zero. How do we interpret this? Usually, if a point estimate of a parameter is 0.1 (e.g. a correlation), we do not know if that parameter is different from zero. This is a normal situation in statistics, and the reason why we usually look at the CI coverage: if the CI includes 0, the parameter cannot be differentiated from 0.

In the case of regularized partial correlation networks, the story is different. If an edge is 0.1 after regularization, that means we have two types of information about the parameter: 1) our best guess is that the parameter is 0.1; 2) our best guess is that the parameter is different from 0.

Why? because we use regularization, a well-validated, sophisticated statistical technique to only keep coefficients in the network that are not zero¹. Obviously, regularization can still lead to errors, there are situations in which regularization does not do well, and there are numerous other methods that should be considered when estimating networks (for a summary on these points, see Sacha Epskamp’s recent blog post). But the main point is that we have to interpret the 95% CI of regularized edge weights differently than we usually do.

For the supplementary materials of a network paper on depression symptoms & inflammation that we are about to submit, Jonas Haslbecks helped us look at this topic from a somewhat different angle, and also provided some insights on the topic. It would take me more sentences to reiterate what Jonas said very concisely, so I will simply paste the relevant part of the supplementary materials here:

“In order to quantify the uncertainty associated with all edge-estimates, we computed a bootstrapped sampling distribution based on 100 bootstrap samples, for each of the edge-estimates. For each of the six networks estimated in the main article we present summaries of the p(p-1)/2 bootstrapped sampling distributions, one for each edge parameter. Specifically, we display the 5% and 95% quantiles of the bootstrapped sampling distribution and show the proportion of nonzero estimates on point that indicates the mean of the sampling distribution.
Because we use regularization to estimate the network models, all edge-estimates are biased towards zero, which implies that all sampling distributions are biased towards zero. Thus, these sampling distributions are not Confidence Intervals (CIs) centered on the true (unbiased) parameter value. This means that if the quantiles of the bootstrapped sampling distribution overlap with zero it could be that the corresponding CI does not overlap with zero. However, if the quantiles of the bootstrapped sampling distribution do not overlap with zero, we know that also the corresponding CI does not overlap with zero (explained in detail in Epskamp, Borsboom & Fried, 2017). Further details of bootstrap analyses are available in the supplemented R code.”

Jonas also produced the following plot²:

The numbers show how often an edge was estimated non-zero in the 100 bootstraps. As you can see, the edge C2—C7 was included in all networks, and while the 95% bootstrapped CI of D1—C1 does include zero, it was estimated to be non-zero in 78% of the 100 estimated networks. The code for these plots can be found in the supplementary materials of our paper.

And as announced a while ago on Twitter, Sacha has recently implemented a function somewhat similar to what Jonas had put together in bootnet 1.1. This version of bootnet is currently available on github, and should be on CRAN soon. As Sacha explained in the blog, you can now plot the quantile intervals only for the times the parameter was not set to zero, in addition to a box indicating how often the parameter was set to zero.

 
install.packages("devtools") 
library("devtools")
install_github("sachaepskamp/bootnet")
library("bootnet")
library("psych")

data(bfi)
network1 <- estimateNetwork(bfi[,1:5],  default = "glasso")
boot1 <- bootnet(network1, nBoots = 500, nCores = 8)
plot(boot1, plot = "interval", split0 = TRUE, order="sample", labels=FALSE)

The above code will lead to a few warnings (ignore for the purpose of this tutorial³), and leads to the following figure:

The saturation is proportional to how often an edge was included in the network. The figure doesn't scale too well at present (i.e. to more than 5 or 10 nodes), but it's something you'd likely report in the supplementary materials anyway, and not in the main part of your paper.

Thanks to Sacha and Jonas for the work they've put into this. Oh, and you know what's also new? Bootnet estimates and tells you how long your coffee break should be ...

The post Bootstrapping edges after regularization: clarifications & tutorial appeared first on Psych Networks.