Outline of the method
Backbone chemical shift assignments (i.e., HN, 15N, 13Cα, 13Cβ, Hα, and C′) can usually be obtained rapidly, semi-automatically, and reliably from a set of triple resonance spectra obtained from 15N, 13C double labeled protein. In order to determine a protein NMR structure, shift assignments are the necessary first stage20, meaning that any protein that has an NMR structure must have backbone shift assignments (which are now required to be submitted with the structures). Crucially, shift assignments are subject to minimal manipulation. This is very different from distance restraints obtained from NOE spectra. For distance restraints there are inevitably many stages of data sorting and rejection, no matter whether the restraints are inputted manually or automatically. Some person or computer must decide which signals to include, how to assign them, when to reject or modify the restraints, and how to set the calibration between peak intensity and distance restraint. All of these reduce the value of distance restraints as independent quality measures. For all these reasons, backbone assignments are better validation input than distance restraints.
In our method, backbone chemical shift assignments are compared to a structure. Although a number of programs can calculate shifts from structures, they are not sufficiently accurate to perform a useful comparison except in rather general terms14,21. Hence, the heart of our method is that the backbone shifts are used to calculate the local rigidity of the backbone, based on an established measure, the random coil index (RCI), which calculates how similar each of the six backbone shifts is to a tabulated “random coil shift” value22. It has been shown to provide a remarkably reliable guide to local rigidity, whether measured by NMR relaxation or by crystallographic B factor22,23.
We compare local rigidity as predicted by RCI to that computed from a structure using techniques from mathematical rigidity theory. Several software packages and methodologies relying on rigidity theory such as the program Floppy Inclusions and Rigid Substructure Topography (FIRST)24,25 and its various implementations and extensions have been developed for fast computational predictions of rigidity and flexibility of protein structures. Starting with a protein structure, FIRST creates a topological graph (a constrained network consisting of nodes and edges), where atoms are represented by vertices (nodes), and edges represent the constraints corresponding to the intramolecular interactions of a protein e.g., covalent bonds, hydrogen bonds and hydrophobic interactions. Applying the mathematically well-established pebble game algorithm and molecular theorem26, FIRST then determines locally rigid subgraphs (rigid regions in the network), a process referred to as rigid cluster decomposition. The degree of flexibility can be quantified as a function of hydrogen bond energy by repeating rigid cluster decomposition as edges corresponding to hydrogen bonds are removed incrementally from the graph, and noting the energy at which the Cα atom of a residue no longer belongs to a rigid subgraph, i.e., becomes flexible. We convert this energy to a Boltzmann population ratio, effectively giving the probability that a residue is flexible.
The two measures of local rigidity (RCI and FIRST) are then compared and a numerical comparison gives a score: a measure of how well the local rigidities match, and thus whether the structures produce a local rigidity that matches the one described by the RCI. Following extensive trials, we use two different measures of similarity: (a) The correlation between the two. This tests whether the peaks and troughs are in the same places. Peaks are locally mobile regions while troughs are locally rigid regions, generally regular secondary structure. This comparison therefore mainly shows whether the secondary structure is correct. (b) The root-mean square deviation (RMSD) between the two. This tests whether overall the structure is too rigid or too floppy. It is strongly influenced by the geometry of hydrogen bonds and other non-covalent interactions in the structure. As discussed below, the overall rigidity of a structure is determined by not just backbone but also sidechain interactions. Protein structures are often compared by superimposing backbones (often cartoons). Two structures can look very similar in a comparison like this, but one can be much worse than the other in terms of the accuracy of the hydrogen bond network or side chain orientations. In order to assess the relationship between structure and function, it is important that sidechain positions should be correct. The RMSD measure between RCI and FIRST is therefore important because it measures the kind of accuracy needed to interpret function.
Correlation and RMSD are simple numerical values, but they do not scale linearly to intuitive measures of accuracy. In the output from ANSURR, we therefore present the numerical values, but we also calculate the percentile of each measure relative to all NMR structures in the PDB with good chemical shift completeness (see below for further discussion of completeness), which we term correlation score and RMSD score, respectively. These are relative values (and are thus likely to change slightly as more structures are added to the PDB), but are easier for the user to interpret. The crystallographic validations in the PDB adopt a similar procedure for both geometrical tests and Rfree. In what follows, we report the scores rather than the numerical values.
Correlation and RMSD scores highlight different aspects of accuracy, so we decided not to combine them into a single score to represent overall accuracy. Instead, we plot both on a single graph, as demonstrated in Fig. 1 for four different models of the same protein. The most accurate models (those with good scores for both correlation and RMSD) appear in the top right-hand corner of the plot.
RECOORD CNS (unrefined) vs. CNW (refined) structures
There is currently no accepted method for measuring the accuracy of an NMR structure. There are also no databases of “good” or “bad” structures. We have therefore created or adopted datasets that can reasonably be assumed to be bad or good. There are also a range of methods that have been used to measure structure quality, including the geometrical methods described above. We compare our findings to these methods in turn.
The RECOORD project27 set out to standardize and tabulate methods for NMR structure calculation. It produced a curated set of structure restraints, which were applied in a consistent manner to more than 500 proteins from the PDB, and then analysed the resultant structures. It carried out two sets of structure calculations on each protein: one using a typical simulated annealing calculation in vacuo using CNS (termed CNS) and another using CYANA (termed CYA)28,29. They then took these two sets of structures and refined them in explicit water using ARIA (termed CNW and CYW, respectively)30. There is an extensive literature indicating that refinement of NMR structures in explicit water produces better geometries and generally better quality structures31, so not surprisingly, the CNW/CYW structures are better.
We have therefore carried out a comparison of those CNS and CNW datasets for which there is sufficient (>75%) chemical shift completeness, which comprises a set of 173 ensembles each made up of 25 models (see Supplementary Table 1 for details). From here on we refer to these datasets as CNS75 and CNW75, respectively. In Fig. 2a, the differences in average correlation and RMSD score for each of the 173 ensembles are depicted in a histogram. There is no real improvement in correlation score on refinement in water, with an average improvement of only 1.0. This is expected, as the secondary structure, which ultimately determines the location of peaks and troughs and therefore correlation, changes very little during refinement. As an example, Fig. 2b shows the lack of change in fold for one model. In contrast, RMSD scores are greatly improved, with an average increase of 36.2 and with only one ensemble scoring worse after refinement. This is mostly due to the improvement in hydrogen bonding which acts to rigidify the entire protein. This can be seen in the difference in computed rigidity before and after refinement (Fig. 2c).
Decoy vs experimental structures
A straightforward way to generate a pool of structures of varying accuracy is to calculate decoys. We used the 3DRobot web server32, which begins from a crystal or NMR structure, identifies possible structure scaffolds from a library, assembles them together, and then refines them. The sets of structures generated using 3DRobot are designed to have a high density of structures close to the native state with good hydrogen bonding and compactness, and of high diversity. In other words, they should look like genuine proteins, with good packing and hydrogen bonds, and they should span a range, from structures that closely resemble the native state, to ones that are very different, although still with good packing and hydrogen bonding. These sets therefore allow us to test whether ANSURR can discriminate between structures that are all geometrically good structures, but differ in their accuracy.
For about half (79 of 173) of the ensembles in the CNW75 dataset (see Supplementary Table 2 for a list of the chosen models), we calculated a group of 300 decoys. These decoys were then compared to the experimental structure using a Global Distance Test (GDT), which measures the similarity between two structures, calculated as the largest set of Cα atoms in the model structure falling within a defined cut-off of their position in the test structure, after superimposing the structures33. A selection of results is shown in Fig. 3a (results for all 79 sets of decoys are depicted in Supplementary Fig. 1). The score for the experimental structure is indicated by a black asterisk and scores for decoys are circles, colored according to their GDT.
From inspection of the examples shown in Fig. 3a, it can be seen that the experimental model is usually one of the best structures, as one would expect. Also apparent is that as GDT increases (i.e., as decoys become more like the experimental structure), both the validation scores tend towards those of the experimental structure, confirming that our method does specifically validate accuracy. There is a consistent difference between α-helical proteins (e.g., 1itf) and β-sheet proteins (e.g., 1gh5). Helical proteins tend to improve more in their correlation score than in their RMSD score. This seems reasonable: helices are almost always rigid26, but not necessarily in the correct location, whereas β-sheet proteins tend to improve more in their RMSD score, because β-sheets can adopt a wide range of local geometries, implying that β-sheet proteins can appear almost correct but have poor hydrogen bonds and thus be much too floppy. Scores for proteins with both α-helical and β-sheet content tend to move in a diagonal, a combination of both effects.
The protein 1bqz presents an interesting example. It is DnaJ, a largely helical protein, and unusually there are many decoys that have a better correlation score but considerably worse RMSD score than the experimental structure, despite most having GDT of around 80 and with some close to 100. However, calculated hydrogen bond correctness scores34 i.e., the percentage of hydrogen bonds in the experimental structure that also appear in the decoy, show that these high correlation score decoys (indicated in Fig. 3a with a red box) have poor hydrogen bond geometries (average hydrogen bond correctness of only 47%), and hence a poor RMSD score. By contrast, decoys for 1cfc that approach the accuracy of the experimental structure have good RMSD and correlation scores and have better hydrogen bond geometries (average hydrogen bond correctness of 69%).
Another interesting example is the beta-fold protein 1gh5 (an antifungal protein from S. tendae). There are some decoys with better correlation and only marginally worse RMSD scores than the experimental structure, suggesting that they are actually more accurate. Figure 3b compares the experimental structure and best scoring decoy. Immediately obvious (and reassuring) is that at backbone level, both structures are very similar. We note that the experimental structure has a relatively poor correlation score. It is therefore possible that some of the refined decoys genuinely are more accurate: such behavior has been noted before35. Inspection of the full dataset in Supplementary Fig. 1 suggests that this is not uncommon. NMR structure refinement is a joint optimization against NMR restraints and known properties of proteins. The observation that some decoys have better scores than NMR structures implies that in some NMR structure calculations, the balance is not yet optimal, and more weight needs to be given to packing and hydrogen bonding for example. We therefore feel that this finding is not a problem with the method: on the contrary, it shows that the method is useful for identifying incompletely refined structures and improving them.
Comparison between ANSURR and conventional predictors of accuracy
Conventional predictors of accuracy include the number of restraints per residue used to generate a structure, the number of restraint violations, and the total energy of the structure. The RMSD between models in an ensemble is often used to gauge precision, and by proxy to provide a guide to accuracy. Whilst these measures are expected to be related to accuracy, they do not explicitly determine it. Here we compare these measures to the average RMSD score (Fig. 4a) and correlation score (Fig. 4b) for each ensemble in the CNW75 dataset.
Overall the correlations are much stronger for RMSD score than correlation score. This is not surprising. These predictors largely assess local accuracy, and thus relate to RMSD score better than correlation score.
There is a moderate positive correlation between the number of distance restraints per residue and RMSD score. This is reasonable: a structure with a higher density of distance restraints is expected to be more tightly defined and therefore more (correctly) rigid overall36. Categorizing distance restraints according whether they are sequential, medium or long-range reveals a slightly better correlation for medium/long-range restraints than for sequential restraints. This is again expected, as medium/long-range restraints provide more information on protein fold, and for this reason are considered a better predictor of accuracy37.
The number of distance restraint violations per residue does not correlate with either validation score. Roughly two thirds of structures do not have any violations at all, because structures are normally refined until there are no, or no significant, violations. It is fairly common practice that restraints that are routinely violated during a structure calculation will be discarded along the way. In fact, programs which automate NMR structure calculation do exactly that. For this reason, restraint violations are clearly not a good predictor of accuracy8,13,38.
The number of dihedral restraints per residue does not correlate with either validation score, but dihedral restraint violations do. This is probably because the restraints themselves are relatively weak, so that they do not particularly guide the structure to become more accurate. However, weak negative correlation to dihedral restraint violations suggests that these kinds of restraints successfully flag major issues.
There is a moderate negative correlation to the total energy of the structure. Typically, the selection of the final set of structures to represent the ensemble is based on total energy, and the correlation seen here suggests that this is a reasonable way of identifying good structures.
Both RMSD score and correlation score are negatively correlated with ensemble RMSD suggesting that more precise ensembles do also tend to be more accurate. However, if those ensembles with RMSD larger than 2.5 Å are excluded (blue fit lines) then the gradient becomes almost zero, suggesting that for better structures, ensemble RMSD is a poor guide to accuracy. Similar comments have been made previously14,15,16,17,39.
In summary, our measures of accuracy match reasonably well to expectations: the number of distance restraints per residue is a fairly good predictor of accuracy, while dihedral restraints, and distance and angle violations, are not. Precision (ensemble RMSD) is a poor predictor of accuracy, while overall energy is surprisingly good as a predictor of accuracy.
Comparison between ANSURR and geometry-based validation measures
It is unclear whether a correlation should be expected between geometrical quality and accuracy. However, given that NMR structure calculation is to a large extent an optimization of models, using both NMR-derived restraints and knowledge-derived geometrical factors simultaneously, it is reasonable to expect that an accurate structure should also have good geometrical quality. We therefore compared our validation scores with two widely used indicators of geometrical quality: Ramachandran outliers and clashscore40. The program ramalyze (part of the Molprobity suite of validation tools) was used to compute the φ/ψ angles for each residue in the CNW75 dataset and categorize them as either favorable, allowed or outlier. The program clashscore (also part of Molprobity) was used to compute the average number of clashes per 1000 atoms for each ensemble in the CNW75 dataset. In Fig. 5a, b, the results for each ensemble are plotted against RMSD score and correlation score, respectively.
The correlation between Ramachandran distribution and RMSD score is the best for any of the measures presented here. In other words, an ensemble with good Ramachandran distribution (high percentage in the favored category, low percentage in the additionally allowed category, small percentage in the outlier category) is likely to have good accuracy. It seems reasonable to find that the most accurate structures are in general those with the best backbone geometry, as was proposed many years ago41.
Geometrical measures have previously been combined together into a consensus quality indicator called Resolution-by-proxy or ResProx, which combines 25 geometrical measures, and has excellent agreement (R = 0.92) with X-ray structure resolution42. In Fig. 5c we take one PDB structure (1cfc) and generate 300 decoys (i.e., structures with good protein quality, but spanning a range of similarity to the 1cfc structure as assessed by the Global Distance Test), and show that there is a reasonable match between ResProx score and GDT. In other words, structures that are closer to the NMR structure are in general of better geometrical quality. However, we also show that the match is much better for ANSURR: in other words, ANSURR performs much better than a consensus goodness measure based simply on geometrical features. Supplementary Fig. 2 includes results for a range of other proteins, with similar results in all cases.
We have also carried out a similar comparison, but against the consensus measure PROSESS, which combines a wide range of both geometry-based and restraint-based measures, and is thus the closest available consensus test for ANSURR43. The PROSESS scores are critically dependent on NOE restraint violations, and are thus subject to the same problems as discussed in the previous section. A more detailed discussion can be found in Supplementary Information.
Comparison of NMR and X-ray crystal structures
An obvious first test for this method is to compare NMR and X-ray crystal structures. It is important to stress here that because we compare the structures to time-averaged chemical shifts obtained using solution NMR, we are explicitly testing how well the structures compare to the average state of the protein in solution. Crystal structures are almost always based on many more experimental values, and more precisely measured values, than NMR structures. One would therefore inherently expect them to be more accurate, except that crystal structures represent the structure of the protein in a crystalline environment, whereas the NMR chemical shifts measure structural rigidity in solution. We are therefore here making a somewhat unfair, but important, comparison, namely how well X-ray structures represent the structure of a protein in solution.
Here we compare X-ray structures for 68 proteins taken from the set used to train the SHIFTX2 program for predicting chemical shifts44 with corresponding NMR structures taken from the PDB (see “Methods” section for details). We validated each structure using our method and averaged the validation scores over each chain for X-ray structures, and each model for NMR ensembles. The results are shown in Fig. 6. The correlation scores for X-ray and NMR structures are very similar. In other words, the locations of rigid and flexible regions, generally representing regular secondary structure in solution, are calculated similarly well by both methods. The slightly lower correlation score for X-ray structures originates from some loops seeming to be too rigid. That is, X-ray structures are missing some peaks in flexibility that should be there according to RCI. Crystal structures are obtained from crystalline arrays, and are usually obtained at cryo-temperatures, both of which will tend to reduce the observed flexibility. There is a large body of evidence45,46,47 that crystal structures obtained at room temperature show much more local variability than do structures obtained at cryo-temperatures, and calculations on lysozyme confirm that the room temperature structures have flexibility that matches the RCI data much better than cryo-temperature structures (Supplementary Note 1 and Supplementary Figs. 3–6). By contrast, in the RMSD score comparison, on average crystal structures are significantly better. When one inspects the data for individual proteins, it is clear that NMR structures are in general much too flexible, particularly in loop regions. This is not unexpected, as NMR structures often have few restraints in loops.
Structures of the glucocorticoid-bound adhesion receptor GPR97–Go complex
No statistical methods were used to predetermine sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment.
HEK293 cells were obtained from the Cell Resource Center of Shanghai Institute for Biological Sciences (Chinese Academy of Sciences). Spodoptera frugiperda (Sf9) cells were purchased from Expression Systems (cat. 94-001S). Y-1 cells were originally obtained from the American Type Culture Collection (ATCC). The cells were grown in monolayer culture in RPMI 1640 with 10% FBS (Gibco) at 37 °C in a humidified atmosphere consisting of 5% CO2 and 95% air.
Constructs of GPR97 and miniGo heterotrimer
For protein production in insect cells, the human GPR97 (residues 21–549) with the autoproteolysis motif mutation (H248/A and T250/A) was sub-cloned into the pFastBac1 vector. The native signal peptide was replaced with the haemagglutinin signal peptide (HA) to enhance receptor expression, followed by a Flag tag DYKDDDK (China peptide) to facilitate complex purification. An engineered human Gαo1 with Gαo1 H domain deletion, named miniGαo1 was cloned into pFastBac1 according to published literature29. Human Gβ1 with the C-terminal hexa-histidine tag and human Gγ2 were subcloned into the pFastBacDual vector. scFv16 was cloned into pfastBac1 with the C-terminal hexa-histidine tag and the N-terminal GP67 signal peptide. To examine the activities of GPR97, the GPR97-FL-WT (wild-type full-length GPR97), GPR97-FL-AA (GPR97 GPS site mutation, H248/A and T250/A), GPR97β (GPR97 with the NTF removed, residues 250–549) and GPR97-β-T (GPR97β with the N-terminal tethered Stachel sequence removed, residues 265–549) were sub-cloned into the pcDNA3.1 plasmid. The GPR97 mutations E298A, R299A, F345A, F353A, H362A, L363A, Y364A, V370A, F371A, Y406A, W421A, W490A, A493G, I494A, L498A and N510A were generated using the Quikchange mutagenesis kit (Stratagene). The G protein BRET probes were constructed according to previous publications42,43. Human G protein subunits (Gαq, Gβ1 and Gγ2) were sub-cloned into the pcDNA3.1 expression vectors. The Gαq-RlucII subunit was generated by amplifying and inserting the coding sequence of RlucII into Gαq between residue L97 and K98. The Gqo probe, in which the six amino acids of the C-terminal of Gαq-RlucII were substituted with those from Gαo1, was constructed by PCR amplification using synthesized oligonucleotides encoding swapped C-terminal sequences. The GFP10–Gγ2 plasmid was generated by fusing the GFP10 coding sequence in frame at the N terminus to Gγ2. All of the constructs and mutations were verified by DNA sequencing.
High titre recombinant baculoviruses were generated using Bac-to-Bac Baculovirus Expression System. In brief, 2 μg of recombinant bacmid and 2 μl X-tremGENE HP transfection reagent (Roche) in 100 μl Opti-MEM medium (Gibco) were mixed and incubated for 20 min at room temperature. The transfection solution was added to 2.5 ml Sf9 cells with a density of 1 × 106 per ml in a 24-well plate. The infected cells were cultured in a shaker at 27 °C for 4 days. P0 virus was collected and then amplified to generate P1 virus. The viral titres were determined by flow cytometric analysis of cells stained with gp64-PE antibody (1:200 dilution; 12-6991-82, Thermo Fisher). Then, Sf9 cells were infected with viruses encoding GPR97-FL-AA, miniGαo, Gβγ, and with or without scFv16, respectively, at equal multiplicity of infection. The infected cells were cultured at 27 °C, 110 rpm for 48 h before collection. Cells were finally collected by centrifugation and the cell pellets were stored at −80 °C.
GPR97–Go complex formation and purification
Cell pellets transfected with virus encompassing the GPR97-FL-AA, miniGo trimer and scFv16 (only existed in cell pellets for purifying the cortisol–GPR97-FL-AA–Go–scFv16 complex) were resuspended in 20 mM HEPES, pH 7.4, 100 mM NaCl, 10% glycerol, 10 mM MgCl2 and 5 mM CaCl2 supplemented with Protease Inhibitor Cocktail (B14001, Bimake) and 100 μM TCEP (Thermo Fisher Scientific). The complex was formed for 2 h at room temperature by adding 10 μM BCM (HY-B1540, MedChemExpress) or cortisol (HY-N0583, MedChemExpress), 25 mU/ml apyrase (Sigma), and then solubilized by 0.5% (w/v) lauryl maltose neopentylglycol (LMNG; Anatrace) and 0.1% (w/v) cholesteryl hemisuccinate TRIS salt (CHS; Anatrace) for 2 h at 4 °C. Supernatant was collected by centrifugation at 30,000 rpm for 40 min, and the solubilized complex was incubated with nickel resin for 2 h at 4 °C. The resin was collected and washed with 20 column volumes of 20 mM HEPES, pH 7.4, 100 mM NaCl, 10% glycerol, 2 mM MgCl2, 25 mM imidazole, 0.01% (w/v) LMNG, 0.01% GDN (Anatrace), 0.004% (w/v) CHS, 10 μM BCM (or cortisol) and 100 μM TCEP. The complex was eluted with 20 mM HEPES, pH 7.4, 100 mM NaCl, 10% glycerol, 2 mM MgCl2, 200 mM imidazole, 0.01% (w/v) LMNG, 0.01% GDN, 0.004% (w/v) CHS, 10 μM BCM (or cortisol) and 100 μM TCEP. The elution of nickel resin was applied to M1 anti-Flag resin (Sigma) for 2 h and washed with 20 mM HEPES, pH 7.4, 100 mM NaCl, 10% glycerol, 2 mM MgCl2, 5 mM CaCl2, 0.01% (w/v) LMNG, 0.01% GDN, 0.004% (w/v) CHS, 10 μM BCM (or cortisol) and 100 μM TCEP. The GPR97–Go complex was eluted in buffer containing 20 mM HEPES, pH 7.4, 100 mM NaCl, 10% glycerol, 2 mM MgCl2, 0.01% (w/v) LMNG, 0.01% GDN, 0.004% (w/v) CHS, 10 μM BCM (or cortisol), 100 μM TCEP, 5 mM EGTA and 0.2 mg/ml Flag peptide. The complex was concentrated and then injected onto Superdex 200 increase 10/300 GL column equilibrated in the buffer containing 20 mM HEPES, pH 7.4, 100 mM NaCl, 2 mM MgCl2, 0.00075% (w/v) LMNG, 0.00025% GDN, 0.0002% (w/v) CHS, 10 μM BCM (or cortisol) and 100 μM TCEP. The complex fractions were collected and concentrated individually for EM experiments.
Cryo-EM grid preparation and data collection
For the preparation of cryo-EM grids, 3 μl of purified BCM-bound and cortisol-bound GPR97–Go complex at approximately 20 mg/ml was applied onto a glow-discharged holey carbon grid (Quantifoil R1.2/1.3). Grids were plunge-frozen in liquid ethane cooled by liquid nitrogen using Vitrobot Mark IV (Thermo Fisher Scientific). Cryo-EM imaging was performed on a Titan Krios at 300 kV accelerating voltage in the Center of Cryo-Electron Microscopy, Zhejiang University. Micrographs were recorded using a Gatan K2 Summit direct electron detector in counting mode with a nominal magnification of ×29,000, which corresponds to a pixel size of 1.014 Å. Movies were obtained using serialEM at a dose rate of about 7.8 electrons per Å2 per second with a defocus ranging from −0.5 to −2.5 μm. The total exposure time was 8 s and intermediate frames were recorded in 0.2-s intervals, resulting in an accumulated dose of 62 electrons per Å2 and a total of 40 frames per micrograph. A total of 2,707 and 5,871 movies were collected for the BCM-bound and cortisol-bound GPR97–Go complex, respectively.
Cryo-EM data processing
Dose-fractionated image stacks for the BCM–GPR97–Go complex were subjected to beam-induced motion correction using MotionCor2.144. Contrast transfer function (CTF) parameters for each non-dose-weighted micrograph were determined by Gctf45. Particle selection, 2D and 3D classifications of the BCM–GPR97–Go complex were performed on a binned data set with a pixel size of 2.028 Å using RELION-3.0-beta246.
For the BCM–GPR97–Go complex, semi-automated particle selection yielded 2,026,926 particle projections. The projections were subjected to reference-free 2D classification to discard particles in poorly defined classes, producing 911,519 particle projections for further processing. The map of the 5-HT1BR–miniGo complex (EMDB-4358)47 low-pass filtered to 40 Å was used as a reference model for maximum-likelihood-based 3D classification, resulting in one well-defined subset with 307,700 projections. Further 3D classifications focusing the alignment on the complex produced two good subsets that accounted for 166,116 particles, which were subsequently subjected to 3D refinement, CTF refinement and Bayesian polishing. The final refinement generated a map with an indicated global resolution of 3.1 Å at a Fourier shell correlation of 0.143.
For the cortisol–GPR97–Go complex, particle selection yielded 4,323,518 particle projections for reference-free 2D classification. The well-defined classes with 2,201,933 particle projections were selected for a further two rounds of 3D classification using the map of the BCM-bound complex as reference. One good subset that accounted for 335,552 particle projections was selected for a further two rounds of 3D classifications that focused the alignment on the complex, and produced one high-quality subset with 75,814 particle projections. The final particle projections were subsequently subjected to 3D refinement, CTF refinement and Bayesian polishing, which generates a map with a global resolution of 2.9 Å. Local resolution for both density maps was determined using the Bsoft package with half maps as input maps48.
Model building and refinement
For the structure of the BCM–GPR97–Go complex, the initial template of GPR97 was generated using the module ‘map to model’ in PHENIX44. The coordinate of the 5-HT1BR–Go complex (PDB ID: 6G79) was used to generate the initial models for Go (ref. 44). Models were docked into the EM density map using UCSF Chimera49, followed by iterative manual rebuilding in COOT50 according to side-chain densities. BCM and lipid coordinates and geometry restraints were generated using phenix.elbow. BCM was built to the model using the ‘LigandFit’ module in PHENIX. The placement of BCM shows a correlation coefficient of 0.81, indicating a good ligand fit to the density. The model was further subjected to real-space refinement using Rosetta51 and PHENIX44.
For the structure of the cortisol–GPR97–Go complex, the coordinates of GPR97 and Go from the BCM-bound complex and scFv16 from the human NTSR1–Gi1 complex (PDB ID: 6OS9) were used as initial model. Models were docked into the density map and then were manual rebuilt in COOT. The agonist cortisol was built to the model using the ‘LigandFit’ module as described, showing a good density fit with a correlation coefficient of 0.80. The model was further refined using Rosetta51 and PHENIX44. The final refinement statistics for both structures were validated using the module ‘comprehensive validation (cryo-EM)’ in PHENIX44. The goodness of the fit of the model to the map was performed for both structures using a global model-versus-map FSC (Extended Data Fig. 2). The refinement statistics are provided in Extended Data Table 1. Figures of the structures were generated using UCSF Chimera, UCSF ChimeraX52 and PyMOL53.
Molecular dynamics simulation of the BCM–GPR97 and cortisol–GPR97 complexes
On the basis of the favour binding poses of BCM and cortisol with the receptor GPR97, which was calculated by the LigandFit program of PHENIX, the GPR97–agonist complexes were substrate from the two GPR97–agonist–mGo complexes for molecular dynamics simulation. The orientations of receptors were calculated by the Orientations of Proteins in Membranes (OPM) database. Following this, the whole systems were prepared by the CHARM-GUI and embedded in a bilayer that consisted of 200 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) lipids by replacement methods. The membrane systems were then solvated into a periodic TIP3P water box supplemented with 0.15 M NaCl. The CHARMM36m Force Filed was used to model protein molecules, CHARMM36 Force Filed for lipids and salt with CHARMM General Force Field (CGenFF) for the agonist molecules BCM and cortisol.
Then, the system was subjected to minimization for 10,000 steps using the conjugated gradient algorithm and then heated and equilibrated at 310.13 K and 1 atm for 200 ps with 10.0 kcal mol−1 Å−2 harmonic restraints in the NAMD 2.13 software. Next followed five cycles of equilibration for 2 ns each at 310.13 K and 1 atm, for which the harmonic restraints were 5.0, 2.5, 1.0, 0.5 and 0.1 kcal mol−1 Å−2 in sequence.
Production simulations were run at 310.13 K and 1 atm in the NPT ensemble using the Langevin thermostat and Nose–Hoover method for 200 ns. Electrostatic interactions were calculated using the particle mesh Ewald (PME) method with a cut-off of 12 Å. Throughout the final stages of equilibration and production, 5.0 kcal mol−1 Å−2 harmonic restraints were placed on the residues of GPR97 that were within 5 Å of Go in the BCM (or cortisol)–GPR97–Go complex to ensure that the receptor remained in the active state in the absence of the G protein. Trajectories were visualized and analysed using Visual Molecular Dynamics (VMD, version 1.9.3)
cAMP ELISA detection in Y-1 cells
Y-1 cells were transfected with Gpr97 siRNA (si-97, GUGCAGGGAAUGUCUUUAA) or control siRNA (si-Con) for 48 h. After starvation for 12 h in serum-free medium, the cells were further stimulated with cortisone (8 nM), forskolin (5 μM) (Sigma-Aldrich) or control vehicle for 10 min. Then, cells were washed three times with pre-cooled PBS and resuspended in pre-cooled 0.1 N HCl containing 500 μM IBMX at a 1:5 ratio (w/v). The samples were neutralized with 1 N NaOH at a 1:10 ratio (v/v) after 10 min. The supernatants were collected after centrifugation of the samples at 600g for 10 min. The supernatants were then prepared for cAMP determination using the cAMP Parameter Assay Kit (R&D Systems) according to the manufacturer’s instruction. The Gpr97 expression level under various conditions were further confirmed using quantitative real-time PCR.
Mouse adrenocorticotoma cell line Y-1 cells were transfected with Gpr97 siRNA (si-97) or control siRNA (si-Con) for 48 h. Then, the cells were treated with serum-free medium for 12 h. After that, cortisone (16 nM) or ACTH (0.5 μM) were added to cells for 30 min. The supernatants of the cell culture medium were collected for measurements of corticosterone by ELISA according to the manufacturer’s instructions.
Quantitative real-time PCR
Total RNA of cells was extracted using a standard TRIzol RNA isolation method. The reverse transcription and PCR experiments were performed with the Revertra Ace qPCR RT Kit (TOYOBO FSQ-101) using 1.0 μg of each sample, according to the manufacturer’s protocols. The quantitative real-time PCR was conducted in the Light Cycler apparatus (Bio-Rad) using the FastStart Universal SYBR Green Master (Roche). The mRNA level was normalized to GAPDH in the same sample and then compared with the control. The forward and reverse primers for GPR97 used in the experiments were CAGTTTGGGACTGAGGGACC and GCCCACACTTGGTGAAACAC. The mRNA level of GAPDH was used as an internal control. The forward and reverse primers for GAPDH were GCCTTCCGTGTTCCTACC and GCCTGCTTCACCACCTTC.
cAMP inhibition assay
To measure the inhibitory effects on forskolin-induced cAMP accumulation of different GPR97 constructs or mutants in response to different ligands or constitutive activity, the GloSensor cAMP assay (Promega) was performed according to previous publications12,13. HEK293 cells were transiently co-transfected with the GloSensor and various versions of GPR97 or vehicle (pcDNA3.1) plasmids using PEI in six-well plates. After incubation at 37 °C for 24 h, transfected cells were seeded into 96-well plates with serum-free DMEM medium (Gibco) and incubated for another 24 h at 37 °C in a 5% CO2 atmosphere. Different ligands were dissolved in DMSO (Sigma) to a stock concentration of 10 mM and followed by serial dilution using PBS solution immediately before the ligand stimulation. The transfected cells were pre-incubated with 50 μl of serum-free DMEM medium containing GloSensor cAMP reagent (Promega). After incubation at 37 °C for 2 h, varying concentrations of ligands were added into each well and followed by the addition of forskolin to 1 μM. The luminescence intensity was examined on an EnVision multi-label microplate detector (Perkin Elmer).
The Gqo protein activation BRET assay
According to previous publications, the BCM dipropionate-induced GPR97 activity could be measured by chimeric Gqo protein assays25. The Gqo BRET probes were generated by replacing the six amino acids of the C-terminal of Gq-RlucII with those from GoA1, creating a chimeric Gqo-RlucII subunit47. GFP10 was connected to Gγ. The Gqo protein activation BRET assay was performed as previously described54. In brief, HEK293 cells were transiently co-transfected with control D2R and various GPR97 constructs, plasmids encoding the Gqo BRET probes, incubated at 37 °C in a 5% CO2 atmosphere for 48 h. Cells were washed twice with PBS, collected and resuspended in buffer containing 25 mM HEPES, pH 7.4, 140 mM NaCl, 2.7 mM KCl, 1 mM CaCl2, 12 mM NaHCO3, 5.6 mM d-glucose, 0.5 mM MgCl2 and 0.37 mM NaH2PO4. Cells that were dispensed into a 96-well microplate at a density of 5–8 × 104 cells per well were stimulated with different concentrations of ligands. BRET2 between RLucII and GFP10 was measured after the addition of the substrate coelenterazine 400a (5 μM, Interchim) (Cayman) using a Mithras LB940 multimode reader (Berthold Technologies). The BRET2 signal was calculated as the ratio of emission of GFP10 (510 nm) to RLucII (400 nm).
Measurement of receptor cell-surface expression by ELISA
To evaluate the expression level of wild-type GPR97 and its mutants, HEK293 cells were transiently transfected with wild-type and mutant GPR97 or vehicle (pcDNA3.1) using PEI regent at in six-well plates. After incubation at 37 °C for 18 h, transfected cells were plated into 24-well plates at a density of 105 cells per well and further incubated at 37 °C in a 5% CO2 atmosphere for 18 h. Cells were then fixed in 4% (w/v) paraformaldehyde and blocked with 5% (w/v) BSA at room temperature. Each well was incubated with 200 μl of monoclonal anti-FLAG (F1804, Sigma-Aldrich) primary antibody overnight at 4 °C and followed by incubation of a secondary goat anti-mouse antibody (A-21235, Thermo Fisher) conjugated to horseradish peroxide for 1 h at room temperature. After washing, 200 μl of 3,3′,5,5′-tetramethylbenzidine (TMB) solution was added. Reactions were quenched by adding an equal volume of 0.25 M HCl solution and the optical density at 450 nm was measured using the TECAN (Infinite M200 Pro NanoQuant) luminescence counter. For determination of the constitutive activities of different GPR97 constructs or mutants, varying concentrations of desired plasmids were transiently transfected into HEK293 cells and the absorbance at 450 nm was measured.
The FlAsH-BRET assay
HEK293 cells were seeded in six-well plates after transfection with GPR97-FlAsH with Nluc inserted in a specific N-terminal site. Before the BRET assay, HEK293 cells were starved with serum for 1 h. Then cells were digested, centrifuged and resuspended in 500 μl BRET buffer (25 mM HEPES, 1 mM CaCl2, 140 mM NaCl, 2.7 mM KCl, 0.9 mM MgCl2, 0.37 mM NaH2PO4, 5.5 mM d-glucose and 12 mM NaHCO3). The FlAsH-EDT2 was added at a final concentration of 2.5 μM and incubated at 37 °C for 60 min. Subsequently, HEK293 cells were washed with BRET buffer and then distributed into black-wall clear-bottom 96-well plates, with approximately 100,000 cells per well. The cells were treated with a final concentration of BCM and cortisol at 10−5 to 10−11 and then coelenterazinc H was added at a final concentration of 5 μM, followed by checking the luciferase (440–480 nm) and FlAsH (525–585 nm) emissions immediately. The BRET ratio (emission enhanced yellow fluorescent protein/emission Nluc) was calculated using a Berthold Technologies Tristar 3 LB 941 spectrofluorimeter. The procedure was modified from those described previously34,55,56.
A one-way ANOVA test was performed to evaluate the statistical significance between various versions of GPR97 and their mutant in terms of expression level, potency or efficacy using GraphPad Prism. For all experiments, the standard error of the mean of the values calculated based on the data sets from three independent experiments is shown in respective figure legends.
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
A ‘Build and Retrieve’ methodology to simultaneously solve cryo-EM structures of membrane proteins
Vinothkumar, K. R. & Henderson, R. Single particle electron cryomicroscopy: trends, issues and future perspective. Q. Rev. Biophys. 49, 1–25 (2016).
Herzik, M. A.Jr, Wu, M. & Lander, G. C. High-resolution structure determination of sub-100 kDa complexes using conventional cryo-EM. Nat. Commun. 10, 1032 (2019).
Ho, C. M. et al. Malaria parasite translocon structure and mechanism of effector export. Nature 561, 70–75 (2018).
Morgan, C. E. et al. Cryo-electron microscopy structure of the Acinetobacter baumannii 70S ribosome and implications for new antibiotic development. mBio 11, e03117–e03119 (2020).
Adams, P. D. et al. PHENIX: building new software for automated crystallographic structure determination. Acta Crystallogr. D Biol. Crystallogr. 58, 1948–1954 (2002).
Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Crystallogr. D Biol. Crystallogr. 60, 2126–2132 (2004).
Daligault, H. E. et al. Whole-genome assemblies of 56 Burkholderia species. Genome Announc. 2, e01106–e01114 (2014).
Doughty, D. M. et al. The RND-family transporter, HpnN, is required for hopanoid localization to the outer membrane of Rhodopseudomonas palustris TIE-1. Proc. Natl Acad. Sci. U S A 108, E1045–E1051 (2011).
Sousa, F. L. et al. The superfamily of heme-copper oxygen reductases: types and evolutionary considerations. Biochim. Biophys. Acta 1817, 629–637 (2012).
Abramson, J. et al. The structure of the ubiquinol oxidase from Escherichia coli and its ubiquinone binding site. Nat. Struct. Biol. 7, 910–917 (2000).
Yap, L. L. et al. The quinone-binding sites of the cytochrome bo3 ubiquinol oxidase from Escherichia coli. Biochim. Biophys. Acta 1797, 1924–1932 (2010).
Choi, S. K. et al. Location of the substrate binding site of the cytochrome bo3 ubiquinol oxidase from Escherichia coli. J. Am. Chem. Soc. 139, 8346–8354 (2017).
Kumar, N. et al. Crystal structures of the Burkholderia multivorans hopanoid transporter HpnN. Proc. Natl Acad. Sci. U S A 114, 6557–6562 (2017).
Centers for Disease Control and Prevention. Bioterrorism agents/diseases (U.S. Department of Health and Human Services, 2018); https://emergency.cdc.gov/agent/agentlist-category.asp
Wagar, E. Bioterrorism and the role of the clinical microbiology laboratory. Clin. Microbiol. Rev. 29, 175–189 (2016).
Christopher, G. W., Cieslak, T. J., Pavlin, J. A. & Eitzen, E. M.Jr Biological warfare. A historical perspective. JAMA 278, 412–417 (1997).
Nierman, W. C. et al. Structural flexibility in the Burkholderia mallei genome. Proc. Natl Acad. Sci. U S A 101, 14246–14251 (2004).
Schweizer, H. P. Mechanisms of antibiotic resistance in Burkholderia pseudomallei: implications for treatment of melioidosis. Future Microbiol. 7, 1389–1399 (2012).
Malott, R. J., Steen-Kinnaird, B. R., Lee, T. D. & Speert, D. P. Identification of hopanoid biosynthesis genes involved in polymyxin resistance in Burkholderia multivorans. Antimicrob. Agents Chemother. 56, 464–471 (2012).
Malott, R. J. et al. Fosmidomycin decreases membrane hopanoids and potentiates the effects of colistin on Burkholderia multivorans clinical isolates. Antimicrob. Agents Chemother. 58, 5211–5219 (2014).
Nikaido, H. Molecular basis of bacterial outer membrane permeability revisited. Microbiol. Mol. Biol. Rev. 67, 593–656 (2003).
Cowan, S. W. et al. Crystal structures explain functional properties of two E. coli porins. Nature 358, 727–733 (1992).
Yankovskaya, V. et al. Architecture of succinate dehydrogenase and reactive oxygen species generation. Science 299, 700–704 (2003).
Baslé, A., Rummel, G., Storici, P., Rosenbusch, J. P. & Schirmer, T. Crystal structure of osmoporin OmpC from E. coli at 2.0 Å. J. Mol. Biol. 362, 933–942 (2006).
Carpena, X., Melik-Adamyan, W., Loewen, P. C. & Fita, I. Structure of the C-terminal domain of the catalase–peroxidase KatG from Escherichia coli. Acta Crystallogr. D Biol. Crystallogr. 60, 1824–1832 (2004).
Capitani, G. et al. Crystal structure and functional analysis of Escherichia coli glutamate decarboxylase. EMBO J. 22, 4027–4037 (2003).
Chorev, D. S. et al. Protein assemblies ejected directly from native membranes yield complexes for mass spectrometry. Science 362, 829–834 (2018).
Ho, C. M. et al. Bottom-up structural proteomics: cryoEM of protein complexes enriched from the cellular milieu. Nat. Methods 17, 79–85 (2020).
Kastritis, P. L. et al. Capturing protein communities by structural proteomics in a thermophilic eukaryote. Mol. Syst. Biol. 13, 936 (2017).
Yi, X., Verbeke, E. J., Chang, Y., Dickinson, D. J. & Taylor, D. W. Electron microscopy snapshots of single particles from single cells. J. Biol. Chem. 294, 1602–1608 (2019).
Schmidli, C. et al. Microfluidic protein isolation and sample preparation for high-resolution cryo-EM. Proc. Natl Acad. Sci. U S A 116, 15007–15012 (2019).
Long, F. et al. Crystal structures of the CusA efflux pump suggest methionine-mediated metal transport. Nature 467, 484–488 (2010).
Mastronarde, D. N. Automated electron microscope tomography using robust prediction of specimen movements. J. Struct. Biol. 152, 36–51 (2005).
Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).
Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Terwilliger, T. C., Ludtke, S. J., Read, R. J., Adams, P. D. & Afonine, P. V. Improvement of cryo-EM maps by density modification. Nat. Methods 17, 923–927 (2020).
Afonine, P. V. et al. Real-space refinement in PHENIX for cryo-EM and crystallography. Acta Crystallogr. D Struct. Biol. 74, 531–544 (2018).
Chen, V. B. et al. MolProbity: all-atom structure validation for macromolecular crystallography. Acta Crystallogr. D Biol. Crystallogr. 66, 12–21 (2010).
Marty, M. T. et al. Bayesian deconvolution of mass and ion mobility spectra: from binary interactions to polydisperse ensembles. Anal. Chem. 87, 4370–4376 (2015).
Shevchenko, A., Wilm, M., Vorm, O. & Mann, M. Mass spectrometric sequencing of proteins from silver-stained polyacrylamide gels. Anal. Chem. 68, 850–858 (1996).
Glycoproteomics is coming of age, thanks to advances in instrumentation, experimental methodologies and computational search algorithms.
Glycosylation is one of the most common post-translational modifications, and glycoproteins play crucial roles in important biological processes like cell signaling, host–pathogen interaction, immune response and disease, including cancer and even the ongoing COVID-19 pandemic (Science 369, 330–333, 2020). Glycoproteomics aims to determine the positions and identities of the complete repertoire of glycans and glycosylated proteins in a given cell or tissue.
Glycans are everywhere. High-throughput glycoproteomics approaches offer insights. Credit: Katherine Vicari, Springer Nature
Mass spectrometry (MS)-based approaches allow large-scale global analysis; however, the structural diversity of glycans and the heterogeneous nature of glycosylation sites make comprehensive analysis particularly challenging. Glycans obstruct complete fragmentation of the protein backbone, and they were traditionally removed for simplicity at the cost of losing glycan information. The MS spectra tend to be complicated due to the presence of isomers, often requiring manual interpretation. Furthermore, database searching for spectral matches can quickly become a combinatorial problem and requires innovative bioinformatics solutions.
Recent developments in MS instrumentation, fragmentation strategies (J. Proteome Res. 19, 3286–3301, 2020) and high-throughput workflows have made analyzing intact glycoproteins a possibility. Several specific enrichment strategies have made even low-abundance glycans and glycopeptides detectable (Mol. Cell. Proteomics https://doi.org/10.1074/mcp.R120.002277, 2020). A variety of experimental workflows tailored for either N-linked glycans, which are found at consensus sites on the proteins, or O-linked glycans, which have no recognizable consensus sequence, have been developed (Nature 549, 538–542, 2017; Nat. Commun. 11, 5268, 2020; Nat. Methods 16, 902–910, 2019). New software packages based on fragment-ion indexing strategies offer substantial increases in speed for glycopeptide and site assignments (Nat. Methods 17, 1125–1132, 2020; Nat. Methods 17, 1133–1138, 2020).
With other -omics fields taking the lion’s share of attention in recent years, it is now time for glycoproteomics to shine. Comprehensive understanding of glycosylation at different levels of granularity is bound to serve both basic and translational research.
About this article
Cite this article
Singh, A. Glycoproteomics. Nat Methods 18, 28 (2021). https://doi.org/10.1038/s41592-020-01028-9
What are Some Major Factors to Keep in Mind When Buying CBD Vape Oil?
MagicMed: Bringing the Drug Candidate Library Model to Psychedelics
Investors in This Cannabis Stock Are Leaving $800 Million on the Table
What is Delta-8 THC? Everything You Need to Know
What’s New With Cannabis Stocks for the Week Ending 01/15/21
Optical Coherence Tomography Findings in Cannabis Users
CBD News: NIHC chairman appointed to board advising U.S. on trade policy
Alcanna to Spin Out Cannabis Business and Pursue Value Segment Strategy
Fire & Flower Expands Free Same-Day Delivery Across Its Ontario Network
Blog – Global Cannabinoids | CBD Wholesale & Bulk | White Label | Private Label
Hemp & CBD Blog | Northeast Kingdom Hemp
Informational blogs about CBD and Hemp
CBD Blog: News, Info & Tips
Clinical and biochemical heterogeneity of Parkinson’s disease
Illinois Collects $62 Million in Cannabis Revenue to Support Neighborhoods
Nordic Oil FAQ: The CBD questions you’ve always wanted answers to
A Beginners Guide to Pests
BC Craft Supply Co Announces Letter of Intent with Psilocybin Research and Development Company Ava Pathways
CBD News: Circle K wants to block hemp trademark that resembles its own
10xPURE-GOLD CBDa Muscle & Joint Relief Cream
Straight Hemp CBD Blog
Premium Jane Blog | Latest News on CBD, Industrial Hemp, and More
Easy Day Hemp Blog
CBD Blogs | Ergo Hemp Co.
How Red White & Bloom Established A Base In Michigan’s Cannabis Scene—And Keeps Expanding
Heavy metals in cbd
CBD News: Restrictions on shipment of vaping products could impact hemp industry
Does CBD Affect Men and Women Differently?
Best CBD Product For Anxiety?
News1 week ago
Pure Harvest Bolsters Corporate Team with Key Additions
Heartland7 days ago
MindMed Adds Chief Development Officer with FDA Phase 2 Psilocybin Clinical Trial Experience
Heartland1 week ago
Novamind Appoints Chuck Rifici to its Board of Directors
Heartland1 week ago
Mydecine Innovations Group Appoints Gordon Neal to Board of Directors and Dean Ditto as Chief Financial Officer
Heartland1 week ago
Can You Treat COVID-19 With CBD and Reduce Mortality Rates? A New Israeli Research Believes You Can!
Uncategorized6 days ago
MediPharm Labs Appoints Warren Everitt, CEO Australia Pacific, to Board of Directors
Heartland1 week ago
Cannabis Heroes of History: How Robert Randall Beat the U.S.
Heartland1 week ago
High Times Greats: John Carpenter