Structural Biology

Structural Biology research focuses on the structure and dynamics of biological molecules. Protein molecules are the basic functional units in cells, and a lot of research subjects are about proteins. There are several layers of structural biology research, depending on the target question and details of models (from electronic, to atomic, to domains, to molecular, to organelle, to cellular, to tissue, to organ, to living body), and we will focus on atomic and domain structures.

  • Develop Computational Methods to study protein structures
Protein structures can be studied using experiments, including nuclear magnetic resonance (NMR), cryo-Electronic microscopy (CryoEM), and X-ray crystallography. As of March of 2014, there are 98359 structures deposited in the Protein Data Bank (pdb), out of which 87040 are determined using X-rays. This ratio (~ 88%) has been stable for some time. One focus of our research is to investigate the properties and basic information of these high resolution structures in some systematic ways, or ‘structural bio-informatics’.

HIgh resolutions models are preferred if they can be determined, yet some experimental methods provides limited information about the structures, from which only rough models at low resolutions can be obtained. A widely used method is solution scattering, which has advantages, such as easy sample preparation, high throughput, capability of probing dynamics, and near in vivo environments. We develop methods to extract information from such scattering data (Small Angle X-ray Scattering, or Wide Angle X-ray Scattering). Our goal is to build models from scattering data, or refinement known structures with respect to scattering information.

  • Design new experimental methods utilizing X-ray lasers to study structural biology
X-rays have been applied in structure determination for a century (celebrating the international year of crystallography). A major breakthrough of X-ray science is the commission of X-ray Free Electron Lasers, or XFELs (FLASH at Hamberg; LCLS at SLAC; SACLA at Spring8), The Linac Coherent Light Source is the first commissioned Hard X-ray facility, providing ultra bright, fully coherent X-ray pulses at up to 120Hz. The pulse duration is at femtosecond time scale, yet compressing ~10^12 X-ray photons per pulse to very focused area. The peak brilliance of the LCLS is 10 orders higher than the third generation synchrotron facilities. Every pulse vaporizes the samples rapidly, but the illumination stops within femtoseconds (pulse duration). Such intense femtosecond pulses could outrun radiation damages, enables a revolutionary experimental approach: diffract-before-destroy. New experiments are designed to exploit this unprecedented technology. High resolution structure determination from tiny crystals smaller than 1 micron emerges and develops very rapidly; exports have been made to imaging single particles, even single molecules, using such bright X-ray lasers.

Serial Femtosecond Nano-crystallography ( SFX ) is one of the killer applications of the Free Electron Laser X-rays. Every XFEL pulse that intercept a crystal can generate diffraction patterns, and after indexing and merging a large number (often > 10s of thousands) of such diffraction patterns, a 3D diffraction volume can be obtained. From there, phasing algorithms developed for synchrotron crystallography can be applied. Our research focus is on data analysis procedures from raw data to integrated 3D diffraction volume. More specifically, we work on (1) background correction; (2) resolving indexing ambiguity due to crystal twinning, i.e., detwinning by utilizing intensity information; (3) optimizing data merging methods; (4) new phasing algorithms for nano crystallography.

Single Particle Imaging using X-ray Lasers is the ultimate goal of FEL facilities. According to simulations, the femtosecond XFEL pulses should be able to probe structure information from noncrystalline materials, such as single particles, or even single protein molecules. Since the commission of LCLS, huge efforts have been devoted to the development of single particle imaging using X-rays. In order to assemble the 2D scattering patterns resulted from the interaction between X-rays and randomly oriented sample particles, advanced computer algorithms must be developed. There are some progresses from pure mathematical and computational perspectives, but none of such methods have been successfully applied to actual experimental data yet. Some breakthrough has to be made in order to realize 3D model reconstructions using femtosecond X-ray scattering data.

Biological molecules need water. This adds extra layer of difficulties to extract scattering signals from experimental data, as the contrast between biomolecules and water is not significantly large (0.44 vs 0.33 e-/A^3 for protein and bulk water). Two approaches are invented to deliver samples to the X-ray path: (1) fixed target approach, where the samples are loaded to a support that can be moved around to allow samples being hit by X-ray pulses in turn; (2) injecting approach, where the samples and buffer are shot/spray into the experimental chamber, focusing to the X-ray path in order to get better chance being hit.

Algorithm and Software

  • Develop Open Source software packages to analyze experimental data
Many scientific software is for in-house usage, developed in an ad-hoc manner, therefore, lack of portability and reusability. Open source is the way that scientific community should pursue. We would like to devote to open source program development, and hopefully attract more groups to gain larger momentum, stimulating the source code sharing and utilization.

  • Build computational platform to integrate structural information from various sources
To understand many aspects of a molecule, all possible measurable information should be integrated to get the best of out the individual source, This is a common understanding in structural biology community. A computational platform is highly desired to facilitate the information integration: the platform needs to have a core that can combine information and interfaces to various types of inputs, including X-ray diffraction/scattering, Spectroscopy, NMR, CD, FRET, CryoEM, and many single molecule experimental data etc. The platform should be expandable, and allow users to develop their own information type and scoring functions. The core will provide powerful methods for conformational space sampling, structure refinement, rigorous optimization routines.