GMSO: Flexible storage of chemical topology for molecular simulation

This is the documentation for GMSO, the General Molecular Simulation Object. It is a part of the MoSDeF, the Molecular Simulation Design Framework.

Design Principles

Scope and Features of GMSO

GMSO is designed to enable the flexible, general representation of chemical topologies for molecular simulation. Efforts are made to enable lossless, bias-free storage of data, without assuming particular chemistries, models, or using any particular engine’s ecosystem as a starting point. The scope is generally restrained to the preparation, manipulation, and conversion of and of input files for molecular simulation, i.e. before engines are called to execute the simulations themselves. GMSO currently does not support conversions between trajectory file formats for analysis codes. In the scope of molecular simulation, we loosely define a chemical topology as everything needed to reproducibly prepare a chemical system for simulation. This includes particle coordinates and connectivity, box information, force field data (functional forms, parameters tagged with units, partial charges, etc.) and some optional information that may not apply to all systems (i.e. specification of elements with each particle).

GMSO enables the following features:

  • Supporting a variety of models in the molecular simulation/computational chemistry community: No assumptions are made about an interaction site representing an atom or bead, instead supported atomistic, united-atom/coarse-grained, polarizable, and other models!
  • Greater flexibility for exotic potentials: The AtomType (and analogue classes for intramolecular interactions) uses sympy to store any potential that can be represented by a mathematical expression. If you can write it down, it can be stored!
  • Easier development for glue to new engines: by not being designed for compatibility with any particular molecular simulation engine or ecosystem, it becomes more tractable for developers in the community to add glue for engines that are not currently supported (and even ones that do not exist at present)!
  • Compatibility with existing community tools: No single molecular simulation tool will be a silver bullet, so GMSO includes functions to convert objects. These can be used in their own right to convert between objects in-memory and also to support conversion to file formats not natively supported at any given time. Currently supported conversions include ParmEd, OpenMM, mBuild, MDTraj, with others coming in the future!
  • Native support for reading and writing many common file formats (XYZ, GRO, TOP, LAMMPSDATA) and indirect support, through other libraries, for many more!

Structure of GMSO

There are three main modules within the Python package:

  • gmso.core stores the classes that constitute the core data structures.
  • gmso.formats stores readers and writers for (on-disk) file formats.
  • gmso.external includes functions that convert core data structures between external libraries and their internal representation.

Data Structures in GMSO

Following data structures are available within GMSO.

Core Classes

gmso.Topology
gmso.SubTopology
gmso.Atom
gmso.Bond
gmso.Angle
gmso.Dihedral
gmso.Improper
gmso.AtomType
gmso.BondType
gmso.AngleType
gmso.DihedralType
gmso.ImproperType

Topology

SubTopology

Atom

Bond

Angle

Dihedral

Improper

Potential Classes

AtomType

BondType

AngleType

DihedralType

ImproperType

ForceField

Formats

This submodule provides readers and writers for (on-disk) file formats.

GROMACS

The following methods are available for reading and writing GROMACS files.

GSD

The following methods are available for reading and writing GSD files.

xyz

The following methods are available for reading and writing xyz files.

LAMMPS DATA

The following methods are available for reading and writing LAMMPS data.

External

This submodule includes functions that convert core data structures between external libraries and their internal representation.

mBuild

The following methods are available for converting mBuild objects to and from GMSO.

Parmed

Conversion methods for Parmed objects to and from GMSO.

OpenMM

Conversion methods for OpenMM objects to and from GMSO.

Installation

Installing dependencies with conda

Dependencies of GMSO are listed in the file requirements.txt. They can be installed in one line:

$ conda install -c omnia -c mosdef -c conda-forge --file requirements.txt

Alternatively you can add all the required channels to your .condarc file and then install dependencies.

$ conda config --add channels omnia
$ conda config --add channels mosdef
$ conda config --add channels conda-forge
$ conda install --file requirements.txt

Note

These commands will likely change a configuration file on your computer and may affect installation of other packages in other projects you are working on. However, the channel priority recommended is fairly common (in particular, conda-forge having the highest priority) and should work well for most installations.

Installing dependencies with pip

$ pip install -r requirements.txt

Note

Compared to conda installation, this is less tested. Some upstream dependencies may not be available on PyPI but can be installed via source or conda.

Install an editable version from source

Once all dependencies are installed, the GMSO itself can be installed. It is currently only available through its source code. It will be available through pip and conda in the future.

$ git clone https://github.com/mosdef-hub/gmso.git
$ cd gmso
$ pip install -e .

Supported Python Versions

Python 3.7 is the recommend version for users. It is the only version on which development and testing consistently takes place. Older (3.6) and newer (3.8+) versions of Python 3 are likely to work but no guarantee is made and, in addition, some dependencies may not be available for other versions. No effort is made to support Python 2 because it is considered obsolete as of early 2020.

Testing your installation

GMSO uses py.test to execute its unit tests. To run them, first install some extra depdencies:

$ conda install --file requirements-test.txt

And then run the tests with the py.test executable:

$ py.test -v

Using GMSO with Docker

As much of scientific software development happens in unix platforms, to avoid the quirks of development dependent on system you use, a recommended way is to use docker or other containerization technologies. This section is a how to guide on using GMSO with docker.

Prerequisites

A docker installation in your machine. Follow this link to get a docker installation working on your machine. If you are not familiar with docker and want to get started with docker, the Internet is full of good tutorials like the ones here and here.

Quick Start

After you have a working docker installation, please use the following command to use run a jupyter-notebook with all the dependencies for GMSO installed:

$ docker pull mosdef/gmso:latest
$ docker run -it --name gmso -p 8888:8888 mosdef/gmso:latest su anaconda -s\
  /bin/sh -l -c "jupyter-notebook --no-browser --ip="0.0.0.0" --notebook-dir\
  /home/anaconda/gmso-notebooks"

If every thing happens correctly, you should a be able to start a jupyter-notebook server running in a python environment with all the dependencies for GMSO installed.

Alternatively, you can also start a Bourne shell to use python from the container’s terminal:

$ docker run -it --name gmso mosdef/gmso:latest

Important

The instructions above will start a docker container but containers by nature are ephemeral, so any filesystem changes (like adding a new notebook) you make will only persist till the end of the container’s lifecycle. If the container is removed, any changes or code additions will not persist.

Persisting User Volumes

If you will be using GMSO from a docker container, a recommended way is to mount what are called user volumes in the container. User volumes will provide a way to persist all filesystem/code additions made to a container regardless of the container lifecycle. For example, you might want to create a directory called gmso-notebooks in your local system, which will store all your GMSO notebooks/code. In order to make that accessible to the container(where the notebooks will be created/edited), use the following steps:

  1. Create a directory in your filesystem
$ mkdir -p /path/to/gmso-notebooks
$ cd /path/to/gmso-notebooks
  1. Define an entry-point script. Inside gmso-notebooks in your local file system create a file called dir_entrypoint.sh and paste the following content.
#!/bin/sh

chown -R anaconda:anaconda /home/anaconda/gmso-notebooks

su anaconda -s /bin/sh -l -c "jupyter-notebook --no-browser --ip="0.0.0.0" --notebook-dir /home/anaconda/gmso-notebooks"
  1. Run docker image for GMSO
$ docker run -it --name gmso -p 8888:8888 --entrypoint /home/anaconda/gmso-notebooks/dir_entrypoint.sh -v /home/umesh/gmso-notebooks:/home/anaconda/gmso-notebooks mosdef/gmso:latest

Cleaning Up

You can remove the created container by using the following command:

$ docker container rm gmso

Note

Instead of using latest, you can use the image mosdef/gmso:stable for most recent stable release of GMSO and run the tutorials.

Contributing

Contributions are welcomed via pull requests on GitHub. Developers and/or users will review requested changes and make comments. The rest of this file will serve as a set of general guidelines for contributors.

Features

Implement functionality in a general and flexible fashion

GMSO is designed to be general and flexible, not limited to single chemistries, file formats, simulation engines, or simulation methods. Additions to core features should attempt to provide something that is applicable to a variety of use-cases and not targeted at only the focus area of your research. However, some specific features targeted toward a limited use case may be appropriate. Speak to the developers before writing your code and they will help you make design choices that allow flexibility.

Version control

We currently use the “standard” Pull Request model. Contributions should be implemented on feature branches of forks. Please try to keep the master branch of your fork up-to-date with the master branch of the main repository.

Source code

Use a consistent style

It is important to have a consistent style throughout the source code. The following criteria are desired:

  • Lines wrapped to 80 characters
  • Lines are indented with spaces
  • Lines do not end with whitespace
  • For other details, refer to PEP8

To help with the above, there are tools such as flake8 and Black.

Document code with comments

All public-facing functions should have docstrings using the numpy style. This includes concise paragraph-style description of what the class or function does, relevant limitations and known issues, and descriptions of arguments. Internal functions can have simple one-liner docstrings.

Tests

Write unit tests

All new functionality in GMSO should be tested with automatic unit tests that execute in a few seconds. These tests should attempt to cover all options that the user can select. All or most of the added lines of source code should be covered by unit test(s). We currently use pytest, which can be executed simply by calling pytest from the root directory of the package.