GMSO: Flexible storage of chemical topology for molecular simulation

This is the documentation for GMSO, the General Molecular Simulation Object. It is a part of the MoSDeF, the Molecular Simulation Design Framework.

Design Principles

Scope and Features of GMSO

GMSO is designed to enable the flexible, general representation of chemical topologies for molecular simulation. Efforts are made to enable lossless, bias-free storage of data, without assuming particular chemistries, models, or using any particular engine’s ecosystem as a starting point. The scope is generally restrained to the preparation, manipulation, and conversion of and of input files for molecular simulation, i.e. before engines are called to execute the simulations themselves. GMSO currently does not support conversions between trajectory file formats for analysis codes. In the scope of molecular simulation, we loosely define a chemical topology as everything needed to reproducibly prepare a chemical system for simulation. This includes particle coordinates and connectivity, box information, force field data (functional forms, parameters tagged with units, partial charges, etc.) and some optional information that may not apply to all systems (i.e. specification of elements with each particle).

GMSO enables the following features:

  • Supporting a variety of models in the molecular simulation/computational chemistry community: No assumptions are made about an interaction site representing an atom or bead, instead supported atomistic, united-atom/coarse-grained, polarizable, and other models!

  • Greater flexibility for exotic potentials: The AtomType (and analogue classes for intramolecular interactions) uses sympy to store any potential that can be represented by a mathematical expression. If you can write it down, it can be stored!

  • Easier development for glue to new engines: by not being designed for compatibility with any particular molecular simulation engine or ecosystem, it becomes more tractable for developers in the community to add glue for engines that are not currently supported (and even ones that do not exist at present)!

  • Compatibility with existing community tools: No single molecular simulation tool will be a silver bullet, so GMSO includes functions to convert objects. These can be used in their own right to convert between objects in-memory and also to support conversion to file formats not natively supported at any given time. Currently supported conversions include ParmEd, OpenMM, mBuild, MDTraj, with others coming in the future!

  • Native support for reading and writing many common file formats (XYZ, GRO, TOP, LAMMPSDATA) and indirect support, through other libraries, for many more!

Structure of GMSO

There are three main modules within the Python package:

  • gmso.core stores the classes that constitute the core data structures.

  • gmso.formats stores readers and writers for (on-disk) file formats.

  • gmso.external includes functions that convert core data structures between external libraries and their internal representation.

Data Structures in GMSO

Following data structures are available within GMSO.

Core Classes

Topology

SubTopology

Atom

Bond

Angle

Dihedral

Improper

Potential Classes

AtomType

BondType

AngleType

DihedralType

ImproperType

PairPotentialType

ForceField


Formats

This submodule provides readers and writers for (on-disk) file formats.

GROMACS

The following methods are available for reading and writing GROMACS files.

GSD

The following methods are available for reading and writing GSD files.

xyz

The following methods are available for reading and writing xyz files.

LAMMPS DATA

The following methods are available for reading and writing LAMMPS data.

External

This submodule includes functions that convert core data structures between external libraries and their internal representation.

mBuild

The following methods are available for converting mBuild objects to and from GMSO.

Parmed

Conversion methods for Parmed objects to and from GMSO.

OpenMM

Conversion methods for OpenMM objects to and from GMSO.

Installation

Installing with conda

Starting from GMSO version 0.3.0, you can use conda to install GMSO in your preferred environment. This will also install the dependencies of GMSO.

(your-env) $ conda install -c conda-forge gmso

Installing from source conda

Dependencies of GMSO are listed in the files environment.yml (lightweight environment specification containing minimal dependencies) and environment-dev.yml (comprehensive environment specification including optional and testing packages for developers). The gmso or gmso-dev conda environments can be created with

$ git clone https://github.com/mosdef-hub/gmso.git
$ cd gmso
# for gmso conda environment
$ conda env create -f environment.yml
$ conda activate gmso

# for gmso-dev
$ conda env create -f environment-dev.yml
$ conda activate gmso

# install a non-editable version of gmso
$ pip install .

Install an editable version from source

Once all dependencies have been installed and the conda environment has been created, the GMSO itself can be installed.

$ cd gmso
$ conda activate gmso-dev # or gmso depending on your installation
$ pip install -e .

Supported Python Versions

Python 3.8-3.11 is the recommend version for users. It is the only version on which development and testing consistently takes place. Older (3.6-3.7) and newer (3.12+) versions of Python 3 are likely to work but no guarantee is made and, in addition, some dependencies may not be available for other versions. No effort is made to support Python 2 because it is considered obsolete as of early 2020.

Testing your installation

GMSO uses py.test to execute its unit tests. To run them, first install the gmso-dev environment from above as well as gmso itself

$ conda activate gmso-dev
$ pip install -e .

And then run the tests with the py.test executable:

$ py.test -v

Install pre-commit

We use [pre-commit](https://pre-commit.com/) to automatically handle our code formatting and this package is included in the dev environment. With the gmso-dev conda environment active, pre-commit can be installed locally as a git hook by running

$ pre-commit install

And (optional) all files can be checked by running

$ pre-commit run --all-files

Building the documentation

GMSO uses sphinx to build its documentation. To build the docs locally, run the following while in the docs directory:

$ conda env create -f docs-env.yml
$ conda activate gmso-docs
$ make html

Using GMSO with Docker

As much of scientific software development happens in unix platforms, to avoid the quirks of development dependent on system you use, a recommended way is to use docker or other containerization technologies. This section is a how to guide on using GMSO with docker.

Prerequisites

A docker installation in your machine. Follow this link to get a docker installation working on your machine. If you are not familiar with docker and want to get started with docker, the Internet is full of good tutorials like the ones here and here.

Quick Start

After you have a working docker installation, please use the following command to use run a jupyter-notebook with all the dependencies for GMSO installed:

$ docker pull mosdef/gmso:latest
$ docker run -it --name gmso -p 8888:8888 mosdef/gmso:latest

If no command is provided to the container (as above), the container starts a jupyter-notebook at the (container) location /home/anaconda/data. To access the notebook, paste the notebook URL into a web browser on your computer. When you are finished, you can control-C to exit the notebook as usual. The docker container will exit upon notebook shutdown.

Alternatively, you can also start a Bourne shell to use python from the container’s terminal:

$ docker run -it --mount type=bind,source=$(pwd),target="/home/anaconda/data" mosdef/gmso:latest sh
~ $ source .profile
(gmso-dev) ~ $

Warning

Containers by nature are ephemeral, so filesystem changes (e.g., adding a new notebook) only persist until the end of the container’s lifecycle. If the container is removed, any changes or code additions will not persist. See the section below for persistent data.

Note

The -it flags connect your keyboard to the terminal running in the container. You may run the prior command without those flags, but be aware that the container will not respond to any keyboard input. In that case, you would need to use the docker ps and docker kill commands to shut down the container.

Persisting User Volumes

If you will be using GMSO from a docker container, a recommended way is to mount what are called user volumes in the container. User volumes will provide a way to persist all filesystem/code additions made to a container regardless of the container lifecycle. For example, you might want to create a directory called gmso-notebooks in your local system, which will store all your GMSO notebooks/code. In order to make that accessible to the container(where the notebooks will be created/edited), use the following steps:

$ mkdir -p /path/to/gmso-notebooks
$ cd /path/to/gmso-notebooks
$ docker run -it --name gmso --mount type=bind,source=$(pwd),target=/home/anaconda/data -p 8888:8888 mosdef/gmso:latest

You can easily mount a different directory from your local machine by changing source=$(pwd) to source=/path/to/my/favorite/directory.

Note

The --mount flag mounts a volume into the docker container. Here we use a bind mount to bind the current directory on our local filesystem to the /home/anaconda/data location in the container. The files you see in the jupyter-notebook browser window are those that exist on your local machine.

Warning

If you are using the container with jupyter notebooks you should use the /home/anaconda/data location as the mount point inside the container; this is the default notebook directory.

Running Python scripts in the container

Jupyter notebooks are a great way to explore new software and prototype code. However, when it comes time for production science, it is often better to work with python scripts. In order to execute a python script (example.py) that exists in the current working directory of your local machine, run:

$ docker run --mount type=bind,source=$(pwd),target=/home/anaconda/data mosdef/gmso:latest "python data/test.py"

Note that once again we are bind mounting the current working directory to /home/anaconda/data. The command we pass to the container is python data/test.py. Note the prefix data/ to the script; this is because we enter the container in the home folder (/home/anaconda), but our script is located under /home/anaconda/data.

Warning

Do not bind mount to target=/home/anaconda. This will cause errors.

If you don’t require a Jupyter notebook, but just want a Python interpreter, you can run:

$ docker run --mount type=bind,source=$(pwd),target=/home/anaconda/data mosdef/gmso:latest python

If you don’t need access to any local data, you can of course drop the --mount command:

$ docker run mosdef/gmso:latest python

Cleaning Up

You can remove the created container by using the following command:

$ docker container rm gmso

Note

Instead of using latest, you can use the image mosdef/gmso:stable for most recent stable release of GMSO and run the tutorials.

Contributing

Contributions are welcomed via pull requests on GitHub. Developers and/or users will review requested changes and make comments. The rest of this file will serve as a set of general guidelines for contributors.

Features

Implement functionality in a general and flexible fashion

GMSO is designed to be general and flexible, not limited to single chemistries, file formats, simulation engines, or simulation methods. Additions to core features should attempt to provide something that is applicable to a variety of use-cases and not targeted at only the focus area of your research. However, some specific features targeted toward a limited use case may be appropriate. Speak to the developers before writing your code and they will help you make design choices that allow flexibility.

Version control

We currently use the “standard” Pull Request model. Contributions should be implemented on feature branches of forks. Please try to keep the master branch of your fork up-to-date with the master branch of the main repository.

Source code

Use a consistent style

It is important to have a consistent style throughout the source code. The following criteria are desired:

  • Lines wrapped to 80 characters

  • Lines are indented with spaces

  • Lines do not end with whitespace

  • For other details, refer to PEP8

To help with the above, there are tools such as flake8 and Black.

Document code with comments

All public-facing functions should have docstrings using the numpy style. This includes concise paragraph-style description of what the class or function does, relevant limitations and known issues, and descriptions of arguments. Internal functions can have simple one-liner docstrings.

Tests

Write unit tests

All new functionality in GMSO should be tested with automatic unit tests that execute in a few seconds. These tests should attempt to cover all options that the user can select. All or most of the added lines of source code should be covered by unit test(s). We currently use pytest, which can be executed simply by calling pytest from the root directory of the package.