Linking Workflows together for automated substituent parameter generation#
While the SubstituentParameterWorkchain is extremely versatile for calculating individual substituent properties, performing these for a large group is non-trivial. In the author’s use case, AIMAll software is not available on compute clusters and are run on local desktop. To not overload the local computer controllers can be used to limit number of active processes. Here, through linking various controllers together, the full SubstituentParameterWorkchain can be simulated.
The controllers are implementations of the FromGroupSubmissionController from aiida-submission-controller. When instantiating a controller, you provide the parent_group that it scans for inputs. The controllers look for the unique extras values “smiles”, which can be set as shown in the code block below. All nodes in the parent_group must have unique values of the “smiles”. NO NODES CAN HAVE DUPLICATE SMILES OR THE CONTROLLERS WILL NOT WORK.
node.base.extras.set("smiles","some_smiles")
From providing this, and setting up the correct groups to scan from, the controllers are then set to link together by putting the relevant nodes in the group for the next controller in the protocol, and setting its extras. The code below shows how the controllers can be linked for substituent parameter calculation. We set low max_concurrent for the AIM controllers to not overload the local computer.
[2]:
from aiida import orm
from aiida.orm import Dict, load_group
from aiida_aimall.controllers import SmilesToGaussianController, AimAllSubmissionController, AimReorSubmissionController, GaussianSubmissionController
from aiida.plugins import DataFactory
from aiida import load_profile
from aiida.engine.processes.control import play_processes
import datetime
import time
load_profile()
AimqbParameters = DataFactory('aimall.aimqb')
aim_params = parameter_dict={"naat": 2, "nproc": 2, "atlaprhocps": True}
smile_controller = SmilesToGaussianController(
parent_group_label = 'smiles',
group_label = 'gauss_opt',
code_label='gaussian@cedar',
gauss_opt_params=orm.Dict(dict={
'link0_parameters': {
'%chk':'aiida.chk',
"%mem": "2000MB", # Currently set to use 8000 MB in .sh files
"%nprocshared": 4,
},
'functional':'wb97xd',
'basis_set':'aug-cc-pvtz',
'route_parameters': {'opt': None, 'freq':None,'Output':'WFX'},
"input_parameters": {"output.wfx\n": None, "output2.wfx":None},
}),
wfxgroup = "opt_wfx",
nprocs = 4,
mem_mb = 3200,
time_s = 60*60*24*7,
max_concurrent = 100
)
aimreor_controller = AimReorSubmissionController(
parent_group_label = 'opt_wfx',
group_label = 'opt_aim',
max_concurrent = 1,
code_label='aimall@localhost',
reor_group = 'reor_structs',
aimparameters=aim_params
)
gaussian_controller = GaussianSubmissionController(
parent_group_label = 'reor_structs',
group_label = 'gaussian_sp',
max_concurrent = 100,
code_label='gaussian@cedar',
gauss_sp_params=Dict(dict={
'link0_parameters': {
'%chk':'aiida.chk',
"%mem": "2000MB",
"%nprocshared": 4,
},
'functional':'wb97xd',
'basis_set':'aug-cc-pvtz',
'charge': 0,
'multiplicity': 1,
'route_parameters': {'nosymmetry':None, 'Output':'WFX'},
"input_parameters": {"output.wfx": None},
}),
wfxname='output.wfx'
)
aimall_controller = AimAllSubmissionController(
code_label='aimall@localhost',
parent_group_label = 'reor_wfx',
group_label = 'sp_aim',
max_concurrent = 1,
aimparameters=aim_params,
aim_parser='aimall.group'
)
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/base.py:28: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._group, _ = orm.Group.objects.get_or_create(self.group_label)
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/from_group.py:26: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._parent_group = orm.Group.objects.get(
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/base.py:28: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._group, _ = orm.Group.objects.get_or_create(self.group_label)
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/from_group.py:26: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._parent_group = orm.Group.objects.get(
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/base.py:28: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._group, _ = orm.Group.objects.get_or_create(self.group_label)
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/from_group.py:26: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._parent_group = orm.Group.objects.get(
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/base.py:28: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._group, _ = orm.Group.objects.get_or_create(self.group_label)
/Users/chemlab/anaconda3/envs/aiida/lib/python3.12/site-packages/aiida_submission_controller/from_group.py:26: AiidaDeprecationWarning: `objects` property is deprecated, use `collection` instead. (this will be removed in v3)
self._parent_group = orm.Group.objects.get(
Note in the definition of the controllers that we can see the links between the controllers implied. For example, AIMReorSubmissionController has an input reor_group. This input is the label of the group that the generated structures are stored in by the workchain. We can also see that GaussianSubmissionController has an input parent_group_label="reor_structs". So, in this case, the AIMReorSubmissionController generates structures into a group, which
GaussianSubmissionController then checks for its inputs.
These controllers are then run in a while loop, either in a Jupyter notebook or a script
group_list = QueryBuilder().append(Group, filters={'type_string': 'core'},project='label').all(flat=True)
qb_sfd_in_group = orm.QueryBuilder()
qb_sfd_in_group.append(Group, filters={'label': {'in':group_list}}, tag='all_group')
qb_sfd_in_group.append(orm.SinglefileData, with_group='all_group', tag='sp_calc',project='pk')
def prune_group(group_label):
"""Removes earlier instances of a given group from the group to ensure unique extras."""
group = load_group(group_label)
smiles={}
for node in group.nodes:
smi = node.extras['smiles']
if smi in smiles:
pk_to_del = min(smiles[smi].pk,node.pk)
group.remove_nodes(orm.load_node(pk_to_del))
else:
smiles[smi] = node
total_jobs = smile_controller.num_to_run + smile_controller.num_already_run
while aimall_controller.num_already_run < total_jobs:
play_processes(all_entries=True)
# prune_group('smiles')
smile_controller.submit_new_batch()
# this will loop 10 times, checking for finished AIM calculations to submit every 3 minutes.
# After the set of 10 loops, the outer loop happens again, checking the Gaussian Optimization calculations
# Due to Gaussian jobs longer run time, we elected to check every 30 minutes.
# You can adjust the loops to suit your needs
# Find all singlefile data that are in groups
sfd_in_group = qb_sfd_in_group.all(flat=True)
reor_wfx_group = orm.load_group('reor_wfx')
# this query finds GaussianCalculations in the gaussian_sp group, and their output .wfx file, and adds those
# wfx file to a group for the AIM controller
if sfd_in_group:
qb_orphan = orm.QueryBuilder()
qb_orphan.append(Group,filters={'label':'gaussian_sp'},tag='sp_group')
qb_orphan.append(orm.CalcJobNode, tag='sp_calc',with_group='sp_group')
qb_orphan.append(orm.SinglefileData, filters={'pk': {'!in': sfd_in_group}},with_incoming='sp_calc',)
orphan_single_files = qb_orphan.all(flat=True)
for o_file in orphan_single_files:
reor_wfx_group.add_nodes(o_file)
for _ in range(10):
prune_group('opt_wfx')
prune_group('reor_structs')
prune_group('reor_wfx')
try:
gaussian_controller.submit_new_batch()
except:
print(f"Skipping Gaussian this loop at {datetime.now()} ")
try:
aimall_controller.submit_new_batch()
except:
print(f"Skipping AIMAll this loop at {datetime.now()} ")
try:
aimreor_controller.submit_new_batch()
except:
print(f"Skipping AIMReor this loop at {datetime.now()} ")
time.sleep(180)