INTRODUCTION:
Welcome. This repository contains the data and code utilized in our manuscript "Emitter Model Inference from Electrospray Array Thruster Tests". It should contain everything you need to regenerate our results, including the figures that appear in the paper. Indeed, the included materials render more trades and data than were able to be included, and we hope you find these additional inclusions edifying.

We have curated this repository out of interest of being as transparent and reproducible in our research, and I hope that in doing so we motivate others in the community to do the same. Understanding sources of uncertainty in electrospray array thrusters---particularly emitter to emitter variability---is of fundamental concern in achieving robust and performant systems, and the study undertaken here is a key step in that direction.

Sincerely,
Collin B. Whittaker
cbwhitt@umich.edu

Last updated: 24 Feb 2025

RESEARCH OVERVIEW:
The object of our study was to train a reduced-fidelity model for individual emitter behavior within a porous conical type electrospray array thruster on data taken over the entire array, which is the sum over all the emitters. By leveraging surface profilometry to measure the variance in geometry in the array, we then gain insight into the individual emitter dynamics. By rigorously predicating uncertainty in the predictions made by the model on uncertainty over its inputs, we can then understand the major sources of uncertainty in the system.

The raw experimental data which inform this inference and prediction study were acquired in April of 2024 at the Jet Propulsion Laboratory's MicroPropulsion Laboratory, with special thanks to Colleen Marrese-Reading and Steven Arestie. These and other results are reported in a separate manuscript:
C. B. Whittaker, B. A. Jorns, S. M. Arestie, and C. M. Marrese-Reading, in 38th International Electric Propulsion Conference (Electric Rocket Propulsion Society, 2024) p. 730.
The thruster used in these experiments was fabricated at the University of Michigan in March of 2024.

The analysis underlying this work was conducted from September of 2024 to January of 2025. This work was supported by a NASA Space Technology Graduate Research Opportunity (80NSSC21K1247). This research was also supported in part through computational resources and services provided by Advanced Research Computing, a division of Information and Technology Services at the University of Michigan, Ann Arbor. Finally, this work was performed in part at the University of Michigan Lurie Nanofabrication Facility.

METHODS:
Analysis has 3 major components. 1. the model; 2. the data; and 3. the inference.

To the first, we leverage a reduced-fidelity single emitter model for porous conical type electrospray emitters we have published on previously:
C. B. Whittaker and B. A. Jorns, Journal of Applied Physics 134, 083301 (2023).
This model abstracts the formation of multiple emission sites on a porous source as integrals over a distribution of menisci, where the behavior of each meniscus is predicted using scaling laws dicatting onset into a Taylor cone and the subsequent emission of ions from that cone.
To the second, we leverage the experimental data we reported at the 38th IEPC:
C. B. Whittaker, B. A. Jorns, S. M. Arestie, and C. M. Marrese-Reading, in 38th International Electric Propulsion Conference (Electric Rocket Propulsion Society, 2024) p. 730.

Since we leverage these tools from prior work, the bulk of the analysis in the paper is devoted to specifying an inference over the parameters of the model given the observation of the data. We adopt a Bayesian parameter estimation formulation wherein we treat the parameters of the model prediction probabilistically, such that our state of knowledge in the parameters (i.e., what values best fit the data) is described as a probability distribution. To do so, we fully parameterize the model and describe our prior state of knowledge in each parameter (e.g., what our input uncertainty in the propellant temperature is, what the charge to mass ratio of the ion beam is, and so on).
To specify a prior over the emitter geometry, we make measurements of 143 of the emitters in the array and infer the corresponding distribution over emitter geometry (i.e., manufacturing tolerances) hierarchically to feed into our main inference over the model parameters.

We then prescribe a noise model for how the data were generated (i.e., distributed about the model prediction with some error), such that the highest probability parameter values will be those that most plausibly explain the data (i.e., a likelihood). The product of our prior state of knowledge and the knowledge conferred by the data is our posterior probability in the parameters. We sample from this distribution using an MCMC routine, completing the inference.
This calibrated model is then used to make performance predictions of the system, which we use to make probabilistic predictions for the behavior of the device. Inspecting these predictions allows us to interpret what is going on in the array at the level of individual emitters.

ANALYSIS FLOW:
Below is an overview of the analysis procedure, including the order the various scripts should be run to reproduce the results.
I have included only the data necessary to start this chain of analysis, and not the intermediate results (they constitute too much data to host on the platform). If you would like to examine only one stage of the data analysis or some such, please reach out and I can arrange to transfer the relevant intermediate product.

As a note, much of the flow contains information for another chip I manufactured, inspected, and tested. Unfortunately, the thruster failed due to a liquid short and so I was not able to acquire validation data with it. But these elements in the analysis stand testament to what might have been.
-----
Data/geoms.data
	- this is the starting place for analysis
	- this file contains information about the chips (number of emitters, sequence in the emitter machining operation, position on the chip, etc.) which is exported from a separate code base that I use to procedurally generate cutting operations for different chips
	- I have declined to include this broader code base; if you wish to have access to it, reach out and we will discuss
	- it also includes some information about how the chip area was divided into different regions for imaging and sampling on the profilometer (again, found in my manufacturing code base)

VVVVV

Code/geom_extract.py
	- called as is
	- this code takes the raw data files containing the best-fit geometry to each site (the *.datf files) and consolidates that information into 'Data/geoms.data'
	- that is, we had information about the location of each emitter from the design of the site, but it was necessary to take measurements on the profilometer to determine the actual emitter geometries for some of the sites in the array

VVVVV

Code/geom_learning.py
	- called as is
	- this code encodes the division of the chips into different populations by tool (see the manuscript) and learns (by MCMC routine over the nested inference problem) the distribution of emitter geometry for each population
	- data is further rolled into 'geoms.data'

VVVVV

Code/geom_sampling.py
	- called as is
	- samples from the learned distributions over emitter geometry to populate 'geoms.data' with realizations of possible array geometries base on our uncertainty over the emitter population
	- also samples from assumed distribution over error in extractor offset measurements

VVVVV

Code/extractorGeomCompute.m
	- called as is
	- takes samples over extractor geometry error and translates them into equivalent extractor offset of each site in the array assuming an interpolation between sites that were measured

VVVVV

Code/fieldCompute.m
	- called over a SLURM cluster as a job array (see also 'Code/HPC_fieldCompute.sh')
	- for each realization of emitter-extractor geometry that we compute, this script runs an electrostatic simulation over a FEM in MATLAB
	- requires 'Code/emitterSE.m' to be on the MATLAB path in the installation (i.e., in the local directory wherefrom you call 'fieldCompute.m' on the cluster
	- generates the field profile for each geometry and compresses into a Bezier parameterization
	- each job in the job array will produce a separate output 'fields_X.data' file where you specify when calling the script

VVVVV

Code/field_consol.py
	- called with a target filename/directory as an argument
	- consolidates the 'fields_X.data' files resulting from each job at that that location into a single 'fields.data' containing all the data and metadata

VVVVV

Code/onset_param_learning.py
	- called as is
	- learns a prior distribution over onset model parameters (nested inference problem) from the simulation data of Coffman et al.
	- generates a file 'onset_params.data' containing samples thereof

VVVVV

Code/surfssubspropsbeams_sampling.py
	- called as is
	- samples from the prior over the surface parameters, fluidic properties, and other marginalized quantities and outputs them into respective HDF5 files

VVVVV

Code/param_learning_em202.py
	- called on an HPC cluster via 'HPC_param_learning_em202.sh'
	- requires all other files (geoms, fields, surfs, etc.) and actually samples from the posterior trained on data from em202
	- also requires that data, 'Data/em202_exp.dat'

VVVVV

Code/predicts_em202.py
	- called on an HPC cluster via 'HPC_predicts_em202.sh'
	- takes the learned parameters and repredicts performance for these values at a more finely discretized voltage domain

VVVVV

Code/Plotting/
	- scripts in this directly plot the results
	- rejoice, analysis complete!
-----
FILE INVENTORY:
Below appears a description of all the files included here.
--------------------
Python-spec-file.txt

A spec file for conda that can be used to pull the necessary packages in a new conda environment by running:

conda create --name myenv --file Python-spec-file.txt

This includes all dependencies, the most important of which is that since the code relies on MPI to parallelize jobs, you need a parallel build of HDF5.
For MATLAB codes, using whatever version is called for in the respective batch script should suffice.
--------------------
README.txt

This readme file.
--------------------
Data/geoms.data

HDF5 file serving as a starting point for analysis over the emitter geometries of the chip. Additional datasets are added to this chip by various other scripts.
--------------------
Data/EMIM_IM_*.csv

These files are simple tables of temperature-dependent propellant properties for EMIM-IM.
--------------------
Data/em202_exp.dat

Experimental current vs. voltage data from the MEAT-1.2-202.
--------------------
Data/em202 Sample/

This directory contains emitter geometry data measured from chip em202.
--------------------
Data/em202 Sample/em202_ScanPlan.xlsx

A spreadsheet collecting information about the division of the emitter chip area into different regions for sampling (regions are numbered, bounds given, etc.). Purely informational, not pulled from in any code.
--------------------
Data/em202_Sample/em202_ScanPlan.dat

The same information  as in "em202_ScanPlan.xlsx", but stored in a tab-delimited text file that is reference by one plotting script ("Code/Plotting/geomScatterCornerPlots.m") to render the division of the chip into ddifferent regions for imaging on the profilometer.
--------------------
Data/em202 Sample/*.datf

These files contain the results of fitting a spherically capped cone to each emitter we measured. We exclude the raw measurement results to reduce the overall size of the dataset, but can supply them on request.
--------------------
Data/em203 Sample/

This directory contains similar files as to the em202 case, but for chip em203.
--------------------
Code/

This directory contains all of the analysis code used in the study.
--------------------
Code/EmissionModel/current_model.py

This module implements the multi-site current model at the individual emitter level.
--------------------
Code/EmissionModel/__init__.py

This is just so that Python recognizes the directory 'Code/EmissionModel' as a package. I have a broader code base with some other utilities/models in here for other projects (e.g., fitting geometry), so this has pulled only the relevant code.
--------------------
Code/Sampling/MHDRAM.py

A module with a slew of helpful functions for performing inference, generating samples, etc. Used frequently throughout the study.
--------------------
Code/Sampling/__init__.py

Same story as for the EmissionModel package.
--------------------
Code/geom_extract.py

A script which consolidates all of the fits to each emitter (tip radius, height, etc.) into the 'geoms.data' file.
--------------------
Code/geom_learning.py

A script implementing a Bayesian inference over the population parameters (means, standard deviations, correlations) of each population of the various chips. It draws samples from the posterior distribution and adds them to 'geoms.data'.
--------------------
Code/geom_sampling.py

This script generates realizations of possible array geometries derived from the samples over the distributions of population parameters (means, standard deviations, correlations). That is, it is hierarchical (we have distributions representing our state of knowledge over the parameters of distributions representing variances/tolerances in emitter machining).
--------------------
Code/extractorGeomCompute.m

A MATLAB script that takes the samples of error in the geometry measurement and translates them into realizations of extractor offset by interpolation. The only reason we transfer to and back from MATLAB is because it was easier than programming my own extrapolation over the scattered interpolation data.
--------------------
Code/fieldCompute.m

A function implemented in MATLAB that uses an electrostatic solve with an FEM method to simulate the field strength on the emitter. It compresses each solution by a Bezier parameterization (see the manuscript) and stores it in a data structure.

It's designed to be run over a SLURM job array on an HPC cluster, such that each job in the array can be run in parallel (with results written to a different output file to avoid parallel writes).
--------------------
Code/emitterSE.m

This code functionalizes the electrostatic solve for the emitter geometry and is required to be on the MATLAB path when running 'fieldCompute.m'.
--------------------
Code/HPC_fieldCompute.sh

A SLURM batch script which wraps 'fieldCompute.m' on our SLURM cluster. Simply given as an example of how to call 'fieldCompute.m'. Your HPC solution may vary (at the very least, you won't be charging the "bjorns0" account).
--------------------
Code/field_consol.py

A command-line compatible (has a parser) module that consolidates all the compressed field profiles put out by various 'fieldCompute.m' jobs into a single data file, which is referenced by later scripts. Depending on your naming convention when running the scripts, these may have to change to accommodate a dissimilar job number, field base string, etc.
--------------------
Code/onset_param_learning.py

A script which learns a hierarchical prior distribution over the onset model parameters from simulation data (see manuscript). Populates a corresponding HDF5 file, 'onset_params.data'.
--------------------
Code/surfssubspropsbeams_sampling.py

A script which samples over our prescription for the prior over other parameters (i.e., those not hierarchically learned from other data).
--------------------
Code/param_learning_em202.py

This is the big one. This routine reads in a bunch of data from the other input files and samples over the posterior. That is, it computes the model prediction as a function of a set of input parameters we're learning (current scaling parameters, surface wetting stuff). Because it's fairly computationally intensive but is embarassingly parallel, I've implemented it over MPI so that I could stretch out on a lot of CPUs on U of M's Great Lakes cluster.

It leverages the mpi4py.futures module in Python to farm out individual emitter model evals over the worker pool, and so each worker is initialized with a reference to the data files it needs so that it can read them when necessary while avoiding memory issues. It's less efficient than it could be, but fast enough for my needs.

Exports the samples to its own file 'params.data'.
--------------------
Code/HPC_param_learning_em202.sh

Batch script to call 'param_learning_em202.py' over an MPI pool. Note that due to some wonkiness with the MPICH build on our cluster, I had to call the script under control of mpi4py.futures (to avoid having to dynamically spawn MPI processes).
--------------------
Code/predicts_em202.py

A code which generates all the predictions used in analysis/plotting. A lot of these were already computed online but not stored during the parameter inference problem. Practically, this is because I wanted to regenerate predictions at finer voltage resolution than just where there's data for comparison. Also, we needed to predict new things like all the different subset-varying predictions that go into the sensitivity study.

It also has the unfortunate problem that due to memory limits on the CPUs in the cluster (5 GB), it's not possible to compute the predictions for every emitter for every realization of the marginalized parameters for every sample from the posterior, so we have to completely redo the predictions so that we can get both array-level statistical properties and emitter-level statistical properties.
--------------------
Code/Plotting/

The scripts in this directory are those that generate the plots used in the paper.
--------------------
Code/Plotting/geomScatterCornerPlots.m

This script generates a series of plots (only a few of which appear in the paper) that show distributions over various geometric parameters as measured. It also renders one sampled realization of the full array geometry subject to uncertainty in the emitter population, generates a sequence of plots illustrating the sampling and division of the domain, and so on.
--------------------
Code/Plotting/cornerPlot.m

A helper function to render corner plots over the geometry. A bit outdated now that MATLAB has introduced tiled layouts which have such excellent properties. I'll get around to updating it someday. Also they just straight up implement scatter matrices, but I wanted to do some fancy things with the color coding.
--------------------
Code/Plotting/exampleFieldPlots.m

This script renders the example field plots for a single emitter that appear in the paper. We choose, specifically, the maximum likelihood realization for the geometry (most probably representative of the data), and the median tip radius emitter over this population (some sort of average emitter).
--------------------
Code/Plotting/onsetIntervalPlot.m

Generates the credible intervals over our hierarchical inference over the onset criterion parameters.
--------------------
Code/Plotting/intervalPlots.m

Generates interval plots for the posterior predictive over array current that we construct. There are many more plots here than are shown in the paper (we downselected pretty heavily). Crucially, there are some which render credible intervals for different selections of terms matching things in the sensitivity study, which may be of interest (e.g., varying only the model parameters but keeping the marginalized parameters constant at the maximum likelihood realization.

It also shows variance in predictions on an emitter by emitter basis. One can see that this is largely homogeneous for the emitters we did not measure (illustrating the exchangeability we discussed within the modeling), but varies for the ones we did measure).
--------------------
Code/Plotting/emitterVariancePlots.m

This script renders the plots that illustrate emitter to emitter variability. Notably, it plots the scatter plots as a function of voltage into an animation as well, so that one can inspect the voltage-dependent distribution of emission current predicted by the model.
--------------------
Code/Plotting/sensitivityPlots.m

Renders the plot for the sensitivity study. Also does the same on an individual emitter basis (i.e., which sources of error are dominant when trying to predict the behavior of an individual emitter, for which geometric uncertainty dominates). Data is noisy in the low-voltage regime.
--------------------
Code/Plotting/robustOptPlot.m

Generates the robust optimization figures that appear in the final discussion subsection. Biggest thing worth noting here is that I did not actually save the maximum emitter current over the array during computations, so had to back into that information using order statistics. It's a toy problem nevertheless, and subsumed by later, more comprehensive work of ours.