2. The pymetamodels description

Metamodeling, sensitivity, optimization, calibration and robustness analysis have become important tools for the virtual development of industrial products. These are based on the construction of soft ML metamodels from discrete populations of hard physics-based models (FE, CFD, molecular, continuum, mesosocopic, …) input and output data calibrated with experimental characterisations. This design input variables are defined by their lower and upper bounds or by several possible discrete values schemas. In particular, in parametric optimization variables are systematically modified by mathematical algorithms to get an improvement of an existing design or to find a global optimum [GYTA13]. In the case of sensitivity analysis, discrete populations of models input and output data are generated using the original model or the metamodel to identify the sensitivity indexes [SAC+08]. This indexes study how the uncertainty in the output of a model can be apportioned, qualitatively or quantitatively, to different sources of variation in the input of a model.

Soft metamodels allow handling a large number of design variables and real-time responses. These metamodels can be applied in optimization problems, where the number of design variables can often be vast, improving the time cost required to execute accurate analysis and optimizations. By the use of flexible tools for implementing these problems in high-demand servers.

Pymetamodels combines machine learning (ML) metamodeling, optimization and analysis tools for the virtual development of products within a common abstract framework implemented in an accessible and distributable Python package shared with a permissive license. Through the configuration spreadsheet or programmatically pymetamodels allows to define multivariate problems and implement multi-disciplinary problems in a pythonic fashion.

The development of pymetamodels package is oriented to support ML applications in the area of material science, material informatics and the construction of materials, components and systems soft metamodels informed by hard physics-based modelling (continuum, mesosocopic, … ) [EuropeanCfSCEN18] and experimental characterisations.

In Fig. 2.1 is described an overview flowchart of pymetamodels Python package capabilities the following main areas:

\(Configuration_{case} = Definition_{\ inputs \ and \ responses}\). Configuration of the pymetamodel object and analysis per case in a usable spreadsheet (see Section 2.1). Configuration spreadsheet is also possible to be defined programmatically using the objconf class (see Section 3.3).

\(DOEX = Sampling(Configuration_{case})\). DOE sampling schemes (see Section 2.2).

\(DOEY = model_{iteration}(DOEX)\). Python models interaction and/or direct importing of DOEY data structures.

\(DOEY = metamodel(DOEX)\). Optimal forecasting metamodels construction (see Section 2.3).

\(Sensitivity_{indexes}=Sensitivity_{analysis}(DOEY)\). Sensitivity analysis capabilities (see Section 2.2).

\(Optimization_{min \ local \ optima}=Optimization_{min}(DOEY, Constrains)\). Optimization problems solution, including calibrations (see Section 2.4).

\(Confidence_{intervals}=Robustness_{analysis}(DOEY,COVs)\). Robustness analysis capabilities.

\(Design_{point}=Calibration_{analysis}(DOEY, Constrains, Reference)\). Calibration analysis capabilities.

Fig. 2.1 Overview flowchart of pymetamodels

2.1. Metamodel configuration (spreadsheet)

The pymetamodel abstract object is defined according a configuration spreadsheet file (in a .xls format). A template of the configuration spreadsheet can be downloaded and edited (download conf spreadsheet file), or build programmatically (see Section 3.3). The configuration structure is described as follows.

The cases sheet

The cases sheet specifies the configuration of the different cases to be executed. Each case is described in a row, each column describes one var (see Table 2.1.1). The compulsory vars are the following (additional vars can be added if needed).

case: name and id of the case

vars sheet: name and id of the sheet where are described the input vars for the given case

output sheet: name and id of the sheet where are described the output vars for the given case

samples: number of samples for the sampling activities (\(2^N \ values\))

sensitivity method: name and id of the sensitivity analysis method (see Section 2.2)

comment: comment for the given case

Table 2.1.1 Main cases description for the configuration spreadsheet
case	vars sheet	output sheet	samples	sensitivity method	comment
case_1	case_1_vars_sensi	case_out	50.0	RBD-Fast

The input vars case sheet

The input vars case sheet describes the different input variables or DOEX variables for a given case. The name of this sheet refers to the case sheet “vars sheet” value. Each row defines an input var and each column defines different attributes for the input var (see Table 2.1.2). The compulsory attributes are the following (additional attributes can be added if needed).

variable: name and id of the input variable

value: nominal value of the input variable, use in case is not considered a ranged variable in the DOEX

min: min value of the ranged variable in the DOEX

max: max value of the ranged variable in the DOEX

distribution: type of range distribution (unif, triang, norm, lognorm)

cov_un: covariance used for the generation of the norm distributions

is_range: TRUE or FALSE value to choose if the variable is a range or a single value in the DOEX

ud: units name for the variable (i.e. [m])

alias: alias name for the variable

comment: comment field

constrain: constrain field

Table 2.1.2 Main case input model variables description for the configuration spreadsheet
variable	value	min	max	distribution	is_range	cov_un	ud	alias	comment	comment
X1	1.0	-3.14	3.14	unif	1	0.05	[-]	X1	None	var X1
X2	1.0	-3.14	3.14	unif	1	0.05	[-]	X2	None	var X2
X3	1.0	-3.14	3.14	unif	1	0.05	[-]	X3	None	var X3
X4	1.0	-3.14	3.14	unif	1	0.05	[-]	X4	None	var X4
X5	1.0	-3.14	3.14	unif	1	0.05	[-]	X5	None	var X5

The output vars case sheet

The output vars case sheet describes the different output variables or DOEY variables for a given case. The name of this sheet refers to the case sheet “output sheet” value. Each row defines an input var and each column defines different attributes for the input var (see Table 2.1.3). The compulsory attributes are the following (additional attributes can be added if needed).

variable: name and id of the output variable

value: nominal value of the output variable

ud: units name for the variable (i.e. [m])

comment: comment field

array: TRUE or FALSE, is the output variable an array or single value

op_min TRUE if variable is to be minimize, \(min(DOEY_{var})\)

op_min=0 TRUE if variable is to be optimize to 0, \(objective(DOEY_{var}=0)\)

ineq_(>=0) TRUE if variables is consider for an inequality constrain, \(DOEY_{var}>=0\)

eq_(=0) TRUE if variables is consider for an equality constrain =0, \(DOEY_{var}=0\)

Table 2.1.3 Main case output model variables description for the configuration spreadsheet
variable	value	ud	comment	array	op_min	op_min=0	ineq_(>=0)	eq_(=0)
Y1	None	m^3	function output	0	0	0	0	0
Y2	None	m^3	function output	0	1	0	0	0

2.2. Available DOEX sampling generators and sensitivity analysis

In pymetamodels package the DOEX sampling schema and sensitivity analysis are defined together because exists some dependencies between both. This couple is defined in the configuration spreadsheet, in “sensitivity method” var. The available sampling and sensitivity analysis schemas are given in Table 2.2.1. This methods are based in SALib package [HU17].

Table 2.2.1 Sampling and sensitivity analysis available schemes
sensitivity method	Sampling method	Sensitivity method	Refs.
Sobol	Saltelli sampling	Sobol sensitivity analysis	[Sob01, SAA+10, CSC11, Sal02]
Morris	Method of Morris	Morris Analysis	[RRSF12, CCS07, Mor91]
RBD-Fast	Latin hypercube sampling (LHS)	RBD-FAST - Random balance designs fourier amplitude sensitivity test	[MBC00, IHC81, TGM06, Pli10, TP12]
Fast	FAST - Fourier amplitude sensitivity test	Fourier amplitude sensitivity test model (eFAST)	[CFS+73, STC99]
Delta-MIM	Latin hypercube sampling (LHS)	Delta moment-independent measure	[MBC00, IHC81, Bor07, PBS13]
DGSM	Sampling for derivative-based global sensitivity measure (DGSM)	Derivative-based Global Sensitivity Measure (DGSM)	[SobolK09, SK10]
Factorial	Fractional factorial sampling	Fractional factorial	[SAC+08]
PAWN	Latin hypercube sampling (LHS)	PAWN sensitivity analysis	[PW15, PW18, BF20]

2.3. Optimal forecasting metamodels construction

In pymetamodels the metamodels construction is based on the automation, training and robust selection of ML models adapted for each DOEX and DOEY group population. The agile response of these metamodels allow to be used in further robustness, plotting and optimization routines with safety, within the variables ranges limits express in the configuration sheet (see Section 2.1). The robust selection of machine learning ML models is carry out by an schema selection approach from different ML libraries [FabianPedregosaVG+11, MW08]. In each schema there are available multiple multi-variate ML learning models of different fashions. These models are train using adaptive parameter values estimator techniques, the most optimal model prognosis is chosen according a residuals scoring evaluation. These routine is carried out with the function run_metamodel_construction(). The metamodels are save as an external file .metaita which can be implement lately in secondary applications or be used in the plotting, robustness, calibration and optimization routines.

The available schemas and models are shown in Table 2.3.1 and Table 2.3.2.

Table 2.3.1 Metamoeling schemas
Schema	Regressor types
general	ID1, ID2, ID3, ID4, ID5, ID6
general_fast	ID1, ID2, ID3, ID4, ID5, ID6
general_fast_nonpol	ID1, ID2, ID3, ID4, ID6
linear	ID1, ID2, ID3
gaussian	ID4
spline	ID5
polynomial	ID6
svr	ID7
neural	ID8

Table 2.3.2 ML models available
ID	Type	ML model	Refs.
ID1.1	linear regressors	LassoCV	[FabianPedregosaVG+11, Shr96]
ID1.2	linear regressors	LassoLarsCV	[FabianPedregosaVG+11, ICG13]
ID1.3	linear regressors	ElasticNetCV	[FabianPedregosaVG+11, Vov13]
ID1.4	linear regressors	LassoLarsIC	[FabianPedregosaVG+11, ICG13]
ID1.5	linear regressors	RidgeCV	[FabianPedregosaVG+11, Vov13]
ID1.6	linear regressors	OrthogonalMatchingPursuitCV	[FabianPedregosaVG+11, SBA20]
ID2.1	outlier-robust regressors	HuberRegressor	[FabianPedregosaVG+11, Owe07]
ID3.1	bayesian regressors	ARDRegression	[FabianPedregosaVG+11, Mac94]
ID3.2	bayesian regressors	BayesianRidge	[FabianPedregosaVG+11, Mac92]
ID4.1	gaussian regressors	GaussianProcessRegressor	[FabianPedregosaVG+11, EG00]
ID5.1	polynomial regressors	PolynomialLassoCV	[FabianPedregosaVG+11, LS81, MMAC16]
ID5.2	polynomial regressors	PolynomialLassoLarsIC	[FabianPedregosaVG+11, LS81, MMAC16]
ID5.3	polynomial regressors	PolynomialBayesianRidge	[FabianPedregosaVG+11, LS81, MMAC16]
ID6.1	spline regressors	SplineLassoCV	[FabianPedregosaVG+11, MMAC16]
ID6.2	spline regressors	SplineLassoLarsIC	[FabianPedregosaVG+11, MMAC16]
ID6.3	spline regressors	SplineBayesianRidge	[FabianPedregosaVG+11, MMAC16]
ID7.1	svn regressor	LinearSVR	[FabianPedregosaVG+11]
ID7.2	svn regressor	SVR	[FabianPedregosaVG+11]
ID8.1	neural regressor	MLPR	[FabianPedregosaVG+11]

2.4. Optimization problem resolution

In the modelling optimization problems it can be always be found the following functions and variables [CK09],

Objective function \(DOEY_{varY}\): A series of functions that includes all possible designs, and returns an indicator number of the goodness of the design. It is a common criterion that a \(f\) small value is better than a large one (a minimization problem). \(f\) can be an indicator of weight, displacement, effective stress, stiffness, environmental impact, cost of production … In the case of a multi-objective optimization, the objective function is composed of the union of more than one sub-objective function.
Design variable \((DOEX)\): Variables, functions or vectors that describes the design, which is changed during optimization. It may represent geometry or a choice of material. When representing a geometry, it can be a sophisticated spline shape or a simply thickness of a bar.
State constrains \(DOEY_{constrains}\): Is a series of functions or vector that represents the response of the structure for a given design \(x\). For a mechanical structure, it can be displacement, stress, strain or force.

A general structural optimization \((SO)\) problem takes the following form (see Eq.2.4.1):

\begin{equation} (SO) \begin{cases} min(DOEY_{varY}) \text{ } \forall \text{DOEX,DOEY} \\ when \begin{cases} \text{bounds: design constraints or variables }(DOEX_{bounds}) \\ \text{constrains: state constraints }DOEY_{constrains} \\ \text{equilibrium constraints} \end{cases} \end{cases} \end{equation}

In the case of multi-objective optimization problems several objective functions are considered,

\begin{equation} min DOEY_{varY} = min (DOEY_{varY_1},DOEY_{varY_2}, \cdots ,DOEY_{varY_l}) \end{equation}

where \(l\) is the number of objectives functions, and each objective function satisfy a given set of constrains.

Pymetamodels allows to resolve optimization problems adapted for each DOEX and DOEY group population. The response of the metamodels constructed with Section 2.3 routines allows to apply optimization algorithms and routines to minimize one of the DOEY variables from this metamodels. The optimization routines take into account DOEX variables bounds constrains (define in the input vars case sheet configuration spreadsheet) and equality and/or inequality constrains for DOEY variables defined in the output vars case sheet. Optimization routines are carried out with the function run_optimization_problem(). Depending on the optimization schemas several optimization methods are proof, and the best is chosen. The optimization results are save in the external file .optita which can be implement lately in secondary applications or be used in the plotting, robustness, calibration and optimization routines.

The available optimization schemas and models are shown in Table 2.3.1 and Table 2.3.2.

Table 2.4.1 Optimization schemas
Schema	Optimization types
general	“iter_grid_method”, “shgo”, “shgo_slow”, “diff_evol”, “min_gen”, “Powell”, “Nelder-Mead”, “TNC”, “COBYLA”, “SLSQP”
general_fast	“iter_grid_method”, “shgo”, “diff_evol”, “min_gen”, “Powell”, “Nelder-Mead”, “TNC”, “COBYLA”, “SLSQP”
general_with_constrains	“iter_grid_method”, “COBYLA”, “SLSQP”, “shgo”
global	“iter_grid_method”, “shgo”, “shgo_slow”, “diff_evol”
minimize	“min_gen”, “Powell”, “Nelder-Mead”, “TNC”, “COBYLA”, “SLSQP”
grid_method	“grid_method”
iter_grid_method	“iter_grid_method”

Table 2.4.2 Optimization models available
ID	Type	Optimization method	Allow constrains	Refs.
OP1.1	global	iter_grid_method	yes	NA
OP1.2	global	grid_method	yes	NA
OP1.3	global	shgo	yes	[ESF18, VGO+20]
OP1.4	global	shgo_slow	yes	[ESF18, VGO+20]
OP1.5	global	diff_evol	no	[VGO+20, SP97]
OP2.1	local	min_gen	no	[VGO+20]
OP2.2	local	Powell	no	[VGO+20, PBDS20, Pow64]
OP2.3	local	Nelder-Mead	no	[VGO+20, GH12]
OP2.4	local	TNC	no	[VGO+20]
OP2.5	local	COBYLA	yes	[VGO+20, PBDS20]
OP2.6	local	SLSQP	yes	[VGO+20, BGLS03]