2. The pymetamodels description

Metamodeling, sensitivity, optimization, calibration and robustness analysis have become important tools for the virtual development of industrial products. These are based on the construction of soft ML metamodels from discrete populations of hard physics-based models (FE, CFD, molecular, continuum, mesosocopic, …) input and output data calibrated with experimental characterisations. This design input variables are defined by their lower and upper bounds or by several possible discrete values schemas. In particular, in parametric optimization variables are systematically modified by mathematical algorithms to get an improvement of an existing design or to find a global optimum [GYTA13]. In the case of sensitivity analysis, discrete populations of models input and output data are generated using the original model or the metamodel to identify the sensitivity indexes [SAC+08]. This indexes study how the uncertainty in the output of a model can be apportioned, qualitatively or quantitatively, to different sources of variation in the input of a model.

Soft metamodels allow handling a large number of design variables and real-time responses. These metamodels can be applied in optimization problems, where the number of design variables can often be vast, improving the time cost required to execute accurate analysis and optimizations. By the use of flexible tools for implementing these problems in high-demand servers.

Pymetamodels combines machine learning (ML) metamodeling, optimization and analysis tools for the virtual development of products within a common abstract framework implemented in an accessible and distributable Python package shared with a permissive license. Through the configuration spreadsheet or programmatically pymetamodels allows to define multivariate problems and implement multi-disciplinary problems in a pythonic fashion.

The development of pymetamodels package is oriented to support ML applications in the area of material science, material informatics and the construction of materials, components and systems soft metamodels informed by hard physics-based modelling (continuum, mesosocopic, … ) [EuropeanCfSCEN18] and experimental characterisations.

In Fig. 2.1 is described an overview flowchart of pymetamodels Python package capabilities the following main areas:

  • \(Configuration_{case} = Definition_{\ inputs \ and \ responses}\). Configuration of the pymetamodel object and analysis per case in a usable spreadsheet (see Section 2.1). Configuration spreadsheet is also possible to be defined programmatically using the objconf class (see Section 3.3).

  • \(DOEX = Sampling(Configuration_{case})\). DOE sampling schemes (see Section 2.2).

  • \(DOEY = model_{iteration}(DOEX)\). Python models interaction and/or direct importing of DOEY data structures.

  • \(DOEY = metamodel(DOEX)\). Optimal forecasting metamodels construction (see Section 2.3).

  • \(Sensitivity_{indexes}=Sensitivity_{analysis}(DOEY)\). Sensitivity analysis capabilities (see Section 2.2).

  • \(Optimization_{min \ local \ optima}=Optimization_{min}(DOEY, Constrains)\). Optimization problems solution, including calibrations (see Section 2.4).

  • \(Confidence_{intervals}=Robustness_{analysis}(DOEY,COVs)\). Robustness analysis capabilities.

  • \(Design_{point}=Calibration_{analysis}(DOEY, Constrains, Reference)\). Calibration analysis capabilities.

parse-figure

Fig. 2.1 Overview flowchart of pymetamodels

2.1. Metamodel configuration (spreadsheet)

The pymetamodel abstract object is defined according a configuration spreadsheet file (in a .xls format). A template of the configuration spreadsheet can be downloaded and edited (download conf spreadsheet file), or build programmatically (see Section 3.3). The configuration structure is described as follows.

The cases sheet

The cases sheet specifies the configuration of the different cases to be executed. Each case is described in a row, each column describes one var (see Table 2.1.1). The compulsory vars are the following (additional vars can be added if needed).

  • case: name and id of the case

  • vars sheet: name and id of the sheet where are described the input vars for the given case

  • output sheet: name and id of the sheet where are described the output vars for the given case

  • samples: number of samples for the sampling activities (\(2^N \ values\))

  • sensitivity method: name and id of the sensitivity analysis method (see Section 2.2)

  • comment: comment for the given case

Table 2.1.1 Main cases description for the configuration spreadsheet

case

vars sheet

output sheet

samples

sensitivity method

comment

case_1

case_1_vars_sensi

case_out

50.0

RBD-Fast


The input vars case sheet

The input vars case sheet describes the different input variables or DOEX variables for a given case. The name of this sheet refers to the case sheet “vars sheet” value. Each row defines an input var and each column defines different attributes for the input var (see Table 2.1.2). The compulsory attributes are the following (additional attributes can be added if needed).

  • variable: name and id of the input variable

  • value: nominal value of the input variable, use in case is not considered a ranged variable in the DOEX

  • min: min value of the ranged variable in the DOEX

  • max: max value of the ranged variable in the DOEX

  • distribution: type of range distribution (unif, triang, norm, lognorm)

  • cov_un: covariance used for the generation of the norm distributions

  • is_range: TRUE or FALSE value to choose if the variable is a range or a single value in the DOEX

  • ud: units name for the variable (i.e. [m])

  • alias: alias name for the variable

  • comment: comment field

  • constrain: constrain field

Table 2.1.2 Main case input model variables description for the configuration spreadsheet

variable

value

min

max

distribution

is_range

cov_un

ud

alias

comment

constrain

comment

X1

1.0

-3.14

3.14

unif

1

0.05

[-]

X1

None

0

var X1

X2

1.0

-3.14

3.14

unif

1

0.05

[-]

X2

None

0

var X2

X3

1.0

-3.14

3.14

unif

1

0.05

[-]

X3

None

0

var X3

X4

1.0

-3.14

3.14

unif

1

0.05

[-]

X4

None

0

var X4

X5

1.0

-3.14

3.14

unif

1

0.05

[-]

X5

None

0

var X5


The output vars case sheet

The output vars case sheet describes the different output variables or DOEY variables for a given case. The name of this sheet refers to the case sheet “output sheet” value. Each row defines an input var and each column defines different attributes for the input var (see Table 2.1.3). The compulsory attributes are the following (additional attributes can be added if needed).

  • variable: name and id of the output variable

  • value: nominal value of the output variable

  • ud: units name for the variable (i.e. [m])

  • comment: comment field

  • array: TRUE or FALSE, is the output variable an array or single value

  • op_min TRUE if variable is to be minimize, \(min(DOEY_{var})\)

  • op_min=0 TRUE if variable is to be optimize to 0, \(objective(DOEY_{var}=0)\)

  • ineq_(>=0) TRUE if variables is consider for an inequality constrain, \(DOEY_{var}>=0\)

  • eq_(=0) TRUE if variables is consider for an equality constrain =0, \(DOEY_{var}=0\)

Table 2.1.3 Main case output model variables description for the configuration spreadsheet

variable

value

ud

comment

array

op_min

op_min=0

ineq_(>=0)

eq_(=0)

Y1

None

m^3

function output

0

0

0

0

0

Y2

None

m^3

function output

0

1

0

0

0


2.2. Available DOEX sampling generators and sensitivity analysis

In pymetamodels package the DOEX sampling schema and sensitivity analysis are defined together because exists some dependencies between both. This couple is defined in the configuration spreadsheet, in “sensitivity method” var. The available sampling and sensitivity analysis schemas are given in Table 2.2.1. This methods are based in SALib package [HU17].

Table 2.2.1 Sampling and sensitivity analysis available schemes

sensitivity method

Sampling method

Sensitivity method

Refs.

Sobol

Saltelli sampling

Sobol sensitivity analysis

[Sob01, SAA+10, CSC11, Sal02]

Morris

Method of Morris

Morris Analysis

[RRSF12, CCS07, Mor91]

RBD-Fast

Latin hypercube sampling (LHS)

RBD-FAST - Random balance designs fourier amplitude sensitivity test

[MBC00, IHC81, TGM06, Pli10, TP12]

Fast

FAST - Fourier amplitude sensitivity test

Fourier amplitude sensitivity test model (eFAST)

[CFS+73, STC99]

Delta-MIM

Latin hypercube sampling (LHS)

Delta moment-independent measure

[MBC00, IHC81, Bor07, PBS13]

DGSM

Sampling for derivative-based global sensitivity measure (DGSM)

Derivative-based Global Sensitivity Measure (DGSM)

[SobolK09, SK10]

Factorial

Fractional factorial sampling

Fractional factorial

[SAC+08]

PAWN

Latin hypercube sampling (LHS)

PAWN sensitivity analysis

[PW15, PW18, BF20]


2.3. Optimal forecasting metamodels construction

In pymetamodels the metamodels construction is based on the automation, training and robust selection of ML models adapted for each DOEX and DOEY group population. The agile response of these metamodels allow to be used in further robustness, plotting and optimization routines with safety, within the variables ranges limits express in the configuration sheet (see Section 2.1). The robust selection of machine learning ML models is carry out by an schema selection approach from different ML libraries [FabianPedregosaVG+11, MW08]. In each schema there are available multiple multi-variate ML learning models of different fashions. These models are train using adaptive parameter values estimator techniques, the most optimal model prognosis is chosen according a residuals scoring evaluation. These routine is carried out with the function run_metamodel_construction(). The metamodels are save as an external file .metaita which can be implement lately in secondary applications or be used in the plotting, robustness, calibration and optimization routines.

The available schemas and models are shown in Table 2.3.1 and Table 2.3.2.

Table 2.3.1 Metamoeling schemas

Schema

Regressor types

general

ID1, ID2, ID3, ID4, ID5, ID6

general_fast

ID1, ID2, ID3, ID4, ID5, ID6

general_fast_nonpol

ID1, ID2, ID3, ID4, ID6

linear

ID1, ID2, ID3

gaussian

ID4

spline

ID5

polynomial

ID6

svr

ID7

neural

ID8

Table 2.3.2 ML models available

ID

Type

ML model

Refs.

ID1.1

linear regressors

LassoCV

[FabianPedregosaVG+11, Shr96]

ID1.2

linear regressors

LassoLarsCV

[FabianPedregosaVG+11, ICG13]

ID1.3

linear regressors

ElasticNetCV

[FabianPedregosaVG+11, Vov13]

ID1.4

linear regressors

LassoLarsIC

[FabianPedregosaVG+11, ICG13]

ID1.5

linear regressors

RidgeCV

[FabianPedregosaVG+11, Vov13]

ID1.6

linear regressors

OrthogonalMatchingPursuitCV

[FabianPedregosaVG+11, SBA20]

ID2.1

outlier-robust regressors

HuberRegressor

[FabianPedregosaVG+11, Owe07]

ID3.1

bayesian regressors

ARDRegression

[FabianPedregosaVG+11, Mac94]

ID3.2

bayesian regressors

BayesianRidge

[FabianPedregosaVG+11, Mac92]

ID4.1

gaussian regressors

GaussianProcessRegressor

[FabianPedregosaVG+11, EG00]

ID5.1

polynomial regressors

PolynomialLassoCV

[FabianPedregosaVG+11, LS81, MMAC16]

ID5.2

polynomial regressors

PolynomialLassoLarsIC

[FabianPedregosaVG+11, LS81, MMAC16]

ID5.3

polynomial regressors

PolynomialBayesianRidge

[FabianPedregosaVG+11, LS81, MMAC16]

ID6.1

spline regressors

SplineLassoCV

[FabianPedregosaVG+11, MMAC16]

ID6.2

spline regressors

SplineLassoLarsIC

[FabianPedregosaVG+11, MMAC16]

ID6.3

spline regressors

SplineBayesianRidge

[FabianPedregosaVG+11, MMAC16]

ID7.1

svn regressor

LinearSVR

[FabianPedregosaVG+11]

ID7.2

svn regressor

SVR

[FabianPedregosaVG+11]

ID8.1

neural regressor

MLPR

[FabianPedregosaVG+11]


2.4. Optimization problem resolution

In the modelling optimization problems it can be always be found the following functions and variables [CK09],

  • Objective function \(DOEY_{varY}\): A series of functions that includes all possible designs, and returns an indicator number of the goodness of the design. It is a common criterion that a \(f\) small value is better than a large one (a minimization problem). \(f\) can be an indicator of weight, displacement, effective stress, stiffness, environmental impact, cost of production … In the case of a multi-objective optimization, the objective function is composed of the union of more than one sub-objective function.

  • Design variable \((DOEX)\): Variables, functions or vectors that describes the design, which is changed during optimization. It may represent geometry or a choice of material. When representing a geometry, it can be a sophisticated spline shape or a simply thickness of a bar.

  • State constrains \(DOEY_{constrains}\): Is a series of functions or vector that represents the response of the structure for a given design \(x\). For a mechanical structure, it can be displacement, stress, strain or force.

A general structural optimization \((SO)\) problem takes the following form (see Eq.2.4.1):

\begin{equation} (SO) \begin{cases} min(DOEY_{varY}) \text{ } \forall \text{DOEX,DOEY} \\ when \begin{cases} \text{bounds: design constraints or variables }(DOEX_{bounds}) \\ \text{constrains: state constraints }DOEY_{constrains} \\ \text{equilibrium constraints} \end{cases} \end{cases} \end{equation}

In the case of multi-objective optimization problems several objective functions are considered,

\begin{equation} min DOEY_{varY} = min (DOEY_{varY_1},DOEY_{varY_2}, \cdots ,DOEY_{varY_l}) \end{equation}

where \(l\) is the number of objectives functions, and each objective function satisfy a given set of constrains.

Pymetamodels allows to resolve optimization problems adapted for each DOEX and DOEY group population. The response of the metamodels constructed with Section 2.3 routines allows to apply optimization algorithms and routines to minimize one of the DOEY variables from this metamodels. The optimization routines take into account DOEX variables bounds constrains (define in the input vars case sheet configuration spreadsheet) and equality and/or inequality constrains for DOEY variables defined in the output vars case sheet. Optimization routines are carried out with the function run_optimization_problem(). Depending on the optimization schemas several optimization methods are proof, and the best is chosen. The optimization results are save in the external file .optita which can be implement lately in secondary applications or be used in the plotting, robustness, calibration and optimization routines.

The available optimization schemas and models are shown in Table 2.3.1 and Table 2.3.2.

Table 2.4.1 Optimization schemas

Schema

Optimization types

general

“iter_grid_method”, “shgo”, “shgo_slow”, “diff_evol”, “min_gen”, “Powell”, “Nelder-Mead”, “TNC”, “COBYLA”, “SLSQP”

general_fast

“iter_grid_method”, “shgo”, “diff_evol”, “min_gen”, “Powell”, “Nelder-Mead”, “TNC”, “COBYLA”, “SLSQP”

general_with_constrains

“iter_grid_method”, “COBYLA”, “SLSQP”, “shgo”

global

“iter_grid_method”, “shgo”, “shgo_slow”, “diff_evol”

minimize

“min_gen”, “Powell”, “Nelder-Mead”, “TNC”, “COBYLA”, “SLSQP”

grid_method

“grid_method”

iter_grid_method

“iter_grid_method”

Table 2.4.2 Optimization models available

ID

Type

Optimization method

Allow constrains

Refs.

OP1.1

global

iter_grid_method

yes

NA

OP1.2

global

grid_method

yes

NA

OP1.3

global

shgo

yes

[ESF18, VGO+20]

OP1.4

global

shgo_slow

yes

[ESF18, VGO+20]

OP1.5

global

diff_evol

no

[VGO+20, SP97]

OP2.1

local

min_gen

no

[VGO+20]

OP2.2

local

Powell

no

[VGO+20, PBDS20, Pow64]

OP2.3

local

Nelder-Mead

no

[VGO+20, GH12]

OP2.4

local

TNC

no

[VGO+20]

OP2.5

local

COBYLA

yes

[VGO+20, PBDS20]

OP2.6

local

SLSQP

yes

[VGO+20, BGLS03]