Skip to content

Commit 8fc0857

Browse files
authored
Merge pull request #43 from lukapecnik/original_method_support
Original NiaAML method support [ci skip]
2 parents 2c88056 + 76428ce commit 8fc0857

8 files changed

Lines changed: 364 additions & 109 deletions

File tree

README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ For a full example see the [Examples section](#examples) or the list of implemen
9696

9797
## Optimization Process And Parameter Tuning
9898

99-
In NiaAML there are two types of optimization. The goal of the first type is to find an optimal set of components (feature selection algorithm, feature transformation algorithm and classifier). The next step is to find optimal parameters for the selected set of components, and that is the goal of the second type of optimization. Each component has an attribute `_params`, which is a dictionary of parameters and their possible values.
99+
In the modifier version of NiaAML optimization process there are two types of optimization. The goal of the first type is to find an optimal set of components (feature selection algorithm, feature transformation algorithm and classifier). The next step is to find optimal parameters for the selected set of components, and that is the goal of the second type of optimization. Each component has an attribute `_params`, which is a dictionary of parameters and their possible values.
100100

101101
```python
102102
self._params = dict(
@@ -111,6 +111,8 @@ Let's say we have a classifier with 3 parameters, a feature selection algorithm
111111

112112
In some cases we may want to tune a parameter that needs additional information for setting its range of values, so we cannot set the range in the initialization method. In that case, we should set its value in the dictionary to None and define it later in the process. The parameter will be a part of the parameter tuning process as soon as we define its possible values. For example, see [Select K Best Feature Selection](niaaml/preprocessing/feature_selection/select_k_best.py) and its parameter `k`.
113113

114+
**The NiaAML framwork also supports running optimization according to the original method proposed in [[1]](#1) where the components selection and hyperparameter optimization steps are combined into one.**
115+
114116
## Examples
115117

116118
### Example of Usage
@@ -135,13 +137,18 @@ pipeline_optimizer = PipelineOptimizer(
135137
feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
136138
feature_transform_algorithms=['Normalizer', 'StandardScaler']
137139
)
138-
pipeline = pipeline_optimizer.run('Accuracy', 20, 20, 400, 400, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
140+
141+
# run the modified version of optimization
142+
pipeline1 = pipeline_optimizer.run('Accuracy', 15, 15, 300, 300, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
143+
144+
# run the original version
145+
pipeline2 = pipeline_optimizer.run_v1('Accuracy', 15, 400, 'ParticleSwarmAlgorithm')
139146
```
140147

141148
You can save a result of the optimization process as an object to a file for later use.
142149

143150
```python
144-
pipeline.export('pipeline.ppln')
151+
pipeline1.export('pipeline.ppln')
145152
```
146153

147154
And also load it from a file and use the pipeline.
@@ -157,7 +164,7 @@ y = loaded_pipeline.run(x)
157164
You can also save a user-friendly representation of a pipeline to a text file.
158165

159166
```python
160-
pipeline.export_text('pipeline.txt')
167+
pipeline1.export_text('pipeline.txt')
161168
```
162169

163170
This is a very simple example with dummy data. It is only intended to give you a basic idea of how to use the framework.

docs/getting_started.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Create a new file, with name, for example *my_first_pipeline.py* and paste in th
3131
feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
3232
feature_transform_algorithms=['Normalizer', 'StandardScaler']
3333
)
34-
pipeline = pipeline_optimizer.run('Accuracy', 20, 20, 400, 400, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
34+
pipeline = pipeline_optimizer.run('Accuracy', 15, 15, 300, 300, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
3535
3636
**As you can see, pipeline components, fitness function and optimization algorithms are always passed into pipeline optimization using their class names.** The example below uses the Particle Swarm Algorithm as the optimization algorithm. You can find a list of all available algorithms in the `NiaPy's documentation <https://niapy.readthedocs.io/en/stable/>`_.
3737
Now you can run it using the command ``python my_first_pipeline.py``. The code currently does not do much, but we can save our pipeline to a file so we can use it later or save a user-friendly representation of it to a text file. You can choose one or both of the scenarios by adding the code below.
@@ -54,6 +54,12 @@ If you want to load and use the saved pipeline later, you can use the following
5454
x = pandas.DataFrame([[0.35, 0.46, 5.32], [0.16, 0.55, 12.5]])
5555
y = loaded_pipeline.run(x)
5656
57+
**The framework also supports the original version of optimization process where the components selection and hyperparameter optimization steps are combined into one. You can replace the ``run`` method with the following code.**
58+
59+
.. code:: python
60+
61+
pipeline = pipeline_optimizer.run_v1('Accuracy', 15, 400, 'ParticleSwarmAlgorithm')
62+
5763
This is a very simple example with dummy data. It is only intended to give you a basic idea on how to use the framework. **NiaAML supports numerical and categorical features.**
5864

5965
Find more examples `here <https://github.com/lukapecnik/NiaAML/tree/master/examples>`_
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
import os
2+
from niaaml import PipelineOptimizer, Pipeline
3+
from niaaml.data import CSVDataReader
4+
5+
"""
6+
This example presents how to use the PipelineOptimizer class to run the original optimization process according to the paper where NiaAML is proposed.
7+
This example is using an instance of CSVDataReader.
8+
The instantiated PipelineOptimizer will try and assemble the best pipeline with the components that are specified in its constructor.
9+
"""
10+
11+
# prepare data reader using csv file
12+
data_reader = CSVDataReader(src=os.path.dirname(os.path.abspath(__file__)) + '/example_files/dataset.csv', has_header=False, contains_classes=True)
13+
14+
# instantiate PipelineOptimizer that chooses among specified classifiers, feature selection algorithms and feature transform algorithms
15+
# log is True by default, log_verbose means more information if True, log_output_file is the destination of a log file
16+
# if log_output_file is not provided there is no file created
17+
# if log is False, logging is turned off
18+
pipeline_optimizer = PipelineOptimizer(
19+
data=data_reader,
20+
classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron', 'RandomForest', 'ExtremelyRandomizedTrees', 'LinearSVC'],
21+
feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
22+
feature_transform_algorithms=['Normalizer', 'StandardScaler'],
23+
log=True,
24+
log_verbose=True,
25+
log_output_file='output.log'
26+
)
27+
28+
# runs the optimization process
29+
# one of the possible pipelines in this case is: SelectPercentile -> Normalizer -> RandomForest
30+
# returns the best found pipeline
31+
# the chosen fitness function and optimization algorithm are Accuracy and Particle Swarm Algorithm
32+
pipeline = pipeline_optimizer.run_v1('Accuracy', 10, 30, 'ParticleSwarmAlgorithm')
33+
34+
# pipeline variable contains Pipeline object that can be used for further classification, exported as an object (that can be later loaded and used) or exported as text file

niaaml/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,4 @@
2929
]
3030

3131
__project__ = 'niaaml'
32-
__version__ = '1.1.0'
32+
__version__ = '1.1.1rc1'

niaaml/pipeline.py

Lines changed: 73 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,8 @@
1313
import os
1414

1515
__all__ = [
16-
'Pipeline'
16+
'Pipeline',
17+
'_PipelineBenchmark'
1718
]
1819

1920
class Pipeline:
@@ -355,6 +356,74 @@ def __init__(self, x, y, parent, population_size, fitness_function):
355356
self.__evals = 0
356357
self.__logger = self.__parent.get_logger()
357358
Benchmark.__init__(self, 0.0, 1.0)
359+
360+
@staticmethod
361+
def evaluate_pipeline(solution_vector, feature_selection_algorithm, feature_transform_algorithm, classifier, x, y, fitness_function):
362+
"""Evaluate pipeline setup.
363+
364+
Arguments:
365+
solution_vector (numpy.ndarray[float]): Individual of population/ possible solution to map hyperparameters from.
366+
feature_selection_algorithm (Optional[FeatureSelectionAlgorithm]): Feature selection algorithm instance.
367+
feature_transform_algorithm (Optional[FeatureTransformAlgorithm]): Feature transform algorithm instance.
368+
classifier (Classifier): Classifier instance.
369+
x (pandas.core.frame.DataFrame): n samples to classify.
370+
y (pandas.core.series.Series): n classes of the samples in the x array.
371+
fitness_function (FitnessFunction): Fitness function instance.
372+
373+
Returns:
374+
Tuple[float, numpy.array[bool], OptimizationStats]:
375+
1. Fitness.
376+
2. Feature selection mask.
377+
3. Optimization statistics.
378+
"""
379+
feature_selection_algorithm_params = feature_selection_algorithm.get_params_dict() if feature_selection_algorithm else dict()
380+
feature_transform_algorithm_params = feature_transform_algorithm.get_params_dict() if feature_transform_algorithm else dict()
381+
classifier_params = classifier.get_params_dict()
382+
383+
params_all = [
384+
(feature_selection_algorithm_params, feature_selection_algorithm),
385+
(feature_transform_algorithm_params, feature_transform_algorithm),
386+
(classifier_params, classifier)
387+
]
388+
solution_index = 0
389+
390+
for i in params_all:
391+
args = dict()
392+
for key in i[0]:
393+
if i[0][key] is not None:
394+
if isinstance(i[0][key].value, MinMax):
395+
val = solution_vector[solution_index] * i[0][key].value.max + i[0][key].value.min
396+
if i[0][key].param_type is np.intc or i[0][key].param_type is np.int or i[0][key].param_type is np.uintc or i[0][key].param_type is np.uint:
397+
val = i[0][key].param_type(np.floor(val))
398+
if val >= i[0][key].value.max:
399+
val = i[0][key].value.max - 1
400+
args[key] = val
401+
else:
402+
args[key] = i[0][key].value[get_bin_index(solution_vector[solution_index], len(i[0][key].value))]
403+
solution_index += 1
404+
if i[1] is not None:
405+
i[1].set_parameters(**args)
406+
407+
selected_features_mask = None
408+
409+
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
410+
411+
if feature_selection_algorithm is None:
412+
selected_features_mask = np.ones(x.shape[1], dtype=bool)
413+
else:
414+
selected_features_mask = feature_selection_algorithm.select_features(x_train, y_train)
415+
416+
x_train = x_train.loc[:, selected_features_mask]
417+
x_test = x_test.loc[:, selected_features_mask]
418+
419+
if feature_transform_algorithm is not None:
420+
feature_transform_algorithm.fit(x_train)
421+
x_train = feature_transform_algorithm.transform(x_train)
422+
x_test = feature_transform_algorithm.transform(x_test)
423+
424+
classifier.fit(x_train, y_train)
425+
predictions = classifier.predict(x_test)
426+
return fitness_function.get_fitness(predictions, y_test) * -1, selected_features_mask, OptimizationStats(predictions, y_test)
358427

359428
def function(self):
360429
r"""Override Benchmark function.
@@ -380,64 +449,16 @@ def evaluate(D, sol):
380449
feature_selection_algorithm = self.__parent.get_feature_selection_algorithm()
381450
feature_transform_algorithm = self.__parent.get_feature_transform_algorithm()
382451
classifier = self.__parent.get_classifier()
383-
selected_features_mask = None
384-
385-
feature_selection_algorithm_params = feature_selection_algorithm.get_params_dict() if feature_selection_algorithm else dict()
386-
feature_transform_algorithm_params = feature_transform_algorithm.get_params_dict() if feature_transform_algorithm else dict()
387-
classifier_params = classifier.get_params_dict()
388-
389-
params_all = [
390-
(feature_selection_algorithm_params, feature_selection_algorithm),
391-
(feature_transform_algorithm_params, feature_transform_algorithm),
392-
(classifier_params, classifier)
393-
]
394-
solution_index = 0
395-
396-
for i in params_all:
397-
args = dict()
398-
for key in i[0]:
399-
if i[0][key] is not None:
400-
if isinstance(i[0][key].value, MinMax):
401-
val = sol[solution_index] * i[0][key].value.max + i[0][key].value.min
402-
if i[0][key].param_type is np.intc or i[0][key].param_type is np.int or i[0][key].param_type is np.uintc or i[0][key].param_type is np.uint:
403-
val = i[0][key].param_type(np.floor(val))
404-
if val >= i[0][key].value.max:
405-
val = i[0][key].value.max - 1
406-
args[key] = val
407-
else:
408-
args[key] = i[0][key].value[get_bin_index(sol[solution_index], len(i[0][key].value))]
409-
solution_index += 1
410-
if i[1] is not None:
411-
i[1].set_parameters(**args)
412-
413-
selected_features_mask = None
414-
415-
x_train, x_test, y_train, y_test = train_test_split(self.__x, self.__y, test_size=0.2)
416-
417-
if feature_selection_algorithm is None:
418-
selected_features_mask = np.ones(x.shape[1], dtype=bool)
419-
else:
420-
selected_features_mask = feature_selection_algorithm.select_features(x_train, y_train)
421-
422-
x_train = x_train.loc[:, selected_features_mask]
423-
x_test = x_test.loc[:, selected_features_mask]
424-
425-
if feature_transform_algorithm is not None:
426-
feature_transform_algorithm.fit(x_train)
427-
x_train = feature_transform_algorithm.transform(x_train)
428-
x_test = feature_transform_algorithm.transform(x_test)
429-
430-
classifier.fit(x_train, y_train)
431-
predictions = classifier.predict(x_test)
432-
fitness = self.__fitness_function.get_fitness(predictions, y_test) * -1
452+
453+
fitness, selected_features_mask, stats = _PipelineBenchmark.evaluate_pipeline(sol, feature_selection_algorithm, feature_transform_algorithm, classifier, self.__x, self.__y, self.__fitness_function)
433454

434455
if fitness < self.__current_best_fitness:
435456
self.__current_best_fitness = fitness
436457
self.__parent.set_feature_selection_algorithm(feature_selection_algorithm)
437458
self.__parent.set_feature_transform_algorithm(feature_transform_algorithm)
438459
self.__parent.set_classifier(classifier)
439460
self.__parent.set_selected_features_mask(selected_features_mask)
440-
self.__parent.set_stats(OptimizationStats(predictions, y_test))
461+
self.__parent.set_stats(stats)
441462

442463
return fitness
443464
except:

0 commit comments

Comments
 (0)