firefly-cpp
diff --git a/‎README.md‎
Lines changed: 11 additions & 4 deletions b/‎README.md‎
Lines changed: 11 additions & 4 deletions
diff --git a/‎docs/getting_started.rst‎
Lines changed: 7 additions & 1 deletion b/‎docs/getting_started.rst‎
Lines changed: 7 additions & 1 deletion
diff --git a/‎examples/run_pipeline_optimizer_csv_data_v1.py‎
Lines changed: 34 additions & 0 deletions b/‎examples/run_pipeline_optimizer_csv_data_v1.py‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎niaaml/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎niaaml/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎niaaml/pipeline.py‎
Lines changed: 73 additions & 52 deletions b/‎niaaml/pipeline.py‎
Lines changed: 73 additions & 52 deletions
@@ -96,7 +96,7 @@ For a full example see the [Examples section](#examples) or the list of implemen
 
 ## Optimization Process And Parameter Tuning
 
-In NiaAML there are two types of optimization. The goal of the first type is to find an optimal set of components (feature selection algorithm, feature transformation algorithm and classifier). The next step is to find optimal parameters for the selected set of components, and that is the goal of the second type of optimization. Each component has an attribute `_params`, which is a dictionary of parameters and their possible values.
+In the modifier version of NiaAML optimization process there are two types of optimization. The goal of the first type is to find an optimal set of components (feature selection algorithm, feature transformation algorithm and classifier). The next step is to find optimal parameters for the selected set of components, and that is the goal of the second type of optimization. Each component has an attribute `_params`, which is a dictionary of parameters and their possible values.
 
 ```python
 self._params = dict(
@@ -111,6 +111,8 @@ Let's say we have a classifier with 3 parameters, a feature selection algorithm
 
 In some cases we may want to tune a parameter that needs additional information for setting its range of values, so we cannot set the range in the initialization method. In that case, we should set its value in the dictionary to None and define it later in the process. The parameter will be a part of the parameter tuning process as soon as we define its possible values. For example, see [Select K Best Feature Selection](niaaml/preprocessing/feature_selection/select_k_best.py) and its parameter `k`.
 
+**The NiaAML framwork also supports running optimization according to the original method proposed in [[1]](#1) where the components selection and hyperparameter optimization steps are combined into one.**
+
 ## Examples
 
 ### Example of Usage
@@ -135,13 +137,18 @@ pipeline_optimizer = PipelineOptimizer(
     feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
     feature_transform_algorithms=['Normalizer', 'StandardScaler']
 )
-pipeline = pipeline_optimizer.run('Accuracy', 20, 20, 400, 400, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
+
+# run the modified version of optimization
+pipeline1 = pipeline_optimizer.run('Accuracy', 15, 15, 300, 300, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
+
+# run the original version
+pipeline2 = pipeline_optimizer.run_v1('Accuracy', 15, 400, 'ParticleSwarmAlgorithm')
 ```
 
 You can save a result of the optimization process as an object to a file for later use.
 
 ```python
-pipeline.export('pipeline.ppln')
+pipeline1.export('pipeline.ppln')
 ```
 
 And also load it from a file and use the pipeline.
@@ -157,7 +164,7 @@ y = loaded_pipeline.run(x)
 You can also save a user-friendly representation of a pipeline to a text file.
 
 ```python
-pipeline.export_text('pipeline.txt')
+pipeline1.export_text('pipeline.txt')
 ```
 
 This is a very simple example with dummy data. It is only intended to give you a basic idea of how to use the framework.
 
@@ -31,7 +31,7 @@ Create a new file, with name, for example *my_first_pipeline.py* and paste in th
         feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
         feature_transform_algorithms=['Normalizer', 'StandardScaler']
     )
-    pipeline = pipeline_optimizer.run('Accuracy', 20, 20, 400, 400, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
+    pipeline = pipeline_optimizer.run('Accuracy', 15, 15, 300, 300, 'ParticleSwarmAlgorithm', 'ParticleSwarmAlgorithm')
 
 **As you can see, pipeline components, fitness function and optimization algorithms are always passed into pipeline optimization using their class names.** The example below uses the Particle Swarm Algorithm as the optimization algorithm. You can find a list of all available algorithms in the `NiaPy's documentation <https://niapy.readthedocs.io/en/stable/>`_.
 Now you can run it using the command ``python my_first_pipeline.py``. The code currently does not do much, but we can save our pipeline to a file so we can use it later or save a user-friendly representation of it to a text file. You can choose one or both of the scenarios by adding the code below.
@@ -54,6 +54,12 @@ If you want to load and use the saved pipeline later, you can use the following
     x = pandas.DataFrame([[0.35, 0.46, 5.32], [0.16, 0.55, 12.5]])
     y = loaded_pipeline.run(x)
 
+**The framework also supports the original version of optimization process where the components selection and hyperparameter optimization steps are combined into one. You can replace the ``run`` method with the following code.**
+
+.. code:: python
+    
+    pipeline = pipeline_optimizer.run_v1('Accuracy', 15, 400, 'ParticleSwarmAlgorithm')
+
 This is a very simple example with dummy data. It is only intended to give you a basic idea on how to use the framework. **NiaAML supports numerical and categorical features.**
 
 Find more examples `here <https://github.com/lukapecnik/NiaAML/tree/master/examples>`_
 
@@ -0,0 +1,34 @@
+import os
+from niaaml import PipelineOptimizer, Pipeline
+from niaaml.data import CSVDataReader
+
+"""
+This example presents how to use the PipelineOptimizer class to run the original optimization process according to the paper where NiaAML is proposed.
+This example is using an instance of CSVDataReader.
+The instantiated PipelineOptimizer will try and assemble the best pipeline with the components that are specified in its constructor.
+"""
+
+# prepare data reader using csv file
+data_reader = CSVDataReader(src=os.path.dirname(os.path.abspath(__file__)) + '/example_files/dataset.csv', has_header=False, contains_classes=True)
+
+# instantiate PipelineOptimizer that chooses among specified classifiers, feature selection algorithms and feature transform algorithms
+# log is True by default, log_verbose means more information if True, log_output_file is the destination of a log file
+# if log_output_file is not provided there is no file created
+# if log is False, logging is turned off
+pipeline_optimizer = PipelineOptimizer(
+    data=data_reader,
+    classifiers=['AdaBoost', 'Bagging', 'MultiLayerPerceptron', 'RandomForest', 'ExtremelyRandomizedTrees', 'LinearSVC'],
+    feature_selection_algorithms=['SelectKBest', 'SelectPercentile', 'ParticleSwarmOptimization', 'VarianceThreshold'],
+    feature_transform_algorithms=['Normalizer', 'StandardScaler'],
+    log=True,
+    log_verbose=True,
+    log_output_file='output.log'
+)
+
+# runs the optimization process
+# one of the possible pipelines in this case is: SelectPercentile -> Normalizer -> RandomForest
+# returns the best found pipeline
+# the chosen fitness function and optimization algorithm are Accuracy and Particle Swarm Algorithm
+pipeline = pipeline_optimizer.run_v1('Accuracy', 10, 30, 'ParticleSwarmAlgorithm')
+
+# pipeline variable contains Pipeline object that can be used for further classification, exported as an object (that can be later loaded and used) or exported as text file
@@ -29,4 +29,4 @@
 ]
 
 __project__ = 'niaaml'
-__version__ = '1.1.0'
+__version__ = '1.1.1rc1'
@@ -13,7 +13,8 @@
 import os
 
 __all__ = [
-    'Pipeline'
+    'Pipeline',
+    '_PipelineBenchmark'
 ]
 
 class Pipeline:
@@ -355,6 +356,74 @@ def __init__(self, x, y, parent, population_size, fitness_function):
         self.__evals = 0
         self.__logger = self.__parent.get_logger()
         Benchmark.__init__(self, 0.0, 1.0)
+
+    @staticmethod
+    def evaluate_pipeline(solution_vector, feature_selection_algorithm, feature_transform_algorithm, classifier, x, y, fitness_function):
+        """Evaluate pipeline setup.
+
+        Arguments:
+            solution_vector (numpy.ndarray[float]): Individual of population/ possible solution to map hyperparameters from.
+            feature_selection_algorithm (Optional[FeatureSelectionAlgorithm]): Feature selection algorithm instance.
+            feature_transform_algorithm (Optional[FeatureTransformAlgorithm]): Feature transform algorithm instance.
+            classifier (Classifier): Classifier instance.
+            x (pandas.core.frame.DataFrame): n samples to classify.
+            y (pandas.core.series.Series): n classes of the samples in the x array.
+            fitness_function (FitnessFunction): Fitness function instance.
+        
+        Returns:
+            Tuple[float, numpy.array[bool], OptimizationStats]:
+                1. Fitness.
+                2. Feature selection mask.
+                3. Optimization statistics.
+        """
+        feature_selection_algorithm_params = feature_selection_algorithm.get_params_dict() if feature_selection_algorithm else dict()
+        feature_transform_algorithm_params = feature_transform_algorithm.get_params_dict() if feature_transform_algorithm else dict()
+        classifier_params = classifier.get_params_dict()
+
+        params_all = [
+            (feature_selection_algorithm_params, feature_selection_algorithm),
+            (feature_transform_algorithm_params, feature_transform_algorithm),
+            (classifier_params, classifier)
+        ]
+        solution_index = 0
+        
+        for i in params_all:
+            args = dict()
+            for key in i[0]:
+                if i[0][key] is not None:
+                    if isinstance(i[0][key].value, MinMax):
+                        val = solution_vector[solution_index] * i[0][key].value.max + i[0][key].value.min
+                        if i[0][key].param_type is np.intc or i[0][key].param_type is np.int or i[0][key].param_type is np.uintc or i[0][key].param_type is np.uint:
+                            val = i[0][key].param_type(np.floor(val))
+                            if val >= i[0][key].value.max:
+                                val = i[0][key].value.max - 1
+                        args[key] = val
+                    else:
+                        args[key] = i[0][key].value[get_bin_index(solution_vector[solution_index], len(i[0][key].value))]
+                solution_index += 1
+            if i[1] is not None:
+                i[1].set_parameters(**args)
+        
+        selected_features_mask = None
+
+        x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
+
+        if feature_selection_algorithm is None:
+            selected_features_mask = np.ones(x.shape[1], dtype=bool)
+        else:
+            selected_features_mask = feature_selection_algorithm.select_features(x_train, y_train)
+
+        x_train = x_train.loc[:, selected_features_mask]
+        x_test = x_test.loc[:, selected_features_mask]
+
+        if feature_transform_algorithm is not None:
+            feature_transform_algorithm.fit(x_train)
+            x_train = feature_transform_algorithm.transform(x_train)
+            x_test = feature_transform_algorithm.transform(x_test)
+        
+        classifier.fit(x_train, y_train)
+        predictions = classifier.predict(x_test)
+        return fitness_function.get_fitness(predictions, y_test) * -1, selected_features_mask, OptimizationStats(predictions, y_test)
 
     def function(self):
         r"""Override Benchmark function.
@@ -380,64 +449,16 @@ def evaluate(D, sol):
                 feature_selection_algorithm = self.__parent.get_feature_selection_algorithm()
                 feature_transform_algorithm = self.__parent.get_feature_transform_algorithm()
                 classifier = self.__parent.get_classifier()
-                selected_features_mask = None
-
-                feature_selection_algorithm_params = feature_selection_algorithm.get_params_dict() if feature_selection_algorithm else dict()
-                feature_transform_algorithm_params = feature_transform_algorithm.get_params_dict() if feature_transform_algorithm else dict()
-                classifier_params = classifier.get_params_dict()
-
-                params_all = [
-                    (feature_selection_algorithm_params, feature_selection_algorithm),
-                    (feature_transform_algorithm_params, feature_transform_algorithm),
-                    (classifier_params, classifier)
-                ]
-                solution_index = 0
-                
-                for i in params_all:
-                    args = dict()
-                    for key in i[0]:
-                        if i[0][key] is not None:
-                            if isinstance(i[0][key].value, MinMax):
-                                val = sol[solution_index] * i[0][key].value.max + i[0][key].value.min
-                                if i[0][key].param_type is np.intc or i[0][key].param_type is np.int or i[0][key].param_type is np.uintc or i[0][key].param_type is np.uint:
-                                    val = i[0][key].param_type(np.floor(val))
-                                    if val >= i[0][key].value.max:
-                                        val = i[0][key].value.max - 1
-                                args[key] = val
-                            else:
-                                args[key] = i[0][key].value[get_bin_index(sol[solution_index], len(i[0][key].value))]
-                        solution_index += 1
-                    if i[1] is not None:
-                        i[1].set_parameters(**args)
-                
-                selected_features_mask = None
-
-                x_train, x_test, y_train, y_test = train_test_split(self.__x, self.__y, test_size=0.2)
-
-                if feature_selection_algorithm is None:
-                    selected_features_mask = np.ones(x.shape[1], dtype=bool)
-                else:
-                    selected_features_mask = feature_selection_algorithm.select_features(x_train, y_train)
-
-                x_train = x_train.loc[:, selected_features_mask]
-                x_test = x_test.loc[:, selected_features_mask]
-
-                if feature_transform_algorithm is not None:
-                    feature_transform_algorithm.fit(x_train)
-                    x_train = feature_transform_algorithm.transform(x_train)
-                    x_test = feature_transform_algorithm.transform(x_test)
-                
-                classifier.fit(x_train, y_train)
-                predictions = classifier.predict(x_test)
-                fitness = self.__fitness_function.get_fitness(predictions, y_test) * -1
+
+                fitness, selected_features_mask, stats = _PipelineBenchmark.evaluate_pipeline(sol, feature_selection_algorithm, feature_transform_algorithm, classifier, self.__x, self.__y, self.__fitness_function)
 
                 if fitness < self.__current_best_fitness:
                     self.__current_best_fitness = fitness
                     self.__parent.set_feature_selection_algorithm(feature_selection_algorithm)
                     self.__parent.set_feature_transform_algorithm(feature_transform_algorithm)
                     self.__parent.set_classifier(classifier)
                     self.__parent.set_selected_features_mask(selected_features_mask)
-                    self.__parent.set_stats(OptimizationStats(predictions, y_test))
+                    self.__parent.set_stats(stats)
 
                 return fitness
             except:
Original file line number	Diff line number	Diff line change
`@@ -29,4 +29,4 @@`
`29`	`29`	`]`
`30`	`30`
`31`	`31`	`__project__ = 'niaaml'`
`32`		`-__version__ = '1.1.0'`
	`32`	`+__version__ = '1.1.1rc1'`