Instructions for project development

!wget --no-cache -O init.py -q https://raw.githubusercontent.com/jdariasl/OTBD/main/content/init.py
import init; init.init(force_download=False)

from local.lib.project import project_sel
import scipy as sc

Instructions for project development#

Set the Name variable in the following cell with your full name and run the cell. You will be given a specific dataset and model to train.
You must chose three algorithms to solve the problem (one for each of the following items):
- Accelerated gradient/subgradient descent; proximal grandient descent
- Conjugate gradient descend; BFGS; Block coordinate descent assuming, at least, 2 nodes
- Distributed ADMM assuming, at least, 2 nodes. To find the solution for the \(\bf{x}\) term (see slides 83 and 84 of Ch 7.) in every iteration of the ADMM you must compare a) An exact (when possible) or gradient-based solution and b) The same algorithm chosen in the previous item. You must set the number of iterations for the inner loop of the gradient-based solutions (GD, CGD, BCD).

Note that not all the algorithms are suitable for all the problems, so the selection must be done carefully.

#Set name variable with your full name.
Name = 'Julián David Arias Londoño';
X,Y = project_sel(Name)

Use the data set of the classification problem given to you to train a Logistic regression with L1 regularization model.

The global number of iterations for each algorithm is 1000.

Use an initial regularization parameter of 10 for the regularization terms and 10 for the rho quadratic term in ADMM.

Note that the values provided for the number of iterations, regularization parameter and rho are orientative, and the student must find a good hyperparameter set that solves adequatelly its problem. You can also use Bayesian optimization to find the best hyperparameters.

The student must provide this python notebook adding the following information:

A description of the problem and the dataset used.
A description of each algorithm chosen.
The code commented.
The results obtained and comment these results, including the advantages and disadvantages of each algorithm used. Use as many graphs and block diagrams as needed.

A fully executable python notebook, as well as its pdf version, must be uploaded to Moodle before the final exam.

The evaluation will be done according to the following criteria:

Project accomplishment (40%).
Correctness of the results (15%).
Quality of the results discussions (15%).
Correctness of the code (25%).
Quality of the comments (5%).

n,d = X.shape

#data normalisation
X = sc.stats.zscore(X)

print(f"Numer of samples {n}; number of features {d}")

Numer of samples 500; number of features 8