02-05-2021



  1. Atom Git Integration
  2. Atom Git

Installing Atom on Windows. Atom is available with Windows installers that can be downloaded from or from the Atom releases page.Use AtomSetup.exe for 32-bit systems and AtomSetup-x64.exe for 64-bit systems. This setup program will install Atom, add the atom and apm commands to your PATH, and create shortcuts on the desktop and in the start menu. Git Plus allows you to work with Git without leaving Atom editor. Within the editor you are able to make a Git commit, checkout, push/pull, diff and other git commands. You need to setup your user.name and user.email on your git config file to make all functions work. Work with Git and GitHub directly from Atom with the GitHub package. Create new branches, stage and commit, push and pull, resolve merge conflicts, view pull requests and moreā€”all from within your editor.


A Python package for fast exploration of machine learning pipelines


During the exploration phase of a machine learning project, a datascientist tries to find the optimal pipeline for his specific use case.This usually involves applying standard data cleaning steps, creatingor selecting useful features, trying out different models, etc. Testingmultiple pipelines requires many lines of code, and writing it all inthe same notebook often makes it long and cluttered. On the other hand,using multiple notebooks makes it harder to compare the results and tokeep an overview. On top of that, refactoring the code for every testcan be quite time-consuming. How many times have you conducted the same actionto pre-process a raw dataset? How many times have you copy-and-pastedcode from an old repository to re-use it in a new use case?

ATOM is here to help solve these common issues. The package acts asa wrapper of the whole machine learning pipeline, helping the datascientist to rapidly find a good model for his problem. Avoidendless imports and documentation lookups. Avoid rewriting the samecode over and over again. With just a few lines of code, it's nowpossible to perform basic data cleaning steps, select relevantfeatures and compare the performance of multiple models on a givendataset, providing quick insights on which pipeline performs bestfor the task at hand.

Example steps taken by ATOM's pipeline:

  1. Data Cleaning
    • Handle missing values
    • Encode categorical features
    • Detect and remove outliers
    • Balance the training set
  2. Feature engineering
    • Create new non-linear features
    • Remove multi-collinear features
    • Remove features with too low variance
    • Select the most promising features
  3. Train and validate multiple models
    • Select hyperparameters using a Bayesian Optimization approach
    • Train and test the models on the provided data
    • Assess the robustness of the output using a bagging algorithm
  4. Analyze the results
    • Get the model scores on various metrics
    • Make plots to compare the model performances


Figure 1. Diagram of the possible steps taken by ATOM.


Version 4.4.0

  • The drop method now allows the user to drop columns as part of the pipeline.
  • It is now possible to add data transformations as function to the pipeline through the apply method.
  • Added the status method to save an overview of atom's branches and models to the logger.
  • Improved the output messages for the Imputer class.
  • The dataset's columns can now be called directly from atom.
  • The distribution and plot_distribution methods now ignore missing values instead of raising an exception.
  • Fixed a bug where transformations failed when columns were added after initializing the pipeline.
  • Fixed a bug where the Cleaner class didn't drop columns with only missing values for minimum_cardinality=True.
  • Fixed a bug where the winning model wasn't displayed correctly.
  • Refactored the way transformers are added or removed from predicting methods.
  • Improved documentation.

Version 4.3.0

  • Possibility to add custom transformers to the pipeline.
  • The export_pipeline utility method exports atom's current pipeline to a sklearn object.
  • Use AutoML to automate the search for an optimized pipeline.
  • New magic methods makes atom behave similarly to sklearn's Pipeline.
  • All training approaches can now be combined in the same atom instance.
  • New plot_scatter_matrix, plot_distribution and plot_qq for data inspection.
  • Complete rework of all the shap plots to be consistent with their new API.
  • Improvements for the Scaler and Pruner classes.
  • The acronym for custom models now defaults to the capital letters in the class' __name__.
  • Possibility to apply transformations on only a subset of the columns.
  • Plots and methods now accept winner as model name.
  • Fixed a bug where custom metrics didn't show the correct name.
  • Fixed a bug where timers were not displayed correctly.
  • Further compatibility with deep learning datasets.
  • Large refactoring for performance optimization.
  • Cleaner output of messages to the logger.
  • Plots no longer show a default title.
  • Added the AutoML example notebook.
  • Minor bug fixes.

Version 4.2.1

  • Bug fix where there was memory leakage in successive halving and train sizing pipelines.
  • The XGBoost, LightGBM and CatBoost packages can now be installed through the installer's extras_require under the name models, e.g. pip install -U atom-ml[models].
  • Improved documentation.

Version 4.2.0

  • Possibility to add custom models to the pipeline using ATOMModel.
  • Compatibility with deep learning models.
  • New branch system for different data pipelines. Read more in the user guide.
  • Use the canvas contextmanager to draw multiple plots in one figure.
  • New voting and stacking ensemble techniques.
  • New get_class_weight utility method.
  • New Sequential Feature Selection strategy for the FeatureSelector.
  • Added the sample_weight parameter to the score method.
  • New ways to initialize the data in the training instances.
  • The n_rows parameter in ATOMLoader is deprecated in favour of the new data input formats.
  • The test_size parameter now also allows integer values.
  • Renamed categories to classes to be consistent with sklearn's API.
  • The class property now returns a pd.DataFrame of the number of rows per target class in the train, test and complete dataset.
  • Possibility to add custom parameters to an estimator's fit method through est_params.
  • Successive halving and train sizing now both allow subsequent runs from atom without losing previous information.
  • Bug fix where ATOMLoader wouldn't encode the target column during transformation.
  • Added the Deep learning, Ensembles and Utilities example notebooks.
  • Compatibility with python 3.9.

Version 4.1.0

  • Added the est_params parameter to customize the parameters passed to every model's estimator.
  • Following skopt's API, the n_random_starts parameter is deprecated in favour of n_initial_points.
  • The Balancer class now allows you to use any of the strategies from imblearn.
  • New utility attributes to inspect the dataset.
  • Four new models: CatNB, CNB, ARD and RNN.
  • Added the models section to the documentation.
  • Small changes in log outputs.
  • Bug fixes and performance improvements.

Version 4.0.1

  • Bug fix where the DFS strategy in FeatureGenerator was not deterministic for a fixed random state.
  • Bug fix where subsequent runs with the same metric failed.
  • Added the license file to the package's installer.
  • Typo fixes in documentation.

Version 4.0.0

  • Bayesian optimization package changed from GpyOpt to skopt.
  • Complete revision of the model's hyperparameters.
  • Four SHAP plots can now be called directly from an ATOM pipeline.
  • Two new plots for regression tasks.
  • New plot_pipeline and pipeline attribute to access all transformers.
  • Possibility to determine transformer parameters per method.
  • New calibration method and plot.
  • Metrics can now be added as scorers or functions with signature metric(y, y_pred, **kwargs).
  • Implementation of multi-metric runs.
  • Possibility to choose which metric to plot.
  • Early stopping for models that allow in-training evaluation.
  • Added the ATOMLoader function to load saved atom instances and directly apply all data transformations.
  • The 'remove' strategy in the data cleaning parameters is deprecated in favour of 'drop'.
  • Implemented the DFS strategy in FeatureGenerator.
  • All training classes now inherit from BaseEstimator.
  • Added multiple new example notebooks.
  • Tests coverage up to 100%.
  • Completely new documentation page.
  • Bug fixes and performance improvements.


  • API
    • ATOM
    • Data cleaning
    • Feature engineering
    • Training
      • Direct
      • SuccessiveHalving
      • TrainSizing
    • Models
    • Predicting
    • Plots
  • Examples

Atom Git Integration

Helpers for working with Git repositories built natively on top oflibgit2.

Atom Git

Installing

Building

  • Clone the repository with the --recurse option to get the libgit2submodule
  • Run npm install
  • Run grunt to compile the native and CoffeeScript code
  • Run grunt test to run the specs

Docs

git.open(path)

Open the repository at the given path. This will return null if therepository at the given path does not exist or cannot be opened.

The opened repository will have a submodules property that will be an objectof paths mapped to submodule {Repository} objects. The path keys will berelative to the opened repository's working directory.

Repository.checkoutHead(path)

Restore the contents of a path in the working directory and index to theversion at HEAD. Similar to running git reset HEAD -- <path> and then agit checkout HEAD -- <path>.

path - The string repository-relative path to checkout.

Returns true if the checkout was successful, false otherwise.

Repository.checkoutReference(reference, [create])

Checks out a branch in your repository.

reference - The string reference to checkoutcreate - A Boolean value which, if true creates the new reference if it doesn't exist.

Returns true if the checkout was successful, false otherwise.

Repository.getAheadBehindCount(branch)

Get the number of commits the branch is ahead/behind the remote branch itis tracking. Similar to the commit numbers reported by git status when aremote tracking branch exists.

branch - The branch name to lookup ahead/behind counts for. (default: HEAD)

Returns an object with ahead and behind keys pointing to integer valuesthat will always be >= 0.

Repository.getCommitCount(fromCommit, toCommit)

Get the number of commits between fromCommit and toCommit.

fromCommit - The string commit SHA-1 to start the rev walk at.

toCommit - The string commit SHA-1 to end the rev walk at.

Returns the number of commits between the two, always >= 0.

Repository.getConfigValue(key)

Get the config value of the given key.

key - The string key to retrieve the value for.

Returns the configuration value, may be null.

Repository.setConfigValue(key, value)

Get the config value of the given key.

key - The string key to set in the config.

Atom Git

value - The string value to set in the config for the given key.

Returns true if setting the config value was successful, false otherwise.

Repository.getDiffStats(path)

Get the number of lines added and removed comparing the working directorycontents of the given path to the HEAD version of the given path.

path - The string repository-relative path to diff.

Returns an object with added and deleted keys pointing to integer valuesthat always be >= 0.

Repository.getHeadBlob(path)

Get the blob contents of the given path at HEAD. Similar togit show HEAD:<path>.

path - The string repository-relative path.

Returns the string contents of the HEAD version of the path.

Repository.getHead()

Get the reference or SHA-1 that HEAD points to such as refs/heads/masteror a full SHA-1 if the repository is in a detached HEAD state.

Returns the string reference name or SHA-1.

Repository.getIndexBlob(path)

Get the blob contents of the given path in the index. Similar togit show :<path>.

path - The string repository-relative path.

Returns the string contents of the index version of the path.

Repository.getLineDiffs(path, text, [options])

Get the line diffs comparing the HEAD version of the given path and the giventext.

path - The string repository-relative path.

text - The string text to diff the HEAD contents of the path against.

options - An optional object with the following keys:

  • ignoreEolWhitespace - true to ignore any whitespace diffs at the end oflines.
  • useIndex - true to compare against the index version instead of the HEADversion.

Returns an array of objects that have oldStart, oldLines, newStart, andnewLines keys pointing to integer values, may be null if the diff fails.

Repository.getMergeBase(commit1, commit2)

Get the merge base of two commits.

commit1 - The string SHA-1 of the first commit.

commit2 - The string SHA-1 of the second commit.

Returns the string SHA-1 of the merge base of commit1 and commit2 or nullif there isn't one.

Repository.getPath()

Get the path of the repository.

Returns the string absolute path of the opened repository.

Repository.getReferences()

Gets all the local and remote references.

Returns an object with three keys: heads, remotes, and tags.Each key can be an array of strings containing the reference names.

Repository.getReferenceTarget(ref)

Get the target of the given reference.

ref - The string reference.

Returns the string target of the given reference.

Repository.getShortHead()

Get a possibly shortened version of value returns by getHead(). This willremove leading segments of refs/heads, refs/tags, or refs/remotes and willalso shorten the SHA-1 of a detached HEAD to 7 characters.

Returns a string shortened reference name or SHA-1.

Repository.getStatus([path])

Get the status of a single path or all paths in the repository. This will notinclude ignored paths.

path - An optional repository-relative path to limit the status reporting to.

Returns an integer status number if a path is specified and returns an objectwith path keys and integer status values if no path is specified.

Repository.getUpstreamBranch([branch])

Get the upstream branch of the given branch.

branch - The branch to find the upstream branch of (default: HEAD)

Returns the string upstream branch reference name.

Repository.getWorkingDirectory()

Get the working directory of the repository.

Returns the string absolute path to the repository's working directory.

Repository.isIgnored(path)

Get the ignored status of a given path.

path - The string repository-relative path.

Returns true if the path is ignored, false otherwise.

Repository.isPathModified(path)

Get the modified status of a given path.

path - The string repository-relative path.

Returns true if the path is modified, false otherwise.

Repository.isPathNew(path)

Get the new status of a given path.

path - The string repository-relative path.

Returns true if the path is new, false otherwise.

Repository.isPathDeleted(path)

Get the deleted status of a given path.

path - The string repository-relative path.

Returns true if the path is deleted, false otherwise.

Repository.isStatusIgnored(status)

Check if a status value represents an ignored path.

status - The integer status value.

Returns true if the status is a ignored one, false otherwise.

Repository.isStatusModified(status)

Check if a status value represents a modified path.

status - The integer status value.

Returns true if the status is a modified one, false otherwise.

Repository.isStatusNew(status)

Check if a status value represents a new path.

status - The integer status value.

Returns true if the status is a new one, false otherwise.

Repository.isStatusDeleted(status)

Check if a status value represents a deleted path.

status - The integer status value.

Returns true if the status is a deleted one, false otherwise.

Repository.isSubmodule(path)

Check if the path is a submodule in the index.

path - The string repository-relative path.

Returns true if the path is a submodule, false otherwise.

Repository.refreshIndex()

Reread the index to update any values that have changed since the last time theindex was read.

Repository.relativize(path)

Relativize the given path to the repository's working directory.

path - The string path to relativize.

Returns a repository-relative path if the given path is prefixed with therepository's working directory path.

Repository.isWorkingDirectory(path)

Is the given path the repository's working directory?

It is better to call this method than comparing a path directly againstthe value of getWorkingDirectory() since this method handles slashnormalization on Windows, case insensitive filesystems, and symlinkedrepositories.

path - The string path to check.

Returns true if the given path is the repository's working directory,false otherwise.

Repository.release()

Release the repository and close all file handles it has open. No other methodscan be called on the Repository object once it has been released.

Repository.submoduleForPath(path)

Get the repository for the submodule that the path is located in.

path - The absolute or repository-relative string path.

Returns a Repository or null if the path isn't in a submodule.

Repository.add(path)

Stage the changes in path into the repository's index. Clear any conflict stateassociated with path.

path - A repository-relative string path.

Raises an Error if the path isn't readable or if another exception occurs.