{"id":803617,"date":"2021-12-16T14:47:39","date_gmt":"2021-12-16T22:47:39","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&#038;p=803617"},"modified":"2021-12-16T14:47:39","modified_gmt":"2021-12-16T22:47:39","slug":"pymarlin-a-lightweight-library-that-improves-deep-learning-training-agility","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/pymarlin-a-lightweight-library-that-improves-deep-learning-training-agility\/","title":{"rendered":"PyMarlin: A lightweight library that improves deep learning training agility"},"content":{"rendered":"<p><strong>By Amin Saied, Ananth Rao, Ashwin Srinivasan, Damien Jose, Eduardo Gonzalez, Han Yu, Jon Sleep, Krishan Subudhi, Shruti Gullapuram<\/strong><\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/PyMarlin\">PyMarlin<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0is a lightweight\u00a0<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/pytorch.org\/\">PyTorch<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u00a0extension library for agile experimentation.\u00a0It was designed with the goal of simplifying the end-to-end deep learning experimentation lifecycle, agnostic of the compute environment.\u00a0In July 2021, the PyMarlin team open-sourced their internal model training library to all PyTorch users. PyMarlin abstracts out all the boilerplate code for scaling, logging, and argument parsing that are crucial for training deep learning-based models. PyMarlin can be thought of as a high-level abstraction over PyTorch. We have created a five-minute \u201c<a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/getting-started#start-exploring\">Getting Started<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>\u201d module for anyone interested in trying out PyMarlin. Today we\u2019ll look at how PyMarlin works, how it supports extensibility, and next steps needed to advance its functionality further.<\/p>\n<h3>How the typical deep learning training lifecycle works<\/h3>\n<div id=\"attachment_804235\" style=\"width: 660px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-804235\" class=\"wp-image-804235\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture1-1024x476.png\" alt=\"Typical deep learning training steps\" width=\"650\" height=\"302\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture1-1024x476.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture1-300x140.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture1-768x357.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture1-1536x715.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture1-2048x953.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture1-240x112.png 240w\" sizes=\"(max-width: 650px) 100vw, 650px\" \/><p id=\"caption-attachment-804235\" class=\"wp-caption-text\">Figure 1: Typical deep learning training steps<\/p><\/div>\n<p>These three steps (and their sub-steps) are the backbone of any typical deep learning model training lifecycle. But this process also involves writing a lot of code and testing. Since scientists and researchers focus mostly on the model training part, they generally write other components without following any design pattern. This makes the training code difficult to extend.<\/p>\n<p>For example, let\u2019s say a researcher has written a code for text summarization, including all the code necessary for scaling and logging. A fellow researcher wants to try out a new optimizer. Another colleague wants to experiment with new evaluation metrics and loss functions. And yet another scientist wants to use the same recipe but on different data. In this case, all the stakeholders make separate copies of the code and make their own modifications. But then, suppose the original researcher changes the encoder and decoder architecture and comes up with a better original model. Other stakeholders may have to change their ML code. What a waste of everyone\u2019s time!<\/p>\n<p>While speeding up the training using Distributed Data Parallel (DDP) and Mixed Precision, bugs can be introduced, too. For example, by using multiple GPUs and multiple nodes, batch size per GPU must be reduced to maintain the same global batch size. This can involve manual and erroneous calculation of minibatch size or number of gradient accumulation steps. Additionally, during the validation step, the outputs from multiple GPUs need to be gathered to calculate evaluation metrics accurately. \u00a0Finally, adding an optimization, such as disabling all reduce during gradient accumulation, can speed up the model. In mixed precision training using PyTorch\u2019s native amp module, gradients must be unscaled before they can be clipped.<\/p>\n<p>There are many open source libraries that provide functionality similar to PyMarlin. In fact, some of them have extra features which can come in quite handy. The Hugging face trainer supports other logging frameworks like wandb, but it is not model agnostic. However, PyMarlin offers unique benefits. We focused on keeping the code simple and easily readable. PyMarlin is not designed to be a black box. Power users will be able to understand PyMarlin\u2019s code and extend it as necessary.<\/p>\n<h3>PyMarlin at a glance<\/h3>\n<p><b>A brief look at the architecture<a href=\"#fn1\" name=\"_ftnref1\"><sup>[1]<\/sup><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/b><\/p>\n<p>PyMarlin has four core components: DataProcessor and DataInterface, Module Interface, Trainer Backend, and Trainer. First, we\u2019ll look at the DataProcessor and <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/reference\/core\/data_interface\/\">DataInterface<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The role of DataProcessor and DataInterface is to decouple data processing and dataset building from model training. The DataProcessor processes and optionally analyzes data. Users can have multiple data processors which can be chained together. DataInterface has abstract methods which the ModuleInterface calls to obtain train and validation datasets during training.<\/p>\n<p>The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/reference\/core\/module_interface\">Module Interface<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is where scientists and researchers write their training code. This module can be thought of as the implementation of the training recipe. Module Interface inherits from nn.Module and hence can be treated like any PyTorch module.<\/p>\n<p>The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/reference\/core\/trainer_backend\">Trainer Backend<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is responsible for training\/validating the Module Interface for one entire epoch. PyMarlin offers various useful backend implementations, such as SingleProcess, SingleProcessAmp, and DDPTrainerBackend.<\/p>\n<p>Finally, the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/reference\/core\/trainer\/\">Trainer<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> serves as the bridge between the Trainer Backend and Module Interface: it takes care of device management, rank fetching, checkpointing, and reloading. It also handles all the calculations for mini batch size, gradient accumulation, number of remaining epochs, initializing stats writers like tensor board, and restarting training from previous state.<\/p>\n<div id=\"attachment_804682\" style=\"width: 524px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-804682\" class=\"size-full wp-image-804682\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/PyMarlinSteps.png\" alt=\"PyMarlin Steps\" width=\"514\" height=\"345\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/PyMarlinSteps.png 514w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/PyMarlinSteps-300x201.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/PyMarlinSteps-240x161.png 240w\" sizes=\"(max-width: 514px) 100vw, 514px\" \/><p id=\"caption-attachment-804682\" class=\"wp-caption-text\">Figure 2: Steps to follow while writing code using PyMarlin<\/p><\/div>\n<h3>A PyMarlin Deep Dive<\/h3>\n<p>Although PyMarlin has four core components, it has additional features to assist coders. We\u2019ll first explore the core components in greater depth, then look at the supporting features.<\/p>\n<ol>\n<li><strong>DataProcessor and DataInterface<\/strong><br \/>\nThe DataProcessor modules aim to support most large-scale preprocessing requirements. DataProcessor can be seen as a single step of processing, such as reading files. Multiple DataProcessors can be used sequentially, each covering a preprocessing step. Once the business logic is added in process() function, inbuilt multiprocessing support can be easily leveraged.The business logic in the DataProcessor\u2019s process function can be invoked either on a single compute, locally, or even as a distributed job across nodes. It comes with built-in support for AML. It also allows for selective preprocessing. For example, with a large dataset you could decide how many parts it should be split into and choose the part to be processed at a time on a single node.<\/p>\n<p><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/examples\/datamodule-example\">This<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/examples\/datamodule-example\">example<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> covers a pre-processing example with raw Wikipedia data. In that example, which splits sentences for 27 Wikipedia raw text files, we see the following time savings:<\/p>\n<ul type=\"none\">\n<li>Single node, without multi-processing: 2.5 hours<\/li>\n<li>Single node, with multi-processing: 20 minutes<\/li>\n<li>Multi-node (4), with multi process: 13 minutes<\/li>\n<\/ul>\n<\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/reference\/core\/module_interface\/\"><strong>ModuleInterface<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><br \/>\nThis interface contains model architecture in the form of a PyTorch nn.Module together with optimizers and schedulers, train and validation step recipes, and any callbacks. Scientists need to implement the abstract functions to create a training recipe. This recipe can be further extended, too. In general, ModuleInterface takes a DataInterface instance as input. DataInterface is called upon to return the datasets which ModuleInterface uses to create DataLoaders. The forward function is overridden and replaced with two functions &#8212; train_step and val_step &#8212; to differentiate training and validation loop code. ModuleInterface also inherits from CallBackInterface. Users can optionally override callbacks like on_end_val_epoch() to calculate metrics. We created an <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/PyMarlin\/blob\/main\/examples\/cifar_image_classification\/cifar.py\">example ModuleInterface<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> for further reference.<\/li>\n<li><strong>TrainerBackend<\/strong><br \/>\nIn PyMarlin, we\u2019ve made distributed (DDP specifically) training, as well as FP16 training, easy by implementing them as backend trainers. You can use them by setting the trainer backend string (this can also be done similarly by setting trainer.backend in the YAML config) as follows:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ntrainer\u00a0=\u00a0Trainer(module_interface=MyModuleInterface(\u2026), backend=\u201dddp-amp\u201d)\r\ntrain()\r\n<\/pre>\n<p>Behind the scenes, many things are happening, such as loss scaling for FP16 and distributed output collection that would otherwise require dozens of lines of boilerplate code copy-pasted from scenario to scenario. Each trainer backend file is separated for modularity. Having multiple trainer backends makes the code extendable and clutter-free. Currently we support the following backends:<\/p>\n<ol type=\"a\">\n<li>SingleProcess (sp)<br \/>\nTo train in single cpu or gpu<\/li>\n<li>SingleProcessAmp (sp-amp)<br \/>\nTo train using single gpu and mixed precision. Recommended for V100 or A100 GPUs.<\/li>\n<li>SingleProcessApexAmp (sp-amp-apex)<br \/>\nSame as SingleProcessAmp but uses nvidia apex library instead of native amp.<\/li>\n<li>DDPTrainerBackend<br \/>\nA decorator that can convert any of the other backends to work in distributed data parallel setting.<br \/>\nbackend = DDPTrainerBackend(SingleProcessAmp())<\/li>\n<\/ol>\n<p>More information can be found in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/reference\/core\/trainer_backend\">documentation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/li>\n<li><strong>Trainer<\/strong><br \/>\nThe Trainer is responsible for coordinating the model definition (ModuleInterface) and the TrainerBackend, connecting the high-level model recipe with the backend on which it will be trained. The Trainer can scale to multiple processes. It automatically handles fetching ranks to scale PyMarlin training from one GPU to multiple GPUs using torch.distributed.launch, or AzureML MPI from the environment variables, and passes it to the backend as shown in Figure 3.Other frameworks can also be integrated easily. For example, if users are spawning multiple processes using a custom script or a framework other than Azure ML, they can write a function to fetch the ranks and create an instance of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/PyMarlin\/blob\/11bce75626c322b749d3171d2e85ef84d0b95aef\/pymarlin\/utils\/distributed.py#L10\">DistributedTrainingArguments<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> and pass it as a TrainerArgument.<\/p>\n<p>The Trainer also allows users to move ModuleInterface to their device. After fetching the ranks, PyMarlin moves the ModuleInterface to the local_rank GPU. Inputs to ModuleInterface\u2019s train_step and val_step are not moved to the device; the user is responsible to move them. Users can extend and modify the \u2018to\u2019 function to change the device movement behavior. `To` is a torch.nn.Module function, hence proper care must be taken before overriding this function. Model parallelism is not supported out of the box but can be achieved by writing custom code for rank fetching and model movement.<\/p>\n<div id=\"attachment_804274\" style=\"width: 810px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-804274\" class=\"wp-image-804274\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-61b7cd45145ce-1024x541.png\" alt=\"Trainer lifecycle\" width=\"800\" height=\"422\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-61b7cd45145ce-1024x541.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-61b7cd45145ce-300x158.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-61b7cd45145ce-768x406.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-61b7cd45145ce-1536x811.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-61b7cd45145ce-2048x1082.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture3-61b7cd45145ce-240x127.png 240w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><p id=\"caption-attachment-804274\" class=\"wp-caption-text\">Figure 3: Trainer lifecycle<\/p><\/div>\n<p>&nbsp;<\/li>\n<li><strong>Stats and Loggers<br \/>\n<\/strong>We have implemented a wrapper on Tensorboard&#8217;s SummaryWriter for logging stats to Tensorboard (TB), which makes it easy to use the utility to save TB events and visualize on TB later, for tracking the progress of your training experiment. We also have the Azure ML and stdout writers to be able to write out your stats to the logs. Users can create their own writers and pass to the trainer. Currently PyMarlin supports three writers out of the box: StdOut, Tensorboard, and AML.<\/li>\n<li><strong>Checkpointer<\/strong><br \/>\nCheckpointing is made simple with a built-in checkpoint utility module that offers a default implementation which saves the state of the ModuleInterface, TrainerBackend, and Trainer at every epoch, and loads any model checkpoints available at the start of training. Users can control via arguments the save and load directories, as well as customize the frequency of checkpointing, and easily perform tasks such as resuming training after any number of steps, because optimizers and schedulers and additional information are stored as a part of the default checkpointing. However, in line with the goal of offering flexibility and extensibility, users can implement their own checkpointers by extending the Abstract Checkpointer class for custom checkpointing logic. As shown in Figure 4, users can also save any states or variables from the three core classes mentioned by overwriting the get_state() and update_state() methods from each class, which are called upon by the checkpointer at each save() and load() respectively. For an example on how to implement a custom checkpointer, please visit our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/examples\/checkpointing\">documentation site<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<div id=\"attachment_804292\" style=\"width: 710px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-804292\" class=\"wp-image-804292\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture4.png\" alt=\"Checkpointing design\" width=\"700\" height=\"425\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture4.png 831w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture4-300x182.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture4-768x467.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2021\/12\/Picture4-240x146.png 240w\" sizes=\"(max-width: 700px) 100vw, 700px\" \/><p id=\"caption-attachment-804292\" class=\"wp-caption-text\">Figure 4: Checkpointing design<\/p><\/div>\n<p>&nbsp;<\/li>\n<li><strong>AML integration<\/strong><br \/>\nAML integration is out of the box using Azure ML (MPI) launcher. Trainer fetches ranks from the MPI environment variables, which are set if Azure ML is used to spawn the nodes and processes. You can use the right backend (DDP, DDP-AMP) to ensure distributed training on AML compute.<\/li>\n<li><strong>Yaml parser<br \/>\n<\/strong>It\u2019s hard to keep track of the many hyperparameters in deep learning experiments. To ease this, we offer a custom arguments parser that allows you to maintain a YAML file containing all parameters. Values which need to be overriden during experimentation can also be passed via command line. Example: If this is your YAML file with following configs:<\/p>\n<pre class=\"brush: yaml; title: ; notranslate\" title=\"\">\r\ntrainer:\r\n    backend: \"sp\"\r\n    train_batch_size: 32\r\n    val_batch_size: 16\r\n    epochs: 1 # Total epochs to run.\r\n<\/pre>\n<p>You can modify this while running the script via command line as:<\/p>\n<blockquote><p><code>python Myscript.py --trainer.backend \u201csp-amp\u201d<\/code><\/p><\/blockquote>\n<p>The simplest example of PyMarlin in action is for CIFAR Image Classification, for which we have a Collab notebook. We recommend following along there: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/colab.research.google.com\/github\/microsoft\/PyMarlin\/blob\/main\/examples\/cifar_image_classification\/CIFAR.ipynb\">CIFAR.ipynb &#8211; Colaboratory (google.com)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>. The notebook goes through the major steps of the workflow for creating a PyMarlin Scenario. These are:<\/p>\n<ul>\n<li>Data preprocessing and analysis (through implementing DataProcessor and DataInterface)<\/li>\n<li>Defining the model and training setup (optimizers,LR Scheduler), and validation metrics through ModuleInterface<\/li>\n<li>Start training by initializing instances of DataInterface, ModuleInterface and Trainer and executing Trainer.train()<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3>Extensibility<\/h3>\n<p>The biggest benefit of PyMarlin is its extensibility.<\/p>\n<p>You can change the dataset but keep the training recipe the same (CIFAR to MNIST). For example, to reuse a great Image Classification model for another task, you can keep your ModuleInterface almost the same and just implement a new data interface. For switching to MNIST from our CIFAR example, this is simple with torchvision extension package. Changes from CIFAR are highlighted:<\/p>\n<pre class=\"brush: python; title: ; wrap-lines: true; notranslate\" title=\"\">\r\nfrom pymarlin.core import data_interface\r\nclass MNISTDataProcessor(data_interface.DataProcessor):\r\n    def process(self):\r\n        '''\r\n        Downloads and caches the CIFAR data.\r\n        Normalizes the data and creates torch datasets\r\n        '''\r\n        transform = transforms.Compose(\r\n            &#x5B;transforms.ToTensor(),\r\n             transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])\r\n        trainset = torchvision.datasets.MNIST(root='.\/data', train=True,\r\n                                        download=True, transform=transform)\r\n        testset = torchvision.datasets.MNIST(root='.\/data', train=False,\r\n                                       download=True, transform=transform)\r\n        self.datasets = {'Train': trainset, 'Test': testset}\r\n        return self.datasets\r\n\r\n    def analyze(self):\r\n        '''\r\n        Displays size of train and test data sets.\r\n        prints few images and their labels from train dataset\r\n        '''\r\n        datasets = self.datasets\r\n        print(f'train data size = {len(datasets&#x5B;\"Train\"])}')\r\n        print(f'val data size = {len(datasets&#x5B;\"Test\"])}')\r\n        print('Examples')\r\n        random_indices = np.random.choice(range(len(datasets&#x5B;'Train'])),5, False)\r\n        sample_images = &#x5B;datasets&#x5B;'Train']&#x5B;i]&#x5B;0] for i in random_indices]\r\n        sample_labels = &#x5B;datasets&#x5B;'Train']&#x5B;i]&#x5B;1] for i in random_indices]\r\n        self._imshow(torchvision.utils.make_grid(sample_images))\r\n        classes = &#x5B;str(digit) for digit in range(10)]\r\n        print('| '.join('%5s' % classes&#x5B;sample_labels&#x5B;j]] for j in range(len(sample_labels))))\r\n        \r\n    def _imshow(self,img):\r\n        img = img \/ 2 + 0.5     # unnormalize\r\n        npimg = img.numpy()\r\n        plt.figure(figsize = (10,5))\r\n        plt.imshow(np.transpose(npimg, (1, 2, 0))) # height x width x channels\r\n        plt.show()\r\n\r\n\r\n<\/pre>\n<p>You can also change the model architecture but keep the data the same. Using the CIFAR example above, this would still be incredibly simple with torchvision, assuming we\u2019d keep the optimizer and everything else the same:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n from torchvision.models import resnet18\r\n\t def __init__(self, data_interface):\r\n        super().__init__() # always initialize superclass first\r\n        self.data_interface = data_interface\r\n       \r\n  self.net = resnet18(pretrained=True)\r\n\r\n<\/pre>\n<p>All PyMarlin code is modular and extendable. We encourage the Open Source community to contribute new additions to the library. The <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/PyMarlin\/tree\/u\/krkusuk\/dp_sgd\/examples\/cnndailymail_text_summarization\">BART CNN\/DailyMail Summarization example<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> shows the process for creating a new trainer backend for ONNXRuntimeTraining: <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/microsoft\/PyMarlin\/blob\/main\/examples\/cnndailymail_text_summarization\/ORT_README.md\">PyMarlin\/ORT_README.md at main \u00b7 microsoft\/PyMarlin (github.com)<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<h3>Future Roadmap<\/h3>\n<p>The PyMarlin team plans to support more trainer backends that will enable users wanting to train large models through DeepSpeed model parallelism. Differential privacy training is another feature we plan to support. We are always looking for ways to further expand and improve PyMarlin and we welcome contributions from the community! You can find our external contribution guidelines <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/contributing#contributor-license-agreement\">here<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<h3>PyMarlin team \u2013 we\u2019re hiring<\/h3>\n<p>We are a group of applied scientists and engineers (Amin Saied, Ananth Rao, Ashwin Srinivasan, Damien Jose, Eduardo Gonzalez, Han Yu, Jon Sleep, Krishan Subudhi, Shruti Gullapuram, Alejandro Stevenson-Duran, Manash Goswami) from Microsoft Office and Azure who are enthusiastic about running extensible and scalable code and working on large-scale language model pretraining and finetuning for enterprise scenarios. If this type of work interests you, the PyMarlin team in <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/msai\/\">MSAI<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> is hiring both scientists and engineers! Please visit our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/careers.microsoft.com\/i\/us\/en\/search-results?keywords=MSAI\">careers page<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n<hr \/>\n<p id=\"fn1\"><a href=\"#r1\"><sup>1<\/sup><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>More information can be found in our <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/microsoft.github.io\/PyMarlin\/docs\/reference\/core\/data_interface\/\">documentation<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>PyMarlin is a lightweight PyTorch extension library for agile experimentation. This article examines how PyMarlin works, how it supports extensibility, and next steps needed to advance its functionality further.<\/p>\n","protected":false},"author":40816,"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"msr-content-parent":644373,"footnotes":""},"research-area":[],"msr-locale":[268875],"class_list":["post-803617","msr-blog-post","type-msr-blog-post","status-publish","hentry","msr-locale-en_us"],"msr_assoc_parent":{"id":644373,"type":"group"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/803617"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-blog-post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/40816"}],"version-history":[{"count":47,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/803617\/revisions"}],"predecessor-version":[{"id":806005,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-blog-post\/803617\/revisions\/806005"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=803617"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=803617"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=803617"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}