معرفی شرکت ها

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

تبلیغات ما

مشتریان به طور فزاینده ای آنلاین هستند. تبلیغات می تواند به آنها کمک کند تا کسب و کار شما را پیدا کنند.

مشاهده بیشتر

توضیحات

A simple PyTorch wrapper making multi-node multi-GPU training much easier on Slurm

ویژگی	مقدار
سیستم عامل	OS Independent
نام فایل	PyTorch-Bolt-0.0.2
نام	PyTorch-Bolt
نسخه کتابخانه	0.0.2
نگهدارنده	[]
ایمیل نگهدارنده	[]
نویسنده	Yi Zhang
ایمیل نویسنده	yizhang.dev@gmail.com
آدرس صفحه اصلی	https://github.com/yzhang-dev/PyTorch-Bolt
آدرس اینترنتی	https://pypi.org/project/PyTorch-Bolt/
مجوز	MIT

# PyTorch Bolt PyTorch Bolt is * a simple PyTorch wrapper making multi-node multi-GPU training much easier on Slurm PyTorch Bolt supports to * use single-node single-GPU training on a specified GPU device * use multi-node (or single-node) multi-GPU [`DistributedDataParallel`](https://pytorch.org/docs/master/generated/torch.nn.parallel.DistributedDataParallel.html) (DDP) training - with [`torch.distributed.launch`](https://pytorch.org/docs/stable/distributed.html#launch-utility) module - with [Slurm](https://slurm.schedmd.com/quickstart.html) cluster workload manager ## Quickstart ### All-in-One Template Using PyTorch Bolt #### Recommended Structure ``` . ├── data │ ├── __init__.py │ └── customized_datamodule.py ├── model │ ├── __init__.py │ └── customized_model.py ├── main.py ├── main.sbatch └── requirements.txt ``` #### Demo [MNIST classification using PyTorch Bolt](https://github.com/yzhang-dev/PyTorch-with-Slurm/tree/main/Tutorials/All-in-One-Template-Using-PyTorch-Bolt) (you might need to go through the relevant [tutorials](https://github.com/yzhang-dev/PyTorch-with-Slurm) step by step). ### Dependencies and Installation #### Package Dependencies `pip install -r requirements.txt` can handle all package dependencies. #### Install Pytorch Bolt ```bash $ pip install pytorch-bolt ``` ## Documentation ### Module `DataModule` ```python class pytorch_bolt.DataModule(data_dir='data', num_splits=10, batch_size=1, num_workers=0, pin_memory=False, drop_last=False) ``` #### `use_dist_sampler()` Can be called to trigger `DistributedSampler` when using `DistributedDataParallel` (DDP). #### `train_dataloader()` Returns `Dataloader` for trainset. #### `val_dataloader()` Returns `Dataloader` for valset. #### `test_dataloader()` Returns `Dataloader` for testset. #### `add_argparse_args(parent_parser)` Returns `argparse` parser. (**Staticmethod**) **Practical template**: ```python import pytorch_bolt class MyDataModule(pytorch_bolt.DataModule): def __init__(self, args): super().__init__(args) # arguments for customized dataset # optional helper function can be used def _prepare_data(self): pass def _setup_dataset(self): # trainset and valset for fit stage # `self.num_splits` can be used for splitting trainset and valset # testset for test stage return trainset, valset, testset @staticmethod def add_argparse_args(parent_parser): parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False) parser = pytorch_bolt.DataModule.add_argparse_args(parser) # TODO return parser @classmethod def from_argparse_args(cls, args): return cls(args) ``` ### Module `Module` ```python class pytorch_bolt.Module() ``` #### `parameters_to_update()` Returns model parameters that have `requires_grad=True`. #### `configure_criterion()` Returns criterion. #### `configure_metric()` Returns metric. #### `configure_optimizer()` Returns optimizer (and learning rate scheduler). **Practical template**: ```python import pytorch_bolt class MyModel(pytorch_bolt.Module): def __init__(self, args): super().__init__() # hyperparameters for model self.model = self._setup_model() # hyperparameters for criterion, metric, optimizer and lr_scheduler def _setup_model(self): # TODO return model def forward(self, inputs): return self.model(inputs) # return parameters that have requires_grad=True # `parameters_to_update` can be useful for transfer learning def parameters_to_update(self): return # return criterion def configure_criterion(self): return # return metric def configure_metric(self): return # return optimizer (and lr_scheduler) def configure_optimizer(self): return @staticmethod def add_argparse_args(parent_parser): parser = argparse.ArgumentParser(parents=[parent_parser], add_help=False) # TODO return parser @classmethod def from_argparse_args(cls, args): return cls(args) ``` ### Module `Loggers` ```python class pytorch_bolt.Loggers(logs_dir='logs', loggerfmt='%(asctime)s | %(levelname)-5s | %(name)s - %(message)s', datefmt=None, tracker_keys=None (Required), tracker_reduction='mean') ``` #### `configure_root_logger(root)` Returns `root` logger. #### `configure_child_logger(child)` Returns `root.child` logger. #### `configure_tracker()` Returns tracker for tracking forward propagation step outputs and statistics. #### `configure_progressbar()` Returns progress bar for showing forward propagation step progress and details. #### `configure_writer()` Returns Tensorboard writer for visualizing forward propagation epoch outputs. #### `add_argparse_args(parent_parser)` Returns `argparse` parser. (**Staticmethod**) #### `from_argparse_args(args)` `Loggers` constructor. ### Module `Trainer` ```python class pytorch_bolt.Trainer(loggers=None (Required), device=None, distributed=False, use_slurm=False, dist_backend='nccl', master_addr='localhost', master_port='29500', world_size=1, rank=0, local_rank=0, datamodule=None (Required), model=None (Required), max_epochs=5, verbose=False) ``` #### `get_rank()` Gets rank of current process. (**Staticmethod**) #### `fit()` Fits the model on trainset, validating each epoch on valset. #### `validate()` Validates trained model by running one epoch on valset. #### `test()` Tests trained model by running one epoch on testset. #### `destroy()` Destroys trainer.. #### `add_argparse_args(parent_parser)` Returns `argparse` parser. (**Staticmethod**) #### `from_argparse_args(args)` `Trainer` constructor. **Practical template for customized trainer**: ```python import pytorch_bolt class MyTrainer(pytorch_bolt.Trainer): def _training_step(self, batch_idx, batch): return def _training_step_end(self, batch_idx, batch, step_outs): return # if return # return dict, containing at least 2 keys: "loss", "score" def _training_epoch_end(self): return ``` ## Related Projects * Inspired by [Pytorch Lightning](https://www.pytorchlightning.ai/) ## Appendix ### Environment Variable Mapping **WORLD_SIZE** | **SLURM_NTASKS** (and **SLURM_NPROCS** for backwards compatibility) > Same as **-n, --ntasks** **RANK** | **SLURM_PROCID** > The MPI rank (or relative process ID) of the current process **LOCAL_RANK** | **SLURM_LOCALID** > Node local task ID for the process within a job. **MASTER_ADDR** | **SLURM_SUBMIT_HOST** > The hostname of the machine from which **sbatch** was invoked. **NPROC_PER_NODE** | **SLURM_NTASKS_PER_NODE** > Number of tasks requested per node. Only set if the **--ntasks-per-node** option is specified. **NNODES** | **SLURM_JOB_NUM_NODES** (and **SLURM_NNODES** for backwards compatibility) > Total number of nodes in the job's resource allocation. **NODE_RANK** | **SLURM_NODEID** > ID of the nodes allocated. **SLURM_JOB_NODELIST** (and **SLURM_NODELIST** for backwards compatibility) > List of nodes allocated to the job. ### Reference * [`sbatch` Output Environment Variables](https://slurm.schedmd.com/sbatch.html#lbAK) * [`torch.distributed` TCP Initialization Environment Variables](https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization)

نیازمندی

مقدار	نام
==1.8.1	torch
==0.9.1	torchvision
==2.4.1	tensorboard

زبان مورد نیاز

مقدار	نام
>=3.8	Python

نحوه نصب

نصب پکیج whl PyTorch-Bolt-0.0.2:

pip install PyTorch-Bolt-0.0.2.whl

نصب پکیج tar.gz PyTorch-Bolt-0.0.2:

pip install PyTorch-Bolt-0.0.2.tar.gz