I marked this video to be watched later as I don't have enough time now. The thing is, I rarely have time later as there's an endless stream of worthy ML videos each week. Your target audience are most likely people like me. This is the moment to learn about HF Accelerate but it will pass because of the video's length. One solution is to post links to few minute video explainers. For example, Fireship's 100 second videos are hugely popular.
You have time, sit and learn, it's just one hour. Remember Elon's rocket catch? He also makes cars and dances (I've tried the short ones but meh, need explanation ❤)
I think it's important to clarify **explicitly** how the code changes if you use a HF Trainer/SFTTrainer. This is my best guess assuming Trainer is it's own special wraper to train your model: ``` from transformers import GPT2LMHeadModel, GPT2TokenizerFast, TrainingArguments, Trainer from accelerate import Accelerator from datasets import load_dataset # Initialize accelerator accelerator = Accelerator() # Load a dataset dataset = load_dataset('text', data_files={'train': 'train.txt', 'test': 'test.txt'}) # Tokenization tokenizer = GPT2TokenizerFast.from_pretrained('gpt2') def tokenize_function(examples): # We are doing causal (unidirectional) masking return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512) tokenized_datasets = dataset.map(tokenize_function, batched=True) # Set the columns to be used in training tokenized_datasets.set_format("torch", columns=["input_ids", "attention_mask"]) # Split the dataset into train and test train_dataset = tokenized_datasets["train"] test_dataset = tokenized_datasets["test"] # Initialize model model = GPT2LMHeadModel.from_pretrained("gpt2") # Prepare everything with our `accelerator`. model, train_dataset, test_dataset = accelerator.prepare(model, train_dataset, test_dataset) # Define training arguments training_args = TrainingArguments( output_dir='./results', num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=64, warmup_steps=500, weight_decay=0.01, logging_dir='./logs', prediction_loss_only=True, # In language modelling, we only care about the loss ) # Create the trainer trainer = Trainer( model=model, args=training_args, train_dataset=train_dataset, eval_dataset=test_dataset, ) # Train the model trainer.train() ```
Wonderful video IMHO
I still have to figure out how to use accelerate to run multi-nodes/multi-GPUs trainings and inferences though
video is great, the best part is around 8:27 to 22:30
Tks
I marked this video to be watched later as I don't have enough time now. The thing is, I rarely have time later as there's an endless stream of worthy ML videos each week. Your target audience are most likely people like me. This is the moment to learn about HF Accelerate but it will pass because of the video's length. One solution is to post links to few minute video explainers. For example, Fireship's 100 second videos are hugely popular.
yes, that might be good
You have time, sit and learn, it's just one hour. Remember Elon's rocket catch? He also makes cars and dances (I've tried the short ones but meh, need explanation ❤)
Thanks for the great video. Does accelerate works with Windows? I cannot find any information about that and it doesnt work on my windows pc.
I think it's important to clarify **explicitly** how the code changes if you use a HF Trainer/SFTTrainer. This is my best guess assuming Trainer is it's own special wraper to train your model:
```
from transformers import GPT2LMHeadModel, GPT2TokenizerFast, TrainingArguments, Trainer
from accelerate import Accelerator
from datasets import load_dataset
# Initialize accelerator
accelerator = Accelerator()
# Load a dataset
dataset = load_dataset('text', data_files={'train': 'train.txt', 'test': 'test.txt'})
# Tokenization
tokenizer = GPT2TokenizerFast.from_pretrained('gpt2')
def tokenize_function(examples):
# We are doing causal (unidirectional) masking
return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=512)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Set the columns to be used in training
tokenized_datasets.set_format("torch", columns=["input_ids", "attention_mask"])
# Split the dataset into train and test
train_dataset = tokenized_datasets["train"]
test_dataset = tokenized_datasets["test"]
# Initialize model
model = GPT2LMHeadModel.from_pretrained("gpt2")
# Prepare everything with our `accelerator`.
model, train_dataset, test_dataset = accelerator.prepare(model, train_dataset, test_dataset)
# Define training arguments
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
prediction_loss_only=True, # In language modelling, we only care about the loss
)
# Create the trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)
# Train the model
trainer.train()
```