AI Training Conversatie - Learning Rate en SQL Stored Procedures
Initiële Training Parameters Uitleg
De volgende training parameters werden besproken:
training_args = TrainingArguments(
output_dir=r"C:\Users\luca\Desktop\Deepseek\FineTune\Deepseek-R1-FineTuned",
overwrite_output_dir=True,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
learning_rate=5e-5,
num_train_epochs=3,
save_steps=500,
save_total_limit=2,
logging_dir=r"./logs",
fp16=True,
evaluation_strategy="no",
logging_steps=100,
dataloader_num_workers=4, # If issues persist, try setting this to 0.
report_to="none"
)
Uitleg van Parameters:
output_dir
: De map waar het getrainde model en checkpoints worden opgeslagen.overwrite_output_dir
: AlsTrue
, overschrijft het bestaande bestanden in de output directory.per_device_train_batch_size=1
: Het aantal voorbeelden dat tegelijk wordt verwerkt per GPU/CPU.gradient_accumulation_steps=16
: Accumuleert gradiënten over meerdere batches.learning_rate=5e-5
: De snelheid waarmee het model leert.num_train_epochs=3
: Aantal keer door de volledige dataset.save_steps=500
: Frequentie van checkpoint opslag.save_total_limit=2
: Maximaal aantal bewaarde checkpoints.logging_dir
: Directory voor trainingsmetrieken.fp16=True
: Gebruikt 16-bits floating point precisie.evaluation_strategy="no"
: Geen evaluatie tijdens training.logging_steps=100
: Logging frequentie.dataloader_num_workers=4
: Aantal parallel processen.report_to="none"
: Geen externe logging.
Gedetailleerde Uitleg Learning Rate
De learning rate is een cruciale hyperparameter bij AI training:
Analogie
- Te hoge learning rate (0.1): Als te snel willen leren fietsen
- Te lage learning rate (1e-6): Als té voorzichtig leren
- 5e-5 (0.00005): "Sweet spot" voor fine-tuning
Technisch Process
- Model maakt voorspelling
- Berekent foutmarge (loss)
- Past parameters aan met learning rate
- Voorbeeld: loss = 10, learning rate = 5e-5
- Parameter aanpassing = 10 * 5e-5 = 0.0005
Waarom 5e-5 voor Fine-tuning
- Basismodel is al getraind
- Voorkomt overschrijving van nuttige kennis
- Empirisch bewezen effectief
Problemen bij Verkeerde Learning Rates
- Te hoog: Geen convergentie, wilde loss schommelingen
- Te laag: Trage training, lokale minima
SQL Stored Procedure Optimalisatie
Learning Rate Range Test
from torch.optim import Adam
from torch.optim.lr_scheduler import LinearLR
def find_optimal_lr(model, train_dataloader):
min_lr = 1e-7
max_lr = 1
optimizer = Adam(model.parameters(), lr=min_lr)
scheduler = LinearLR(optimizer,
start_factor=1.0,
end_factor=max_lr/min_lr,
total_iters=100)
lrs = []
losses = []
for batch in train_dataloader:
optimizer.zero_grad()
loss = model(batch).loss
loss.backward()
lrs.append(optimizer.param_groups[0]['lr'])
losses.append(loss.item())
optimizer.step()
scheduler.step()
return lrs, losses
SQL-Specifieke Training Configuratie
training_args = TrainingArguments(
output_dir="sql_model_output",
learning_rate=1e-5,
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
evaluation_strategy="steps",
eval_steps=100
)
def monitor_training(trainer):
losses = []
lrs = []
def logging_callback(args, state, control):
losses.append(state.log_history[-1]['loss'])
lrs.append(state.log_history[-1]['learning_rate'])
trainer.add_callback(logging_callback)
return losses, lrs
SQL Performance Evaluatie
def evaluate_sql_performance(model, test_cases):
success_rate = 0
syntax_errors = 0
for case in test_cases:
prediction = model.generate(case['input'])
if is_valid_sql(prediction):
if matches_expected_procedure(prediction, case['expected']):
success_rate += 1
else:
syntax_errors += 1
return {
'success_rate': success_rate / len(test_cases),
'syntax_errors': syntax_errors / len(test_cases)
}
Specifieke Stored Procedure Training
Voorbeeld Procedure
sp_api_modal_table
@tmptable=N'#test',
@print=1,
@excel=1,
@Orderby=N'Order by 1,2'
Trainingsdata Voorbeelden
training_examples = [
{
"input": "Toon verkoopdata in modal venster, sorteer op klantnaam en datum",
"output": "EXEC sp_api_modal_table @tmptable='#verkoop_tmp', @print=1, @excel=1, @Orderby='Order by 1,2'"
},
{
"input": "Exporteer personeelslijst naar Excel, gesorteerd op afdeling",
"output": "EXEC sp_api_modal_table @tmptable='#personeel_tmp', @print=0, @excel=1, @Orderby='Order by 1'"
}
]
Procedure-Specifieke Evaluatie
def evaluate_modal_table_usage(model, test_cases):
def check_parameters(prediction):
required_params = {
'@tmptable': str,
'@print': [0, 1],
'@excel': [0, 1],
'@Orderby': str
}
for param, valid_type in required_params.items():
if param not in prediction:
return False
if param in ['@print', '@excel']:
if prediction[param] not in valid_type:
return False
return True
correct = 0
total = len(test_cases)
for case in test_cases:
prediction = model.generate(case['input'])
if (
prediction.startswith("EXEC sp_api_modal_table") and
check_parameters(parse_sql_params(prediction))
):
correct += 1
return correct / total
Aangepaste Training Arguments
training_args = TrainingArguments(
output_dir="modal_table_model",
learning_rate=2e-5,
num_train_epochs=5,
per_device_train_batch_size=2,
gradient_accumulation_steps=8,
evaluation_strategy="steps",
eval_steps=50,
save_steps=100,
load_best_model_at_end=True,
metric_for_best_model="parameter_accuracy"
)
Parameter Monitoring
def monitor_parameter_learning(model, eval_dataset):
parameter_accuracy = {
'tmptable': [],
'print': [],
'excel': [],
'orderby': []
}
for epoch in range(training_args.num_train_epochs):
predictions = model.predict(eval_dataset)
for param in parameter_accuracy:
accuracy = evaluate_parameter(predictions, param)
parameter_accuracy[param].append(accuracy)
if min(parameter_accuracy['orderby']) < 0.8:
adjust_parameter_specific_learning_rate('orderby', 3e-5)
return parameter_accuracy
Validatie
def validate_modal_table_output(prediction):
if not re.match(r"@tmptable=N'#\w+'", prediction):
return False
orderby_pattern = re.compile(r"@Orderby=N'Order by [\d,\s]+'")
if not orderby_pattern.search(prediction):
return False
if not all(param in prediction for param in ['@print=1', '@excel=1']):
return False
return True
Aanbevolen Learning Rate Strategie
- Start met learning rate 2e-5
- Verhoog naar 3e-5 bij correcte orderby-syntax
- Verlaag naar 1e-5 bij parameter volgorde fouten