AI Training Conversatie - Learning Rate en SQL Stored Procedures

Initiële Training Parameters Uitleg

De volgende training parameters werden besproken:

training_args = TrainingArguments(
    output_dir=r"C:\Users\luca\Desktop\Deepseek\FineTune\Deepseek-R1-FineTuned",
    overwrite_output_dir=True,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    learning_rate=5e-5,
    num_train_epochs=3,
    save_steps=500,
    save_total_limit=2,
    logging_dir=r"./logs",
    fp16=True,
    evaluation_strategy="no",
    logging_steps=100,
    dataloader_num_workers=4,  # If issues persist, try setting this to 0.
    report_to="none"
)

Uitleg van Parameters:

  1. output_dir: De map waar het getrainde model en checkpoints worden opgeslagen.
  2. overwrite_output_dir: Als True, overschrijft het bestaande bestanden in de output directory.
  3. per_device_train_batch_size=1: Het aantal voorbeelden dat tegelijk wordt verwerkt per GPU/CPU.
  4. gradient_accumulation_steps=16: Accumuleert gradiënten over meerdere batches.
  5. learning_rate=5e-5: De snelheid waarmee het model leert.
  6. num_train_epochs=3: Aantal keer door de volledige dataset.
  7. save_steps=500: Frequentie van checkpoint opslag.
  8. save_total_limit=2: Maximaal aantal bewaarde checkpoints.
  9. logging_dir: Directory voor trainingsmetrieken.
  10. fp16=True: Gebruikt 16-bits floating point precisie.
  11. evaluation_strategy="no": Geen evaluatie tijdens training.
  12. logging_steps=100: Logging frequentie.
  13. dataloader_num_workers=4: Aantal parallel processen.
  14. report_to="none": Geen externe logging.

Gedetailleerde Uitleg Learning Rate

De learning rate is een cruciale hyperparameter bij AI training:

Analogie

  • Te hoge learning rate (0.1): Als te snel willen leren fietsen
  • Te lage learning rate (1e-6): Als té voorzichtig leren
  • 5e-5 (0.00005): "Sweet spot" voor fine-tuning

Technisch Process

  1. Model maakt voorspelling
  2. Berekent foutmarge (loss)
  3. Past parameters aan met learning rate
    • Voorbeeld: loss = 10, learning rate = 5e-5
    • Parameter aanpassing = 10 * 5e-5 = 0.0005

Waarom 5e-5 voor Fine-tuning

  • Basismodel is al getraind
  • Voorkomt overschrijving van nuttige kennis
  • Empirisch bewezen effectief

Problemen bij Verkeerde Learning Rates

  • Te hoog: Geen convergentie, wilde loss schommelingen
  • Te laag: Trage training, lokale minima

SQL Stored Procedure Optimalisatie

Learning Rate Range Test

from torch.optim import Adam
from torch.optim.lr_scheduler import LinearLR

def find_optimal_lr(model, train_dataloader):
    min_lr = 1e-7
    max_lr = 1
    
    optimizer = Adam(model.parameters(), lr=min_lr)
    scheduler = LinearLR(optimizer, 
                        start_factor=1.0, 
                        end_factor=max_lr/min_lr, 
                        total_iters=100)
    
    lrs = []
    losses = []
    
    for batch in train_dataloader:
        optimizer.zero_grad()
        loss = model(batch).loss
        loss.backward()
        
        lrs.append(optimizer.param_groups[0]['lr'])
        losses.append(loss.item())
        
        optimizer.step()
        scheduler.step()
        
    return lrs, losses

SQL-Specifieke Training Configuratie

training_args = TrainingArguments(
    output_dir="sql_model_output",
    learning_rate=1e-5,
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    evaluation_strategy="steps",
    eval_steps=100
)

def monitor_training(trainer):
    losses = []
    lrs = []
    
    def logging_callback(args, state, control):
        losses.append(state.log_history[-1]['loss'])
        lrs.append(state.log_history[-1]['learning_rate'])
    
    trainer.add_callback(logging_callback)
    return losses, lrs

SQL Performance Evaluatie

def evaluate_sql_performance(model, test_cases):
    success_rate = 0
    syntax_errors = 0
    
    for case in test_cases:
        prediction = model.generate(case['input'])
        
        if is_valid_sql(prediction):
            if matches_expected_procedure(prediction, case['expected']):
                success_rate += 1
        else:
            syntax_errors += 1
    
    return {
        'success_rate': success_rate / len(test_cases),
        'syntax_errors': syntax_errors / len(test_cases)
    }

Specifieke Stored Procedure Training

Voorbeeld Procedure

sp_api_modal_table 
    @tmptable=N'#test', 
    @print=1, 
    @excel=1, 
    @Orderby=N'Order by 1,2'

Trainingsdata Voorbeelden

training_examples = [
    {
        "input": "Toon verkoopdata in modal venster, sorteer op klantnaam en datum",
        "output": "EXEC sp_api_modal_table @tmptable='#verkoop_tmp', @print=1, @excel=1, @Orderby='Order by 1,2'"
    },
    {
        "input": "Exporteer personeelslijst naar Excel, gesorteerd op afdeling",
        "output": "EXEC sp_api_modal_table @tmptable='#personeel_tmp', @print=0, @excel=1, @Orderby='Order by 1'"
    }
]

Procedure-Specifieke Evaluatie

def evaluate_modal_table_usage(model, test_cases):
    def check_parameters(prediction):
        required_params = {
            '@tmptable': str,
            '@print': [0, 1],
            '@excel': [0, 1],
            '@Orderby': str
        }
        
        for param, valid_type in required_params.items():
            if param not in prediction:
                return False
            if param in ['@print', '@excel']:
                if prediction[param] not in valid_type:
                    return False
        return True

    correct = 0
    total = len(test_cases)
    
    for case in test_cases:
        prediction = model.generate(case['input'])
        if (
            prediction.startswith("EXEC sp_api_modal_table") and
            check_parameters(parse_sql_params(prediction))
        ):
            correct += 1
    
    return correct / total

Aangepaste Training Arguments

training_args = TrainingArguments(
    output_dir="modal_table_model",
    learning_rate=2e-5,
    num_train_epochs=5,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    evaluation_strategy="steps",
    eval_steps=50,
    save_steps=100,
    load_best_model_at_end=True,
    metric_for_best_model="parameter_accuracy"
)

Parameter Monitoring

def monitor_parameter_learning(model, eval_dataset):
    parameter_accuracy = {
        'tmptable': [],
        'print': [],
        'excel': [],
        'orderby': []
    }
    
    for epoch in range(training_args.num_train_epochs):
        predictions = model.predict(eval_dataset)
        for param in parameter_accuracy:
            accuracy = evaluate_parameter(predictions, param)
            parameter_accuracy[param].append(accuracy)
            
        if min(parameter_accuracy['orderby']) < 0.8:
            adjust_parameter_specific_learning_rate('orderby', 3e-5)
            
    return parameter_accuracy

Validatie

def validate_modal_table_output(prediction):
    if not re.match(r"@tmptable=N'#\w+'", prediction):
        return False
        
    orderby_pattern = re.compile(r"@Orderby=N'Order by [\d,\s]+'")
    if not orderby_pattern.search(prediction):
        return False
        
    if not all(param in prediction for param in ['@print=1', '@excel=1']):
        return False
        
    return True

Aanbevolen Learning Rate Strategie

  1. Start met learning rate 2e-5
  2. Verhoog naar 3e-5 bij correcte orderby-syntax
  3. Verlaag naar 1e-5 bij parameter volgorde fouten