Understand AI Code Generator “Agentless” by executing step by step

Masato Naka
6 min readJan 29, 2025

--

# Overview

A while ago, it was reported that Agentless achieved high performance on SWEBench. I decided to take a closer look at its details. In this blog, we will explore the fundamental features of Agentless and how to use it effectively. We will follow along with the steps outlined in the article Agentless: A New Approach to Using LLMs.

https://www.swebench.com/

Agentless operates through three main steps:

  1. Localization: Uses LLMs and embeddings to identify relevant files and positions (File-level → Element-level → Line-level).
  2. Repair: Generates multiple patches per issue, consisting of line positions and diff-based changes.
  3. Validation and Selection: Executes regression tests (existing tests) and generates reproduction tests for the issue, selecting the final patch based on these results.
Agentless: A New Approach to Using LLMs

# Steps

## 1. Preparation

### Repo Setup

First, clone the repository and install the necessary dependencies.

gh repo clone OpenAutoCoder/agentless
cd agentless
pip install -r requirements.txt

### Repo Structure

Download the repo structure from https://github.com/OpenAutoCoder/Agentless/releases/tag/v1.5.0 and extract it. If the repo structure is not downloaded, the structure generation script will run, which takes time.

### Environment Variable Configuration

Set the required environment variables:

export OPENAI_API_KEY=xxxxxx
export PROJECT_FILE_LOC=~/Downloads/repo_structure/repo_structures
export PYTHONPATH=$PYTHONPATH:$(pwd)

The path ~/Downloads/repo_structure/repo_structures should point to the extracted folder from the downloaded repo structure.

## 2. Localization

In this post, we will use a specific target ID (e.g., django__django-10914) to retrieve relevant files.

Localization step identifies the location to change to resolve the target issue.

#### Retrieve Relevant Files at File Level

Run localization for related files:

python agentless/fl/localize.py --file_level --output_folder results/swe-bench-lite/file_level --num_threads 10 --skip_existing --target_id=django__django-10914

#### Identify Irrelevant Files

Run localization for irrelevant files:

python agentless/fl/localize.py --file_level --output_folder results/swe-bench-lite/file_level_irrelevant --num_threads 10 --skip_existing --target_id=django__django-10914 --irrelevant

#### Retrieve Relevant Files from Embeddings

In addition to the LLM based retrieval above, Agentless also retrieves relevant files using embeddings.

python agentless/fl/retrieve.py --index_type simple \
--filter_type given_files \
--filter_file results/swe-bench-lite/file_level_irrelevant/loc_outputs.jsonl \
--output_folder results/swe-bench-lite/retrievel_embedding \
--persist_dir embedding/swe-bench_simple \
--num_threads 10 \
--target_id=django__django-10914

#### Combine Results

Merge top N results from LLM and embeddings:

python agentless/fl/combine.py  --retrieval_loc_file results/swe-bench-lite/retrievel_embedding/retrieve_locs.jsonl \
--model_loc_file results/swe-bench-lite/file_level/loc_outputs.jsonl \
--top_n 3 \
--output_folder results/swe-bench-lite/file_level_combined

#### Get Element-Level Relevance

Using combined locations, retrieve relevant elements:

python agentless/fl/combine.py  --retrieval_loc_file results/swe-bench-lite/retrievel_embedding/retrieve_locs.jsonl \
--model_loc_file results/swe-bench-lite/file_level/loc_outputs.jsonl \
--top_n 3 \
--output_folder results/swe-bench-lite/file_level_combined

The prompt used in this step is:

Please provide the complete set of locations as either a class name, a function name, or a variable name.
Note that if you include a class, you do not need to list its specific methods.
You can include either the entire class or don't include the class name and instead include specific methods in the class.
### Examples:
```
full_path1/file1.py
function: my_function_1
class: MyClass1
function: MyClass2.my_method

full_path2/file2.py
variable: my_var
function: MyClass3.my_method
full_path3/file3.py
function: my_function_2
function: my_function_3
function: MyClass4.my_method_1
class: MyClass5
```

Return just the locations wrapped with ```.

Example elements:

{"django/core/files/storage.py": ["class: FileSystemStorage"], "django/conf/global_settings.py": ["variable: FILE_UPLOAD_PERMISSIONS"], "django/core/files/uploadhandler.py": [""]}

#### Generate Line-Level Change Samples

Identify the line number for relevant elements extracted in the previous step. You can specify --num_samples to decide the number of samples that we can compare in the subsequent steps:

python agentless/fl/localize.py --fine_grain_line_level \
--output_folder results/swe-bench-lite/edit_location_samples \
--top_n 3 \
--compress \
--temperature 0.8 \
--num_samples 4 \
--start_file results/swe-bench-lite/related_elements/loc_outputs.jsonl \
--num_threads 10 \
--skip_existing \
--target_id=django__django-10914

(part of ) the prompt:

Please provide the class name, function or method name, or the exact line numbers that need to be edited.
The possible location outputs should be either \"class\", \"function\" or \"line\".

### Examples:
```
full_path1/file1.py
line: 10
class: MyClass1
line: 51

full_path2/file2.py
function: MyClass2.my_method
line: 12

full_path3/file3.py
function: my_function
line: 24
line: 156
```

Return just the location(s) wrapped with ```.

example result:

["```\ndjango/core/files/storage.py\nline: 260\nline: 217\n\ndjango/conf/global_settings.py\nline: 307\n```", "```\nfull_path1/django/core/files/storage.py\nline: 260\nline: 284\n\nfull_path2/django/conf/global_settings.py\nline: 307\n```", "```\ndjango/core/files/storage.py\nline: 260\n\ndjango/conf/global_settings.py\nline: 307\n```", "```\ndjango/conf/global_settings.py\nline: 307\n\ndjango/core/files/storage.py\nline: 260\n```"]

loc_output.jsonl format:

  1. instance_id: task ID of the issue
  2. found_files: list of files localized by the model
  3. additional_artifact_loc_file: raw output of the model during file-level localization
  4. file_traj: trajectory of the model during file-level localization (e.g., # of tokens)
  5. found_related_locs: dict of relevant code elements localized by the model
  6. additional_artifact_loc_related: raw output of the model during relevant-code-level localization
  7. related_loc_traj: trajectory of the model during relevant-code-level localization
  8. found_edit_locs: dict of edit locations localized by the model
  9. additional_artifact_loc_edit_location: raw output of the model during edit-location-level localization
  10. edit_loc_traj: trajectory of the model during edit-location-level localization

#### edit location-level

python agentless/fl/localize.py --merge \
--output_folder results/swe-bench-lite/edit_location_individual \
--top_n 3 \
--num_samples 4 \
--start_file results/swe-bench-lite/edit_location_samples/loc_outputs.jsonl \
--target_id=django__django-10914

directories:

tree results/swe-bench-lite/edit_location_individual                                                                       
results/swe-bench-lite/edit_location_individual
├── args.json
├── loc_merged_0-0_outputs.jsonl
├── loc_merged_1-1_outputs.jsonl
├── loc_merged_2-2_outputs.jsonl
├── loc_merged_3-3_outputs.jsonl
└── localization_logs

2 directories, 5 files

## 3. Repair

#### Generate Patches

Generate multiple patches per issue and perform voting to determine the final patch (example: results/swe-bench-lite/repair_sample_1):

python agentless/repair/repair.py --loc_file results/swe-bench-lite/edit_location_individual/loc_merged_0-0_outputs.jsonl \
--output_folder results/swe-bench-lite/repair_sample_1 \
--loc_interval \
--top_n=3 \
--context_window=10 \
--max_samples 10 \
--cot \
--diff_format \
--gen_and_process \
--num_threads 2 \
--target_id=django__django-10914

Run for the remaining three samples:

for i in {1..3}; do
python agentless/repair/repair.py --loc_file results/swe-bench-lite/edit_location_individual/loc_merged_${i}-${i}_outputs.jsonl \
--output_folder results/swe-bench-lite/repair_sample_$((i+1)) \
--loc_interval \
--top_n=3 \
--context_window=10 \
--max_samples 10 \
--cot \
--diff_format \
--gen_and_process \
--num_threads 2 \
--target_id=django__django-10914
done

## 4. Patch Validation and Selection

Last step is to evaluate the patches and select the final patches. The evaluation includes regression test (running existing tests) and reproduction test (new test for the target issue).

### Regression Test Selection

Run regression tests for the issue:

python agentless/test/run_regression_tests.py --run_id generate_regression_tests \
--output_file results/swe-bench-lite/passing_tests.jsonl \
--instance_ids=django__django-10914
python agentless/test/select_regression_tests.py --passing_tests results/swe-bench-lite/passing_tests.jsonl \
--output_folder results/swe-bench-lite/select_regression
folder=results/swe-bench-lite/repair_sample_1
for num in {0..9..1}; do
run_id_prefix=$(basename $folder);
python agentless/test/run_regression_tests.py --regression_tests results/swe-bench-lite/select_regression/output.jsonl \
--predictions_path="${folder}/output_${num}_processed.jsonl" \
--run_id="${run_id_prefix}_regression_${num}" --num_workers 10;
done

#### Generate Reproduction Tests

Generate reproduction tests for the target issue:

python agentless/test/generate_reproduction_tests.py --max_samples 40 \
--output_folder results/swe-bench-lite/reproduction_test_samples \
--num_threads 10

Run the tests against samples:

for st in {0..36..4}; do   en=$((st + 3));   
echo "Processing ${st} to ${en}";
for num in $(seq $st $en); do
echo "Processing ${num}";
python agentless/test/run_reproduction_tests.py --run_id="reproduction_test_generation_filter_sample_${num}" \
--test_jsonl="results/swe-bench-lite/reproduction_test_samples/output_${num}_processed_reproduction_test.jsonl" \
--num_workers 6 \
--testing;
done & done
for st in {0..36..4}; do   en=$((st + 3));   
echo "Processing ${st} to ${en}";
for num in $(seq $st $en); do
echo "Processing ${num}";
python agentless/test/run_reproduction_tests.py --run_id="reproduction_test_generation_filter_sample_${num}" \
--test_jsonl="results/swe-bench-lite/reproduction_test_samples/output_${num}_processed_reproduction_test.jsonl" \
--num_workers 6 \
--testing;
done & done
folder=results/swe-bench-lite/repair_sample_1
for num in {0..9..1}; do
run_id_prefix=$(basename $folder);
python agentless/test/run_reproduction_tests.py --test_jsonl results/swe-bench-lite/reproduction_test_samples/reproduction_tests.jsonl \
--predictions_path="${folder}/output_${num}_processed.jsonl" \
--run_id="${run_id_prefix}_reproduction_${num}" --num_workers 10;
done

### Reranking and Patch Selection

Lastly rerank the patches for the final selection:

python agentless/repair/rerank.py --patch_folder results/swe-bench-lite/repair_sample_1/,results/swe-bench-lite/repair_sample_2/,results/swe-bench-lite/repair_sample_3/,results/swe-bench-lite/repair_sample_4/ \
--num_samples 40 \
--deduplicate \
--regression \
--reproduction

# Summary

By using Agentless, we gained a clear understanding of the flow of AI-driven code generation. Through the steps of Localization, Repair, and Validation and Selection, we learned an efficient way to make code modifications.

In the next steps, we plan to explore how to apply this process to repositories other than SWEBench. Additionally, we will revisit the setup for Regression Tests to prepare for future executions. I encourage everyone to try out Agentless!

--

--

Masato Naka
Masato Naka

Written by Masato Naka

An SRE, mainly working on Kubernetes. CKA (Feb 2021). His Interests include Cloud-Native application development, and machine learning.

No responses yet