Code block evals (#29619)
Add a targeted eval for code block formatting, and revise the system prompt accordingly. ### Eval before, n=8 <img width="728" alt="eval before" src="https://github.com/user-attachments/assets/552b6146-3d26-4eaa-86f9-9fc36c0cadf2" /> ### Eval after prompt change, n=8 (excluding the new evals, so just testing the prompt change) <img width="717" alt="eval after" src="https://github.com/user-attachments/assets/c78c7a54-4c65-470c-b135-8691584cd73e" /> Release Notes: - N/A
This commit is contained in:
parent
2508e491d5
commit
d7004030b3
1
Cargo.lock
generated
1
Cargo.lock
generated
@ -4993,6 +4993,7 @@ dependencies = [
|
||||
"language_model",
|
||||
"language_models",
|
||||
"languages",
|
||||
"markdown",
|
||||
"node_runtime",
|
||||
"pathdiff",
|
||||
"paths",
|
||||
|
@ -39,18 +39,78 @@ If appropriate, use tool calls to explore the current project, which contains th
|
||||
|
||||
## Code Block Formatting
|
||||
|
||||
Whenever you mention a code block, you MUST use ONLY use the following format when the code in the block comes from a file
|
||||
in the project:
|
||||
|
||||
Whenever you mention a code block, you MUST use ONLY use the following format:
|
||||
```path/to/Something.blah#L123-456
|
||||
(code goes here)
|
||||
```
|
||||
|
||||
The `#L123-456` means the line number range 123 through 456, and the path/to/Something.blah
|
||||
is a path in the project. (If this code block does not come from a file in the project, then you may instead use
|
||||
the normal markdown style of three backticks followed by language name. However, you MUST use this format if
|
||||
the code in the block comes from a file in the project.)
|
||||
|
||||
is a path in the project. (If there is no valid path in the project, then you can use
|
||||
/dev/null/path.extension for its path.) This is the ONLY valid way to format code blocks, because the Markdown parser
|
||||
does not understand the more common ```language syntax, or bare ``` blocks. It only
|
||||
understands this path-based syntax, and if the path is missing, then it will error and you will have to do it over again.
|
||||
Just to be really clear about this, if you ever find yourself writing three backticks followed by a language name, STOP!
|
||||
You have made a mistake. You can only ever put paths after triple backticks!
|
||||
<example>
|
||||
Based on all the information I've gathered, here's a summary of how this system works:
|
||||
1. The README file is loaded into the system.
|
||||
2. The system finds the first two headers, including everything in between. In this case, that would be:
|
||||
```path/to/README.md#L8-12
|
||||
# First Header
|
||||
This is the info under the first header.
|
||||
## Sub-header
|
||||
```
|
||||
3. Then the system finds the last header in the README:
|
||||
```path/to/README.md#L27-29
|
||||
## Last Header
|
||||
This is the last header in the README.
|
||||
```
|
||||
4. Finally, it passes this information on to the next process.
|
||||
</example>
|
||||
<example>
|
||||
In Markdown, hash marks signify headings. For example:
|
||||
```/dev/null/example.md#L1-3
|
||||
# Level 1 heading
|
||||
## Level 2 heading
|
||||
### Level 3 heading
|
||||
```
|
||||
</example>
|
||||
Here are examples of ways you must never render code blocks:
|
||||
<bad_example_do_not_do_this>
|
||||
In Markdown, hash marks signify headings. For example:
|
||||
```
|
||||
# Level 1 heading
|
||||
## Level 2 heading
|
||||
### Level 3 heading
|
||||
```
|
||||
</bad_example_do_not_do_this>
|
||||
This example is unacceptable because it does not include the path.
|
||||
<bad_example_do_not_do_this>
|
||||
In Markdown, hash marks signify headings. For example:
|
||||
```markdown
|
||||
# Level 1 heading
|
||||
## Level 2 heading
|
||||
### Level 3 heading
|
||||
```
|
||||
</bad_example_do_not_do_this>
|
||||
This example is unacceptable because it has the language instead of the path.
|
||||
<bad_example_do_not_do_this>
|
||||
In Markdown, hash marks signify headings. For example:
|
||||
# Level 1 heading
|
||||
## Level 2 heading
|
||||
### Level 3 heading
|
||||
</bad_example_do_not_do_this>
|
||||
This example is unacceptable because it uses indentation to mark the code block
|
||||
instead of backticks with a path.
|
||||
<bad_example_do_not_do_this>
|
||||
In Markdown, hash marks signify headings. For example:
|
||||
```markdown
|
||||
/dev/null/example.md#L1-3
|
||||
# Level 1 heading
|
||||
## Level 2 heading
|
||||
### Level 3 heading
|
||||
```
|
||||
</bad_example_do_not_do_this>
|
||||
This example is unacceptable because the path is in the wrong place. The path must be directly after the opening backticks.
|
||||
## Fixing Diagnostics
|
||||
|
||||
1. Make 1-2 attempts at fixing diagnostics, then defer to the user.
|
||||
|
@ -23,7 +23,7 @@ use gpui::{
|
||||
Task, TextStyle, TextStyleRefinement, Transformation, UnderlineStyle, WeakEntity, WindowHandle,
|
||||
linear_color_stop, linear_gradient, list, percentage, pulsating_between,
|
||||
};
|
||||
use language::{Buffer, LanguageRegistry};
|
||||
use language::{Buffer, Language, LanguageRegistry};
|
||||
use language_model::{
|
||||
LanguageModelRequestMessage, LanguageModelToolUseId, MessageContent, RequestUsage, Role,
|
||||
StopReason,
|
||||
@ -33,6 +33,7 @@ use markdown::{HeadingLevelStyles, Markdown, MarkdownElement, MarkdownStyle, Par
|
||||
use project::{ProjectEntryId, ProjectItem as _};
|
||||
use rope::Point;
|
||||
use settings::{Settings as _, update_settings_file};
|
||||
use std::ffi::OsStr;
|
||||
use std::path::Path;
|
||||
use std::rc::Rc;
|
||||
use std::sync::Arc;
|
||||
@ -346,130 +347,130 @@ fn render_markdown_code_block(
|
||||
.child(Label::new("untitled").size(LabelSize::Small))
|
||||
.into_any_element(),
|
||||
),
|
||||
CodeBlockKind::FencedLang(raw_language_name) => Some(
|
||||
h_flex()
|
||||
.gap_1()
|
||||
.children(
|
||||
parsed_markdown
|
||||
.languages_by_name
|
||||
.get(raw_language_name)
|
||||
.and_then(|language| {
|
||||
language
|
||||
.config()
|
||||
.matcher
|
||||
.path_suffixes
|
||||
.iter()
|
||||
.find_map(|extension| {
|
||||
file_icons::FileIcons::get_icon(Path::new(extension), cx)
|
||||
})
|
||||
.map(Icon::from_path)
|
||||
.map(|icon| icon.color(Color::Muted).size(IconSize::Small))
|
||||
}),
|
||||
)
|
||||
.child(
|
||||
Label::new(
|
||||
parsed_markdown
|
||||
.languages_by_name
|
||||
.get(raw_language_name)
|
||||
.map(|language| language.name().into())
|
||||
.clone()
|
||||
.unwrap_or_else(|| raw_language_name.clone()),
|
||||
)
|
||||
.size(LabelSize::Small),
|
||||
)
|
||||
.into_any_element(),
|
||||
),
|
||||
CodeBlockKind::FencedLang(raw_language_name) => Some(render_code_language(
|
||||
parsed_markdown.languages_by_name.get(raw_language_name),
|
||||
raw_language_name.clone(),
|
||||
cx,
|
||||
)),
|
||||
CodeBlockKind::FencedSrc(path_range) => path_range.path.file_name().map(|file_name| {
|
||||
let content = if let Some(parent) = path_range.path.parent() {
|
||||
h_flex()
|
||||
.ml_1()
|
||||
.gap_1()
|
||||
.child(
|
||||
Label::new(file_name.to_string_lossy().to_string()).size(LabelSize::Small),
|
||||
)
|
||||
.child(
|
||||
Label::new(parent.to_string_lossy().to_string())
|
||||
.color(Color::Muted)
|
||||
.size(LabelSize::Small),
|
||||
)
|
||||
.into_any_element()
|
||||
} else {
|
||||
Label::new(path_range.path.to_string_lossy().to_string())
|
||||
.size(LabelSize::Small)
|
||||
.ml_1()
|
||||
.into_any_element()
|
||||
};
|
||||
// We tell the model to use /dev/null for the path instead of using ```language
|
||||
// because otherwise it consistently fails to use code citations.
|
||||
if path_range.path.starts_with("/dev/null") {
|
||||
let ext = path_range
|
||||
.path
|
||||
.extension()
|
||||
.and_then(OsStr::to_str)
|
||||
.map(|str| SharedString::new(str.to_string()))
|
||||
.unwrap_or_default();
|
||||
|
||||
h_flex()
|
||||
.id(("code-block-header-label", ix))
|
||||
.w_full()
|
||||
.max_w_full()
|
||||
.px_1()
|
||||
.gap_0p5()
|
||||
.cursor_pointer()
|
||||
.rounded_sm()
|
||||
.hover(|item| item.bg(cx.theme().colors().element_hover.opacity(0.5)))
|
||||
.tooltip(Tooltip::text("Jump to File"))
|
||||
.child(
|
||||
h_flex()
|
||||
.gap_0p5()
|
||||
.children(
|
||||
file_icons::FileIcons::get_icon(&path_range.path, cx)
|
||||
.map(Icon::from_path)
|
||||
.map(|icon| icon.color(Color::Muted).size(IconSize::XSmall)),
|
||||
)
|
||||
.child(content)
|
||||
.child(
|
||||
Icon::new(IconName::ArrowUpRight)
|
||||
.size(IconSize::XSmall)
|
||||
.color(Color::Ignored),
|
||||
),
|
||||
render_code_language(
|
||||
parsed_markdown
|
||||
.languages_by_path
|
||||
.get(&path_range.path)
|
||||
.or_else(|| parsed_markdown.languages_by_name.get(&ext)),
|
||||
ext,
|
||||
cx,
|
||||
)
|
||||
.on_click({
|
||||
let path_range = path_range.clone();
|
||||
move |_, window, cx| {
|
||||
workspace
|
||||
.update(cx, {
|
||||
|workspace, cx| {
|
||||
let Some(project_path) = workspace
|
||||
.project()
|
||||
.read(cx)
|
||||
.find_project_path(&path_range.path, cx)
|
||||
else {
|
||||
return;
|
||||
};
|
||||
let Some(target) = path_range.range.as_ref().map(|range| {
|
||||
Point::new(
|
||||
// Line number is 1-based
|
||||
range.start.line.saturating_sub(1),
|
||||
range.start.col.unwrap_or(0),
|
||||
)
|
||||
}) else {
|
||||
return;
|
||||
};
|
||||
let open_task =
|
||||
workspace.open_path(project_path, None, true, window, cx);
|
||||
window
|
||||
.spawn(cx, async move |cx| {
|
||||
let item = open_task.await?;
|
||||
if let Some(active_editor) = item.downcast::<Editor>() {
|
||||
active_editor
|
||||
.update_in(cx, |editor, window, cx| {
|
||||
editor.go_to_singleton_buffer_point(
|
||||
target, window, cx,
|
||||
);
|
||||
})
|
||||
.ok();
|
||||
}
|
||||
anyhow::Ok(())
|
||||
})
|
||||
.detach_and_log_err(cx);
|
||||
}
|
||||
})
|
||||
.ok();
|
||||
}
|
||||
})
|
||||
.into_any_element()
|
||||
} else {
|
||||
let content = if let Some(parent) = path_range.path.parent() {
|
||||
h_flex()
|
||||
.ml_1()
|
||||
.gap_1()
|
||||
.child(
|
||||
Label::new(file_name.to_string_lossy().to_string())
|
||||
.size(LabelSize::Small),
|
||||
)
|
||||
.child(
|
||||
Label::new(parent.to_string_lossy().to_string())
|
||||
.color(Color::Muted)
|
||||
.size(LabelSize::Small),
|
||||
)
|
||||
.into_any_element()
|
||||
} else {
|
||||
Label::new(path_range.path.to_string_lossy().to_string())
|
||||
.size(LabelSize::Small)
|
||||
.ml_1()
|
||||
.into_any_element()
|
||||
};
|
||||
|
||||
h_flex()
|
||||
.id(("code-block-header-label", ix))
|
||||
.w_full()
|
||||
.max_w_full()
|
||||
.px_1()
|
||||
.gap_0p5()
|
||||
.cursor_pointer()
|
||||
.rounded_sm()
|
||||
.hover(|item| item.bg(cx.theme().colors().element_hover.opacity(0.5)))
|
||||
.tooltip(Tooltip::text("Jump to File"))
|
||||
.child(
|
||||
h_flex()
|
||||
.gap_0p5()
|
||||
.children(
|
||||
file_icons::FileIcons::get_icon(&path_range.path, cx)
|
||||
.map(Icon::from_path)
|
||||
.map(|icon| icon.color(Color::Muted).size(IconSize::XSmall)),
|
||||
)
|
||||
.child(content)
|
||||
.child(
|
||||
Icon::new(IconName::ArrowUpRight)
|
||||
.size(IconSize::XSmall)
|
||||
.color(Color::Ignored),
|
||||
),
|
||||
)
|
||||
.on_click({
|
||||
let path_range = path_range.clone();
|
||||
move |_, window, cx| {
|
||||
workspace
|
||||
.update(cx, {
|
||||
|workspace, cx| {
|
||||
let Some(project_path) = workspace
|
||||
.project()
|
||||
.read(cx)
|
||||
.find_project_path(&path_range.path, cx)
|
||||
else {
|
||||
return;
|
||||
};
|
||||
let Some(target) = path_range.range.as_ref().map(|range| {
|
||||
Point::new(
|
||||
// Line number is 1-based
|
||||
range.start.line.saturating_sub(1),
|
||||
range.start.col.unwrap_or(0),
|
||||
)
|
||||
}) else {
|
||||
return;
|
||||
};
|
||||
let open_task = workspace.open_path(
|
||||
project_path,
|
||||
None,
|
||||
true,
|
||||
window,
|
||||
cx,
|
||||
);
|
||||
window
|
||||
.spawn(cx, async move |cx| {
|
||||
let item = open_task.await?;
|
||||
if let Some(active_editor) =
|
||||
item.downcast::<Editor>()
|
||||
{
|
||||
active_editor
|
||||
.update_in(cx, |editor, window, cx| {
|
||||
editor.go_to_singleton_buffer_point(
|
||||
target, window, cx,
|
||||
);
|
||||
})
|
||||
.ok();
|
||||
}
|
||||
anyhow::Ok(())
|
||||
})
|
||||
.detach_and_log_err(cx);
|
||||
}
|
||||
})
|
||||
.ok();
|
||||
}
|
||||
})
|
||||
.into_any_element()
|
||||
}
|
||||
}),
|
||||
};
|
||||
|
||||
@ -604,6 +605,32 @@ fn render_markdown_code_block(
|
||||
)
|
||||
}
|
||||
|
||||
fn render_code_language(
|
||||
language: Option<&Arc<Language>>,
|
||||
name_fallback: SharedString,
|
||||
cx: &App,
|
||||
) -> AnyElement {
|
||||
let icon_path = language.and_then(|language| {
|
||||
language
|
||||
.config()
|
||||
.matcher
|
||||
.path_suffixes
|
||||
.iter()
|
||||
.find_map(|extension| file_icons::FileIcons::get_icon(Path::new(extension), cx))
|
||||
.map(Icon::from_path)
|
||||
});
|
||||
|
||||
let language_label = language
|
||||
.map(|language| language.name().into())
|
||||
.unwrap_or(name_fallback);
|
||||
|
||||
h_flex()
|
||||
.gap_1()
|
||||
.children(icon_path.map(|icon| icon.color(Color::Muted).size(IconSize::Small)))
|
||||
.child(Label::new(language_label).size(LabelSize::Small))
|
||||
.into_any_element()
|
||||
}
|
||||
|
||||
fn open_markdown_link(
|
||||
text: SharedString,
|
||||
workspace: WeakEntity<Workspace>,
|
||||
|
@ -174,6 +174,7 @@ impl Tool for EditFileTool {
|
||||
"The `old_string` and `new_string` are identical, so no changes would be made."
|
||||
));
|
||||
}
|
||||
let old_string = input.old_string.clone();
|
||||
|
||||
let result = cx
|
||||
.background_spawn(async move {
|
||||
@ -213,6 +214,21 @@ impl Tool for EditFileTool {
|
||||
input.path.display()
|
||||
)
|
||||
} else {
|
||||
let old_string_with_buffer = format!(
|
||||
"old_string:\n\n{}\n\n-------file-------\n\n{}",
|
||||
&old_string,
|
||||
buffer.text()
|
||||
);
|
||||
let path = {
|
||||
use std::collections::hash_map::DefaultHasher;
|
||||
use std::hash::{Hash, Hasher};
|
||||
|
||||
let mut hasher = DefaultHasher::new();
|
||||
old_string_with_buffer.hash(&mut hasher);
|
||||
|
||||
PathBuf::from(format!("failed_tool_{}.txt", hasher.finish()))
|
||||
};
|
||||
std::fs::write(path, old_string_with_buffer).unwrap();
|
||||
anyhow!("Failed to match the provided `old_string`")
|
||||
}
|
||||
})?;
|
||||
|
@ -44,6 +44,7 @@ language_extension.workspace = true
|
||||
language_model.workspace = true
|
||||
language_models.workspace = true
|
||||
languages = { workspace = true, features = ["load-grammars"] }
|
||||
markdown.workspace = true
|
||||
node_runtime.workspace = true
|
||||
pathdiff.workspace = true
|
||||
paths.workspace = true
|
||||
|
@ -10,13 +10,13 @@ use crate::{
|
||||
ToolMetrics,
|
||||
assertions::{AssertionsReport, RanAssertion, RanAssertionResult},
|
||||
};
|
||||
use agent::{ContextLoadResult, ThreadEvent};
|
||||
use agent::{ContextLoadResult, Thread, ThreadEvent};
|
||||
use anyhow::{Result, anyhow};
|
||||
use async_trait::async_trait;
|
||||
use buffer_diff::DiffHunkStatus;
|
||||
use collections::HashMap;
|
||||
use futures::{FutureExt as _, StreamExt, channel::mpsc, select_biased};
|
||||
use gpui::{AppContext, AsyncApp, Entity};
|
||||
use gpui::{App, AppContext, AsyncApp, Entity};
|
||||
use language_model::{LanguageModel, Role, StopReason};
|
||||
|
||||
pub const THREAD_EVENT_TIMEOUT: Duration = Duration::from_secs(60 * 2);
|
||||
@ -314,7 +314,7 @@ impl ExampleContext {
|
||||
for message in thread.messages().skip(message_count_before) {
|
||||
messages.push(Message {
|
||||
_role: message.role,
|
||||
_text: message.to_string(),
|
||||
text: message.to_string(),
|
||||
tool_use: thread
|
||||
.tool_uses_for_message(message.id, cx)
|
||||
.into_iter()
|
||||
@ -362,6 +362,90 @@ impl ExampleContext {
|
||||
})
|
||||
.unwrap()
|
||||
}
|
||||
|
||||
pub fn agent_thread(&self) -> Entity<Thread> {
|
||||
self.agent_thread.clone()
|
||||
}
|
||||
}
|
||||
|
||||
impl AppContext for ExampleContext {
|
||||
type Result<T> = anyhow::Result<T>;
|
||||
|
||||
fn new<T: 'static>(
|
||||
&mut self,
|
||||
build_entity: impl FnOnce(&mut gpui::Context<T>) -> T,
|
||||
) -> Self::Result<Entity<T>> {
|
||||
self.app.new(build_entity)
|
||||
}
|
||||
|
||||
fn reserve_entity<T: 'static>(&mut self) -> Self::Result<gpui::Reservation<T>> {
|
||||
self.app.reserve_entity()
|
||||
}
|
||||
|
||||
fn insert_entity<T: 'static>(
|
||||
&mut self,
|
||||
reservation: gpui::Reservation<T>,
|
||||
build_entity: impl FnOnce(&mut gpui::Context<T>) -> T,
|
||||
) -> Self::Result<Entity<T>> {
|
||||
self.app.insert_entity(reservation, build_entity)
|
||||
}
|
||||
|
||||
fn update_entity<T, R>(
|
||||
&mut self,
|
||||
handle: &Entity<T>,
|
||||
update: impl FnOnce(&mut T, &mut gpui::Context<T>) -> R,
|
||||
) -> Self::Result<R>
|
||||
where
|
||||
T: 'static,
|
||||
{
|
||||
self.app.update_entity(handle, update)
|
||||
}
|
||||
|
||||
fn read_entity<T, R>(
|
||||
&self,
|
||||
handle: &Entity<T>,
|
||||
read: impl FnOnce(&T, &App) -> R,
|
||||
) -> Self::Result<R>
|
||||
where
|
||||
T: 'static,
|
||||
{
|
||||
self.app.read_entity(handle, read)
|
||||
}
|
||||
|
||||
fn update_window<T, F>(&mut self, window: gpui::AnyWindowHandle, f: F) -> Result<T>
|
||||
where
|
||||
F: FnOnce(gpui::AnyView, &mut gpui::Window, &mut App) -> T,
|
||||
{
|
||||
self.app.update_window(window, f)
|
||||
}
|
||||
|
||||
fn read_window<T, R>(
|
||||
&self,
|
||||
window: &gpui::WindowHandle<T>,
|
||||
read: impl FnOnce(Entity<T>, &App) -> R,
|
||||
) -> Result<R>
|
||||
where
|
||||
T: 'static,
|
||||
{
|
||||
self.app.read_window(window, read)
|
||||
}
|
||||
|
||||
fn background_spawn<R>(
|
||||
&self,
|
||||
future: impl std::future::Future<Output = R> + Send + 'static,
|
||||
) -> gpui::Task<R>
|
||||
where
|
||||
R: Send + 'static,
|
||||
{
|
||||
self.app.background_spawn(future)
|
||||
}
|
||||
|
||||
fn read_global<G, R>(&self, callback: impl FnOnce(&G, &App) -> R) -> Self::Result<R>
|
||||
where
|
||||
G: gpui::Global,
|
||||
{
|
||||
self.app.read_global(callback)
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
@ -391,12 +475,16 @@ impl Response {
|
||||
pub fn tool_uses(&self) -> impl Iterator<Item = &ToolUse> {
|
||||
self.messages.iter().flat_map(|msg| &msg.tool_use)
|
||||
}
|
||||
|
||||
pub fn texts(&self) -> impl Iterator<Item = String> {
|
||||
self.messages.iter().map(|message| message.text.clone())
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
pub struct Message {
|
||||
_role: Role,
|
||||
_text: String,
|
||||
text: String,
|
||||
tool_use: Vec<ToolUse>,
|
||||
}
|
||||
|
||||
|
191
crates/eval/src/examples/code_block_citations.rs
Normal file
191
crates/eval/src/examples/code_block_citations.rs
Normal file
@ -0,0 +1,191 @@
|
||||
use anyhow::Result;
|
||||
use async_trait::async_trait;
|
||||
use markdown::PathWithRange;
|
||||
|
||||
use crate::example::{Example, ExampleContext, ExampleMetadata, JudgeAssertion, LanguageServer};
|
||||
|
||||
pub struct CodeBlockCitations;
|
||||
|
||||
const FENCE: &str = "```";
|
||||
|
||||
#[async_trait(?Send)]
|
||||
impl Example for CodeBlockCitations {
|
||||
fn meta(&self) -> ExampleMetadata {
|
||||
ExampleMetadata {
|
||||
name: "code_block_citations".to_string(),
|
||||
url: "https://github.com/zed-industries/zed.git".to_string(),
|
||||
revision: "f69aeb6311dde3c0b8979c293d019d66498d54f2".to_string(),
|
||||
language_server: Some(LanguageServer {
|
||||
file_extension: "rs".to_string(),
|
||||
allow_preexisting_diagnostics: false,
|
||||
}),
|
||||
max_assertions: None,
|
||||
}
|
||||
}
|
||||
|
||||
async fn conversation(&self, cx: &mut ExampleContext) -> Result<()> {
|
||||
const FILENAME: &str = "assistant_tool.rs";
|
||||
cx.push_user_message(format!(
|
||||
r#"
|
||||
Show me the method bodies of all the methods of the `Tool` trait in {FILENAME}.
|
||||
|
||||
Please show each method in a separate code snippet.
|
||||
"#
|
||||
));
|
||||
|
||||
// Verify that the messages all have the correct formatting.
|
||||
let texts: Vec<String> = cx.run_to_end().await?.texts().collect();
|
||||
let closing_fence = format!("\n{FENCE}");
|
||||
|
||||
for text in texts.iter() {
|
||||
let mut text = text.as_str();
|
||||
|
||||
while let Some(index) = text.find(FENCE) {
|
||||
// Advance text past the opening backticks.
|
||||
text = &text[index + FENCE.len()..];
|
||||
|
||||
// Find the closing backticks.
|
||||
let content_len = text.find(&closing_fence);
|
||||
|
||||
// Verify the citation format - e.g. ```path/to/foo.txt#L123-456
|
||||
if let Some(citation_len) = text.find('\n') {
|
||||
let citation = &text[..citation_len];
|
||||
|
||||
if let Ok(()) =
|
||||
cx.assert(citation.contains("/"), format!("Slash in {citation:?}",))
|
||||
{
|
||||
let path_range = PathWithRange::new(citation);
|
||||
let path = cx
|
||||
.agent_thread()
|
||||
.update(cx, |thread, cx| {
|
||||
thread
|
||||
.project()
|
||||
.read(cx)
|
||||
.find_project_path(path_range.path, cx)
|
||||
})
|
||||
.ok()
|
||||
.flatten();
|
||||
|
||||
if let Ok(path) = cx.assert_some(path, format!("Valid path: {citation:?}"))
|
||||
{
|
||||
let buffer_text = {
|
||||
let buffer = match cx.agent_thread().update(cx, |thread, cx| {
|
||||
thread
|
||||
.project()
|
||||
.update(cx, |project, cx| project.open_buffer(path, cx))
|
||||
}) {
|
||||
Ok(buffer_task) => buffer_task.await.ok(),
|
||||
Err(err) => {
|
||||
cx.assert(
|
||||
false,
|
||||
format!("Expected Ok(buffer), not {err:?}"),
|
||||
)
|
||||
.ok();
|
||||
break;
|
||||
}
|
||||
};
|
||||
|
||||
let Ok(buffer_text) = cx.assert_some(
|
||||
buffer.and_then(|buffer| {
|
||||
buffer.read_with(cx, |buffer, _| buffer.text()).ok()
|
||||
}),
|
||||
"Reading buffer text succeeded",
|
||||
) else {
|
||||
continue;
|
||||
};
|
||||
buffer_text
|
||||
};
|
||||
|
||||
if let Some(content_len) = content_len {
|
||||
// + 1 because there's a newline character after the citation.
|
||||
let content =
|
||||
&text[(citation.len() + 1)..content_len - (citation.len() + 1)];
|
||||
|
||||
cx.assert(
|
||||
buffer_text.contains(&content),
|
||||
"Code block content was found in file",
|
||||
)
|
||||
.ok();
|
||||
|
||||
if let Some(range) = path_range.range {
|
||||
let start_line_index = range.start.line.saturating_sub(1);
|
||||
let line_count =
|
||||
range.end.line.saturating_sub(start_line_index);
|
||||
let mut snippet = buffer_text
|
||||
.lines()
|
||||
.skip(start_line_index as usize)
|
||||
.take(line_count as usize)
|
||||
.collect::<Vec<&str>>()
|
||||
.join("\n");
|
||||
|
||||
if let Some(start_col) = range.start.col {
|
||||
snippet = snippet[start_col as usize..].to_string();
|
||||
}
|
||||
|
||||
if let Some(end_col) = range.end.col {
|
||||
let last_line = snippet.lines().last().unwrap();
|
||||
snippet = snippet
|
||||
[..snippet.len() - last_line.len() + end_col as usize]
|
||||
.to_string();
|
||||
}
|
||||
|
||||
cx.assert_eq(
|
||||
snippet.as_str(),
|
||||
content,
|
||||
"Code block snippet was at specified line/col",
|
||||
)
|
||||
.ok();
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
cx.assert(
|
||||
false,
|
||||
format!("Opening {FENCE} did not have a newline anywhere after it."),
|
||||
)
|
||||
.ok();
|
||||
}
|
||||
|
||||
if let Some(content_len) = content_len {
|
||||
// Advance past the closing backticks
|
||||
text = &text[content_len + FENCE.len()..];
|
||||
} else {
|
||||
// There were no closing backticks associated with these opening backticks.
|
||||
cx.assert(
|
||||
false,
|
||||
"Code block opening had matching closing backticks.".to_string(),
|
||||
)
|
||||
.ok();
|
||||
|
||||
// There are no more code blocks to parse, so we're done.
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn thread_assertions(&self) -> Vec<JudgeAssertion> {
|
||||
vec![
|
||||
JudgeAssertion {
|
||||
id: "trait method bodies are shown".to_string(),
|
||||
description:
|
||||
"All method bodies of the Tool trait are shown."
|
||||
.to_string(),
|
||||
},
|
||||
JudgeAssertion {
|
||||
id: "code blocks used".to_string(),
|
||||
description:
|
||||
"All code snippets are rendered inside markdown code blocks (as opposed to any other formatting besides code blocks)."
|
||||
.to_string(),
|
||||
},
|
||||
JudgeAssertion {
|
||||
id: "code blocks use backticks".to_string(),
|
||||
description:
|
||||
format!("All markdown code blocks use backtick fences ({FENCE}) rather than indentation.")
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
@ -12,12 +12,14 @@ use util::serde::default_true;
|
||||
use crate::example::{Example, ExampleContext, ExampleMetadata, JudgeAssertion};
|
||||
|
||||
mod add_arg_to_trait_method;
|
||||
mod code_block_citations;
|
||||
mod file_search;
|
||||
|
||||
pub fn all(examples_dir: &Path) -> Vec<Rc<dyn Example>> {
|
||||
let mut threads: Vec<Rc<dyn Example>> = vec![
|
||||
Rc::new(file_search::FileSearchExample),
|
||||
Rc::new(add_arg_to_trait_method::AddArgToTraitMethod),
|
||||
Rc::new(code_block_citations::CodeBlockCitations),
|
||||
];
|
||||
|
||||
for example_path in list_declarative_examples(examples_dir).unwrap() {
|
||||
|
@ -1,6 +1,8 @@
|
||||
pub mod parser;
|
||||
mod path_range;
|
||||
|
||||
pub use path_range::{LineCol, PathWithRange};
|
||||
|
||||
use std::borrow::Cow;
|
||||
use std::collections::HashSet;
|
||||
use std::iter;
|
||||
|
@ -32,6 +32,20 @@ impl LineCol {
|
||||
}
|
||||
|
||||
impl PathWithRange {
|
||||
// Note: We could try out this as an alternative, and see how it does on evals.
|
||||
//
|
||||
// The closest to a standard way of including a filename is this:
|
||||
// ```rust filename="path/to/file.rs#42:43"
|
||||
// ```
|
||||
//
|
||||
// or, alternatively,
|
||||
// ```rust filename="path/to/file.rs" lines="42:43"
|
||||
// ```
|
||||
//
|
||||
// Examples where it's used this way:
|
||||
// - https://mdxjs.com/guides/syntax-highlighting/#syntax-highlighting-with-the-meta-field
|
||||
// - https://docusaurus.io/docs/markdown-features/code-blocks
|
||||
// - https://spec.commonmark.org/0.31.2/#example-143
|
||||
pub fn new(str: impl AsRef<str>) -> Self {
|
||||
let str = str.as_ref();
|
||||
// Sometimes the model will include a language at the start,
|
||||
|
Loading…
x
Reference in New Issue
Block a user