Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing v11 instruction to test LLMAJ grounded ensemble #1218

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

add_to_catalog(
InputOutputTemplate(
instruction="""Read the following three parts: (A) Document, (B) Conversation between the user and the agent occurring in multiple turns. The user and agent alternate the conversation where the user asks a question, the agent gives a response to that question, and the user poses an inquiry at the end, (C) the Response (of the agent) to the last turn user query that continues the conversation from part 2. Your task is to evaluate if the Response (C) is completely grounded in the Document with one of three answers [yes, no, unsure], followed by an explanation. To make this determination, you can consider the response as consisting of a set of claims.\nIf all the claims are explicitly mentioned and appear word for word in the Document (A), you must answer yes and identify the part of document in the explanation output.\nIf at least one of the claims does not appear in the document content or you think it can be indirectly inferred, you must answer no.\nIf for at least one of the claims you cannot determine yes or no, and none of the other claims are ungrounded, you must answer unsure.\n\nFollow your answer with an explanation. Try to be concise. Limit your answer and explanation to at most 200 words.\n\n\nConversation:\nuser: Can you tell me a bit about the history of Python and its versions?\nagent: Python was created by Guido van Rossum in the late 1980s at CWI in the Netherlands. Python 2.0 was released in 2000 and introduced new features like list comprehensions and a garbage collection system using reference counting. Python 3.0, a major revision, was released in 2008 and is not completely backward-compatible with previous versions.\nuser: What are some of the programming paradigms that Python supports?\nagent: Python supports multiple programming paradigms, including structured, object-oriented, and functional programming.\nuser: What should I do if I encounter issues with backward compatibility between Python 2 and Python 3?\n\n\nDocument:\nPython (programming language)\n\nPython is a high-level, interpreted, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented, and functional programming.\n\nPython was created in the late 1980s by Guido van Rossum at Centrum Wiskunde & Informatica (CWI) in the Netherlands as a successor to the ABC programming language. Python 2.0, released in 2000, introduced new features, such as list comprehensions and a garbage collection system using reference counting. Python 3.0, released in 2008, was a major revision of the language that is not completely backward-compatible with earlier versions. Python consistently ranks as one of the most popular programming languages.\n\n\nResponse:\nagent: If you encounter issues with backward compatibility between Python 2 and Python 3, you should consider using the 2to3 tool, which is designed to help convert Python 2 code to Python 3 code. Additionally, you can refer to the Python documentation for guidance on how to address specific compatibility issues.\n\n\nOutput:\nno\n\n\n\nConversation:\nuser: Can you tell me about the history and construction of the Great Wall of China?\nagent: The Great Wall of China is a series of fortifications made of various materials, built along the northern borders of China to protect against raids and invasions. It was constructed as early as the 7th century BC, with the most well-known sections built by the Ming dynasty (1368-1644).\nuser: How long is the Great Wall and what are its dimensions?\nagent: The Great Wall is approximately 21,196 kilometers long. Its height ranges from 5 to 8 meters, and it is 6 to 7 meters wide at the base.\nuser: What purposes did the Great Wall serve besides defense?\n\n\nDocument:\nThe Great Wall of China\n\nThe Great Wall of China is a series of fortifications made of various materials, such as stone, brick, tamped earth, wood, and other materials. It was built along the northern borders of China to protect the Chinese states and empires against the raids and invasions of the various nomadic groups of the Eurasian Steppe. Several walls were being built as early as the 7th century BC; these, later joined together and made bigger and stronger, are now collectively referred to as the Great Wall.\n\nThe most well-known sections of the wall were built by the Ming dynasty (1368-1644). The wall is approximately 21,196 kilometers long, and its height ranges from 5 to 8 meters, depending on the section. The wall is also 6 to 7 meters wide at the base. The Great Wall has numerous watchtowers and barracks and served not only as a defense system but also for border control, allowing the imposition of duties on goods transported along the Silk Road, regulation of trade, and control of immigration and emigration.\n\n\nResponse:\nagent: Besides serving as a defense system, the Great Wall also served purposes such as border control, allowing the imposition of duties on goods transported along the Silk Road, regulation of trade, and control of immigration and emigration.\n\n\nOutput:\nyes\n\n\n\nConversation:\nuser: Can you tell me about the construction of the Eiffel Tower?\nagent: The Eiffel Tower was constructed from 1887 to 1889 as the entrance to the 1889 World's Fair. It was designed by Gustave Eiffel's company.\nuser: How tall is the Eiffel Tower and what can visitors do there?\nagent: The Eiffel Tower is 330 meters tall. Visitors can explore three levels, with restaurants on the first and second levels and an observatory on the third level.\nuser: What kind of events are hosted at the Eiffel Tower today?\n\n\nDocument:\nThe Eiffel Tower\n\nThe Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower. Constructed from 1887 to 1889 as the entrance to the 1889 World's Fair, it was initially criticized by some of France's leading artists and intellectuals for its design but has become a global cultural icon of France and one of the most recognizable structures in the world.\n\nThe tower is 330 meters tall, making it the tallest structure in Paris. Its base is square, measuring 125 meters on each side. The Eiffel Tower has three levels for visitors, with restaurants on the first and second levels. The third level observatory's upper platform is 276 meters above the ground, the highest accessible to the public in the European Union.\n\n\nResponse:\nagent: Today, the Eiffel Tower hosts a variety of events, including concerts, art exhibitions, and fireworks displays. It is also a popular spot for New Year's Eve celebrations and other major public events.\n\n\nOutput:\nno""",
instruction="""Read the following three parts: (A) Document, (B) Conversation between the user and the agent occurring in multiple turns. The user and agent alternate the conversation where the user asks a question, the agent gives a response to that question, and the user poses an inquiry at the end, (C) the Response (of the agent) to the last turn user query that continues the conversation from part 2. Your task is to evaluate if the Response (C) is completely grounded in the Document with one of three answers [yes, no, unsure], followed by an explanation. To make this determination, you can consider the response as consisting of a set of claims.\nIf all the claims are explicitly mentioned and appear word for word in the Document (A), you must answer yes and identify the part of document in the explanation output.\nIf at least one of the claims does not appear in the document content or you think it can be indirectly inferred, you must answer no.\nIf for at least one of the claims you cannot determine yes or no, and none of the other claims are ungrounded, you must answer unsure.\n\nFollow your answer with an explanation. Try to be concise. Limit your answer and explanation to at most 200 words.""",
input_format="\n\nConversation:\n{question}\n\nResponse:\n{answer}\n\n\nOutput:",
output_format="[[{rating}]]",
postprocessors=[
Expand Down