Skip to content

Commit 25a48ee

Browse files
authored
Merge pull request #160 from coding-cat-official/haystack-questions
Haystack questions
2 parents 37e6cd0 + f30687b commit 25a48ee

File tree

12 files changed

+130
-0
lines changed

12 files changed

+130
-0
lines changed
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
You’re part of a backend team maintaining a data pipeline that aggregates event data from multiple services in real time. Two upstream systems independently log event timestamps and forward them to your processing service. Each system guarantees that its own data is ordered chronologically, but the data arrives in two separate batches.
2+
3+
Before downstream services can process the data, the event stream must reflect a consistent view of time — meaning all incoming timestamps must be arranged in proper chronological order, as if they came from a single source.
4+
5+
Your task is to write a function that takes in these two batches of event timestamps and prepares them for the next stage of processing. Each batch is already internally ordered, but the full sequence must be processed in time order.
6+
7+
The system expects this task to run efficiently under load, so avoid unnecessary operations that assume disorder in the inputs.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[
2+
{
3+
"input": [[1, 3, 5], [2, 4, 6]],
4+
"output": [1, 2, 3, 4, 5, 6]
5+
},
6+
{
7+
"input": [[1, 2, 2, 3], [2, 2, 4]],
8+
"output": [1, 2, 2, 2, 2, 3, 4]
9+
},
10+
{
11+
"input": [[], [1, 2, 3]],
12+
"output": [1, 2, 3]
13+
},
14+
{
15+
"input": [[4, 8, 12], []],
16+
"output": [4, 8, 12]
17+
},
18+
{
19+
"input": [[-5, 0, 3], [-10, -3, 2]],
20+
"output": [-10, -5, -3, 0, 2, 3]
21+
},
22+
{
23+
"input": [[1], [0]],
24+
"output": [0, 1]
25+
}
26+
]
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"title": "Merge Sorted Lists",
3+
"name": "merge_sorted_lists",
4+
"difficulty": "medium",
5+
"author": "ChatGPT",
6+
"category": "Haystack",
7+
"question_type": [
8+
"haystack"
9+
]
10+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
def merge_sorted_lists(list1: list, list2: list) -> list:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
You work as a data analyst for a large publishing company, and your job is to help improve their search engine algorithms. One day, you receive a dataset containing hundreds of thousands of sentences from various books, articles, and social media posts. Your task is to analyze the text to gain insights into the usage of vowels in the written content.
2+
3+
Your manager asks you to focus on identifying which vowel appears most frequently in a given sentence. The sentences you receive may contain a mix of uppercase and lowercase letters, punctuation marks, and spaces. For the purposes of this analysis, you’ll need to ignore everything except the vowels: 'a', 'e', 'i', 'o', 'u'.
4+
5+
After counting the occurrences of each vowel, you must return the vowel with the highest frequency. If there is a tie, return the vowel that appears first in the sentence. This analysis is important because vowels can provide insights into the linguistic structure of the text, which could help improve the accuracy of the company's text classification models.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[
2+
{
3+
"input": ["hello world"],
4+
"output": "l"
5+
},
6+
{
7+
"input": ["The quick brown fox jumped over the lazy dog"],
8+
"output": "o"
9+
},
10+
{
11+
"input": ["A quick brown fox!"],
12+
"output": "o"
13+
},
14+
{
15+
"input": ["apple pie and coffee"],
16+
"output": "e"
17+
},
18+
{
19+
"input": ["A simple test sentence, with punctuation!"],
20+
"output": "t"
21+
},
22+
{
23+
"input": ["I am the walrus, goo goo g'joob!"],
24+
"output": "o"
25+
}
26+
]
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"title": "Finding the Most Frequent Vowel",
3+
"name": "most_frequent_vowel",
4+
"difficulty": "medium",
5+
"author": "ChatGPT",
6+
"category": "String",
7+
"question_type": [
8+
"haystack"
9+
]
10+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
def most_frequent_vowel(sentence: str) -> str:
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
You're part of a content analysis team at a company that curates large volumes of user-submitted writing — blog posts, short stories, and personal journals. The company is experimenting with a new feature that automatically tags content based on stylistic patterns and linguistic quirks.
2+
3+
One request from the editorial team is to help identify unusually symmetrical writing, especially words or phrases that are the same forwards and backwards. They’ve given you a batch of strings and want a quick analysis to see how common these symmetrical patterns are across submissions.
4+
5+
Before the system can move to full-scale tagging, your job is to write a utility that scans through a list of textual entries and counts how many of them follow this mirrored pattern. Since formatting varies across submissions, you should disregard things like spacing and punctuation, and avoid being thrown off by inconsistent capitalization.
6+
7+
Your function should take in a list of strings and return the number of entries that fit the requested pattern. This count will be used to help estimate how often this stylistic device appears, which in turn will shape the tagging rules for future content.

haystack/pattern_analysis/io.json

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
[
2+
{
3+
"input": [["level", "hello", "Madam", "No lemon, no melon"]],
4+
"output": 3
5+
},
6+
{
7+
"input": [["Was it a car or a cat I saw", "banana", "Step on no pets"]],
8+
"output": 2
9+
},
10+
{
11+
"input": [["Racecar", "civic", "deed", "not a palindrome"]],
12+
"output": 3
13+
},
14+
{
15+
"input": [["Hello", "world", "python"]],
16+
"output": 0
17+
},
18+
{
19+
"input": [["Red rum, sir, is murder", "Eva, can I see bees in a cave?", "Go hang a salami I'm a lasagna hog"]],
20+
"output": 3
21+
},
22+
{
23+
"input": [["", "A", "!!!"]],
24+
"output": 3
25+
}
26+
]

0 commit comments

Comments
 (0)