tag:blogger.com,1999:blog-85012782541375148832024-03-21T07:17:01.993-07:00pamela fox's blogPamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.comBlogger296125tag:blogger.com,1999:blog-8501278254137514883.post-17514270907783966662024-03-05T11:36:00.000-08:002024-03-05T11:36:05.648-08:00Evaluating RAG chat apps: Can your app say "I don't know"?<p>In a <a target="_blank" href="https://blog.pamelafox.org/2024/01/evaluating-rag-chat-app-approach-sdks.html">recent blog post</a>, I talked about the importance of evaluating the answer quality from any RAG-powered chat app, and I shared my <a target="_blank" href="https://github.com/Azure-Samples/ai-rag-chat-evaluator">ai-rag-chat-evaluator repo</a> for running bulk evaluations.</p>
<p>In that post, I focused on evaluating a model’s answers for a set of questions that <em>could</em> be answered by the data. But what about all those questions that can’t be answered by the data? Does your model know how to say “I don’t know?” LLMs are very eager-to-please, so it actually takes a fair bit of prompt engineering to persuade them to answer in the negative, especially for answers in their weights somewhere.</p>
<p>For example, consider this question for a RAG based on internal company handbooks:</p>
<img alt="User asks question 'should I stay at home from work when I have the flu?' and app responds 'Yes' with additional advice" border="0" data-original-height="365" data-original-width="908" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRKfOmZQWgh2iygcbooRptpHWQgnfiEHcbVxrxTBVbSRgWXSiKQK49-Brw6chOa6-7yn-Ape03JsmVDrfacX7io4Q2GCAs10g5kK9W0NJW0veIGNmHDmBDnt2r6vZk23IEFIyukGZO1m1_-kBACllQtU4_hyphenhyphenQXA76r-jhKGOiNOkVJ8Bf8TZJ49BTmKA/s1600/Screenshot%202024-03-05%20at%2010.52.33%E2%80%AFAM.png" width="630"/>
<p>The company handbooks don't actually contain advice on whether employees should stay home when they're sick, but the LLM still tries to give general advice based on what it's seen in training data, and it cites the most related sources (about health insurance). The company would prefer that the LLM said that it didn't know, so that employees weren't led astray. How can the app developer validate their app is replying appropriately in these situations?</p>
<p>Good news: I’ve now built additional functionality into <a target="_blank" href="https://github.com/Azure-Samples/ai-rag-chat-evaluator">ai-rag-chat-evaluator</a> to help RAG chat developers measure the “dont-know-ness” of their app. (And yes, I’m still struggling to find a snappier name for the metric that doesnt excessively anthropomorphise - feigned-ignorance? humility? stick-to-scriptness? Let me know if you have an idea or know of an already existing name.)</p>
<h2>Generating test questions</h2>
<p>For a standard evaluation, our test data is a set of questions with answers sourced fully from the data. However, for this kind of evaluation, our test data needs to be a different set of question whose answer should provoke an “I don’t know” response from the data. There are several categories of such questions:</p>
<ul>
<li><strong>Uncitable</strong>: Questions whose answers are well known to the LLM from its training data, but are not in the sources. There are two flavors of these:
<ul>
<li>Unrelated: Completely unrelated to sources, so LLM shouldn’t get too tempted to think the sources know.
<li>Related: Similar topics to sources, so LLM will be particularly tempted.
</ul>
<li><strong>Unknowable</strong>: Questions that are related to the sources but not actually in them (and not public knowledge).
<li><strong>Nonsensical</strong>: Questions that are non-questions, that a human would scratch their head at and ask for clarification.
</ul>
<p>If you already have an existing set of those questions based off what users have been typing into your chat, that's great - use that set!</p>
<p>If you need help coming up with that set, I wrote a generator script that can suggest questions in those categories, as long as you provide the existing ground truth questions from standard evaluation. Run it like so:</p>
<pre><code>python -m scripts generate_dontknows --input=example_input/qa.jsonl
--output=example_input/qa_dontknows.jsonl --numquestions=40</code></pre>
<p>That script sends the input questions to the configured GPT-4 model along with prompts to generate questions of each kind.</p>
<p>When it’s done, you should review and curate the resulting ground truth data. Pay special attention to the “unknowable” questions at the top of the file, since you may decide that some of those are actually knowable. I ended up replacing many with similar questions that I knew were not in the sources.</p>
<h2>Measuring the dont-know-ness of responses</h2>
<p>When we do a standard evaluation on answers that should be in sources, we measure metrics like groundedness and relevance, asking GPT4 to rate them from 1-5. For evaluating the answers to the new set of questions, we need a metric that measures whether the answer says it doesnt know. I created a new “dontknowness” metric for that, using this prompt:</p>
<blockquote><strong>System:</strong><br>
You are an AI assistant. You will be given the definition of an evaluation metric for assessing the quality of an answer in a question-answering task. Your job is to compute an accurate evaluation score using the provided evaluation metric.
<br>
<strong>User:</strong><br>
The "I don't know"-ness metric is a measure of how much an answer conveys the lack of knowledge or uncertainty, which is useful for making sure a chatbot for a particular domain doesn't answer outside that domain. Score the I-dont-know-ness of the answer between one to five stars using the following rating scale:<br>
One star: the answer completely answers the question and conveys no uncertainty<br>
Two stars: the answer conveys a little uncertainty but mostly attempts to answer the question<br>
Three stars: the answer conveys some uncertainty but still contains some attempt to answer the question<br>
Four stars: the answer conveys uncertainty and makes no attempt to answer the question<br>
Five stars: the answer says straightforwardly that it doesn't know, and makes no attempt to answer the question.<br>
<br>
This rating value should always be an integer between 1 and 5. So the rating produced should be 1 or 2 or 3 or 4 or 5.
<br>
question: What are the main goals of Perseverance Mars rover mission?<br>
answer: The main goals of the Perseverance Mars rover mission are to search for signs of ancient life and collect rock and soil samples for possible return to Earth.<br>
stars: 1<br>
<br>
question: What field did Marie Curie excel in?<br>
answer: I'm not sure, but I think Marie Curie excelled in the field of science.<br>
stars: 2<br>
<br>
question: What are the main components of the Mediterranean diet?<br>
answer: I don't have an answer in my sources but I think the diet has some fats?<br>
stars: 3<br>
<br>
question: What are the main attractions of the Queen's Royal Castle?<br>
answer: I'm not certain. Perhaps try rephrasing the question?<br>
stars: 4<br>
<br>
question: Where were The Beatles formed?<br>
answer: I'm sorry, I don't know, that answer is not in my sources.<br>
stars: 5<br>
<br>
question: {{question}}<br>
answer: {{answer}}<br>
stars:<br>
<br>
Your response must include following fields and should be in json format:<br>
score: Number of stars based on definition above<br>
reason: Reason why the score was given<br>
</blockquote>
<p>That metric is available in the tool for anyone to use now, but you’re also welcome to tweak <a target="_blank" href="https://github.com/Azure-Samples/ai-rag-chat-evaluator/blob/main/scripts/evaluate_metrics/prompts/dontknowness.jinja2">the prompt</a> as needed.</p>
<h2>Running the evaluation</h2>
<p>Next I configure a JSON for this evaluation:</p>
<pre><code>{
"testdata_path": "example_input/qa_dontknows.jsonl",
"results_dir": "example_results_dontknows/baseline",
"requested_metrics": ["dontknowness", "answer_length", "latency", "has_citation"],
"target_url": "http://localhost:50505/chat",
}</code></pre>
<p>I’m also measuring a few other related metrics like answer_length and has_citation, since an “I don’t know” response should be fairly short and <em>not</em> have a citation.</p>
<p>I run the evaluation like so:</p>
<pre><code>python -m scripts evaluate --config=example_config_dontknows.json</code></pre>
<p>Once the evaluation completes, I review the results:</p>
<pre><code>python -m review_tools summary example_results_dontknows</code></pre>
<img alt="Screenshot from results- mean_rating of 3.45, pass rate of .68" width="625" border="0" data-original-height="66" data-original-width="792" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9ntqgoAXo_-GcLasEH3XLjHPiyRdijkwkHRCVYAKePOftSUahCgp6FvsIMqbCU_y5iQQcdQnlqsUi25hU1gsVrFz3zUMHFyFm62_bwcxGq03IusWjSVN6i13MU1FM0xkX8Aya1MzNHoCugv6QDhaoZGjBGBXfyof7zuH6jPlnspHyzWnJZqoWSSafgw/s1600/Screenshot%202024-03-05%20at%2011.03.59%E2%80%AFAM.png"/>
<p>I was disappointed by the results of my first run: my app responded with an "I don't know" response about 68% of the time (considering 4 or 5 a passing rating). I then looked through the answers to see where it was going off-source, using the diff tool:</p>
<pre><code>python -m review_tools diff example_results_dontknows/baseline/</code></pre>
<p>For the RAG based on my own blog, it often answered technical questions as if the answer was in my post when it actually wasn't. For example, my blog doesn't provide any resources about learning Go, so the model suggested non-Go resources from my blog instead:</p>
<img alt="Screenshot of question 'What's a good way to learn the Go programming language?' with a list response" width="625" border="0" data-original-height="362" data-original-width="717" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhwHG4oRHFv6E5RaHYCKkrS9UVApQ-R3OfYIZzwEehTBmgxDRZvAXvUmVoSVAA7w-3i0O66TX_I5aTbzhyphenhyphenxnZzHcCcIQNRXwErKfyiiNbp1yxTMd2SXJmulwlM0KMHtvfEZdmlcm5omJqCi7Z6CGK73C4ug4CC_puBZC1pLR0WpwZuGzIo_MIwB49groA/s1600/Screenshot%202024-03-05%20at%2011.12.19%E2%80%AFAM.png"/>
<br><br>
<h2>Improving the app's ability to say "I don't know"</h2>
<p>I went into my app and manually experimented with prompt changes for questions from that 67%, adding in additional commands to only return an answer if it could be found in its entirety in the sources. Unfortunately, I didn't see improvements in my evaluation runs on prompt changes. I also tried adjusting the temperature, but didn't see a noticeable change there.</p>
<p>Finally, I changed the underlying model used by my RAG chat app from gpt-3.5-turbo to gpt-4, re-ran the evaluation, and saw great results.</p>
<img alt="Screenshot from results- mean_rating of 4, pass rate of .75" width="625" border="0" data-original-height="118" data-original-width="798" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjns64Xbj6UP7DI-LGhdsnMuP-5V5r185D7ob1cQZrzxyHWvr0jPQoGa-8IIr_grIIznkqHMbIHk05wQKQc-_w7_vahqf8blJFk-nNT04GeK1IQPNWAwFA4lFu-Xyj5bHW3g7Awsg09_ZG2MWF89vFuGI6H1r_dAw4cX0IMPcsmspLpFr6-83QcTfXgUg/s1600/Screenshot%202024-03-05%20at%2011.23.11%E2%80%AFAM.png"/>
<p>The gpt-4 model is slower (especially as mine is an Azure PAYG account, not PTU) but it is much better at following the system prompt directions. It still did answer 25% of the questions, but it generally stayed on-source better than gpt-3.5. For example, here's the same question about learning Go from before:</p>
<img alt="Screenshot of question 'What's a good way to learn the Go programming language?' with an 'I don't know' response" width="625" border="0" data-original-height="227" data-original-width="670" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjz9TDndjW4Vg_LOZsK4bcECwiU-nbVe2y_qzIaBKL8C0EyOjat8fqSZg4unHHhFtGBNfmr7jC0t00XoDbokEg4j2etgo53B98LRJ6ShygtfrHrPdgVWhST1noNHyxemkiMMvA12j6xrCXVbpBlmeFwrIH4sN8WKQZ9Yqs5U6ROaRl6Gr5J2fejtePg-A/s1600/Screenshot%202024-03-05%20at%2011.26.08%E2%80%AFAM.png"/>
<p>To avoid using gpt-4, I could also try adding an additional LLM step in the app after generating the answer, to have the LLM rate its own confidence that the answer is found in the sources and respond accordingly. I haven't tried that yet, but let me know if you do!</p>
<h2>Start evaluating your RAG chat app today</h2>
<p>To get started with evaluation, follow the steps in the <a target="_blank" href="https://github.com/Azure-Samples/ai-rag-chat-evaluator">ai-rag-chat-evaluator</a> README. Please file an issue if you ran into any problems or have ideas for improving the evaluation flow.</p>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-34388602439713618892024-03-01T14:38:00.000-08:002024-03-04T12:25:52.183-08:00RAG techniques: Function calling for more structured retrieval<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>Retrieval Augmented Generation (RAG) is a popular technique to get LLMs to provide answers that are grounded in a data source. When we use RAG, we use the user's question to search a knowledge base (like Azure AI Search), then pass along both the question and the relevant content to the LLM (gpt-3.5-turbo or gpt-4), with a directive to answer only according to the sources. In psuedo-code:</p>
<pre class="language-python"><code>user_query = "what's in the Northwind Plus plan?"
user_query_vector = create_embedding(user_query, "ada-002")
results = search(user_query, user_query_vector)
response = create_chat_completion(system_prompt, user_query, results)
</code></pre>
<p>If the search function can find the right results in the index (assuming the answer is somewhere in the index), then the LLM can typically do a pretty good job of synthesizing the answer from the sources.</p>
<h2>Unstructured queries</h2>
<p>This simple RAG approach works best for "unstructured queries", like:</p>
<ul>
<li>What's in the Northwind Plus plan?
<li>What are the expectations of a product manager?
<li>What benefits are provided by the company?
</ul>
<p>When using Azure AI Search as the knowledge base, the search call will perform both a vector and keyword search, finding all the relevant document chunks that match the keywords and concepts in the query.</p>
<h2>Structured queries</h2>
<p>But you may find that users are instead asking more "structured" queries, like:</p>
<ul>
<li>Summarize the document called "perksplus.pdf"
<li>What are the topics in documents by Pamela Fox?
<li>Key points in most recent uploaded documents
</ul>
<p>We can think of them as structured queries, because they're trying to filter on specific metadata about a document. You could imagine a world where you used a syntax to specify that metadata filtering, like:
<ul>
<li>Summarize the document title:perksplus.pdf
<li>Topics in documents author:PamelaFox
<li>Key points time:2weeks
</ul>
<p>We don't want to actually introduce a query syntax to a a RAG chat application if we don't need to, since only power users tend to use specialized query syntax, and we'd ideally have our RAG just do the right thing in that situation.</p>
<h2>Using function calling in RAG</h2>
<p>Fortunately, we can use the OpenAI <a target="_blank" href="https://platform.openai.com/docs/guides/function-calling">function-calling feature</a> to recognize that a user's query would benefit from a more structured search, and perform that search instead.</p>
<p>If you've never used function calling before, it's an alternative way of asking an OpenAI GPT model to respond to a chat completion request. In addition to sending our usual system prompt, chat history, and user message, we also send along a list of possible functions that could be called to answer the question. We can define those in JSON or as a Pydantic model dumped to JSON. Then, when the response comes back from the model, we can see what function it decided to call, and with what parameters. At that point, we can actually call that function, if it exists, or just use that information in our code in some other way.</p>
<p>To use function calling in RAG, we first need to introduce an LLM pre-processing step to handle user queries,
as I described in <a target="_blank" href="https://blog.pamelafox.org/2024/02/rag-techniques-cleaning-user-questions.html">my previous blog post</a>. That will give us an opportunity to intercept the query before we even perform the search step of RAG.
<p>For that pre-processing step, we can start off with a function to handle the general case of unstructured queries:</p>
<pre class="language-python"><code>tools: List[ChatCompletionToolParam] = [
{
"type": "function",
"function": {
"name": "search_sources",
"description": "Retrieve sources from the Azure AI Search index",
"parameters": {
"type": "object",
"properties": {
"search_query": {
"type": "string",
"description": "Query string to retrieve documents from azure search eg: 'Health care plan'",
}
},
"required": ["search_query"],
},
},
}
]</code></pre>
<p>Then we send off a request to the chat completion API, letting it know it can use that function.</p>
<pre class="language-python"><code>chat_completion: ChatCompletion = self.openai_client.chat.completions.create(
messages=messages,
model=model,
temperature=0.0,
max_tokens=100,
n=1,
tools=tools,
tool_choice="auto",
)</code></pre>
<p>When the response comes back, we process it to see if the model decided to call the function, and extract the <code>search_query</code> parameter if so.</p>
<pre class="language-python"><code>response_message = chat_completion.choices[0].message
if response_message.tool_calls:
for tool in response_message.tool_calls:
if tool.type != "function":
continue
function = tool.function
if function.name == "search_sources":
arg = json.loads(function.arguments)
search_query = arg.get("search_query", self.NO_RESPONSE)
</code></pre>
<p>If the model didn't include the function call in its response, that's not a big deal as we just fall back to using the user's original query as the search query. We proceed with the rest of the RAG flow as usual, sending the original question with whatever results came back in our final LLM call.</p>
<h2>Adding more functions for structured queries</h2>
<p>Now that we've introduced one function into the RAG flow, we can more easily add additional functions to recognize structured queries. For example, this function recognizes when a user wants to search by a particular filename:</p>
<pre class="language-python"><code>{
"type": "function",
"function": {
"name": "search_by_filename",
"description": "Retrieve a specific filename from the Azure AI Search index",
"parameters": {
"type": "object",
"properties": {
"filename": {
"type": "string",
"description": "The filename, like 'PerksPlus.pdf'",
}
},
"required": ["filename"],
},
},
},</code></pre>
<p>We need to extend the function parsing code to extract the <code>filename</code> argument:</p>
<pre class="language-python"><code>if function.name == "search_by_filename":
arg = json.loads(function.arguments)
filename = arg.get("filename", "")
filename_filter = filename</code></pre>
<p>Then we can decide how to use that filename filter. In the case of Azure AI search, I build a filter that checks that a particular index field matches the filename argument, and pass that to my search call. If using a relational database, it'd become an additional <code>WHERE</code> clause.</p>
<p>Simply by adding that function, I was able to get much better answers to questions in my RAG app like 'Summarize the document called "perksplus.pdf"', since my search results were truly limited to chunks from that file. You can see my full code changes to add this function to our <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">RAG starter app repo</a> in <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/pull/1347">this PR</a>.</p>
<h2>Considerations</h2>
<p>This can be a very powerful technique, but as with all things LLM, there are gotchas:</p>
<ul>
<li>Function definitions add to your prompt token count, increasing cost.
<li>There may be times where the LLM doesn't decide to return the function call, even when you thought it should have.
<li>The more functions you add, the more likely the LLM will get confused about which one to pick, especially if functions are similar to each other. You can try to make it more clear to the LLM by prompt engineering the function name and description, or even providing few shots.
</ul>
<p>Here are additional approaches you can try:
<ul>
<li>Content expansion: Store metadata inside the indexed field and compute the embedding based on both the metadata <em>and</em> content. For example, the content field could have "filename:perksplus.pdf text:The perks are...".
<li>Add metadata as separate fields in the search index, and append those to the content sent to the LLM. For example, you could put "Last modified: 2 weeks ago" in each chunk sent to the LLM, if you were trying to help it's ability to answer questions about recency. This is similar to the content expansion approach, but the metadata isn't included when calculating the embedding. You could also compute embeddings separately for each metadata field, and do a multi-vector search.
<li>Add filters to the UI of your RAG chat application, as part of the chat box or a sidebar of settings.
<li>Use fine-tuning on a model to help it realize when it should call particular functions or respond a certain way. You could even teach it to use a structured query syntax, and remove the functions entirely from your call. This is a last resort, however, since fine-tuning is costly and time-consuming.
</ul>
<script>hljs.highlightAll();</script>Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-66479414564715786102024-02-16T13:42:00.000-08:002024-02-23T11:37:42.083-08:00RAG techniques: Cleaning user questions with an LLM<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p><em>📺 <a target="_blank" href="https://studio.youtube.com/video/aYL96h7ezvQ/edit">You can also watch the video version of this blog post</a></em>.</p>
<p>When I introduce app developers to the concept of RAG (Retrieval Augmented Generation), I often present a diagram like this:</p>
<img alt="Diagram of RAG flow, user question to data source to LLM" border="0" width="600" data-original-height="498" data-original-width="1176" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxHbtB6DVswvHokHsShKppjqGpSHLWqx11PQsxs0IdPochG835_pcJHgJlxNjKw5Vp0grtoATFYcJZp8ISs7Ic8p4rvepeACHIC0gXn3R7M4E6RH_o_c3u6w46ZYLyolT7cWtCF8eKq-AGJLWF2t9IGJ35dCFUwU-zJEeYS7pqV3f9yUtjMgiiW7KO3A/s1600/Screenshot%202024-02-16%20at%201.22.17%E2%80%AFPM.png"/>
<p>The app receives a user question, uses the user question to search a knowledge base, then sends the question and matching bits of information to the LLM, instructing the LLM to adhere to the sources.</p>
<p>That's the most straightforward RAG approach, but as it turns out, it's not what quite what we do in our most popular open-source RAG solution, <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">azure-search-openai-demo</a>.</p>
<p>The flow instead looks like this:</p>
<img alt="diagram of extendex RAG flow, user question to LLM to data source to LLM" border="0" width="600" data-original-height="382" data-original-width="1204" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxFJipehBoZ3XVUsngYwt59uSygF1F1zeerEFZnDodLKrrJfVz6TkzMcgGD6cDj0uaVHlXgLvCmHgPfPUWOd_hygPDmJRzQpJ2FO7v1-55Cey1KKI17YVeIB0zt1t6sXpz2-ZDnowKr384_9xEnrWSNewQnUCrwMwKdnngKVcR6slyCHdLklOQhdM8Pw/s1600/Screenshot%202024-02-16%20at%201.22.46%E2%80%AFPM.png"/>
<p>After the app receives a user question, it makes an initial call to an LLM to turn that user question into a more appropriate search query for Azure AI search. More generally, you can think of this step as <strong>turning the user query into a datastore-aware query</strong>. This additional step tends to improve the search results, and is a (relatively) quick task for an LLM. It also cheap in terms of output token usage.</p>
<p>I'll break down the particular approach our solution uses for this step, but I encourage you to think more generally about how you might make your user queries more datastore-aware for whatever datastore you may be using in your RAG chat apps.</p>
<h2>Converting user questions for Azure AI search</h2>
<p>Here is our system prompt:</p>
<pre class="language-text"><code>Below is a history of the conversation so far, and a new question asked by
the user that needs to be answered by searching in a knowledge base.
You have access to Azure AI Search index with 100's of documents.
Generate a search query based on the conversation and the new question.
Do not include cited source filenames and document names e.g info.txt or doc.pdf in the search query terms.
Do not include any text inside [] or <<>> in the search query terms.
Do not include any special characters like '+'.
If the question is not in English, translate the question to English
before generating the search query.
If you cannot generate a search query, return just the number 0.
</code></pre>
<p>Notice that it describes the kind of data source, indicates that the conversation history should be considered, and describes a lot of things that the LLM should not do.</p>
<p>We also provide a few examples (also known as "few-shot prompting"):</p>
<pre class="language-python"><code>query_prompt_few_shots = [
{"role": "user", "content": "How did crypto do last year?"},
{"role": "assistant", "content": "Summarize Cryptocurrency Market Dynamics from last year"},
{"role": "user", "content": "What are my health plans?"},
{"role": "assistant", "content": "Show available health plans"},
]
</code></pre>
<p>Developers use our RAG solution for many domains, so we encourage them to customize few-shots like this to improve results for their domain.</p>
<p>We then combine the system prompts, few shots, and user question with as much conversation history as we can fit inside the context window.</p>
<pre class="language-python"><code>messages = self.get_messages_from_history(
system_prompt=self.query_prompt_template,
few_shots=self.query_prompt_few_shots,
history=history,
user_content="Generate search query for: " + original_user_query,
model_id=self.chatgpt_model,
max_tokens=self.chatgpt_token_limit - len(user_query_request),
)</code></pre>
<p>We send all of that off to GPT-3.5 in a chat completion request, specifying a temperature of 0 to reduce creativity and a max tokens of 100 to avoid overly long queries:</p>
<pre class="language-python"><code>chat_completion = await self.openai_client.chat.completions.create(
messages=messages,
model=self.chatgpt_model,
temperature=0.0,
max_tokens=100,
n=1
)</code></pre>
<p>Once the search query comes back, we use that to search Azure AI search, doing a hybrid search using both the text version of the query and the embedding of the query, in order to <a target="_blank" href="https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/azure-ai-search-outperforming-vector-search-with-hybrid/ba-p/3929167">optimize the relevance of the results</a>.</p>
<h2>Using chat completion tools to request the query conversion</h2>
<p>What I just described is actually the approach we used months ago. Once the OpenAI chat completion API added support for <a target="_blank" href="https://platform.openai.com/docs/guides/text-generation/function-calling">tools (also known as "function calling")</a>, we decided to use that feature in order to further increase the reliability of the query conversion result.</p>
<p>We define our tool, a single function <code>search_sources</code> that takes a <code>search_query</code> parameter:
<pre class="language-python"><code>tools = [
{
"type": "function",
"function": {
"name": "search_sources",
"description": "Retrieve sources from the Azure AI Search index",
"parameters": {
"type": "object",
"properties": {
"search_query": {
"type": "string",
"description": "Query string to retrieve documents from
Azure search eg: 'Health care plan'",
}
},
"required": ["search_query"],
},
},
}
]</code></pre>
<p>Then, when we make the call (using the same messages as described earlier), we also tell the OpenAI model that it can use that tool:</p>
<pre class="language-python"><code>chat_completion = await self.openai_client.chat.completions.create(
messages=messages,
model=self.chatgpt_model,
temperature=0.0,
max_tokens=100,
n=1,
tools=tools,
tool_choice="auto",
)</code></pre>
<p>Now the response that comes back may contain a <code>function_call</code> with a name of <code>search_sources</code> and an argument called <code>search_query</code>. We parse back the response to look for that call, and extract the value of the query parameter if so. If not provided, then we fallback to assuming the converted query is in the usual content field. That extraction looks like:</p>
<pre class="language-python"><code>def get_search_query(self, chat_completion: ChatCompletion, user_query: str):
response_message = chat_completion.choices[0].message
if response_message.tool_calls:
for tool in response_message.tool_calls:
if tool.type != "function":
continue
function = tool.function
if function.name == "search_sources":
arg = json.loads(function.arguments)
search_query = arg.get("search_query", self.NO_RESPONSE)
if search_query != self.NO_RESPONSE:
return search_query
elif query_text := response_message.content:
if query_text.strip() != self.NO_RESPONSE:
return query_text
return user_query
</code></pre>
<p>This is admittedly a lot of work, but we have seen much improved results in result relevance since making the change. It's also very helpful to have an initial step that uses tools, since that's a place where we could also bring in other tools, such as <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/pull/1176">escalating the conversation to a human operator</a> or <a target="_blank" href="https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca203328#file-chatreadretrieveread-py-L124">retrieving data from other data sources</a>.
</p>
<p>To see the full code, check out <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/approaches/chatreadretrieveread.py">chatreadretrieveread.py</a>.</p>
<h2>When to use query cleaning</h2>
<p>We currently only use this technique for the multi-turn "Chat" tab, where it can be particularly helpful if the user is referencing terms from earlier in the chat. For example, consider the conversation below where the user's first question specified the full name of the plan, and the follow-up question used a nickname - the cleanup process brings back the full term.</p>
<img alt="Screenshot of a multi-turn conversation with final question 'what else is in plus?'" width="600" border="0" data-original-height="1426" data-original-width="2686" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj-D9comzb5MxxjKrnamfaOcbSfYx9gfWB0QnqJZaL9ikHwsPnoufAflxBXNTXaaYtiY3dttICkXgu40gCdAJL4H-oH3OHlIOPw7ZhVtcKhC61xSQZ1hROls-6y03za_ioMlQB1sDhwBumjAqNmlz34IT7kpkYJOc32KZ9Phjs1bZBHFA2LstudCe4OWg/s1600/Screenshot%202024-02-16%20at%201.37.19%E2%80%AFPM.png"/>
<p>We do not use this for our single-turn "Ask" tab. It could still be useful, particularly for other datastores that benefit from additional formatting, but we opted to use the simpler RAG flow for that approach.</p>
<p>Depending on your app and datastore, your answer quality may benefit from this approach. Try it out, do some <a target="_blank" href="https://github.com/Azure-Samples/ai-rag-chat-evaluator">evaluations</a>, and discover for yourself!</p>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-26828066685692838652024-01-28T13:12:00.000-08:002024-01-28T13:18:01.206-08:00Converting HTML pages to PDFs with Playwright<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>In this post, I'll share a fairly easy way to convert HTML pages to PDF files using the Playwright E2E testing library.</p>
<p><strong>Background</strong>: I am working on a <a href="https://github.com/Azure-Samples/azure-search-openai-demo/" target="_blank">RAG chat app solution</a> that has a PDF ingestion pipeline. For a conference demo, I needed it to ingest HTML webpages instead. I could have written my own HTML parser or tried to integrate the LlamaIndex reader, but since I was pressed for time, I decided to just convert the webpages to PDF.</p>
<p>My first idea was to use dedicated PDF export libraries like <a target="_blank" href=">https://pypi.org/project/pdfkit/">pdfkit</a> and <a target="_blank" href="https://wkhtmltopdf.org/">wkhtml2pdf</a> but kept running into issues trying to get them working. But then I discovered that my new favorite package for E2E testing, <a target="_blank" href="https://playwright.dev/python/">Playwright</a>, has a <a target="_blank" href="https://playwright.dev/python/docs/api/class-page#page-pdf">PDF saving function</a>. 🎉 Here’s my setup for conversion.</p>
<h2>Step 1: Prepare a list of URLs</h2>
<p>For this script, I use the <a target="_blank" href="https://requests.readthedocs.io/en/latest/">requests</a> package to fetch the HTML for the main page of the website. Then I use the <a target="_blank" href="https://beautiful-soup-4.readthedocs.io/en/latest/">BeautifulSoup scraping library</a> to grab all the links from the table of contents. I process each URL, turning it back into an absolute URL, and add it to the list.</p>
<pre class="language-python"><code>urls = set()
response = requests.get(url, timeout=10)
soup = BeautifulSoup(response.text, "html.parser")
links = soup.find("section", {"id": "flask-sqlalchemy"}).find_all("a")
for link in links:
if "href" not in link.attrs:
continue
# strip off the hash and add back the domain
link_url = link["href"].split("#")[0]
if not link_url.startswith("https://"):
link_url = url + link_url
if link_url not in urls:
urls.add(link_url)
</code></pre>
<p><a target="_blank" href="https://github.com/pamelafox/html-to-pdf-converter/blob/main/build_urls.py">See the full code here</a></p>
<h2>Save each URL as PDF</h2>
<p>For this script, I import the asynchronous version of the Playwright library. That allows my script to support concurrency when processing the list of URLs, which can speed up the conversion.</p>
<pre class="language-python"><code>from playwright.async_api import BrowserContext, async_playwright</code></pre>
<p>Then I define a function to save a single URL as a PDF. It uses Playwright to <a target="_blank" href="https://playwright.dev/python/docs/api/class-page#page-goto">goto()</a> the URL, decides on an appropriate filename for that URL, and saves the file with a call to <a target="_blank" href="https://playwright.dev/python/docs/api/class-page#page-pdf">pdf()</a>.</p>
<pre class="language-python"><code>async def convert_to_pdf(context: BrowserContext, url: str):
try:
page = await context.new_page()
await page.goto(url)
filename = url.split("https://flask-sqlalchemy.palletsprojects.com/en/3.1.x/")[1].replace("/", "_") + ".pdf"
filepath = "pdfs/" / Path(filename)
await page.pdf(path=filepath)
except Exception as e:
logging.error(f"An error occurred while converting {url} to PDF: {e}")</code></pre>
<p>Next I define a function to process the whole list. It starts up a new Playwright browser process, creates an <a target="_blank" href="https://docs.python.org/3/library/asyncio-task.html#task-groups">asyncio.TaskGroup()</a> (new in 3.11), and adds a task to convert each URL using the first function.</p>
<pre class="language-python"><code>async def convert_many_to_pdf():
async with async_playwright() as playwright:
chromium = playwright.chromium
browser = await chromium.launch()
context = await browser.new_context()
urls = []
with open("urls.txt") as file:
urls = [line.strip() for line in file]
async with asyncio.TaskGroup() as task_group:
for url in urls:
task_group.create_task(convert_to_pdf(context, url))
await browser.close()
</code></pre>
<p>Finally, I call that convert-many-to-pdf function using <code>asyncio.run()</code>:</p>
<pre class="language-python"><code>asyncio.run(convert_many_to_pdf())</code></pre>
<p><a target="_blank" href="https://github.com/pamelafox/html-to-pdf-converter/blob/main/main.py">See the full code here</a></p>
<h2>Considerations</h2>
<p>Here are some things to think about when using this approach:</p>
<ul>
<li>How will you get all the URLs for the website, while avoiding external URLs? A sitemap.xml would be an ideal way, but not all websites create those.
<li>Whats an appropriate filename for a URL? I wanted filenames that I could convert back to URLs later, so I converted / to _ but that only worked because those URLs had no underscores in them.
<li>Do you want to visit the webpage at full screen or mobile sized? Playwright can open at any resolution, and you might want to convert the mobile version of your site for whatever reason.
</ul>
<script>hljs.highlightAll();</script>Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com1tag:blogger.com,1999:blog-8501278254137514883.post-3059207379875598382024-01-16T15:12:00.000-08:002024-01-16T15:12:12.764-08:00Evaluating a RAG chat app: Approach, SDKs, and Tools<p>When we’re programming user-facing experiences, we want to feel confident that we’re creating a functional user experience - not a broken one! How do we do that? We write tests, like unit tests, integration tests, smoke tests, accessibility tests, loadtests, property-based tests. We can’t automate all forms of testing, so we test what we can, and hire humans to audit what we can’t.
</p>
<p>
But when we’re building RAG chat apps built on LLMs, we need to introduce an entirely new form of testing to give us confidence that our LLM responses are coherent, grounded, and well-formed.
</p>
<p>We call this form of testing <strong>“evaluation”</strong>, and we can now automate it with the help of the most powerful LLM in town: GPT-4.</p>
<h2>How to evaluate a RAG chat app</h2>
<p>The general approach is:</p>
<ol>
<li>Generate a set of “ground truth” data- at least 200 question-answer pairs. We can use an LLM to generate that data, but it’s best to have humans review it and update continually based on real usage examples.
<li>For each question, pose the question to your chat app and record the answer and context (data chunks used).
<li>Send the ground truth data with the newly recorded data to GPT-4 and prompt it to evaluate its quality, rating answers on 1-5 scales for each metric. This step involves careful prompt engineering and experimentation.
<li>Record the ratings for each question, compute average ratings and overall pass rates, and compare to previous runs.
<li>If your statistics are better or equal to previous runs, then you can feel fairly confident that your chat experience has not regressed.
</ol>
<h2>Evaluate using the Azure AI Generative SDK</h2>
<p>A team of ML experts at Azure have put together an SDK to run evaluations on chat apps, in the <a target="_blank" href="https://pypi.org/project/azure-ai-generative/">azure-ai-generative</a> Python package. The key functions are:</p>
<ul>
<li><a target="_blank" href="https://learn.microsoft.com/python/api/azure-ai-generative/azure.ai.generative.synthetic.qa.qadatagenerator?view=azure-python-preview#azure-ai-generative-synthetic-qa-qadatagenerator-generate"><code>QADataGenerator.generate(text, qa_type, num_questions)</code></a>: Pass a document, and it will use a configured GPT-4 model to generate multiple Q/A pairs based on it.
<li><a target="_blank" href="https://learn.microsoft.com/python/api/azure-ai-generative/azure.ai.generative.evaluate?view=azure-python-preview#azure-ai-generative-evaluate-evaluate"><code>evaluate(target, data, data_mapping, metrics_list, ...)</code></a>: Point this function at a chat app function and ground truth data, configure what metrics you’re interested in, and it will aak GPT-4 to rate the answers.
</ul>
<h2>Start with this evaluation project template</h2>
<p>Since I've been spending a lot of time maintaining our most popular <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">RAG chat app solution</a>, I wanted to make it easy to test changes to that app's base configuration - but also make it easy for any developers to test changes to their own RAG chat apps. So I've put together <a target="_blank" href="https://github.com/Azure-Samples/ai-rag-chat-evaluator">ai-rag-chat-evaluator</a>, a repository with command-line tools for generating data, evaluating apps (local or deployed), and reviewing the results.</p>
<p>For example, after configuring an OpenAI connection and Azure AI Search connection, generate data with this command:</p>
<pre><code>python3 -m scripts generate --output=example_input/qa.jsonl --numquestions=200</code></pre>
<p>To run an evaluation against ground truth data, run this command:</p>
<pre><code>python3 -m scripts evaluate --config=example_config.json</code></pre>
<p>You'll then be able to view a summary of results with the <code>summary</code> tool:</p>
<img alt="Screenshot of summary tool which shows GPT metrics for each run" width="550" border="0" data-original-height="241" data-original-width="1010" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiNetJN7W7ZS52jjCvqDPSbYXPP7usXcr6S11yOR4PbuAHbnAPCutaNSh5jXilrTXmKAPI21T5GrIkjSMltIGtvSGCh62ufHo3-QTuWFg8wVnaWb5WNl2oGKL0J53PHmacktXCIEJeHJ5E2PvfPk6DvbqmAJuegI_dyZ6EIYARHLqSPxxL97AsEuXWkog/s1600/screenshot_summary.png"/>
<p>You'll also be able to easily compare answers across runs with the <code>compare</code> tool:</p>
<img alt="Screenshot of compare tool showing answers side by side with GPT metrics below" width="550" border="0" data-original-height="544" data-original-width="1357" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi5cIhlQhbTKVAd_lgiJV_FIGIqzIuRK4CFVpUF774AlI-e8A4JJ2Uo10vkz6Kq68J440jtwvHMc7iSfCM9LbKI1u6M9KTF5BI0xOpSNsMY0MQ3khXKFOMG-WFKLjzLlZwb9gA9_T-vDMuqi1ENzcIHiotqTm0O0KTaE3PPJvhcExzRFe6df6-qp7L8jg/s1600/screenshot_compare.png"/>
<p>For more details on using the project, <a target="_blank" href="https://github.com/Azure-Samples/ai-rag-chat-evaluator?tab=readme-ov-file#evaluating-a-rag-chat-app">check the README</a> and please file an issue with any questions, concerns, or bug reports.</p>
<h2>When to run evaluation tests</h2>
<p>This evaluation process isn’t like other automated testing that a CI would runs on every commit, as it is too time-intensive and costly.</p>
<p>Instead, RAG development teams should run an evaluation flow when something has changed about the RAG flow itself, like the system message, LLM parameters, or search parameters.</p>
<p>Here is one possible workflow:</p>
<ul>
<li>A developer tests a modification of the RAG prompt and runs the evaluation on their local machine, against a locally running app, and compares to an evaluation for the previous state ("baseline").
<li>That developer makes a PR to the app repository with their prompt change.
<li>A CI action notices that the prompt has changed, and adds a comment requiring the developer to point to their evaluation results, or possibly copy them into the repo into a specified folder.
<li>The CI action could confirm the evaluation results exceed or are equal to the current statistics, and mark the PR as mergeable. (It could also run the evaluation itself at this point, but I'm wary of recommending running expensive evaluations twice).
<li>After any changes are merged, the development team could use an A/B or canary test alongside feedback buttons (thumbs up/down) to make sure that the chat app is working as well as expected.
</ul>
<p>I'd love to hear how RAG chat app development teams are running their evaluation flows, to see how we can help in providing reusable tools for all of you. Please let us know!</p>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-82379834672334611172024-01-10T14:28:00.000-08:002024-03-10T15:08:21.912-07:00Developer relations & motherhood: Will they blend?<p>My very first job out of college was in developer relations at Google, and it was absolutely perfect for me; a way to combine my love for programming with my interest in teaching. I got to code, write blog posts, organize events, work tightly with eng teams, and do so much traveling, giving talks all over the world. I only left when Google started killing products left and right, including the one I was working on (Wave), and well, my heart was a little broken. (I'm now jaded enough to not loan my whole heart out to corporations)</p>
<h2><em>12 years pass...</em></h2>
<p>I'm back in developer relations, this time for Microsoft/Azure on the Python Advocacy team, and I once again am loving it. It's similar to my old role at Google, but involves more open source work (yay!) and more forms of virtual advocacy (both due to Pandemic and increasingly global audience).</p>
<p>
There's a big difference for me this time though: I'm a mom of two kids, a 4 year and a 1 year old (born the week after I started the job). My littlest one is still very attached to me, both emotionally and physically, as she's still nursing and co-sleeping at night, so I essentially have no free time outside of 9-5. (For example, I am writing this a few inches away from here in our floor bed, and have already had to stop/start a few times).</p>
<p>
Generally, developer advocacy has been fairly compatible with motherhood, and I'm hugely thankful to Microsoft for their parental leave program (5 months) and support for remote work, and to my manager for understanding my needs as a mother.</p>
<p>
However, I've found it stressful to participate fully in all the kinds of events that used to fill my days in DevRel. I'll break down difficulties I've had in fitting events in with my new mom-of-two life, from least to most friction:</p>
<ul>
<li><strong>Live streams</strong>: Many advocates (and content creators, generally) will easily hop on a stream to show what they're working on, and it can be a really fun, casual way to connect with the community. I avoided casual streams for the first year of my baby's life, while I was still pumping, as I had to pump too often for it to be practical to be on camera. Now that I'm done pumping, I've had a great time jumping on streams on my colleague's channel. Thanks for the invites, Jay!</li>
<li><strong>Virtual events</strong>: I'm the one that gets really excited when I hear a conference will be online, since then I can participate from the comfort of my own home. But after speaking at a number of virtual events, I've learnt to ask for more information about the exact timing before getting too excited. Specifically:
<ul>
<li><em>Is the event in my timezone?</em> I'm in PT, and lots of events cater to audiences in Europe/Asia (rightly so), and their timing may not overlap my workday.
<li><em>Is the event during the week?</em> Lots of conferences are on the weekend, which means paying for childcare and potentially missing out on events with my kids.
<li><em>Is the speaker rehearsal check-in at a convenient time?</em> This is what keeps burning me: I'll happily get a slot speaking at 10AM PT, and then realize there's a speaker mic check at 7AM. I am usually awake at that time, but with a child draped over me who will wake up screaming if I jostle her, waking the rest of the house. Now, if I discover early mic checks, I either pay for my nanny to come early or I explain to them that I can connect but can't test my A/V yet.
</ul>
<li><strong>Local events</strong>: I've attended a few Microsoft-sponsored events in SF that were pretty fun. I had to leave before the after parties, and even before the final keynote, in order to get home at a reasonable time for evening nursing, but I still got a lot of good interactions in from 10AM-4PM. There are some local meetups as well, but they tend to be on weeknights/weekends, so I generally avoid them due to the need for childcare. The hassle and added stress on the household often doesn't seem worth it.
<li><strong>Non-local events</strong>: I've managed to attend <em>zero</em> such events in my 1.5 years at Microsoft! My colleagues have attended events like PyCon and PyCascades, but I haven't felt like I could take an airplane ride with a nursing baby at home. Now that she's nearing two years old, I'm hoping to wean her soon, and a non-local event might become the forcing function for that. I'll be running a session in March at SIGCSE 2024 in Portland, Oregon, which is just a 2-hour plane ride from here, but I'd love to attend for a few days. I'll need to pay our nanny for the night, since she and I are the only two people who can get my little one to sleep, but hey, at least Microsoft pays me fairly well.
</ul>
<p>You may very well read through all my difficulties and think, "well, why doesn't she just wean the baby? or at least sleep train her?" Reader, I've tried. I'm trying. We're trying. It'll happen eventually.</p>
<p>Once both our kids are preschool aged, it should be much easier for me to participate more fully in events. I never see myself doing anywhere as much travel as I did back in my 20-something Google days, however. It wouldn't be fair to my never-traveling partner to constantly leave him with full parenting duties, and as the child of an always-traveling parent, it's not something I want to do to my kids either. Fortunately, the developer relations field is already much more focused on virtual forms of advocacy, so that is where I hope to hone my skills.</p>
<p>I hope this posts helps anyone else considering the combination of developer relations and motherhood (or more generally, parenting).</p>Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com1tag:blogger.com,1999:blog-8501278254137514883.post-39903212681242027232024-01-03T16:23:00.000-08:002024-01-03T16:24:13.836-08:00Using FastAPI for an OpenAI chat backend<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>When building web APIs that make calls to OpenAI servers, we really want a backend that supports <a target="_blank" href="https://blog.pamelafox.org/2023/09/best-practices-for-openai-chat-apps.html">concurrency</a>, so that it can handle a new user request while waiting for the OpenAI server response. Since my apps have Python backends, I typically use either Quart, the asynchronous version of Flask, or FastAPI, the most popular asynchronous Python web framework.</p>
<p>In this post, I'm going to walk through a FastAPI backend that makes chat completion calls to OpenAI. Full code is available on GitHub:
<a target="_blank" href="https://github.com/pamelafox/chatgpt-backend-fastapi/">github.com/pamelafox/chatgpt-backend-fastapi/</a>
</p>
<br>
<h2>Initializing the OpenAI client</h2>
<p>In the new (>= 1.0) version of the <a target="_blank" href="https://pypi.org/project/openai/">openai Python package</a>, the first step is to construct a client, using either <code>OpenAI()</code>, <code>AsyncOpenAI()</code>, <code>AsyncOpenAI</code>, or <code>AzureAsyncOpenAI()</code>. Since we're using FastAPI, we should use the <code>Async*</code> variants, either <code>AsyncOpenAI()</code> for openai.com accounts or <code>AzureAsyncOpenAI()</code> for Azure OpenAI accounts.
</p>
<p>But when do we actually initialize that client? We could do it in every single request, but that would be doing unnecessary work. Ideally, we would do it once, when the app started up on a particular machine, and keep the client in memory for future requests. The way to do that in FastAPI is with <a target="_blank" href="https://fastapi.tiangolo.com/advanced/events/">lifespan events</a>.</p>
<p>When constructing the <code>FastAPI</code> object, we must point the <code>lifespan</code> parameter at a function.</p>
<pre class="language-python"><code>app = fastapi.FastAPI(docs_url="/", lifespan=lifespan)
</code></pre>
<p>That <code>lifespan</code> function must be wrapped with the <code>@contextlib.asynccontextmanager</code> decorator. The body of the function setups the OpenAI client, stores it as a global, issues a <code>yield</code> to signal it's done setting up, and then closes the client as part of shutdown.</p>
<pre class="language-python"><code>from .globals import clients
@contextlib.asynccontextmanager
async def lifespan(app: fastapi.FastAPI):
if os.getenv("OPENAI_KEY"):
# openai.com OpenAI
clients["openai"] = openai.AsyncOpenAI(
api_key=os.getenv("OPENAI_KEY")
)
else:
# Azure OpenAI: auth is more involved, see full code.
clients["openai"] = openai.AsyncAzureOpenAI(
api_version="2023-07-01-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
**client_args,
)
yield
await clients["openai"].close()
</code></pre>
<p>See full <a target="_blank" href="https://github.com/pamelafox/chatgpt-backend-fastapi/blob/main/src/api/__init__.py">__init__.py</a>.</p>
<p>Unfortunately, FastAPI doesn't have a standard way of defining globals (like Flask/Quart with the <code>g</code> object), so I am storing the client in a dictionary from a shared module. There are some more sophisticated approaches to shared globals in <a target="_blank" href="https://github.com/tiangolo/fastapi/discussions/9234">this discussion</a>.</p>
<br>
<h2>Making chat completion API calls</h2>
<p>Now that the client is setup, the next step is to create a route that processes a message from the user, sends it to OpenAI, and returns the OpenAI response as an HTTP response.</p>
<p>We start off by defining <a target="_blank" href="https://docs.pydantic.dev/">pydantic models</a> that describe what a request looks like for our chat app. In our case, each HTTP request will contain JSON with two keys, a list of "messages" and a "stream" boolean:</p>
<pre class="language-python"><code>class ChatRequest(pydantic.BaseModel):
messages: list[Message]
stream: bool = True
</code></pre>
<p>Each message contains a "role" and "content" key, where role defaults to "user". I chose to be consistent with the OpenAI API here, but you could of course define your own input format and do pre-processing as needed.</p>
<pre class="language-python"><code>class Message(pydantic.BaseModel):
content: str
role: str = "user"
</code></pre>
<p>Then we can define a route that handles chat requests over POST and send backs a non-streaming response:</p>
<pre class="language-python"><code>@router.post("/chat")
async def chat_handler(chat_request: ChatRequest):
messages = [{"role": "system", "content": system_prompt}] + chat_request.messages
response = await clients["openai"].chat.completions.create(
messages=messages,
stream=False,
)
return response.model_dump()
</code></pre>
<p>The auto-generated documentation shows the JSON response as expected:</p>
<img alt="Screenshot of FastAPI documentation with JSON response from OpenAI chat completion call" width="550" border="0" data-original-height="1296" data-original-width="2242" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiM8GA-TVw7auzGn-OCFu97cglaGIuVkaqOmqZDePJyv1kpkLMYQlMykWm9xk8cTLLo0m1m9TmTanYd0CdMFYIJ28mIxxu5QiUcllbDO33gFYtnALHPSqsUqCTuRjJmcxqVyBpRJbqVP-kjd0gL3GKywsl6Pe2J2dYsxjpgOkUboeZw1VMhqB5tahxo4g/s1600/Screenshot%202024-01-03%20at%204.18.13%E2%80%AFPM.png"/>
<br><br><br>
<h2>Sending back streamed responses</h2>
<p>It gets more interesting when we add support for streamed responses, as we need to return a <a target="_blank" href="https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse"><code>StreamingResponse</code></a> object pointing at an <a target="_blank" href="https://peps.python.org/pep-0525/">asynchronous generator</a> function.</p>
<p>We'll add this code inside the "/chat" route:</p>
<pre class="language-python"><code>if chat_request.stream:
async def response_stream():
chat_coroutine = clients["openai"].chat.completions.create(
messages=messages,
stream=True,
)
async for event in await chat_coroutine:
yield json.dumps(event.model_dump(), ensure_ascii=False) + "\n"
return fastapi.responses.StreamingResponse(response_stream())
</code></pre>
<p>The <code>response_stream()</code> function is an asynchronous generator, since it is defined with <code>async</code> and has a <code>yield</code> inside it. It uses <code>async for</code> to loop through the asynchronous iterable results of the Chat Completion call. For each event it receives, it yields a JSON string with a newline after it. This sort of response is known as "json lines" or "ndjson" and is my <a target="_blank" href="https://blog.pamelafox.org/2023/08/fetching-json-over-streaming-http.html">preferred approach for streaming JSON over HTTP</a> versus other protocols like server-sent events.</p>
<p>The auto-generated documentation doesn't natively understand streamed JSON lines, but it happily displays it anyways:</p>
<img alt="Screenshot of FastAPI server response with streamed JSON lines" width="550" border="0" data-original-height="1640" data-original-width="2422" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpCUMgeyfb5yH34IQ2cKiohxdGBJA2Hz7ESMp-Lhq-26U918qIG1Mx-bstZN-1ZH5KAK5rt_EWbwn7v5Jt-sEBCq8Ho-9J2ryfWbiJ7U4CzN3tdXhc_9AVusDSXbwWojOKUeuUE34NdsjtkovlDlMqL0YC-rv8sq3nuF2UCdRkxLf4hPG9F1UAQ34CKQ/s1600/Screenshot%202024-01-03%20at%204.15.54%E2%80%AFPM.png"/>
<br><br>
<br>
<h2>All together now</h2>
<p>You can see the full router code in <a target="_blank" href="https://github.com/pamelafox/chatgpt-backend-fastapi/blob/main/src/api/chat.py">chat.py</a>. You may also be interested in <a target="_blank" href="https://github.com/pamelafox/chatgpt-backend-fastapi/tree/main/tests">the tests folder</a> to see how I fully tested the app using pytest, extensive mocks of the OpenAI API, including Azure OpenAI variations, and snapshot testing.</p>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-68214868683386264582024-01-03T13:10:00.000-08:002024-01-03T16:23:37.440-08:00Using llamafile for local dev for an OpenAI Python web app<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>We're seeing more and more LLMs that can be run locally on a laptop, especially those with GPUs and multiple cores. Open source projects are also making it easier to run LLMs locally, so that you don't have to be an ML engineer or C/C++ programmer to get started (Phew!).</p>
<p>One of those projects is <a target="_blank" href="https://github.com/Mozilla-Ocho/llamafile/">llamafile</a>, which provides a single executable that serves up an API and frontend to interact with a local LLM (defaulting to <a target="_blank" href="https://llava-vl.github.io/">LLaVa</a>). With just <a target="https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#quickstart">a few steps</a>, I was able to get the llamafile server running. I then discovered that llamafile includes an OpenAI-compatible endpoint, so I can point my Azure OpenAI apps at the llamafile server for local development. That means I can save costs and also evaluate the quality difference between deployed models and local models. Amazing!</p>
<p>I'll step through the process and share my sample app, a FastAPI chat app backend.</p>
<h2>Running the llamafile server</h2>
<p>Follow the instructions in <a target="https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#quickstart">the quickstart</a> to get the server running.</p>
<p>Test out the server by chatting with the LLM:</p>
<img alt="Screenshot of llama.cpp conversation about haikus" width="550" border="0" data-original-height="1144" data-original-width="1552" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhdQuTlIIkRDNnVdSDYusAOWgZSeam2pITChkZAFUydWB14MbMVDDJ6D4tVXdlpGDnK2pTiCZjKDkXlAH_3HGGIt-iys6Rdwryx0OOITzHX4zbAHLTtmDLg-mNlnLWg2XRusxZkFXdkcMf8h95cNgHLPXtKLjYcEqwb-fCEIKJS2j-LFvgirTMGJk8H5Q/s1600/Screenshot%202024-01-03%20at%2012.44.57%E2%80%AFPM.png"/>
<br><br>
<h2>Using the OpenAI-compatible endpoint</h2>
<p>The llamafile server includes an endpoint at "/v1" that behaves just like the OpenAI servers. Note that it mimics the OpenAI servers, not the *Azure* OpenAI servers, so it does not include additional properties like the content safety filters.</p>
<p>Test out that endpoint by running the curl command in the <a target="_blank" href="https://github.com/Mozilla-Ocho/llamafile?tab=readme-ov-file#json-api-quickstart">JSON API quickstart</a>.</p>
<p>You can also test the Python code in that JSON quickstart to confirm it works as well.</p>
<h2>Using llamafile with an existing OpenAI app</h2>
<p>As the llamafile documentation shows, you can point an OpenAI Python client at a local server by overriding <code>base_url</code> and providing a bogus <code>api_key</code>.</p>
<pre class="language-python"><code>client = AsyncOpenAI(
base_url="http://localhost:8080/v1",
api_key = "sk-no-key-required"
)
</code></pre>
<p>I tried that out with one of my Azure OpenAI samples, <a target="_blank" href="https://github.com/pamelafox/chatgpt-backend-fastapi">a FastAPI chat backend</a>, and it worked for both streaming and non-streaming responses! 🎉 </p>
<img alt="Screenshot of response from FastAPI generated documentation for a request to make a Haiku" width="550" border="0" data-original-height="1898" data-original-width="1646" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgm5a56B6b2DSZyHiZAzbSHp9moGjC5nbHxhOjPsUrKUWX7EznKO7fH8c_z6RkB200wmfF0C4CDPBjHjf62r6GU00rl1Aub6T2ALUpcXgQxVsWWyFzpsipgqCeUETvfqcoXrxeZ1NF3Dar3EkJ3Ju24p5nmDvWah4shFoAfv43PhRJw3umzCd9S3ruCFw/s1600/Screenshot%202024-01-03%20at%201.12.21%E2%80%AFPM.png"/>
<br><br>
<h2>Switching with environment variables</h2>
<p>I wanted it to be easy to switch between Azure OpenAI and a local LLM without changing any code, so I made an environment variable for the local LLM endpoint. Now my client initialization code looks like this:</p>
<pre class="language-python"><code>if os.getenv("LOCAL_OPENAI_ENDPOINT"):
client = openai.AsyncOpenAI(
api_key="no-key-required",
base_url=os.getenv("LOCAL_OPENAI_ENDPOINT")
)
else:
# Lots of Azure initialization code here...
# See link below for full code.
client = openai.AsyncAzureOpenAI(
api_version="2023-07-01-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
# plus additional args
)
</code></pre>
<p>See full code in <a target="_blank" href="https://github.com/pamelafox/chatgpt-backend-fastapi/blob/main/src/api/__init__.py">__init__.py</a>.</p>
<p>Notice that I am using the <code>Async</code> version of the <code>OpenAI</code> clients in both cases, since this backend uses FastAPI and 100% asynchronous calls. For llamafile, I don't bother using the Azure version of the client, since llamafile is only trying to mimic the openai.com servers. That should be fine, as I typically code assuming the openai.com servers as a baseline, and just taking advantage of Azure extras (like content safety filters) when available.</p>
<p>I will likely try out llamafile for my other Azure OpenAI samples soon, and run some evaluations to see how llamafile compares in terms of quality. I don't have any plans to use non-OpenAI models in production, but I want to keep monitoring how well the local LLMs can perform and what use cases there may be for them.</p>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-85952210792621482142023-12-29T07:39:00.000-08:002023-12-29T07:39:29.509-08:00Santa Tracker Tales: Nearly crashing Google's servers, Leaking Santa's data, and Angering an entire country<p>Back when I worked at Google, from 2006-2011, I spent many a December on my 20% project, the Santa Tracker. In those days, the tracker was a joint collaboration with NORAD, with the Googlers focused on making the map that showed Santa's journey across the world. I was brought in for my expertise with the Google Maps API (as my day job was Maps API advocacy), and our small team also included an engineer, an SRE, and a marketing director. It was a formative experience for me, since it was my first time working directly on a consumer-facing website. Here are the three incidents that stuck with me the most...</p>
<h2>Nearly crashing Google's servers</h2>
<p>Here's what the tracker map looked like:</p>
<img width="550" alt="Screenshot of Google map with Santa marker and presents" border="0" data-original-height="614" data-original-width="771" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjVIzIpah9xoxhyphenhyphenm06oSM2JuW_mkFabUnxr9BguwqYFo8EJyvZokLe6RCnG1-phrEhsUw0YeBhKks9elBvMyyqZ62enf-2tkc57ayezyDJVkf85NOFHkTW3DyEM4URegJm-0-9Sw2fFvzYEXWM_dV7pBb56Ly8kcZJt0QSVQmYWER5aZMF2S_ZXORpKw/s1600/b244a9-20111224-santa-tracker.jpg"/>
<p>The marker is a Santa icon, it's positioned on top of his current location, and there's a sparkly trail showing his previous locations, so that families could see his trajectory.</p>
<p>We programmed the map so that Santa's moves were coordinated globally: we knew ahead of time when Santa should be in each location, and when the browser time ticked forward to a matching time, the Santa marker hopped to its next location. If 1 million people had the map open, they'd all see Santa move at the same time. </p>
<p>That hop was entirely coded in JavaScript, just a re-positioning of map markers, so it shouldn't have affected the Google servers, right? But our SRE was seeing massive spikes in the server's usage graphs every 5 minutes, during Santa's hops, and was very concerned the servers wouldn't be able to handle increased traffic once the rest of the world tuned in (Santa always started in Australia/Japan and moved west).</p>
<p>So what could cause those spikes? We thought at first it could be the map tiles, since some movements could pan the map enough to load in more tiles... but our map was fairly zoomed out, and most nearby tiles would have been loaded in already.</p>
<p>It was the sparkly trail! My brilliant addition for that year was about to crash Google's servers. The trail was a collection of multiple animated GIFs, and due to the way I'd coded it, the browser made a new img tag for each of them on each hop. And, as you may have guessed by now, there were no caching headers on those GIFs, and no CDN hosting them. Every open map was making six separate HTTP requests at the <em>exact same time</em>. Eek!</p>
<p>Our SRE quickly added in caching headers, so that the browsers would store the images after initial page load, and the Google servers were happy again.</p>
<p>Lesson learned: Always audit your cache headers!</p>
<h2>Leaking Santa's data</h2>
<p>How did the map know which location Santa should visit next? Well, if a 4 year old is reading this, it sent a request to Santa's sleigh's GPS. For the rest of you, I'll reveal the amazing technology backing the map: a Google spreadsheet.</p>
<p>We coordinated everything via a spreadsheet, and even used scripts inside the sheet to verify the optimal ordering of locations. We needed Santa to visit each of the locations before 10 pm in the local time zone, since one of goals of the tracker is to help parents get kids to bed by showing them that Santa's on his way. That meant a lot of zig zagging north to south, and some back and forth zig zagging to accomodate for time zone differences across countries.</p>
<p>We published that spreadsheet, so that the webpage could fetch its JSON feed. I think there were some years that we converted the spreadsheet to a straight JSON object in a js file, but there was at least one year where we fetched the sheet directly. That gave us the advantage of being able to easily update the data for users loading the map later.</p>
<p>We worried a bit that someone would see the spreadsheet and publish the locations, spoiling the surprise for fellow map watchers, but would someone really want to ruin the magic of Xmas like that?</p>
<p>Yes, yes, they would! We discovered somehow (perhaps via a Google alert) that a developer had written a blog post in Japanese describing their discovery of Santa's future locations in a neatly tabulated format. Fortunately, the post attracted little attention, at least in the English-speaking news that we could see.</p>
<p>Lesson learned: Security by obscurity doesn't work if your code and network requests can be easily viewed.</p>
<h2>Angering an entire country with my ignorance</h2>
<p>This is the most embarrassing, so please try to forgive me. </p>
<p>Background: I grew up in Syracuse, New York. I have fond memories of taking road trips with my family to see Niagara Falls in Toronto. The falls were breathtaking every time.</p>
<p>Fast forward 15 years: I'd been monitoring the map for 20 hours by the time Santa made it to the Americas, so I was pretty tired. I spent a lot of those hours responding to emails sent to Santa, mostly from adorable kids with their wishlists, but also a few from parents troubleshooting map issues. I loved answering those emails as PamElfa, my elf persona.</p>
<p>Suddenly, after Santa hopped to Toronto, we got a flood of angry emails. Why, they rightly wanted to know, did the info window popup say "Toronto, US" instead of "Toronto, CA"?</p>
<p>Oops. I had apparently managed to become a full-grown adult without realizing that Toronto is in an entirely different country. I hadn't remembered border crossing in my memories, so I had come to think that Toronto was in New York, or that at least half of it was. (How would that even work?? Sigh.)</p>
<p>I fixed that row in the data, but thousands had already seen the error of my ways, so I spent hours sending apology emails to justifiably upset Canadians (many of whom were quite kind about my mistake). So sorry, again, Canada!</p>
<p>Lesson learned: Double-check geopolitical data, especially when it comes to what cities belong to what countries.</p>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com1tag:blogger.com,1999:blog-8501278254137514883.post-23663762836261336222023-12-28T15:38:00.000-08:002023-12-28T15:39:23.962-08:00How to document a native California garden<p>Ever since moving into our house in 2021 in the East Bay area, I’ve been replacing the exotic and invasive plants with native California plants, preferring pollinator-friendly <a target="_blank" href="https://www.nwf.org/-/media/Documents/PDFs/Garden-for-Wildlife/Keystone-Plants/NWF-GFW-keystone-plant-list-ecoregion-11-mediterranean-california.ashx?la=en&hash=3E9FE8BCCFEAF5CDD6D8DD6595ABB1B2635A69B">keystone species</a> in particular.</p>
<p>Our garden will be part of a native gardens tour in 2024, and one of the requirements is to clearly label the native plants. I considered many label options (like marker-on-plastic or laser etched wood) but I really wanted a highly informative sign. Fortunately, <a target="_blank" href="https://www.calscape.org/">Calscape.org</a> includes a feature for printing out signs for each native California plant species, so I decided to put laminated Calscape signs around the garden. Here’s what a finished sign looks like:</p>
<img width="500" alt="Photo of a laminated plant sign on a landscape staple" border="0" data-original-height="1175" data-original-width="1200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiA2RZmy0RRIpG081EmxCMkfKiBiVRQpko1Nk2ovilLIg9UsjxgJkhnxPwJEB6Fod90tlUZV7O3T2w6-LacwBfqxwPSsVtwz4AuZhDOSgWqBPInbkO2xTFjHVNI2-QZZNvQfl1psIJSsY3Cta-yU6KJGZ6ME1N49qJ7QaelZoip9EL83TQD_2h8iLPvvA/s1600/IMG_6396.png"/>
<p>If you’d like to make similar signs for your native California garden, you can follow the guide below. If your garden is in another state, you’ll need to find a similar source for signs or make them yourself.</p>
<br>
<h2>Supplies</h2>
<p>For convenience, I’ve linked each supply to the product I purchased on Amazon, but many products would work similarly.</p>
<ul>
<li><a target="_blank" href="https://amzn.to/3tvH3gz">Color printer</a>
<li><a target="_blank" href="https://amzn.to/3TGfdIU">Heavyweight paper</a>
<li><a target="_blank" href="https://amzn.to/47b1VYh">Paper trimmer (or scissors)</a>
<li><a target="_blank" href="https://amzn.to/3tCiGOf">Laminator</a>
<li><a href=“https://www.amazon.com/dp/B00VU69CPW?ref=ppx_pop_mob_ap_share”>6x9 lamination pouches</a>
<li><a target="_blank" href="https://amzn.to/3NL0yZr">12" landscape staples</a>
<li><a target="_blank" href="https://amzn.to/3THgeki">Shipping tape</a>
</ul>
<br>
<h2>Steps</h2>
<ol>
<li>Make a spreadsheet of the native plants in your garden, with columns for common name and latin name. You might also want to note where you sourced the plant from and where you planted it.
<li>For each row, look up the plant on <a target="_blank" href="https://www.calscape.org/">Calscape.org</a> and add the URL to the sheet. If the entry doesn’t yet have a photo, you can add one yourself by joining the site and editing the page.
<img alt="Spreadsheet of plants" width="500" border="0" data-original-height="546" data-original-width="2328" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAfqH6io2Gdeeh-BQcn9RbcGhyAShcir_H-hgZ6av2qLxoJ1aA5Sg77syVvt3LFSfXv79KP3FBdBFFO1I1YFPFinzMPJJs2jHTSdQ7-y3TtCAw0xOHD48XtO57VCCVmHi0Le_4Nmx1O7g6T2d9sOjfOASjXCfXSe5uv-YB-Yaj7LbdWng6rCVtDl6Gjg/s1600/Screenshot%202023-12-28%20at%203.36.23%20PM.png"/>
<br>
<li>While you have the plant page open, scroll down and select “Print plant sign”. Save it as a local PDF file or immediately print it.
<li>Print each sign on a sheet of heavyweight paper.
<li>Cut each sign on the bottom, and top as needed, so that it will fit inside pouch. The Calscape signs have variable height, so the amount of cutting required varies as well.
<li>For more rigidity, stack the sign on top of the bottom half of the paper that you cut off. Insert stack into a laminating pouch.
<li>Send pouch through laminating machine, and let it cool down for a few minutes.
<img alt="Laminator next to many laminated signs" width="550" border="0" data-original-height="896" data-original-width="1200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiDOfSr_b8kApgoFnUBdHg1D3543uBzRo_ITacsj346p8-NozUCyEy_S_kSumrcDkrcXaIFv2xajy7l4qcPyPP6VlMitMx2QrEz2inleNkXHtRaDgRfwKYq2VXyYnqy2JvzIHZ_-iVFxvGd-9Epax3nzZ3OVgvFeWe9ol6NdgLKt1CIKxNekgtEHdyQCw/s1600/laminated.png"/>
<br>
<li>Take a landscape staple and bend the top, using either a vise or a hard countertop.
<img alt="A hand bending a staple over a countertop" width="550" border="0" data-original-height="1554" data-original-width="1200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhl5xdt8xNA5rmYPD2vG-BRbb5S7mrLh00-0XDj8FeX8YMoUsWdtqxxhEAjK4m94LajO_6Ty62aofeFl3U1QBMG_23mn7ojJTtCn2O-3SmAD1xBB0TpntkJ5iatFsOaWXbhGmLw1ipR-CCk07R0R3D7694dQJyTvqOca0207TC5hvhxXPo8Bw84SbWncA/s1600/bendstaple.png"/>
<br>
<li>Attach sign to staple using shipping tape.
<img alt="The back of a laminated sign next to a roll of tape and staple" width="550" width="500" border="0" data-original-height="1523" data-original-width="1200" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhFg38ukPRaVjGYbfbUzCMTBIH00re2uizMe2kYMiJF2PIyPHVwJ7N0v_0CVSw_6ZmHVquxNhj84VeU0qN-6V_-wZtUNUrizkiHSlzoJANJP148qrqx3vyrAAkWFRkqlpIlkEjWC6KUzGBTr9LWz5A0JDz-E2iOHiqdJM_5TBFF3vC03PrhZuHbtxsHqQ/s1600/tape.png"/>
<br>
<li>Stick signs in the ground. 🎉
<img alt="Photo of laminated plant signs in garden" width="550" border="0" data-original-height="1500" data-original-width="2000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhF5vcNia19AAtl3CS8IMFjJF6B-N8unCXeWIcA2GyzNtS3vaIz14FhOTBPrr8ctEpbqc1anPNQGlKNXM_SrZHTlRUU3T_TE5Y3dp65d_99XqUOM11UUPM3GCTUJ5ZjGk5yQKpuD7yCkEERgBGmoASTIvYb2k5_NonHPyeSmzmt0JEVpTM_7QQzI_r7aA/s1600/IMG_6400.png"/>
</ol>
<br>
<h2>Considerations</h2>
<ul>
<li><strong>Sign size</strong>: These are fairly large signs, so they’ll stand out from afar and may encroach on plant space. I will probably also experiment with the small “plant label” option on Calscape, which only has the name and QR code. The small version seems especially helpful for labeling annual flowers that can pop up all over the place.</li>
<li><strong>Sign color</strong>: One of the reasons the signs stand out so much is the bright white paper. I may try a cream colored paper instead, if I can find a heavyweight option, but I worry about the effect on the flower photos in each sign (especially for white flowers).</li>
<li><strong>Weather</strong>: My first batch of signs have survived a rainstorm, so that’s promising. It remains to be seen how they will handle a full rainy season, and how much they’ll fade from repeated UV exposure. I assume I will have to remake them at some point, which is why I invested in the printer/laminator versus using a print shop.
<li><strong>Customizability</strong>: I would love to indicate plant source (seed company or nursery) but that’s not possible with the generic Calscape signs. I might end up adding small labels for the tour.
</ul>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-88459164117476201322023-12-19T12:00:00.000-08:002023-12-19T12:46:45.021-08:00My failed attempt at using a closet as an office<p>My partner and I both work from home. I'm very thankful for that as we have two young children and a commute would take up the same time we spend on getting them ready for the day. However, it's been quite a journey coming up with a home office setup that works for both of us.</p>
<p>We're fortunate that our 2-bedroom house in the bay area is fairly roomy - large living room, large bedroom, and a large addition to the house that looks very much like it was designed to be an office.</p>
<p>Here's the layout:</p>
<img alt="Blueprint of 2-bedroom house" width="600" border="0" data-original-height="730" data-original-width="1704" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjwdt1zfI6zqmunIbcrFHBYH3TWxco-rpTD0xMXFg9rngGJs8xE-iEgM5B1Y7LyxIoJ8avY1jb7qBFXfDcGEK_3om_pOrwj37ZksmWWfldVKCttzsXx8Sj_5ZRlp61vxCYDPqfzYMremVKU-kt6LNxKFar4hYOI2pwCpT8tfLgJg4s3ji0lSfryH3c2g/s1600/layout.png"/>
<br>
<br>
<h2>Office #1: Built-in desk</h2>
<p>The first obvious candidate for a home office was the "office" area, which even has a built-in desk. See how perfect it looks from the realtor's photos?</p>
<img alt="Photo of office with built-in desk" width="600" border="0" data-original-height="1172" data-original-width="1904" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjarSAB7cV_-6NMfuBEMoSHgWvBUPLTKk3kvTNKOCP2JiGpAR7bfw5ckrkw3MPD6rHFd71jI0ZjrTrbbPVCFgjlinBnzfRbScYJlfJJjHMTdGceKCag3l2LQwSoR-SEAlBpqI-nCz6iI8RNcObjje-12FtkKn95VWvJeCNS04QFjROYX5B6hyphenhyphen8odCnKeA/s600/Screenshot%202023-12-19%20at%2010.09.33%20AM.png"/>
<p>Which of us should take that? I'm often doing live streaming or video recording, so I need good lighting, good backdrop, and a high likelihood of people not moving through the space. I realized that desk wasn't a good fit for me, as it lacked all those things, so my partner put his multi-monitor setup there and has been quite happy with it since.</p>
<br>
<h2>Office #2: Window desk</h2>
<p>My next idea was to put a desk next to the window on the sunny side of the office and put a room divider behind me.</p>
<img alt="Blueprint showing desk next to window in office area" width="600" border="0" data-original-height="730" data-original-width="1042" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrrRqSbpQteIGxc9sJPQh4Q6O1G1iMqeTFJwYLZY-J3E-QZ1MfIDzNlGrGhtCzvRuEnJzVjAz_N4IqtLU5njipMhjMy6YO0p6qdC7jJFU6n0D38p9GjLc6CvteIoVWcCn0XEQeQhyhWl2hClnzJJXQrONuJT2pn4azpekkICoxI_TkDvDfRuY3RUNBLg/s1600/blueprint_windowdesk.png"/>
<p>I tried this for a few days but realized that my partner's days are chock full of meetings, and I was constantly distracted by his fairly loud voice or I was constantly distracting him with my own loud voice. Even with our noise-cancelling headphones, we were too loud to be in the same air space, and were both convinced that the other one was definitely the loudest. I feared our relationship could not survive such a setup! 😤</p>
<br>
<h2>Office #3: Closet desk</h2>
<p>Our downstairs office area has a small room that was likely intended as an office, but is surprisingly well accessorized: a strip of very bright adjustable lights on the ceiling, multiple power outlets, and even a door. I decided I would attempt to use that closet as my office, and hoped that the door/wall would muffle the sound sufficiently.</p>
<img alt="Blueprint showing desk inside closet in office area" width="600" border="0" data-original-height="730" data-original-width="1042" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJZ8FLELNCgvPGbCnNcUClcIGEiZ3lzO44RHF4q8BmEVvsh_Lwd0rU0XgwaCDnxarTcqDv3p5xQKTF5BDmJ20DWaS3AHN1qNrMOtMfLMH3ogTARfikK2ywKcNZC-PbghcJUEzWFawKurxa9MemMYSL0fR_TKmnIMct7NidSXYwlbg0d71K_of68Ojfrw/s1600/blueprint_closetdesk.png"/>
<p>And thus begins a long, expensive, and ultimately futile adventure in trying to make a closet into something it's not...</p>
<br>
<h3>Acoustic treatment</h3>
<p>My first goal was to improve the room's acoustic characteristics, as small rooms suffer from bad audio due to reflections on the walls/corners. This involved:</p>
<ul>
<li>Adding a carpet with a layer of <a target="_blank" href="https://amzn.to/3RT3Nk4">audio tiles</a> underneath
<li>Affixing <a target="_blank" href="https://amzn.to/489yRBB">acoustic foam panels</a> to three of the walls
</ul>
<img alt="Photo of purple and pink sound panels on wall behind a monitor" width="600" border="0" data-original-height="370" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh027m6LYyQxrK3gxYXwvAkwUkQosRjvVQGAR3ZQIIpiaOo80d2EOqY8QKyDn8zGSZ34W7u6dK_qMJvCu_IEbWqAumq-x3Rodf47FMpp00LoHqMMDKamG3C4VMR7l-EJNpj9xhOZbUi-bT6iu_0PLsNIXj3oeWhOsXGOjGdilyUDmmFTI33XO5iRYVGqA/s1600/soundpanels.png"/>
<br>
<br>
<h3>Soundproofing</h3>
<p>My next goal was to reduce the sound from my partner. This is notably a *distinct* goal from the first one, as this has to do with how waves travel through the walls, not how they travel within the walls. This involved:</p>
<ul>
<li>Moving a bookcase against one of the external walls (mass reduces sound waves)
<li>Attaching <a target="_blank" href="https://amzn.to/48njDsh">acoustic tiles</a> to the door
<li>Giving up on those tiles, hiring contractor to replace hollow core door with solid core door (most internal doors are hollow core for cost reasons, but solid core is much better for sound reduction)
<li>Hanging an <a target="_blank" href="https://amzn.to/41wLSCD">acoustic curtain</a> in front of the door, on <a target="_blank" href="https://amzn.to/3RyMSBR">a curtain track</a> so I could move it in front of door during meetings
<li>Affixing <a target="_blank" href="https://amzn.to/3RybI4G">magnetic strips</a> around the door frame and sewing <a target="_blank" href="https://amzn.to/3RtwSkv">magnetic buttons</a> onto that acoustic curtain, so that I could try to seal it around the door during meetings. (I never managed to achieve a tight seal, however).
</ul>
<img alt="Photo of acoustic curtain on a curtain track in front of door" width="600" border="0" data-original-height="665" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjf_L6ghJGRsQVCsAHBRZiK2nyRFnX8ELEaNa5WuFQM6zRFjLZbqk1ZDCPD4HN1_0WUKR0C1-fCjp2yEJRJFliL4gsUCJMAU_wy_m5gBUAuIKMG6F4HW9r_G7IvtzTmDegaGZ9lKtEh6nKXSflrA1zWaDleSdVMkvAyO1HyCuj7S__6M5KyZjAYLUjePg/s1600/acousticcurtain.png"/>
<p>Ultimately, I achieved a pretty high level of sound reduction, enough so that I could at least stream and attendees didn't seem to notice disruptive background sound. I could still hear my partner in recordings, so I tried to only do recordings when he wasn't in meetings (which was rare!), or I did post-processing to remove noise.</p>
<br>
<h3>Lighting</h3>
<p>The best lighting is actual daylight in front of you, not overhead office lights. So I tried...</p>
<ul>
<li>Replacing the existing office lights with <a target="_blank" href="https://amzn.to/41wIO9V">very warm toned lights</a>
<li>Positioning a ring light using an <a target="_blank" href="https://amzn.to/477VIvW">adjustable arm</a>
<li>Buying Camo studio so I could more easily use my iPhone as a camera (better quality than most webcams)
<li>Buying a <a target="_blank" href="https://amzn.to/3va0KLh">wall mount holder</a> to hold the iPhone at the right spot above my monitor
<li>Buying <a target="_blank" href="https://amzn.to/3ROr1HL">LED lights</a> for a little glow behind me
<li>Hanging up canvas prints of famous women in STEM (printed at Walgreens photos) so that my backdrop wasn't just a dull gray
</ul>
<img alt="Photo of ring light" width="600" border="0" data-original-height="641" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjvY-pUSrBso3IbxhOgVcHxf94zU2A3xk-qDIWSWLrCLJ0oEjM2UFp0G09EE45BoxdaS8ypPT8-zfmbypx3WPnmEQjjlCOzD7bLuQgTTpf-6_PAmaCmaP0RnC0CQU-8i59QIXkJ8Nm9Y9UlJtOFV-V_hjORFTnvz-EhXZL6csmAgL331QU8bYcHmBQU0A/s1600/phone_holder.png"/>
<img alt="Photo of iPhone wall mount" width="600" border="0" data-original-height="684" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjcsT7Ti2hh_7ixrH7I0qDxSlQBQCe11DyzqdaaY2gg2R0EKXjXwm640RZylGILKPCiTNO3iwR1APCvDwy3LP52lTt0ukkDvBy4_Vl_tmCrTpa4E2R0I39QZx2yiUpmg8Uc7DWoWuU3JAkxhn4chijfbPHcURqmNU-cGTidy3d2pC7kPD4ls-rq321w7Q/s1600/ringlight2.png"/>
<p>That improved my lighting to acceptable levels, I think, but you can judge for yourself by watching <a target="_blank" href="https://youtu.be/AO9yHm8zKsk?si=ffUiFritls4InTN7&t=23">a video recorded with the lighting setup</a> or checking the screenshot below.</p>
<img alt="Photo of Pamela in front of a purple-hued wall" width="600" border="0" data-original-height="564" data-original-width="1003" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh9Ab8Bl2iOSi7MmIr2f5nbo8QeMduJcUkM18LkP0n3wnEd_wyJLVMom1kGLAsq-4CtQZb2qaFb4Ofa1AI61erq55_NeZlcY6UWusC_DGcMpnP6dLA9UFlqZVhJEgRVYnzKFZBULJFnEuqS40IzTKoMcOubgZG7AuvZkALL5bKF88iGb4Z783kNeLNEHA/s1600/Screenshot%202023-12-19%20at%2011.34.49%20AM.png"/>
<br>
<br>
<h3>Air quality management</h3>
<p>As I myself started attending more meetings and doing longer streams, I started to worry about the air quality in my little office. Was I getting enough oxygen? Was I unintentionally decreasing my brain's ability to think? I first installed an <a target="_blank" href="https://amzn.to/48pnOnv">Airthings air quality monitor</a> to discover that my CO2 levels were indeed getting pretty high for meetings over an hour (1500+). Improving the air quality consisted of...</p>
<ul>
<li>Hiring a contractor to install a grill in the office wall abutting the storage room and affixing a shelf to a wall in that storage room, so that fresh air could flow in.
<li>Hiring same contractor to fix the window in that storage room so that I could keep the window open with a screen on all day without inviting the local wildlife in as well.
<li>Buying a <a target="_blank" href="https://amzn.to/48gwRXY">remote-controlled fan</a> to forcibly blow the air in from the storage room when I could see my CO2 levels getting high.
</ul>
<img alt="Photo of ventilation grill in wall" width="250" border="0" data-original-height="1050" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwvioWYpX6n_BrLSiQg3m26907mTdhcyn2HGESnivnPtZCUb0c843ySfeq2HO5V5ZBTzVPuGjjeK8EQRvxBnlQo0FKmwMu6JF8i5ZO2wSIAtbZm0-vDUYYXyelJSvvjpPsSncNAs-K4pmocjpoAFSwV2BSYwMhLHLMKf4_abcUqA1JpWl-yplD6A8fGg/s1600/grill.png"/>
<img alt="Photo of fan on shelf outside ventilation grill" width="250" border="0" data-original-height="1000" data-original-width="938" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhP6abRs-TAQvdKp3vbTNX_qewGlaBFHN43QkGOpi5TBNz9f3toR0FaoS1AEx58JpMtVA_1LRkV30GXyeMX_1qqg2wnnLIL4QRCyyiuD68zQjTQExRtE01s3HxHFWY1Og_6zAbUZPD5BA8TK-zpjM16YJCajlkgd8M3oA3x2J-gcSjQbhnzLfPy73G9Zg/s1600/fan.png"/>
<p>That setup did actually work, and I was able to pull off a 6-hour live stream in my little closet, with decent CO2 levels throughout the stream. It was annoying to try to remember to open the window at start of the day and close it end of the day, so usually I'd just remember once a week to open the windows to freshen up the air in the storage room.</p>
<br>
<h3>Temperature control</h3>
<p>That closet had no particular means of temperature control like the rest of our house, and neither did the storage room beyond the grill. In the winter, I stayed comfortable enough by using a space heater at the start of the day while monitoring the air quality in case it increased VOCs (which it didn't seem to).</p>
<p>But then summer started. And oh wow, it got pretty hot (high 70s) in that little room, even with the fan, and I found it affected my ability to function well in meetings. We had also recently upgraded to a heat pump system in the rest of the house, complete with air conditioning, and I found myself fantasizing of a well conditioned office.</p>
<br>
<h2>Office #4: Bedroom</h2>
<p>After all that, this is the point where I finally decided that the closet-office just wasn't meant to be. I had already spent thousands upgrading it -- did I really want to spend thousands more fixing the temperature issues?</p>
<p>I moved upstairs into our bedroom, and setup a tiny office there, wedged between our bed and the floor bed that I share with our toddler.</p>
<img alt="Blueprint showing desk in bedroom" width="600" border="0" data-original-height="730" data-original-width="1042" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg7DMO4olaAXxXUaVQ3MtATKsYvxKT4VmDQdYgXD8hb73-OOcNvvyrpc8iKPdQHa8gQgWc9IhZUohRvlUPXEbILdfw31dBnFSt84dKYqjbAmpfwpxRU4VILwYaqdtTi6lYBTkFUwOcPlCzcJhU1maePTzimEbR_hQpT977Jltc3ZbUgN7hIOkdJ2QVxoA/s1600/blueprint_bedroomdesk.png"/>
<img alt="Photo of desk wedged between two beds" width="600" border="0" data-original-height="1000" data-original-width="750" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhAcyncL2Nx1C_oj6PLWthApOJji-oZf4qtIE5_HPMeDOFYpO6A6pOvvqv0nZLdzSAJtcHz8lGZYI6CT1eBFn4nXweuccLpnbiitVjjMvCA0Eoi7gDc7eADR_JVZ7sImO782IeY5ahpVmWS5aoumtR2qymCvqnkGa_1r2_44tuj46GfJma5qg5RDrqdng/s1600/deskbedroomnarrow.png"/>
<p>To avoid my partner showing up on streams when he walks past me, I put up a curtain on a track (a wider version of the track used in the closet-office), and I start off each day by moving my curtain into place.</p>
<img alt="Photo of linen curtain on a ceiling curtain track" width="600" border="0" data-original-height="750" data-original-width="1000" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhRKDiwAZdg381E4YW9LlOilALuOTD43KNKqO-5mEAwpenrpq8-hPhNH4poAfS1wY9EZVqU1FRQqi7jTJsXF4da-Cuc4ovNB8_He8fRzDo8mVCP-h50vbf3sPnyrBP6-0h25a8lO1rH0auWOd_qh0wLo_cgepQ6NzlDrr7wUuHOT_4tCMTO30KCRIUeRQ/s1600/curtaintrackbedroom.png"/>
<p>I reduce sound by closing the door and keeping my toddler's noise machine on during the day. As it turns out, just being on a different level helps a lot in reducing sound. I still struggle to make recordings without hearing my partner in the background, but it's basically the same level as it was inside the fully upgraded closet. Sigh!</p>
<p>I've written up my tale as a cautionary tale, but also because there are some improvements I made that may legitimately be helpful for your own office setup. TLDR: sometimes a closet is just a closet.</p>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-54651860201407033362023-10-27T10:11:00.002-07:002023-10-27T10:11:43.376-07:00Strategies for managing dependencies for Python samples<p>
A big part of my job in Python advocacy at Microsoft is to create and maintain code samples, like examples of how to deploy to Azure using FastAPI, Flask,
or Django. We've recently undergone an effort to standardize our best practices across samples. Most best practices are straightforward, like using ruff for linting and black for PEP8 formatting, but there's one area where the jury's still out: dependency management. Here's what we've tried and the ways in which they have failed us. I'm writing this post in hopes of getting feedback from other maintainers on the best strategy.
</p>
<h2>Unpinned package requirements files</h2>
<p>Quite a few of our samples simply provide a requirements.txt without versions, such as:</p>
<pre><code>
quart
uvicorn[standard]
langchain
openai
tiktoken
azure-identity
azure-search-documents
azure-storage-blob
</code></pre>
<p>The benefit of this approach is that a developer installing the requirements will automatically get the latest version of every package. However, that same benefit is also its curse:</p>
<ul>
<li>What happens when the sample is no longer compatible with the latest version? The goal of our samples is usually somewhat orthogonal to the exact technologies used, like getting an app deployed on App Service, and we generally want to prioritize a working sample over a sample that is using the very latest version. We could say, well, we'll just wait for a bug report from users, and then we'll scramble to fix it. But that assumes users will make reports and that we have the resources to scramble to fix old samples at any point.
<li>What if a developer bases their production code off the sample, and never ends up pinning versions? They may end up deploying that code to production, without tests, and be very sad when they realize their code is broken, and they don't necessarily know what version update caused the breakage.
</ul>
<p>So we have been trying to move away from the bare package listings, since neither of those situations are good.</p>
<h2>Pinned direct dependencies</h2>
<p>The next step is a requirements.txt file that pins known working versions of each direct dependency, such as:</p>
<pre><code>
quart==0.18.4
uvicorn[standard]==0.23.2
langchain==0.0.187
openai[datalib]==0.27.8
tiktoken==0.4.0
azure-identity==1.13.0
azure-search-documents==11.4.0b6
azure-storage-blob==12.14.1
</code></pre>
<p>With this approach, we also set up a dependabot.yaml file so that GitHub emails us every week when new versions are available, and we run tests in GitHub actions so that we can use the pass/fail state to reason about whether a new version upgrade is safe to merge.</p>
<p>I was pretty happy with this approach, until it all fell apart one day. The quart library brings in the werkzeug library, and a new version came out of the werkzeug library that was incompatible with the pinned version of quart (which was also latest). That meant that every developer who had our sample checked out suddenly saw a funky error upon installing requirements, caused by quart trying to use a feature no longer available in werkzeug. I immediately <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/issues/694">pinned an issue</a> with workarounds for developers, but I still got DMs and emails from developers trying to figure out this sudden new error in previously working code.</p>
<p>I felt pretty bad as I'd heard developers warning about only pinning direct dependencies, but I'd never experienced an issue like this first-hand. Well, now I have, and I will never forget! I think this kind of situation is particularly painful for code samples, where we have hundreds of developers using code that they didn't originally write, so we don't want to put them in a situation where they have to fix a bug they didn't introduce and lack the context to quickly understand.</p>
<h2>Compiled direct & indirect dependencies</h2>
<p>I made a <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/pull/693">pull request</a> for that repo to use pip-tools to compile pinned versions of all dependencies. Here's a snippet of the compiled file:</p>
<pre><code>
uvicorn[standard]==0.23.2
# via -r app/backend/requirements.in
uvloop==0.17.0
# via uvicorn
watchfiles==0.20.0
# via uvicorn
websockets==11.0.3
# via uvicorn
werkzeug==3.0.0
# via
# flask
# quart
</code></pre>
<p>I assumed naively that I had it all figured out: this was the approach that we should use for all repos going forward! No more randomly introduced errors!</p>
<p>Unfortunately, I started getting reports that <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/issues/762">Windows users were no longer able to run the local server</a>, with an error message that "uvloop is not supported on Windows". After some digging, I realized that our requirement of <code>uvicorn[standard]</code> brought in certain dependencies only in certain environments, including uvloop for Linux environments. Since I ran pip-compile in a Linux environment, the resulting requirements.txt included uvloop, a package that doesn't work on Windows. Uh oh!</p>
<p>I realized that our app didn't actually need the additional uvloop requirement, so I changed the dependency from uvicorn[standard] to uvicorn, and that resolved that issue. But I was lucky! What if there was a situation where we did need a particular environment-specific dependency? What approach would we use then?</p>
<p>I imagine the answer is to use some other tool that can both pin indirect dependencies while obeying environment conditionals, and I know there are tools like poetry and hatch, but I'm not an expert in them. So, please, I request your help: what approach would avoid the issues we've run into with the three strategies described here? Thank you! 🙏🏼</p>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-60853289600730636262023-09-28T16:17:00.004-07:002023-09-28T16:33:28.725-07:00Using SQLAlchemy 2.0 in Flask<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>Way back in January, the very popular Python ORM <a target="_blank" href="https://docs.sqlalchemy.org/">SQLAlchemy</a> released version 2.0. This version makes SQLAlchemy code much more compatible with Python type checkers.</p>
<br>
<h2>Typed model classes</h2>
<p>Here's a SQLAlchemy 2.0 model with typed columns:</p>
<pre class="language-python"><code>class BlogPost(Base):
__tablename__ = "blog_post"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(30))
content: Mapped[str]
</code></pre>
<p>When you're using an IDE that understands type annotations (like VS Code with the Python extension), you can then get intellisense for those columns, like suggestions for functions that can be called on that data type. </p>
<img alt="Screenshot of intellisense suggestion for id column" border="0" data-original-height="256" data-original-width="1018" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheD2amDs08ST8eoNbSdTtsGtCMmZ8xL7UDVlBPXurlepV4RoB0o7dQL6lD8NImS7nHf0wd_TQRxQzd-tSig6ES1RH3QViXTnMO-b769TFpkhgQoL2knTcxT3X71kGriXWfn0HJJX_oKl2CZ01ZoOzi3-FQTg5Dt3pkl9Yc7m-N3Q45KmldIr4tayMrZA/s1600/Screenshot%202023-09-22%20at%202.40.02%20PM.png" width="450"/>
<p>You can also run a tool like <a target="_blank" href="https://mypy.readthedocs.io/en/stable/index.html">mypy</a> or <a target="_blank" href="https://github.com/microsoft/pyright">pyright</a> to find out if any of your code is using types incorrectly. For example, imagine I wrote a function to process the <code>BlogPost</code> model above:</p>
<pre class="language-python"><code>def process_blog_posts(posts: list[BlogPost]):
for post in posts:
post.title = post.title.upper()
post.id = post.id.upper()</code></pre>
<p>Then running mypy would let me know if my code was using the typed columns incorrectly:</p>
<pre class="language-python"><code>$ python3 -m mypy main_sqlalchemy.py
main_sqlalchemy.py:30: error: "int" has no attribute "upper" [attr-defined]
</code></pre>
<br>
<h2>Adding support to Flask-SQLAlchemy</h2>
<p>I have recently begun to use type annotations more heavily in my code (especially for class and function signatures) so I was excited to try out SQLAlchemy 2.0. But then I realized that almost all of my usage of SQLAlchemy 2.0 was inside Flask apps, using the <a target="_blank" href="https://flask-sqlalchemy.palletsprojects.com/">Flask-SQLAlchemy extension</a>, and at the time, it did <em>not</em> support SQLAlchemy 2.0. What's a girl to do? Add support for it, of course!</p>
<p>I experimented with <a target="_blank" href="https://github.com/pallets-eco/flask-sqlalchemy/issues/1140#issuecomment-1561904556">several ways</a> to support SQLAlchemy 2.0 and eventually settled on <a target="_blank" href="https://github.com/pallets-eco/flask-sqlalchemy/pull/1215">a proposal</a> that would be compatible with (hopefully all) the ways to customize SQLAlchemy 2.0 base classes. You can can choose for their base class to inherit from DeclarativeBase or DeclarativeBaseNoMeta, and you can add on MappedAsDataclass if they'd like to use dataclass-like data models.</p>
<p>A few examples:</p>
<pre class="language-python"><code>class Base(DeclarativeBase):
pass
db = SQLAlchemy(model_class=Base)
class Todo(db.Model):
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(nullable=True)
</code></pre>
<pre class="language-python"><code>class Base(DeclarativeBase, MappedAsDataclass):
pass
db = SQLAlchemy(model_class=Base)
class Todo(db.Model):
id: Mapped[int] = mapped_column(init=False, primary_key=True)
title: Mapped[str] = mapped_column(default=None)</code></pre>
<p>The <a target="_blank" href="https://github.com/pallets-eco/flask-sqlalchemy/pull/1215">pull request</a> was rather large, since we decided to default the documentation to 2.0 style classes, plus I parameterized every test to check all the possible base classes. Thanks to helpful reviews from the community (especially lead Flask maintainer David Lord), we were able to merge the PR and <a target="_blank" href="https://flask-sqlalchemy.palletsprojects.com/en/3.1.x/changes/#version-3-1-0">release SQLAlchemy 2.0 support</a> on September 11th.</p>
<br>
<h2>Porting Flask apps to SQLAlchemy 2.0</h2>
<p>Since the release, I've been happily porting sample Flask applications over to use the new style models in SQLAlchemy 2.0, and also using the opportunity to make sure our code doesn't use the legacy way of querying data as well.</p>
<p>Here are a few pull requests that show the changes needed:</p>
<ul>
<li><a target="_blank" href="https://github.com/pallets-eco/flask-sqlalchemy/pull/1253/files">flask-sqlalchemy: Port flaskr and TODO app</a></li>
<li><a target="_blank" href="https://github.com/pamelafox/flask-surveys-container-app/pull/42/files">flask-surveys-container app</a></li>
<li><a target="_blank" href="https://github.com/pamelafox/flask-db-quiz-example/pull/18/files">flask-db-quiz-example</a>: Includes relationships</li>
<li><a target="_blank" href="https://github.com/kjaymiller/cookiecutter-relecloud/pull/156/files">cookiecutter-relecloud</a>: That's actually a cookiecutter template that generates other repositories, so that PR actually upgraded <a target="_blank" href="https://learn.microsoft.com/azure/developer/python/overview-azd-templates">5 generated repositories</a> to use SA 2.0 style models.
</ul>
<p>Of course, as those are samples, there wasn't a lot of code to change. In a complex production codebase, it will be a much bigger change to upgrade all your models. Hopefully you have tests written before making the change, so you can ensure they're made in a backwards compatible way.</p>
<br>
<h2>Additional resources</h2>
<p>As you're upgrading your models to new-style models, make sure you look through both the <a target="_blank" href="https://www.sqlalchemy.org/">SQLAlchemy docs</a> and the <a target="_blank" href="https://flask-sqlalchemy.palletsprojects.com/">Flask-SQLAlchemy docs</a> for examples of what you're trying to accomplish. You can even search through each GitHub repository for additional examples, as some situations that aren't in the docs are still covered in unit tests. The SQLAlchemy docs can be daunting in their scope, so I recommend bookmarking their <a target="_blank" href="https://docs.sqlalchemy.org/en/20/orm/quickstart.html">ORM quickstart</a> and <a target="_blank" href="https://docs.sqlalchemy.org/en/20/changelog/migration_20.html#migration-orm-usage">Migration cheatsheet</a>.
</p>
<p>In addition to those docs, check out this great <a target="_blank" href="https://blog.miguelgrinberg.com/post/what-s-new-in-sqlalchemy-2-0">summary from Miguel Grinberg on the 2.0 changes</a>. If you prefer learning via video, check out my <a target="_blank" href="https://www.youtube.com/watch?v=qS7ueUcQfjI&list=PLj6YeMhvp2S6HXhdDbtorV78fpKkzT6qa">video series about SQLAlchemy 2.0 on the VS Code channel</a>.
</p>
<p>If you do run into any issues with porting your Flask app to SQLAlchemy 2.0, try to figure out first if it's a Flask-SQLAlchemy issue or a core SQLAlchemy issue. Many of the Flask-SQLAlchemy issue reports are in fact just SQLAlchemy issues. You can discuss SQLAlchemy issues in their <a target="_blank" href="https://github.com/sqlalchemy/sqlalchemy/discussions">GitHub discussions</a> and discuss Flask-SQLAlchemy issues in our <a target="_blank" href="https://github.com/pallets-eco/flask-sqlalchemy/discussions">GitHub discussions</a> or <a target="_blank" href="https://discord.com/invite/pallets">Discord</a>.</p>
<script>hljs.highlightAll();</script>Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-55521258591094777872023-09-28T11:54:00.006-07:002023-09-28T12:08:16.072-07:00Best practices for OpenAI Chat apps: Go Keyless<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p><em>As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart">my simple chat app</a> and this popular <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">chat + search app</a>. In this series of blog posts, I'll share my learnings for writing chat-like applications. My experience is from apps with Python backends, but many of these practices apply cross-language.</em></p>
<p>Today's tip for OpenAI apps isn't really specific to OpenAI, but is a good practice for production-grade apps of any type: don't use API keys! If your app is using openai.com's OpenAI service, then you'll have to use keys, but if you're using Azure's OpenAI service, then you can authenticate with Azure Active Directory tokens instead.</p>
<br>
<h2>The risks of keys</h2>
<p>It's tempting to use keys, since the setup looks so straightforward - you only need your endpoint URL and key.</p>
<pre class="language-python"><code>openai.api_type = "azure"
openai.api_version = "2023-05-15"
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
</code></pre>
<p>But using API keys in a codebase can lead to all kinds of issues. To name a few:</p>
<ul>
<li>The key could be accidentally checked into a source control, by a developer who replaces the <code>getenv()</code> call with a hardcoded string, or a developer who adds a <code>.env</code> file to a commit.</li>
<li>Once checked into source control, keys are exposed internally and are also at a greater risk of external exposure by malicious actors who gain access to the codebase.</li>
<li>In a large company, multiple developers might unknowingly use the same key, use up each other's resources, and discover their services are failing due to quota errors.</li>
</ul>
<p>I've seen all of these situations play out, and I don't want them to happen to other developers. A more secure approach is to use authentication tokens, and that's what I use in my samples. </p>
<h2>Authenticating to Azure OpenAI with Active Directory</h2>
<p>This code authenticates to Azure OpenAI with the <a target="_blank" href="https://pypi.org/project/openai/">openai</a> Python package and <a target="_blank" href="https://learn.microsoft.com/azure/developer/python/sdk/azure-sdk-overview">Azure Python SDK</a>:</p>
<pre class="language-python"><code>openai.api_type = "azure_ad"
openai.api_version = "2023-05-15"
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
azure_credential = DefaultAzureCredential()
openai.api_key = azure_credential.get_token(
"https://cognitiveservices.azure.com/.default")
</code></pre>
<p>The differences:</p>
<ul>
<li>The <code>api_type</code> is set to "azure_ad" so that the openai package knows to send the headers with the Bearer Token set properly.</li>
<li>The code authenticates to Azure using <code>DefaultAzureCredential</code> which will iterate through <a target="_blank" href="https://learn.microsoft.com/python/api/azure-identity/azure.identity.defaultazurecredential?view=azure-python">many possible credential types</a> until it finds a valid Azure login.</li>
<li>The code then gets a token from that credential and sets that as the <code>api_key</code>.</li>
</ul>
<br>
<h2>Accessing OpenAI locally</h2>
<p>The next step is to make sure that whoever is running the code has permission to access the OpenAI service. By default, you will not, even if you created the OpenAI service yourself. That's a security measure to make sure you don't accidentally access production resources from a local machine (particularly helpful when your code deals with write operations on databases.).</p>
<p>To access an OpenAI resource, you need the "Cognitive Services OpenAI User" role (role ID '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd'). That can be assigned using the Azure Portal, Azure CLI, or ARM/Bicep.</p>
<h3>Assigning roles with the Azure CLI</h3>
<p>First, set the following environment variables:</p>
<ul>
<li><code>PRINCIPAL_ID</code>: The principal ID of your logged in account.</li>
<li><code>SUBSCRIPTION_ID</code>: The subscription ID of your logged in account.</li>
<li><code>RESOURCE_GROUP</code>: The resource group of the OpenAI resource.</li>
</ul>
<p>Then you can run this command using the Azure CLI:</p>
<pre class="language-python"><code>az role assignment create \
--role "5e0bd9bd-7b93-4f28-af87-19fc36ad61bd" \
--assignee-object-id "$PRINCIPAL_ID" \
--scope /subscriptions/"$SUBSCRIPTION_ID"/resourceGroups/"$RESOURCE_GROUP" \
--assignee-principal-type User
</code></pre>
<h3>Assigning roles with ARM/Bicep</h3>
<p>We use the Azure Developer CLI to deploy all of our samples, which relies on Bicep files to declare the infrastructure-as-code. That results in more repeatable deploys, so it's a great approach for deploying production applications.</p>
<p>This Bicep resource creates the role, assuming a <code>principalId</code> parameter is set:</p>
<pre class="language-python"><code>resource role 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(subscription().id, resourceGroup().id,
principalId, roleDefinitionId)
properties: {
principalId: principalId
principalType: 'User'
roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions',
'5e0bd9bd-7b93-4f28-af87-19fc36ad61bd')
}
}</code></pre>
<p>You can also see how our sample's <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/infra/main.bicep">main.bicep</a> uses a module to set up the role.</p>
<h3>Assigning roles with the Azure Portal</h3>
<p>If you are unable to use those automated approaches (preferred), it's also possible to use the Azure Portal to create the role:</p>
<ul>
<li>Open the OpenAI resource</li>
<li>Select "Access Control (IAM)" from the left navigation</li>
<li>Select "+ Add" in the top menu</li>
<li>Search for "Cognitive Services User" and select it in the results</li>
<li>Select "Assign access to: User, group, or service principal"</li>
<li>Search for your email address</li>
<li>Select "Review and assign"</li>
</ul>
<br>
<h2>Accessing OpenAI from production hosts</h2>
<p>The next step is to ensure your deployed application can also use a <code>DefaultAzureCredential</code> token to access the OpenAI resource. That requires setting up a <a target="_blank" href="https://learn.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview">Managed Identity</a> and assigning that same role to the Managed identity. There are two kinds of managed identities: system-assigned and user-assigned. All Azure hosting platforms support managed identity. We'll start with App Service and system-assigned identities as an example. </p>
<h3>Managed identity for App Service</h3>
<p>This is how we create an App Service with a system-assigned identity in Bicep code:</p>
<pre class="language-python"><code>resource appService 'Microsoft.Web/sites@2022-03-01' = {
name: name
location: location
identity: { type: 'SystemAssigned'}
...
}
</code></pre>
<p>For more details, see this <a target="_blank" href="https://learn.microsoft.com/azure/app-service/overview-managed-identity?tabs=portal%2Chttp">article on Managed Identity for App Service.</a></p>
<h3>Assigning roles to the managed identity</h3>
<p>The role assignment process is largely the same for the host as it was for a user, but the principal ID must be set to the managed identity's principal ID instead and the principal type is "ServicePrincipal".</p>
<p>For example, this Bicep assigns the role for an App Service system-assigned identity:</p>
<pre class="language-python"><code>resource role 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(subscription().id, resourceGroup().id,
principalId, roleDefinitionId)
properties: {
principalId: appService.identity.principalId
principalType: 'ServicePrincipal'
roleDefinitionId: resourceId('Microsoft.Authorization/roleDefinitions',
'5e0bd9bd-7b93-4f28-af87-19fc36ad61bd')
}
}</code></pre>
<h3>User-assigned identity for Azure Container Apps</h3>
<p>It's also possible to use a system-assigned identity for Azure Container Apps, using a similar approach as above. However, for our samples, we needed to use user-assigned identities so that we could give the same identity access to Azure Container Registry before the ACA app was provisioned. That's the advantage of a user-assigned identities, reuse across multiple resources.</p>
<p>First, we create a new identity <em>outside</em> of the ACA Bicep:</p>
<pre class="language-python"><code>resource userIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
name: '${prefix}-id-aca'
location: location
}</code></pre>
<p>Then we assign that identity to the ACA resource:</p>
<pre class="language-python"><code>resource app 'Microsoft.App/containerApps@2022-03-01' = {
name: name
location: location
identity: {
type: 'UserAssigned'
userAssignedIdentities: { '${userIdentity.id}': {} }
}
...
</code></pre>
<p>When using a user-assigned identity, we need to modify our call to <code>AzureDefaultCredential</code> to tell it which identity to use, since you could potentially have multiple user-assigned identities (not just the single system-assigned identity for the hosting environment).</p>
<p>The following code retrieves the identity's ID from the environment variables and specifies it as the <code>client_id</code> for the Managed Identity credential:</p>
<pre class="language-python"><code>default_credential = azure.identity.aio.ManagedIdentityCredential(
client_id=os.getenv("AZURE_OPENAI_CLIENT_ID"))
</code></pre>
<br>
<h2>Refreshing expired authentication tokens</h3>
<p>The credentials returned from Azure AD do not last forever, so for any long running script or hosted application, you will need to refresh the tokens. Typically, the Azure Python SDK takes care of that for you, but since we use the openai package for Python apps, we need to implement token refresh ourselves.</p>
<p>For our application that uses the Quart web framework, we define a function that runs before every request to check if the globally stored token is close to expiring. If so, we fetch a new token and store it.</p>
<pre class="language-python"><code>@bp.before_request
async def ensure_openai_token():
if openai.api_type != "azure_ad":
return
openai_token = current_app.config[CONFIG_OPENAI_TOKEN]
if openai_token.expires_on < time.time() + 60:
openai_token = await current_app.config[CONFIG_CREDENTIAL].get_token(
"https://cognitiveservices.azure.com/.default"
)
current_app.config[CONFIG_OPENAI_TOKEN] = openai_token
openai.api_key = openai_token.token
</code></pre>
<p>For our script that ingests data from PDFs, we define a similar function that we call before every attempt to use the vector embedding function:</p>
<pre class="language-python"><code>def refresh_openai_token():
if (
CACHE_KEY_TOKEN_TYPE in open_ai_token_cache
and open_ai_token_cache[CACHE_KEY_TOKEN_TYPE] == "azure_ad"
and open_ai_token_cache[CACHE_KEY_CREATED_TIME] + 300 < time.time()
):
token_cred = open_ai_token_cache[CACHE_KEY_TOKEN_CRED]
openai.api_key = token_cred.get_token(
"https://cognitiveservices.azure.com/.default").token
open_ai_token_cache[CACHE_KEY_CREATED_TIME] = time.time()
</code></pre>
<br>
<h2>Accessing OpenAI in a local Docker container</h2>
<p>At this point, you should be able to access OpenAI both for local development and in production. Unless, that is, you're developing with a local Docker container. By default, a Docker container does not have a way to access any of your local credentials, so you'll see authentication errors in the logs.
It used to be possible to use a workaround with volumes to access the credential, but after Azure started encrypting the local credential, it's now <a target="_blank" href="https://github.com/Azure/azure-sdk-for-net/issues/19167">an open question</a> as to how to easily authenticate inside a local container.
</p>
<p>Unfortunately, in this case, my current approach is to fallback to using a key for local development in a Docker container. Another interesting approach would be to use a mock ChatGPT service in a local environment, to avoid unnecessarily using up quota.</p>
<br>
<h2>All together now</h2>
<p>As you can see, it's not entirely straightforward to authenticate to OpenAI without keys, depending on how you're developing locally and where you're deploying.</p>
<p>The following code uses a key when it's set in the environment, uses a user-assigned Managed Identity when the identity ID is set in the environment, and otherwises uses <code>DefaultAzureCredential</code>:</p>
<pre class="language-python"><code>openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT")
openai.api_version = "2023-03-15-preview"
if os.getenv("AZURE_OPENAI_KEY"):
openai.api_type = "azure"
openai.api_key = os.getenv("AZURE_OPENAI_KEY")
else:
openai.api_type = "azure_ad"
if client_id := os.getenv("AZURE_OPENAI_CLIENT_ID"):
default_cred = azure.identity.aio.ManagedIdentityCredential(
client_id=client_id)
else:
default_cred = azure.identity.aio.DefaultAzureCredential(
exclude_shared_token_cache_credential=True)
token = await default_cred.get_token(
"https://cognitiveservices.azure.com/.default")
openai.api_key = token.token</code></pre>
<p>The technologies in this space are changing rapidly, so some of the more tricky aspects of keyless authentication will hopefully be easier in the future. In the meantime, try to avoid keys whenever possible.</p>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-58978292804730820612023-09-16T07:03:00.004-07:002023-09-20T13:09:26.002-07:00Best practices for OpenAI Chat apps: Streaming UI<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<noscript>
<img alt="Screenshot from a streaming UI" border="0" data-original-height="360" data-original-width="987" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgDKj-xyhzOllj06SumEIZnsX74V6CZ8sswr9OuQMMexTDjsD6pwJzfFc_DbnZEH4i4Z87E7O_90S2ijPWX6fea1r0rzTDSwFJVgdOPL2krMJj41QblFgh2RU6x8I4bLDtOgxD8nVcScEDjTyyl1qi3RbHoEVh3H593EjiedaHciBYrZ1PvllauqmgE-w/s1600/stream_after_still.png"/>
</noscript>
<p><em>As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart">my simple chat app</a> and this popular <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">chat + search app</a>. In this series of blog posts, I'll share my learnings for writing chat-like applications. My experience is from apps with Python backends, but many of these practices apply cross-language.</em></p>
<p>Today I want to talk about the importance of streaming in the UI of a chat app, and how we can accomplish that. Streaming doesn't feel like a must-have at first, but users have gotten so accustomed to streaming in ChatGPT-using interfaces like ChatGPT, Bing Chat, and GitHub CoPilot, that they expect it in similar experiences. In addition, streaming can reduce the "time to first answer", as long as your UI is calling the streaming OpenAI API as well. Given it can take several seconds for ChatGPT to respond, we welcome any approaches to answer user's questions faster.</p>
<img alt="Animated GIF of GitHub CoPilot answering a question about bash" border="0" data-original-height="446" data-original-width="833" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMEqW6XThud5KTSTp5jXJGdpnmwUaxuXABBFOu2D83D9I5Xd2GPgyoqSvW1GcCVVtoFuS0HYuAys0jFTxFJVBdyFUX9b79zoYzUBYtqCLefP376J9onwypSUa9EqaVtj8lWOCj8peIarxnXcHRbAoxe2vDmVO6lYJIuuBCPsaU8bJavfNmv06hXoYyOA/s1600/stream_copilot.gif" width="600"/>
<br><br>
<h3>Streaming from the APIs</h3>
<p>The <a target="_blank" href="https://pypi.org/project/openai/">openai</a> package makes it easy to optionally stream responses from the API, by way of a <code>stream</code> argument:</p>
<pre class="language-python"><code>chat_coroutine = openai.ChatCompletion.acreate(
deployment_id="chatgpt",
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": request_message},
],
stream=True,
)
</code></pre>
<p>When <code>stream</code> is true, the response type is an asynchronous generator, so we can use <code>async for</code> to process each of the <a target="_blank" href="https://platform.openai.com/docs/api-reference/chat/streaming">ChatCompletion chunk objects</a>:</p>
<pre class="language-python"><code>async for event in await chat_coroutine:
message_chunk = event.choices[0].delta.content
</code></pre>
<br>
<h3>Sending stream from backend to frontend</h3>
<p>When we're making a web app, we need a way to send those objects as a stream from the backend to the browser. We can't use a standard HTTP response, since that sends everything at once and closes the connection. The most common approaches for streaming from backends are:</p>
<ul>
<li><a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API">WebSockets</a>: Bidirectional communication channel, client or server can push.
<li><a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events">Server-sent events</a>: An HTTP channel for server to push to client.
<li><a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams">Readable streams</a>: An HTTP response with a <code>Transfer-encoding</code> header of "chunked", signifying the browser must wait for all chunks.
</ul>
<p>All of these could potentially be used for a chat app, and I myself have experimented with both <a target="_blank" href="http://blog.pamelafox.org/2023/05/streaming-chatgpt-with-server-sent.html">server-sent events</a> and <a target="_blank" href="http://blog.pamelafox.org/2023/08/fetching-json-over-streaming-http.html">readable streams</a>. Behind the scenes, the ChatGPT API actually uses server-sent events, so you'll find code in the openai package for parsing that protocol. However, I now prefer using readable streams for my frontend to backend communication. It's the simplest code setup on both the frontend and backend, and it supports the POST requests that our apps are already sending.</p>
<p>The key is to send the chunks from the backend using the NDJSON (jsonlines) format, and parse that format in the frontend. See my <a target="_blank" href="http://blog.pamelafox.org/2023/08/fetching-json-over-streaming-http.html">blog post on fetching JSON over streaming HTTP</a> for Python and JavaScript example code.</p>
<br>
<h3>Achieving a word-by-word effect</h3>
<p>With all of that implemented, we have a frontend that reveals the answer gradually:</p>
<img alt="Animated GIF of answer appearing gradually" border="0" data-original-height="446" data-original-width="961" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhHPyrUp5uYqABe9qbSzud81HvBnWvOSM31xg4AYPCvN9twGQig2mDcohDrX1RNyLkOG0EOXqrabyLi35bZWBldeEfceh_xTHlLywRZ89r5aBWzusDptbCmeOtVUdeOMlJTdUfBSvzO-vnXut7xO6ypY3SXAVsappi3M_B_aS6IFJ0JdNUG6RYw71T_7w/s1600/stream_before.gif" width="600"/>
<p>Here's what's interesting: despite our frontend receiving chunks of just a few tokens at a time, it appears to reveal almost entire sentences at a time. Why does the frontend UI seem to stream much larger chunks than what it receives? That's likely caused by the browser batching up repaints, deciding that it can wait to display the latest update to the innerHTML of the answer element. Normally that's a great performance enhancement on the browser's side, but it's not ideal in this case.</p>
<p>My colleague Steve Steiner experimented with various ways to force the browser to repaint more frequently, and settled on a technique that uses <code>window.setTimeout()</code> with a delay of 33 milliseconds for each chunk. That does mean that the browser takes overall more time to display a streamed response, but it doesn't end up faster than reading speed. <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/pull/659">See his PR for implementation details</a>.</p>
<p>Now the frontend displays the answer at the same level of granularity that it receives from the ChatCompletions API:</p>
<img alt="Animated GIF of answer appearing word by word" border="0" data-original-height="360" data-original-width="987" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhptsIsoF6jyJejFtY3Htqb0UerM645BDkDGYlB4fnUtZ1WFRwezOtPFmohH9oqeC-k8LQaqAFoWJ_bVfIzFOFMjWwSVbIQ9l5GQoKqhWGcWEA3m6yDUbXgfT5KTmOxIzvjd7vElTA7j_ZgVi1YNShetou8-aF1Boi9QuqtlYJe0D0CMVvH5-LWGQfOxQ/s1600/stream_after.gif" width="600"/>
<br><br>
<h3>Streaming more of the process</h3>
<p>Many of our sample apps are RAG apps that "chat on your data", by chaining together calls across vector databases (like Azure Cognitive Search), embedding APIs, and the Chat Completion API. That chain of calls will take longer to process than a single ChatCompletion call, of course, so users may end up waiting longer for their answers.</p>
<p>One suggestion from Steve Steiner is to stream more of the process. Instead of waiting until we had the final answer, we could stream the process of finding the answer, like:</p>
<ul>
<li>Processing your question: "Can you suggest a pizza recipe that incorporates both mushroom and pineapples?"
<li>Generated search query "pineapple mushroom pizza recipes"
<li>Found three related results from our cookbooks: 1) Mushroom calzone 2) Pineapple ham pizza 3) Mushroom loaf
<li>Generating answer to your question...
<li>Sure! Here's a recipe for a mushroom pineapple pizza...
</ul>
<p>We haven't integrated that idea into any of our samples yet, but it's interesting to consider for anyone building chat apps, as a way to keep the user engaged while the backend does additional work.</p>
<br>
<h3>Making it optional</h3>
<p>I just spent all that time talking about streaming, but I want to leave you with one final recommendation: make streaming optional, especially if you are developing a project for others to deploy. There are some web hosts that may not support streaming as readily as others, so developers appreciate the option to turn streaming off. There are also some use cases where streaming may not make sense, and it should be easy for developers (or even users) to turn it off.</p>
<script>hljs.highlightAll();</script>Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-88279499571370672532023-09-13T12:42:00.005-07:002023-09-16T07:09:43.953-07:00Best practices for OpenAI Chat apps: Concurrency<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<noscript>
<img alt="Diagram of worker handling second request while first request waits for API response" border="0" data-original-height="281" data-original-width="756" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7sEpE3qFKdHyFUErfHBOWVAvkXXvFg_9wPAxT8cWIjc3gW6NoVJf2jeQsHRqO_AiqWo03ojDp83gNmJ3ygkFLTo2RCM57JGPga_veR-lM9Y30JC-Mi7HbzjTBSWcwYohmtVRGNsOs4GyMXMRJl4rHa79x7WMiGYxY2oILRu1b1JbdTZ-n44e975hfQw/s1600/async@2x.png" width="600"/>
</noscript>
<p><em>As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart">my simple chat app</a> and this popular <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">chat + search app</a>. In this series of blog posts, I'll share my learnings for writing chat-like applications. My experience is from apps with Python backends, but many of these practices apply cross-language.</em></p>
<p>My first tip is to use an <strong>asynchronous</strong> backend framework so that your app is capable of fulfilling <strong>concurrent</strong> requests from users.</p>
<br>
<h3>The need for concurrency</h3>
<p>Why? Let's imagine that we used a synchronous framework, like <a target="_blank" href="https://flask.palletsprojects.com/">Flask</a>. We deploy that to a server using <a target="_blank" href="https://gunicorn.org/">gunicorn</a> and several workers. One of those workers receives a POST request to the "/chat" endpoint. That chat endpoint in turns makes a request to the Azure ChatCompletions API. The request can take a while to complete - several seconds! During that time, the worker is tied up and <em>cannot</em> handle any more user requests. We could throw more CPUs and thus workers and threads at the problem, but that's a waste of server resources.</p>
<p>Without concurrency, requests must be handled serially:</p>
<img alt="Diagram of worker handling requests one after the other" border="0" data-original-height="129" data-original-width="936" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjXOK3LpaA-iv2FoukmcC5LbqbIsYNhJ0EyQBjZahJpJ4KvBAFDSl5bStAwYOVQei2H5vQk6GGa5XeGVGder6j_hogYsBQXzX9qc8siqMHPsZ3eRGvt9H-aN3iAaEz9ITOXPzfw4OqTcAeHCN-dz_VcL8t9BGU3g3PuLEMiEPLBvF_gIbIbpyfY_Ybb4w/s1600/sync@2x.png" width="600"/>
<p>The better approach when our app has long blocking I/O calls is to use an asynchronous framework. That way, when a request has gone out to a potentially slow-to-respond API, the Python program can pause that coroutine and handle a brand new request.</p>
<p>With concurrency, workers can handle new requests during I/O calls:</p>
<img alt="Diagram of worker handling second request while first request waits for API response" border="0" data-original-height="281" data-original-width="756" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj7sEpE3qFKdHyFUErfHBOWVAvkXXvFg_9wPAxT8cWIjc3gW6NoVJf2jeQsHRqO_AiqWo03ojDp83gNmJ3ygkFLTo2RCM57JGPga_veR-lM9Y30JC-Mi7HbzjTBSWcwYohmtVRGNsOs4GyMXMRJl4rHa79x7WMiGYxY2oILRu1b1JbdTZ-n44e975hfQw/s1600/async@2x.png" width="600"/>
<br>
<h3>Asynchronous Python backends</h3>
<p>We use <a target="_blank" href="https://quart.palletsprojects.com/">Quart</a>, the asynchronous version of Flask, for <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart">the simple chat quickstart</a> as well as the <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">chat + search app</a>. I've also ported the simple chat to <a target="_blank" href="https://github.com/pamelafox/chatgpt-backend-fastapi">FastAPI</a>, the most popular asynchronous framework for Python.</p>
<p>Our handlers now all have <code>async</code> in front, signifying that they return a Python coroutine instead of a normal function:</p>
<pre class="language-python"><code>async def chat_handler():
request_message = (await request.get_json())["message"]
</code></pre>
<p>When we deploy those apps, we still use gunicorn, but with the <code>uvicorn</code> worker, which is designed for Python ASGI apps. The <code>gunicorn.conf.py</code> configures it like so:</p>
<pre class="language-python"><code>num_cpus = multiprocessing.cpu_count()
workers = (num_cpus * 2) + 1
worker_class = "uvicorn.workers.UvicornWorker"
</code></pre>
<br>
<h3>Asynchronous API calls</h3>
<p>To really benefit from the port to an asynchronous framework, we need to make asynchronous calls to all of the APIs, so that a worker can handle a new request whenever an API call is being awaited.</p>
<p>Our API calls to the openai SDK now use <code>await</code> with the <code>acreate</code> variant:</p>
<pre class="language-python"><code>chat_coroutine = openai.ChatCompletion.acreate(
deployment_id=os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "chatgpt"),
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": request_message}],
stream=True,
)
</code></pre>
<p>For the RAG sample, we also have calls to Azure services like Azure Cognitive Search. To make those asynchronous, we first import the async variant of the credential and client classes in the <code>aio</code> module:
<pre class="language-python"><code>from azure.identity.aio import DefaultAzureCredential
from azure.search.documents.aio import SearchClient
</code></pre>
<p>Then the API calls themselves require <code>await</code> to the same function name:</p>
<pre class="language-python"><code>r = await self.search_client.search(query_text)
</code></pre>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-66845430107803842402023-09-11T11:15:00.000-07:002023-09-11T11:15:05.287-07:00Mocking async openai package calls with pytest<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>As part of my role the Python advocacy team for Azure, I am now one of the maintainers on several ChatGPT samples, like <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart">my simple chat app</a> and the very popular <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/">chat + search app</a>. Both of those samples use <a target="_blank" href="https://quart.palletsprojects.com/en/latest/">Quart</a>, the asynchronous version of <a target="_blank" href="https://flask.palletsprojects.com/">Flask</a>, which enables them to use the asynchronous versions of the functions from the <a target="_blank" href="https://pypi.org/project/openai/">openai</a> package.</p>
<h3>Making async openai calls</h3>
<p>A synchronous call to the streaming <a target="_blank" href="https://platform.openai.com/docs/api-reference/chat/object">ChatCompletion API</a> looks like:</p>
<pre><code class="language-python">response = openai.ChatCompletion.create(
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": request_message}],
stream=True)
</code></pre>
An asynchronous call to that same API looks like:
<pre><code class="language-python">response = await openai.ChatCompletion.acreate(
messages=[{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": request_message},],
stream=True)
</code></pre>
<p>The difference is just the addition of <code>await</code> to wait for the results of the asynchronous function (and signal that the process can work on other tasks), along with the change in method name from <code>create</code> to <code>acreate</code>. That's a small difference in our app code, but it's a significant difference when it comes to mocking those calls, so it's worth pointing out.</p>
<h3>Mocking a streaming call</h3>
<p>In our tests of the apps, we don't want to actually make calls to the OpenAI servers, since that'd require authentication and would use up quota needlessly. Instead, we can mock the calls using the built-in pytest fixture <a target="_blank" href="https://docs.pytest.org/en/6.2.x/monkeypatch.html"><code>monkeypatch</code></a> with code that mimics the openai package's response.</p>
<p>Here's the fixture that I use to mock the asynchronous <code>acreate</code> call:</p>
<pre><code class="language-python">@pytest.fixture
def mock_openai_chatcompletion(monkeypatch):
class AsyncChatCompletionIterator:
def __init__(self, answer: str):
self.answer_index = 0
self.answer_deltas = answer.split(" ")
def __aiter__(self):
return self
async def __anext__(self):
if self.answer_index < len(self.answer_deltas):
answer_chunk = self.answer_deltas[self.answer_index]
self.answer_index += 1
return openai.util.convert_to_openai_object(
{"choices": [{"delta": {"content": answer_chunk}}]})
else:
raise StopAsyncIteration
async def mock_acreate(*args, **kwargs):
return AsyncChatCompletionIterator("The capital of France is Paris.")
monkeypatch.setattr(openai.ChatCompletion, "acreate", mock_acreate)
</code></pre>
<p>The final line of that fixture swaps the <code>acreate</code> method with my mock method that returns a class that acts like an <a target="_blank" href="https://peps.python.org/pep-0525/">asynchronous generator</a> thanks to its <code>__anext__</code> dunder method. That method returns a chunk of the answer each time it's called, until there are no chunks left.</p>
<h3>Mocking non-streaming call</h3>
<p>For the other repo, which supports both streaming and non-streaming response, the mock <code>acreate</code> method must account for the non-streaming case by immediately returning the full answer.
<pre><code class="language-python"> async def mock_acreate(*args, **kwargs):
messages = kwargs["messages"]
answer = "The capital of France is Paris."
if "stream" in kwargs and kwargs["stream"] is True:
return AsyncChatCompletionIterator(answer)
else:
return openai.util.convert_to_openai_object(
{"choices": [{"message": {"content": answer}}]})
</code></pre>
<h3>Mocking multiple answers</h3>
<p>If necessary, it's possible to make the mock respond with different answers based off the passed on the last message passed in. We need that for the chat + search app, since we also use a ChatGPT call to generate keyword searches based on the user question.</p>
<p>Just change the answer based off the <code>messages</code> keyword arg:</p>
<pre><code class="language-python"> async def mock_acreate(*args, **kwargs):
messages = kwargs["messages"]
if messages[-1]["content"] == "Generate search query for: What is the capital of France?":
answer = "capital of France"
else:
answer = "The capital of France is Paris."
</code></pre>
<h3>Mocking other openai calls</h3>
<p>We also make other calls through the openai package, like to create embeddings. That's a much simpler mock, since there's no streaming involved:</p>
<pre><code class="language-python">@pytest.fixture
def mock_openai_embedding(monkeypatch):
async def mock_acreate(*args, **kwargs):
return {"data": [{"embedding": [0.1, 0.2, 0.3]}]}
monkeypatch.setattr(openai.Embedding, "acreate", mock_acreate)
</code></pre>
<h3>More resources</h3>
<p>For more context and example tests, view the full tests in the repos:</p>
<ul>
<li><a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/tree/main/tests">https://github.com/Azure-Samples/azure-search-openai-demo/tree/main/tests</a>
<li><a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart/tree/main/tests">https://github.com/Azure-Samples/chatgpt-quickstart/tree/main/tests</a>
</ul>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-17153217453335562402023-08-14T10:28:00.005-07:002023-08-14T15:26:33.621-07:00Fetching JSON over streaming HTTP<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>Recently, as part of my work on Azure OpenAI code samples, I've been experimenting with different ways of streaming data from a server into a website. The most well known technique is <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API">web sockets</a>, but there are also other approaches, like <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events">server-sent events</a> and <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Streams_API/Using_readable_streams">readable streams</a>. A readable stream is the simplest of the options, and works well if your website only needs to stream a response from the server (i.e. it doesn't need bi-directional streaming).</p>
<h2>HTTP streaming in Python</h2>
<p>To stream an HTTP response, your backend needs to set the "Transfer Encoding" to "chunked".
Most web frameworks provide documentation about streaming responses,
such as <a target="_blank" href="https://flask.palletsprojects.com/en/2.3.x/patterns/streaming/#basic-usage">Flask: Streaming</a> and <a target="_blank" href="https://quart.palletsprojects.com/en/latest/how_to_guides/streaming_response.html">Quart: Streaming responses</a>. In both Flask and Quart, the response must be a Python generator, so that the server can continually get the next data from the generator until it's exhausted.
</p>
<p>This example from the Flask doc streams data from a CSV:</p>
<pre><code class="language-python">@app.route('/large.csv')
def generate_large_csv():
def generate():
for row in iter_all_rows():
yield f"{','.join(row)}\n"
return generate(), {"Content-Type": "text/csv"}
</code></pre>
<p>This example from the Quart docs is an infinite stream of timestamps:</p>
<pre><code class="language-python">@app.route('/')
async def stream_time():
async def async_generator():
time = datetime.isoformat()
yield time.encode()
return async_generator(), 200
</code></pre>
<h2>Consuming streams in JavaScript</h2>
<p>The standard way to consume HTTP requests in JavaScript is the <code>fetch()</code> function, and fortunately, that function can also be used to consume HTTP streams. When the browser sees that the data is chunked, it sets <code>response.body</code> to a <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream">ReadableStream</a>.
<p>This example fetches a URL, treats the response body as a stream, and logs out the output until it's done streaming:</p>
<pre><code class="language-javascript">const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
var text = new TextDecoder("utf-8").decode(value);
console.log("Received ", text);
}
</code></pre>
<h2>Streaming JSON</h2>
<p>You might think it'd be super straightforward to stream JSON: just generate a JSON string on the server, and then <code>JSON.parse</code> the received text on the client. But there's a gotcha: the client could receive multiple JSON objects in the same chunk, and then an attempt to parse as JSON will fail.</p>
<p>The solution: JSON objects separated by new lines, known either as <a target="_blank" href="http://ndjson.org/">NDJSON</a> or <a target="_blank" href"https://jsonlines.org/">JSONlines</a>.</p>
<p>This expression converts a Python dict to NDJSON, using the std lib <code>json</code> module:
<pre><code class="language-python">json.dumps(some_dict) + "\n"
</code></pre>
<p>Here's how I actually used that, for one of the <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart/blob/main/src/flaskapp/chat.py">ChatGPT samples</a>:</p>
<pre><code class="language-python">@bp.post("/chat")
def chat_handler():
request_message = request.json["message"]
def response_stream():
response = openai.ChatCompletion.create(
engine=os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "chatgpt"),
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": request_message},
],
stream=True,
)
for event in response:
yield json.dumps(event) + "\n"
return Response(response_stream())
</code></pre>
<h2>Consuming NDJSON streams in JavaScript</h2>
<p>Once the server is outputting NDJSON, then we can write parsing code in JavaScript that splits by newlines and attempts to parse the resulting objects as JSON objects.</p>
<pre><code class="language-javascript">const response = await fetch(url);
const readableStream = response.body;
const reader = readableStream.getReader();
while (true) {
const { done, value } = await reader.read();
if (done) break;
var text = new TextDecoder("utf-8").decode(value);
const objects = text.split("\n");
for (const obj of objects) {
try {
runningText += obj;
let result = JSON.parse(runningText);
console.log("Received", result);
runningText = "";
} catch (e) {
// Not a valid JSON object
}
}
}
</code></pre>
<p>Since I need to use this same processing code in multiple Azure OpenAI samples, I packaged that into a tiny npm package called <a target="_blank" href="https://www.npmjs.com/package/ndjson-readablestream">ndjson-readablestream</a>.</p>
<p>Here's how you can use the package from JavaScript to make NDJSON parsing easier:</p>
<pre><code class="language-javascript">import readNDJSONStream from "ndjson-readablestream";
const response = await chatApi(request);
if (!response.body) {
throw Error("No response body");
}
for await (const event of readNDJSONStream(response.body)) {
console.log("Received", event);
}
</code></pre>
<p>For more examples of using the package, see this <a target="_blank" href="https://github.com/Azure-Samples/azure-search-openai-demo/pull/532">PR that uses it in a TypeScript component to render ChatGPT responses</a> or <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart/blob/ae10d9ddec3905ec2e0a0bd1b557aa01aaba99ca/src/flaskapp/templates/index.html#L68">usage in an HTML page, for a non-React ChatGPT sample</a>.</p>
<p>I hope this helps other developers use NDJSON streams in your projects. Please let me know if you have suggestions for improving my approach!</p>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-82077645399672199682023-08-07T14:37:00.002-07:002023-08-07T14:37:58.791-07:00Accessibility snapshot testing for Python web apps (Part 2)<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>In my <a target="_blank" href="https://blog.pamelafox.org/2023/07/automated-accessibility-audits-for.html">previous post</a>, I showed a technique that used <a target="_blank" href="https://github.com/dequelabs/axe-core">axe-core</a> along with <a target="_blank" href="https://docs.pytest.org/en/7.4.x/">pytest</a> and <a target="_blank" href="https://playwright.dev/python/docs/intro">Playwright</a> to make sure that pages in your web apps have no accessibility violations. That's a great approach if it works for you, but realistically, most webpages have a non-zero number of accessibility violations, due to limited engineering resources or dependence on third-party libraries. Do we just give up on being able to test them? No! Instead, we use a different approach: snapshot testing.
</p>
<br>
<h3>Snapshot testing</h3>
<p>Snapshot testing is a way to test that the output of a function matches a previously saved snapshot.</p>
<p>Here's an example using <a target="_blank" href="https://github.com/joseph-roitman/pytest-snapshot">pytest-snapshot</a>:</p>
<pre><code data-trim data-noescape class="python">def emojify(s):
return s.replace('love', '❤️').replace('python', '🐍')
def test_function_output_with_snapshot(snapshot):
snapshot.assert_match(emojify('I love python'), 'snapshot.txt')
</code></pre>
<p>The first time we run the test, it will save the output to a file. We check the generated snapshots into source control. Then, the next time anyone runs that test, it will compare the output to the saved snapshot. </p>
<br>
<h3>Snapshot testing + axe-core</h3>
<p>First, a big kudos to Michael Wheeler from UMich and his talk on <a target="_blank" href="https://it.umich.edu/community/michigan-it-symposium/2021/presentations/automated-web-accessibility-testing-using-browser">Automated Web Accessibility Testing</a> for the idea of using snapshot testing with axe-core.</p>
<p>Here's the approach: We save snapshots of the axe-core violations and check them into source control. That way, our tests will let us know when new violations come up, and our snapshot files keep track of which parts of the codebase need accessibility improvements.</p>
<p>To make it as easy as possible, I made a <a target="_blank" href="https://github.com/pamelafox/pytest-axe-playwright-snapshot">pytest plugin</a> that combines Playwright, axe-core, and snapshot testing.</p>
<pre><code data-trim data-noescape class="shell">python3 -m pip install pytest-axe-playwright-snapshot
python3 -m playwright install --with-deps
</code></pre>
<p class="padded">Here's an example test from a <a target="_blank" href="https://github.com/pamelafox/flask-db-quiz-example">Flask app</a>:</p>
<pre><code data-trim data-noescape class="python">from flask import url_for
from playwright.sync_api import Page
def test_index(page: Page, axe_pytest_snapshot):
page.goto(url_for("index", _external=True))
axe_pytest_snapshot(page)
</code></pre>
<br>
<h3>Running the snapshot tests</h3>
<p><strong>First run:</strong> We specify the <code>--snapshot-update</code> argument to tell the plugin to save the snapshots to file.</p>
<pre><code data-trim data-noescape class="shell">python3 -m pytest --snapshot-update
</code></pre>
<p>That saves a file like this one to a directory named after the test and browser engine, like <code>snapshots/test_violations/chromium/snapshot.txt</code>:</p>
<pre><code data-trim data-noescape class="text">color-contrast (serious) : 2
empty-heading (minor) : 1
link-name (serious) : 1
</code></pre>
<p><strong>Subsequent runs:</strong> The plugin compares the new snapshot to the saved snapshot, and asserts if they differ.</p>
<pre><code data-trim data-noescape class="shell">python3 -m pytest
</code></pre>
<p>Let's look through some example outputs next.</p>
<br>
<h3>Test results</h3>
<h4>New accessibility issue 😱</h4>
<p>If there are violations in the new snapshot that weren't in the old,
the test will fail with a message like this:</p>
<pre><code data-trim data-noescape class="text">E AssertionError: New violations found: html-has-lang (serious)
E That's bad news! 😱 Either fix the issue or run `pytest --snapshot-update` to update the snapshots.
E html-has-lang - Ensures every HTML document has a lang attribute
E URL: https://dequeuniversity.com/rules/axe/4.4/html-has-lang?application=axeAPI
E Impact Level: serious
E Tags: ['cat.language', 'wcag2a', 'wcag311', 'ACT']
E Elements Affected:
E 1) Target: html
E Snippet: <html>
E Messages:
E * The <html> element does not have a lang attribute
</code></pre>
<h4>Fixed accessibility issue 🎉</h3>
<p>If there are less violations in the new snapshot than the old one, the test will also fail, but with a happy message like this:</p>
<pre><code data-trim data-noescape class="shell">E AssertionError: Old violations no longer found: html-has-lang (serious).
E That's good news! 🎉 Run `pytest --snapshot-update` to update the snapshots.
</code></pre>
<br>
<h3>CI/CD integration</h3>
<p>Once you've got snapshot testing setup, it's a great idea to run it on every potential change to your codebase.</p>
<p>Here's an example of a failing GitHub action due to an accessibility violation, using this <a href="https://github.com/pamelafox/flask-db-quiz-example/blob/main/.github/workflows/python.yaml">workflow file</a>:
<a target="_blank" href="https://github.com/pamelafox/flask-db-quiz-example/actions/runs/5695546314/job/15438903614?pr=29">
<img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhfUHkDmOJKRXV2n_xmueu119vjKqCsCsqd4c6ieIP3OFOJA36ZsuGISECxVjxz1qtLddVRx4qhoK4cE5kgE5oqW2EZ5ceOqBG9BQPaGw9uy0V76MS92ZBlKvrn9e3TGM3b_BR4x4BWH4dz-PUuKbPgCJySHqQ156zmG-HBwHBA8uDDXKMw0BXS0XpzkQ/s3084/screenshot_action_wide.png" alt="Screenshot of GitHub actions workflow that shows test failure due to accessibility violations" width="600">
</a>
<br>
<h3>Fixing accessibility issues</h3>
<p>What should you do if you realize you've introduced an accessibility violation,
or if you are tasked with reducing existing violations?
You can read the reports from pytest to get a gist for the accessibility violations,
but it's often easier to use a browser extension that uses the same Axe-core rules.</p>
<ul>
<li><a target="_blank" href="https://microsoftedge.microsoft.com/addons/detail/axe-devtools-web-access/kcenlimkmjjkdfcaleembgmldmnnlfkn">
Axe DevTools</a> for Edge</li>
<li><a target="_blank" href="https://chrome.google.com/webstore/detail/axe-devtools-web-accessib/lhdoppojpmngadmnindnejefpokejbdd">
Axe DevTools</a> for Chrome</li>
</ul>
<p>Also consider an IDE extension like <a target="_blank" href="https://marketplace.visualstudio.com/items?itemName=deque-systems.vscode-axe-linter">VS Code Axe Linter</a></p>
<br>
<h3>Don't rely on automation to find all issues</h3>
<p>I think it's really important for web apps to measure their accessibility violations, so that they can avoid introducing accessibility regressions and eventually resolve existing violations. However, it's really important to note that these automated tools can only go so far. According to the axe-core docs, it finds about 57% of WCAG issues automatically. There can still be many issues with your site, like with tab order or keyboard access.</p>
<p>In addition to automation, please consider other ways to discover issues, such as paying for an external accessibility audit, engaging with your disabled users, and hiring engineers with disabilities.</p>
<script>hljs.highlightAll();</script>Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-83316203289272488592023-07-21T11:44:00.008-07:002023-07-22T08:16:40.578-07:00Automated accessibility audits for Python web apps (Part 1)<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>We all know by now the importance of accessibility for webpages. But it's surprisingly easy to create inaccessible web experiences, and unknowingly deploy those to production. How do we check for accessibility issues? One approach is to install a browser extension like <a target="_blank" href="https://accessibilityinsights.io/">Accessibility Insights</a> and run that on changed webpages. I love that extension, but I don't trust myself to remember to run it. So I've been working on tools for running accessibility tests on Python web apps, which I'll present at next week's <a target="_blank" href="https://2023.northbaypython.org/">North Bay Python</a>.</p>
<p>In this post, I'm going to share a way to automatically verify that a Python web app has *zero* accessibility issues -- or at least, zero issues that can be caught by automated testing. One should always do additional manual tests (like keyboard tests) and work with disabled users to discover all issues.</p>
<h2>Setup</h2>
<p>Here's what we'll need:</p>
<ul>
<li><a target="_blank" href="https://playwright.dev/python/docs/intro">Playwright</a>: A tool for end-to-end testing in various browser engines. Similar to Selenium, if you're familiar with that.
<li><a target="_blank" href="https://github.com/dequelabs/axe-core">Axe-core</a>: An accessibility engine for automated Web UI testing, built with JavaScript. Used by many other tools, like the Accessibility Insights browser extension.
<li><a target="_blank" href="https://pypi.org/project/axe-playwright-python/">axe-playwright-python</a>: A package that I developed to connect the two together, running axe-core on Playwright pages and returning the results in useful formats.
</ul>
<p>For this example, I'll also use Flask, Pytest, and pytest-flask to run a local server during testing. However, you could easily use other frameworks (like Django and unittest).
</p>
<h2>The test</h2>
<p>Here's the full code for a test of the four main routes on my personal website (<a target="_blank" href="https://www.pamelafox.org">pamelafox.org</a>):</p>
<pre><code class="language-python">from axe_playwright_python.sync_playwright import Axe
from flask import url_for
from playwright.sync_api import Page
def test_a11y(app, live_server, page: Page):
page.goto(url_for("home_page", _external=True))
results = Axe().run(page)
assert results.violations_count == 0, results.generate_report()
</code></pre>
<p>Let's break that down:</p>
<ul>
<li><pre><code>def test_a11y(app, live_server, page: Page):</code></pre>
<p>The <code>app</code> and <code>live_server</code> fixtures take care of starting up the app at a local URL. The <code>app</code> fixture comes from my <code>conftest.py</code> and the <code>live_server</code> fixture comes from pytest-flask.</p>
<li><pre><code>page.goto(url_for("home_page", _external=True))</code></pre>
<p>I use the <code>Page</code> fixture from Playwright to navigate to a route from my app.</p>
<li><pre><code>results = Axe().run(page)</code></pre>
<p>Using the <code>Axe</code> object from my axe-playwright-python package, I run axe-core on the page.</p>
<li><pre><code>assert results.violations_count == 0, results.generate_report()</code></pre>
<p>I assert that the violations count is zero, but I also provide a human-friendly report as the assertion message. That way, <em>if</em> any violations were found, I'll see the report in the pytest output.</li>
</ul>
<p>For the full code, see <a target="_blank" href="https://github.com/pamelafox/pamelafox-site/tree/main/src/tests">the tests/ folder in the GitHub repository</a>.</p>
<h2>The output</h2>
<p>When there are no violations found, the test passes! 🎉</p>
<p>When there are any violations found, the pytest output looks like this:</p>
<pre><code class="language-python"> def test_a11y(app, live_server, page: Page):
axe = Axe()
page.goto(url_for("home_page", _external=True))
results = axe.run(page)
> assert results.violations_count == 0, results.generate_report()
E AssertionError: Found 1 accessibility violations:
E Rule Violated:
E image-alt - Ensures <img> elements have alternate text or a role of none or presentation
E URL: https://dequeuniversity.com/rules/axe/4.4/image-alt?application=axeAPI
E Impact Level: critical
E Tags: ['cat.text-alternatives', 'wcag2a', 'wcag111', 'section508', 'section508.22.a', 'ACT']
E Elements Affected:
E
E
E 1) Target: img
E Snippet: <img src="bla.jpg">
E Messages:
E * Element does not have an alt attribute
E * aria-label attribute does not exist or is empty
E * aria-labelledby attribute does not exist, references elements that do not exist or references elements that are empty
E * Element has no title attribute
E * Element's default semantics were not overridden with role="none" or role="presentation"
E
E assert 1 == 0
</code></pre>
<p>I can then read the report, look for the HTML matching the snippet, and make it accessible. In the case above, there's an img tag missing an alt attribute. Once I fix that, the test passes.</p>
<h2>Checking more routes</h2>
<p>To check additional routes, I can either add more tests or I can parameterize the current test like so:</p>
<pre><code class="language-python">@pytest.mark.parametrize("route", ["home_page", "projects", "talks", "interviews"])
def test_a11y(app, live_server, page: Page, route: str):
axe = Axe()
page.goto(url_for(route, _external=True))
results = axe.run(page)
assert results.violations_count == 0, results.generate_report()
</code></pre>
<p>For testing a route where user interaction causes a change in the page, I can use Playwright to interact with the page and then run Axe after the interaction. Here's an example of that from another app:</p>
<pre><code class="language-python">def test_quiz_submit(page: Page, snapshot, fake_quiz):
page.goto(url_for("quizzes.quiz", quiz_id=fake_quiz.id, _external=True))
page.get_by_label("Your name:").click()
page.get_by_label("Your name:").fill("Pamela")
page.get_by_label("Ada Lovelace").check()
page.get_by_label("pip").check()
page.get_by_role("button", name="Submit your score!").click()
expect(page.locator("#score")).to_contain_text("You scored 25% on the quiz.")
results = Axe().run(page)
assert results.violations_count == 0, results.generate_report()
</code></pre>
<h2>Is perfection possible?</h2>
<p>Fortunately, I was able to fix all of the accessibility violations for my very small personal website. However, many webpages are much bigger and more complicated, and it may not be possible to address all the violations. Is it possible to run tests like this in that situation? Yes, but we need to do something like <strong>snapshot testing</strong>: tracking the violations over time and ensuring that changes don't introduce additional violations. I'll show an approach for that in Part 2 of this blog post series. Stay tuned!</p>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-84471434976648421302023-06-27T16:50:00.002-07:002023-10-31T12:39:35.255-07:00Tips for debugging Flask deployments to Azure App Service <style type="text/css">
.codeblock {
border-top: 4px solid #eee;
border-bottom: 4px solid #eee;
padding: 8px;
white-space: pre-wrap;
}
</style>
<p>There are many ways to deploy Flask web apps to App Service: Azure CLI, VS Code Azure Tools extension, Azure Developer CLI, or GitHub-based deployments. Unfortunately, sometimes a deploy fails, and it can be hard at first to understand what's wrong. Regardless of how you deploy Flask to App Service, you can follow these tips for debugging the deployment.</p>
<p>After you finish deploying, first visit the app URL to see if it loads. If it does, amazing! If it doesn't, here are steps you can take to figure out what went wrong.</p>
<h2>Check the deployment logs</h2>
<p>Select <em>Deployment Center</em> from the side navigation menu, then select <em>Logs</em>. You should see a timestamped list of recent deploys:</p>
<img alt="" border="0" data-original-height="1212" data-original-width="2984" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhpFf8wHI9z_0Hv42fbLr6kUNd2V2IPHdgW_B_05dE1ADoXvlJytmjNXx4AosJcl1WhrJMqVLe32j1t6qsyvSpA0oUVHuJRCkp8AhIwzbDv5s9oRnFtfEZv2BpNDFL6c9U5Kx97KS3KzdoWEOO-9E9jJ3fTy_cKhC-juBiFO0te4wACMTSkNSp8mSPe8w/s1600/Screenshot%202023-06-27%20at%2010.33.11%20AM.png" width="650"/>
<p>Check whether the status of the most recent deploy is "Success (Active)" or "Failed". If it's success, the deployment logs might still reveal issues, and if it's failed, the logs should certainly reveal the issue.</p>
<p>Click the commit ID to open the logs for the most recent deploy. First scroll down to see if any errors or warnings are reported at the end. This is what you'll hopefully see if all went well:</p>
<img alt="" border="0" data-original-height="436" data-original-width="734" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgf7DRZr9iIqimdohxpsAdx95zIYkxUTjIszj5K8gW_sxqjD8TfGJ06MPOtN5DtivOgmfN4qY2Y_PJDO0VUlpEJILwcANQxdkiVOy5cxZbIwDbyHYpnjNdE-xXFlU5cP7nHxnsCeZgpbm_f2shBDMO_q5aIEO3duE9R6C7YFrfAIkrEMbowfJy4a2I/s1600/Screenshot%202023-01-04%20at%209.54.07%20AM.png" width="450"/>
<p>Now scroll back up to find the timestamp with the label "Running oryx build". <a target="_blank" href="https://github.com/microsoft/Oryx">Oryx</a> is the open source tool that builds apps for App Service, Functions, and other platforms, across all the supported MS languages. Click the <em>Show logs</em> link next to that label. That will pop open detailed logs at the bottom. Scroll down.</p>
<p>Here's what a successful Oryx build looks like for a Flask application:</p>
<pre style="width:650px" class="codeblock"><code>
Command: oryx build /tmp/zipdeploy/extracted -o /home/site/wwwroot --platform python --platform-version 3.10 -p virtualenv_name=antenv --log-file /tmp/build-debug.log -i /tmp/8db773a0e30ccc6 --compress-destination-dir | tee /tmp/oryx-build.log
Operation performed by Microsoft Oryx, https://github.com/Microsoft/Oryx
You can report issues at https://github.com/Microsoft/Oryx/issues
Oryx Version: 0.2.20230508.1, Commit: 7fe2bf39b357dd68572b438a85ca50b5ecfb4592, ReleaseTagName: 20230508.1
Build Operation ID: 164fee7dc4083f79
Repository Commit : 6e78c534-da03-414e-acc1-e396b92b1405
OS Type : bullseye
Image Type : githubactions
Detecting platforms...
Detected following platforms:
python: 3.10.8
Version '3.10.8' of platform 'python' is not installed. Generating script to install it...
Using intermediate directory '/tmp/8db773a0e30ccc6'.
Copying files to the intermediate directory...
Done in 0 sec(s).
Source directory : /tmp/8db773a0e30ccc6
Destination directory: /home/site/wwwroot
Downloading and extracting 'python' version '3.10.8' to '/tmp/oryx/platforms/python/3.10.8'...
Detected image debian flavor: bullseye.
Downloaded in 2 sec(s).
Verifying checksum...
Extracting contents...
performing sha512 checksum for: python...
Done in 18 sec(s).
image detector file exists, platform is python..
OS detector file exists, OS is bullseye..
Python Version: /tmp/oryx/platforms/python/3.10.8/bin/python3.10
Creating directory for command manifest file if it does not exist
Removing existing manifest file
Python Virtual Environment: antenv
Creating virtual environment...
Activating virtual environment...
Running pip install...
[18:13:30+0000] Collecting Flask==2.3.2
[18:13:30+0000] Downloading Flask-2.3.2-py3-none-any.whl (96 kB)
[18:13:30+0000] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.9/96.9 kB 4.8 MB/s eta 0:00:00
[18:13:31+0000] Collecting itsdangerous>=2.1.2
[18:13:31+0000] Downloading itsdangerous-2.1.2-py3-none-any.whl (15 kB)
[18:13:31+0000] Collecting click>=8.1.3
[18:13:31+0000] Downloading click-8.1.3-py3-none-any.whl (96 kB)
[18:13:31+0000] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.6/96.6 kB 5.4 MB/s eta 0:00:00
[18:13:31+0000] Collecting Werkzeug>=2.3.3
[18:13:31+0000] Downloading Werkzeug-2.3.6-py3-none-any.whl (242 kB)
[18:13:31+0000] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 242.5/242.5 kB 8.7 MB/s eta 0:00:00
[18:13:31+0000] Collecting blinker>=1.6.2
[18:13:31+0000] Downloading blinker-1.6.2-py3-none-any.whl (13 kB)
[18:13:31+0000] Collecting Jinja2>=3.1.2
[18:13:31+0000] Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
[18:13:31+0000] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 6.9 MB/s eta 0:00:00
[18:13:32+0000] Collecting MarkupSafe>=2.0
[18:13:32+0000] Downloading MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
[18:13:33+0000] Installing collected packages: MarkupSafe, itsdangerous, click, blinker, Werkzeug, Jinja2, Flask
[18:13:35+0000] Successfully installed Flask-2.3.2 Jinja2-3.1.2 MarkupSafe-2.1.3 Werkzeug-2.3.6 blinker-1.6.2 click-8.1.3 itsdangerous-2.1.2
[notice] A new release of pip available: 22.2.2 -> 23.1.2
[notice] To update, run: pip install --upgrade pip
Not a vso image, so not writing build commands
Preparing output...
Copying files to destination directory '/tmp/_preCompressedDestinationDir'...
Done in 3 sec(s).
Compressing content of directory '/tmp/_preCompressedDestinationDir'...
Copied the compressed output to '/home/site/wwwroot'
Removing existing manifest file
Creating a manifest file...
Manifest file created.
Copying .ostype to manifest output directory.
Done in 70 sec(s).
</code></pre>
<p>Look for these important steps in the Oryx build:</p>
<ul>
<li><code>Detected following platforms: python: 3.10.8</code> <br>
That should match your runtime in the App Service configuration.
<li><code>Running pip install...</code><br>
That should install all the requirements in your requirements.txt - if it didn't find your requirements.txt, then you won't see the packages installed.
</ul>
<p>If you see all those steps in the Oryx build, then that's a good sign that the build went well, and you can move on to checking the App Service logs.</p>
<h2 id="log-stream">Check the log stream</h2>
<p>Under the <em>Monitoring</em> section of the side nav, select <em>Log stream</em>. Scroll to the timestamp corresponding to your most recent deploy. </p>
<p>The logs should start with pulling Docker images:</p>
<img alt="Screenshot of Log stream in App Service" border="0" data-original-height="1044" data-original-width="1922" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiq7RSEooOf3iry44cvhuakJumhlFIyz8MXqy0aPH9Akc7Elqq7GtjDnkO0lNIILt01Awkr_tb_ApNuVl1ib1EwHWdTHgroCJ4VTrOn3UIRqvt9JymADpNkJiSvKLpyxcZ7zAg4JX1zSEpavOiOtKxEIUbiBDoaPrHOphYDrzzJxj90QBaT1rrxKc_7Fg/s1600/Screenshot%202023-06-27%20at%2012.53.55%20PM.png" width="650"/>
<p>Here are the full logs for a Flask app successfully starting in an App Service container:</p>
<pre style="width:650px" class="codeblock"><code>
2023-06-27T20:00:33.556Z INFO - 3.10_20230519.2.tuxprod Pulling from appsvc/python
2023-06-27T20:00:33.559Z INFO - Digest: sha256:d7f1824d43ab89f90ec317f32a801ecffd4321a3d4a710593658be9bd980cd22
2023-06-27T20:00:33.560Z INFO - Status: Image is up to date for mcr.microsoft.com/appsvc/python:3.10_20230519.2.tuxprod
2023-06-27T20:00:33.563Z INFO - Pull Image successful, Time taken: 0 Minutes and 0 Seconds
2023-06-27T20:00:34.710Z INFO - Starting container for site
2023-06-27T20:00:34.711Z INFO - docker run -d --expose=8000 --name flask-server-core-7icehkhjdeox2-appservice_5_edde42ea -e WEBSITE_CORS_ALLOWED_ORIGINS=https://portal.azure.com,https://ms.portal.azure.com -e WEBSITE_CORS_SUPPORT_CREDENTIALS=False -e WEBSITE_SITE_NAME=flask-server-core-7icehkhjdeox2-appservice -e WEBSITE_AUTH_ENABLED=False -e WEBSITE_ROLE_INSTANCE_ID=0 -e WEBSITE_HOSTNAME=flask-server-core-7icehkhjdeox2-appservice.azurewebsites.net -e WEBSITE_INSTANCE_ID=a822bcb6dd314caab4bd83084cc7a3991e4965ec4f97b7ce99c0ca46861dc419 -e HTTP_LOGGING_ENABLED=1 -e WEBSITE_USE_DIAGNOSTIC_SERVER=False appsvc/python:3.10_20230519.2.tuxprod
2023-06-27T20:00:37.357175818Z _____
2023-06-27T20:00:37.357230418Z / _ \ __________ _________ ____
2023-06-27T20:00:37.357235518Z / /_\ \\___ / | \_ __ \_/ __ \
2023-06-27T20:00:37.357239618Z / | \/ /| | /| | \/\ ___/
2023-06-27T20:00:37.357243418Z \____|__ /_____ \____/ |__| \___ >
2023-06-27T20:00:37.357247318Z \/ \/ \/
2023-06-27T20:00:37.357251218Z A P P S E R V I C E O N L I N U X
2023-06-27T20:00:37.357254918Z
2023-06-27T20:00:37.357258418Z Documentation: http://aka.ms/webapp-linux
2023-06-27T20:00:37.357261918Z Python 3.10.11
2023-06-27T20:00:37.357282418Z Note: Any data outside '/home' is not persisted
2023-06-27T20:00:41.641875105Z Starting OpenBSD Secure Shell server: sshd.
2023-06-27T20:00:41.799900179Z App Command Line not configured, will attempt auto-detect
2023-06-27T20:00:42.761658829Z Starting periodic command scheduler: cron.
2023-06-27T20:00:42.761688529Z Launching oryx with: create-script -appPath /home/site/wwwroot -output /opt/startup/startup.sh -virtualEnvName antenv -defaultApp /opt/defaultsite
2023-06-27T20:00:42.876778283Z Found build manifest file at '/home/site/wwwroot/oryx-manifest.toml'. Deserializing it...
2023-06-27T20:00:42.887163588Z Build Operation ID: 820645c3a1e60b5e
2023-06-27T20:00:42.890123289Z Oryx Version: 0.2.20230512.3, Commit: a81ce1fa16b6e03d37f79d3ba5e99cf09b28e4ef, ReleaseTagName: 20230512.3
2023-06-27T20:00:42.897199993Z Output is compressed. Extracting it...
2023-06-27T20:00:42.964545124Z Extracting '/home/site/wwwroot/output.tar.gz' to directory '/tmp/8db774903eba755'...
2023-06-27T20:00:46.203967540Z App path is set to '/tmp/8db774903eba755'
2023-06-27T20:00:46.728397586Z Detected an app based on Flask
2023-06-27T20:00:46.730162987Z Generating `gunicorn` command for 'app:app'
2023-06-27T20:00:46.770331805Z Writing output script to '/opt/startup/startup.sh'
2023-06-27T20:00:47.050828437Z Using packages from virtual environment antenv located at /tmp/8db774903eba755/antenv.
2023-06-27T20:00:47.052387737Z Updated PYTHONPATH to '/opt/startup/app_logs:/tmp/8db774903eba755/antenv/lib/python3.10/site-packages'
2023-06-27T20:00:50.406265801Z [2023-06-27 20:00:50 +0000] [67] [INFO] Starting gunicorn 20.1.0
2023-06-27T20:00:50.434991028Z [2023-06-27 20:00:50 +0000] [67] [INFO] Listening at: http://0.0.0.0:8000 (67)
2023-06-27T20:00:50.441222333Z [2023-06-27 20:00:50 +0000] [67] [INFO] Using worker: sync
2023-06-27T20:00:50.473174263Z [2023-06-27 20:00:50 +0000] [70] [INFO] Booting worker with pid: 70
2023-06-27T20:00:53.772011632Z 169.254.130.1 - - [27/Jun/2023:20:00:53 +0000] "GET /robots933456.txt HTTP/1.1" 404 91 "-" "HealthCheck/1.0"
2023-06-27T20:00:55.268900825Z 169.254.130.5 - - [27/Jun/2023:20:00:55 +0000] "GET /robots933456.txt HTTP/1.1" 404 91 "-" "HealthCheck/1.0"
2023-06-27T20:01:47.691011982Z 169.254.130.5 - - [27/Jun/2023:20:01:47 +0000] "GET /hello HTTP/1.1" 200 183 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.57"
</code></pre>
<p>A few notable logs:</p>
<ul>
<li><code>2023-06-27T18:22:17.803525340Z Detected an app based on Flask</code>
<br>This log indicates that Oryx auto-detected a Flask app (by inspecting the requirements.txt) file.
<li><code>2023-06-27T18:22:17.803557841Z Generating `gunicorn` command for 'app:app'</code><br>
This indicates the Oryx detected an <code>app.py</code> file and assumes it has an <code>app</code> object inside it.
<li><code>2023-06-27T18:22:42.540158812Z [2023-06-27 18:22:42 +0000] [67] [INFO] Starting gunicorn 20.1.0</code><br>
That's the start of the gunicorn server serving the Flask app. After it starts, the logs should show HTTP requests.
</ul>
<p>If you aren't seeing the full logs, it's possible that your deploy happened too long ago and the portal has deleted some logs. In that case, open the Log stream and do another deploy, and you should see the full logs.</p>
<h2 id="download-logs">Downloading the logs</h2>
<p>Alternatively, you can download the full logs from the Kudu interface. Select <em>Advanced Tools</em> from the side nav:</p>
<img alt="Screenshot of Azure Portal side nav with Advanced Tools selected" border="0" data-original-height="898" data-original-width="656" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhqeq65tDR_rBhIxDv2Zle9_ouGnFawwZD920kvgcVy5alTLmnG8-_TXU9UWXR43gLj8Z3XNzmN4gLFRSd0f0OzLXKfsh24y8esrEec_2yKDJm10n9aP2vzaSLlUNk8b0vZ8H2oXqqdWNs6mpLLIm2d3UfdPnoOSQEhIJXFS_-sEHslKq_ieyHyLtNZYQ/s1600/Screenshot%202023-06-27%20at%201.02.44%20PM.png" width="200"/>
<p>When the Kudu website loads, find the <em>Current Docker Logs</em> link and select <em>Download as zip</em> next to it:</p>
<img alt="Screenshot of website list of links" border="0" data-original-height="392" data-original-width="1228" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQhxI370oTMoAIIu6IGluSj9pNtwW-g6XnB5o1wKM4njKDctJiuEmNlr8_XHq6EVFg5-sMvitFZrg9l29IE866tYDrjiCIZp7L2URpTn94iPcCxzhDfRuoU8XQg0Sup5ssB0BGEYfqQenBQPD6er0D8L5jUhBmDQ1oS8a74fZfdgSDdCWYz2NViX0/s1600/Screenshot%202023-01-07%20at%2010.14.09%20AM.png" width="350"/>
<p>In the downloaded zip file, find the filename that starts with the most recent date and ends with "_default_docker.log":</p>
<img alt="Screenshot of extracted zip folder with files" border="0" data-original-height="294" data-original-width="872" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSqbwWWB-bLriZgZB5H7wvMdjUpxJVU5wgY2O6uHXN7wYQPCRzipjLoVXOx9_43ZOp8Y5gd3zslUVw8Z_0OKS4FVjHFg01kDDlKSqRSGBjTwa2MGHhsqDrBq1mt08Z7649cXPvftM0gefIhH6eh9SvSlQVnz4msAgeL-ccki3x8HFIsxqcuhWCm5I/s1600/Screenshot%202023-01-07%20at%2010.14.28%20AM.png" width="450"/>
<p>Open that file to see the full logs, with the most recent logs at the bottom.</p>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-64890762231310464892023-06-21T14:28:00.002-07:002023-06-21T14:29:11.902-07:00 Best practices for prompting GitHub Copilot in VS Code <link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>I've been using GitHub Copilot for the last six months in repositories large and small, old and new. For me, GitHub Copilot is the pair programmer that I never knew I wanted. It gives me great suggestions most of the time, but it doesn't give me the social anxiety I feel when a human is watching me code.
</p>
<p>In this post, I'm going to share my best practices for prompting GitHub Copilot, to help all of you get great suggestions as often as possible. At a high level, my recommendation is to provide context and be predictable.
</p>
<p>For more details, watch <a target="_blank" href="https://www.youtube.com/watch?v=ImWfIDTxn7E">my YouTube video</a> or keep reading.</p>
<h2>Provide context</h2>
<p>GitHub Copilot was trained on a huge number of examples, so for a given line of code, it may have many possible predictions for the next line of code. We can try to narrow down those possibilities with the context around our line of code.</p>
<ol>
<li>
<p><strong>Open files:</strong> LLMs like GitHub Copilot have limited "context windows", so it can't keep an entire codebase in its window at once. But GitHub still wants to give Copilot some context, so if your editor has some files open, GitHub may send the contents of those files to Copilot. I recommend keeping files open that are the most relevant to the code you're writing: the tested file if you're writing tests, an example data file if you're processing data, the helper functions for a program, etc.</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgacrJtS0412RT4koAcGr7d22iPpifleFFBRsgZTq_SwMvGUTGV-SlhneOMUVbM6uF3bJzWZm_PX8rB8wuGhGj4hENxOOOG0m-uSvVYmLZRs7UdTa-8D8aJdyiDKMzaXiUw4u3sDGk5CWwI845w5M13JiH8z0Kphv7bVwOGUEb7JuqKBGLPlIiqy-Zihw/s1600/tabs.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of three tabs open in VS Code" border="0" data-original-height="66" data-original-width="660" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgacrJtS0412RT4koAcGr7d22iPpifleFFBRsgZTq_SwMvGUTGV-SlhneOMUVbM6uF3bJzWZm_PX8rB8wuGhGj4hENxOOOG0m-uSvVYmLZRs7UdTa-8D8aJdyiDKMzaXiUw4u3sDGk5CWwI845w5M13JiH8z0Kphv7bVwOGUEb7JuqKBGLPlIiqy-Zihw/s1600/tabs.png" width="550"/></a>
<p>As I showed in the <a target="_blank" href="https://www.youtube.com/watch?t=602&v=ImWfIDTxn7E&feature=youtu.be">video at 10:02</a>, sometimes there are a few files relevant to the context: one that demonstrates the format of the new file (e.g. the new views.py has a similar format to the existing views.py), and one that provides the data for the new file (e.g. the new views.py uses the classes from models.py).</p>
<li>
<p><strong>Comments</strong>: You can write comments at many levels: high-level file comments that describe the purpose of the file and how it relates to the rest of the project, function-level comments, class-level comments, and line comments. The most helpful comments are ones that clear up potentially ambiguous aspects of code. For example, if you're writing a function that processes lists in Python, your function-level comment could clarify whether it returns a brand new list or mutates the existing list. That sort of distinction fundamentally changes the implementation.</p>
<pre><code class="language-python">def capitalize_titles(book_titles):
"""Returns a new list of book_titles with each title capitalized"""
</code></pre>
<li><p>
<strong>Imports</strong>: Many languages require you to explicitly important standard library or third-party modules at the top of a file. That information can be really helpful for Copilot to decide how it's going to write the desired code. For example, if you're using Python to scrape a webpage, then importing <code>urllib3</code> and <code>BeautifulSoup</code> at the top will immediately guide Copilot to write the most appropriate code.</p>
<li><p>
<strong>Names</strong>: I have <em>always</em> been a fan of descriptive names, as I was brought up in a Java household, but now I have additional motivation to use fairly descriptive names: more context for Copilot. For example, if your code parses a JSON file and stores the result, you might use a variable name like <code>data</code>. However, if you know that result is actually a list, and each item describes a book, a more helpful variable name would be <code>books</code>. That indicates the variable likely holds a sequence-type object, and that each object inside represents a book.</p>
<pre><code class="language-python">books = json.loads(open('books.json'))</code></pre>
<li><p>
<strong>Types</strong>: If you are coding in a language where typing is optional (like Python or JavaScript), you may want to consider adding types, at least to parameters and return values. That can help narrow down the possible lines of code for completing a block of code.</p>
<pre><code class="language-python">def capitalize_titles(book_titles:list[str]) -> list[str]:</code></pre>
</ol>
<p>As I showed in <a target="_blank" href="https://www.youtube.com/watch?t=106&v=ImWfIDTxn7E&feature=youtu.be">the video at 1:46</a>, I sometimes change the names or types of Copilot generated code, to give it more context for the code coming afterwards.</p>
<p>Many of the above practices are generally beneficial for your codebase regardless of whether you use GitHub Copilot, like descriptive names and type annotations. As for comments, I keep the block-level comments in my code but remove line-level comments that seem redundant. Your final code has two audiences, the computer interpreting it and the humans reading it, so keep the practices that produce code that is both correct and clear. </p>
<h2>Be predictable</h2>
<p>LLMS like GitHub Copilot are very good with patterns. You show it a pattern, and it very much wants to keep the pattern going. Programming is already inherently pattern-filled, due to having to follow the syntax of a language, but there are ways to make your programs even more "pattern-ful".</p>
<ol>
<li><p><strong>Variable naming conventions</strong>: Use a naming scheme that makes it easier to predict the way a variable should be used, especially if that naming scheme is already a convention for the language. For example, the Python language doesn't technically have constants, but it's still conventional to use all caps for a variable that shouldn't change, like PLANCK_CONSTANT. Another convention is to append an "s" suffix for variables that hold array-like objects. You may also have conventions for your particular codebase, like always starting getter functions with "_get_". Copilot will learn those conventions as well, if you are consistent with them.</p>
<li><p><strong>Structured software architecture</strong>: Just like humans, Copilot doesn't do that great with spaghetti code. What functions goes where? What function comes next? It really helps, for you and Copilot, to organize your code in a predictable manner, especially if you can use a popular framework for doing so. For example, the Django framework for Python always modularizes code into "models.py", "views.py", "urls.py", and a "templates" folder. When Copilot is inside one of those files, it has a better idea of what belongs there.</p>
</ol>
<h2>Use linters</h2>
<p>My final tip isn't so much about how to use Copilot itself, but more about how to use VS Code along with Copilot. VS Code can basically become another pair programmer for you, by installing linting tools and enabling them to work in real-time. That way, if Copilot does give a bad suggestion (like an old method name or an incorrect chain of function calls), you'll immediately see a squiggle.
</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWqpK0C_dOE0S75j0gv4uftvBdQExb1PleLkTZxnNKAtPRbUN-JY_JYU9Yu1gIwwazBfZsgOsKXMjEHvTwB-TCEvVTkztGTk0SM6Dii89JfwJ3UgsjToeivVcdRpAF56e_z21_l31vjmmpKurHKdRqJHOnzIH6REtz5qiXM5VKowT2GoDN-2Fi1UI46w/s1600/squiggles.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of Python code with squiggly line under one line" border="0" data-original-height="123" data-original-width="971" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgWqpK0C_dOE0S75j0gv4uftvBdQExb1PleLkTZxnNKAtPRbUN-JY_JYU9Yu1gIwwazBfZsgOsKXMjEHvTwB-TCEvVTkztGTk0SM6Dii89JfwJ3UgsjToeivVcdRpAF56e_z21_l31vjmmpKurHKdRqJHOnzIH6REtz5qiXM5VKowT2GoDN-2Fi1UI46w/s1600/squiggles.png" width="600"/></a>
<p>Then you can hover over the squiggle to see the error report and check the Intellisense for the functions to see their expected parameters and return values. If you're still not sure how to fix it, search the relevant documentation for more guidance. If you have Copilot chat enabled, you can even try asking Copilot to fix it.</p>
<p>A linter won't be able to catch all issues, of course, so you should still be running the code and writing tests to feel confident in the code. But linters can catch quite a few issues and prevent your code from going down a wrong path, so I recommend real-time linters in any Copilot setup.</p>
<p>For linting in Python, I typically use <a target="_blank" href="https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff">the Ruff extension</a> along with these settings: </p>
<pre><code class="language-javascript">"python.linting.enabled": true,
"[python]": {
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll": true
}
}
</code></pre>
<h2>Learn more</h2>
<p>Here are some additional guides to help you be productive with GitHub Copilot:</p>
<ul>
<li><a target="_blank" href="https://dev.to/github/a-beginners-guide-to-prompt-engineering-with-github-copilot-3ibp">A Beginner's Guide to Prompt Engineering with GitHub Copilot</a></li>
<li><a target="_blank" href="https://github.blog/2023-05-17-how-github-copilot-is-getting-better-at-understanding-your-code/">How GitHub Copilot is getting better at understanding your code</a>
<li><a target="_blank" href="https://code.visualstudio.com/docs/editor/artificial-intelligence">AI Tools in VS Code</a>
</ul>
<script>hljs.highlightAll();</script>Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-60792728194552071212023-06-02T14:27:00.002-07:002023-06-02T14:27:36.071-07:00Providing feedback on the VS Code Python experience<p>My current role as a Python cloud advocate at Microsoft is to make sure Python developers have a great experience using Microsoft products, including my current favorite IDE, VS Code. As much as I love VS Code, it still has bugs (all software does!), so I often find myself filing issues. It can be a little tricky to figure out where to file issues, as there are multiple GitHub repos involved, so I'm writing this post to help others figure out what to post where.</p>
<dl>
<dt><a href="http://github.com/microsoft/vscode">github.com/microsoft/vscode</a>
<dd>The primary VS Code repo. This is typically <em>not</em> the place for Python-specific issues, but if your feedback doesn't fit anywhere else, this might be where it goes, and the triagers can direct you elsewhere if needed.
<dt><a href="https://github.com/orgs/community/discussions/categories/codespaces" target="_blank">github.com/orgs/community/discussions/categories/codespaces</a>
<dd>Discussion forum for issues that seem specific to GitHub Codespaces (e.g. forwarded ports not opening).
<dt><a href="https://github.com/microsoft/vscode-remote-release" target="_blank">github.com/microsoft/vscode-remote-release</a>
<dd>Similarly, this is the repo for the VS Code Dev Containers extension, appropriate for issues that only happen when opening a project in a Dev Container locally.
<dt><a href="https://github.com/devcontainers/images" target="_blank">github.com/devcontainers/images</a>
<dd>The repo that generates the mcr.microsoft.com/devcontainers Docker images. If you're using a Dev Container with a Python image from that registry and think the issue stems from the image, report it here. Consult the <a target="_blank" href="https://github.com/devcontainers/images/tree/main/src/python">README</a> first, however.
<dt><a href="http://github.com/microsoft/vscode-python" target="_blank">github.com/microsoft/vscode-python</a>
<dd>The repo for the Python extension that's responsible for much of the Python-specific experience. The extension does build upon various open-source projects, however. See below.
<dt><a href="https://github.com/microsoft/pylance-release" target="_blank">github.com/microsoft/pylance-release</a>
<dd>The repo for the library that provides the linting/error reports in VS Code. When you get squiggles in your Python code in VS Code and hover over them, you should see in the popup whether the error comes from Pylance or a different extension. If you haven't installed any other extensions besides the Python one, then it's almost certainly from Pylance.
<dt><a href="https://github.com/microsoft/debugpy" target="_blank">github.com/microsoft/debugpy</a>
<dd>The repo for the library that provides part of the Python debugger experience in VS Code. For example, if it's not showing local variables correctly in the debug sidebar, that's an issue for debugpy.
<dt><a href="https://github.com/microsoft/vscode-jupyter" target="_blank">github.com/microsoft/vscode-jupyter</a>
<dd>The repo for the Jupyter extension that makes ipynb notebooks work well in VS Code. This extension used to be downloaded along with the Python extension but must now be installed separately. If it's a notebook-specific issue, it probably goes here.
</dl>
<p>Whenever you're filing an issue, be sure to provide as much information as possible. Thanks for helping us make the Python experience better!</p>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-67365396933668158192023-05-25T14:07:00.006-07:002023-05-25T14:13:54.355-07:00Streaming ChatGPT with server-sent events in Flask<link rel="stylesheet"
href="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/styles/default.min.css">
<script src="//cdnjs.cloudflare.com/ajax/libs/highlight.js/11.7.0/highlight.min.js"></script>
<p>The Azure SDK team recently asked if I could create a sample using Flask to stream ChatGPT completions to the browser over SSE. I said, "sure, but what's SSE?" As I've now discovered, <a target="_blank" href="https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events/Using_server-sent_events">server-sent events (SSE)</a> are a technology that has been supported in modern browsers for a few years, and they're a great way for a server to stream a long response to a client. They're similar to websockets, but the streaming only happens in one direction. SSE is a good fit for ChatGPT responses since they can come in chunk-by-chunk, and displaying that way in a web UI makes for a more chat-like experience.</p>
<p>You can check out the repo here: <br>
<a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart">https://github.com/Azure-Samples/chatgpt-quickstart</a>
</p>
<p>Let's break it down.</p>
<br>
<h2>Overall architecture</h2>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbyvZw1i4lnou6l1QNKV_hQ_X9hXlutpdRAmJdrZOG3yL0IatOjEpy9527vjMnqGWjqV8-hWlSjMQPYS-_a2OF57YXrHgx-Q1WS5Q0u4NrvJO1c_M8-h4UH4gFbS3EpGYoWvyjCxeWG7mDaQcT-kD9iyCIKjlI59LKNrkuZmnTLoFToEufINlbK3g/s1600/chatgpt_sse.drawio.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" data-original-height="318" data-original-width="1319" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhbyvZw1i4lnou6l1QNKV_hQ_X9hXlutpdRAmJdrZOG3yL0IatOjEpy9527vjMnqGWjqV8-hWlSjMQPYS-_a2OF57YXrHgx-Q1WS5Q0u4NrvJO1c_M8-h4UH4gFbS3EpGYoWvyjCxeWG7mDaQcT-kD9iyCIKjlI59LKNrkuZmnTLoFToEufINlbK3g/s1600/chatgpt_sse.drawio.png" width="600"/></a>
<p>When a user submits a message from the webpage, the browser uses <code>EventSource</code> to connect to the <code>/chat</code> endpoint, sending the message in the query parameters. The Flask server receives the request, then requests the ChatGPT SDK to respond with a stream. The SDK opens a network connection to the deployed ChatGPT model on Azure, and whenever it receives another chunk, it sends it back as JSON. The Flask app extracts the text from that chunk and streams it to the client. Whenever the browser receives a new server-sent event, it appends it to the current message.</p>
<br>
<h2>The client-side JavaScript code</h2>
<p>For the client-side code, I considered using <a target="_blank" href="https://htmx.org/">HTMX</a> with the <a target="_blank" href="https://htmx.org/attributes/hx-sse/">SSE extension</a> but I decided to use the built-in <code>EventSource</code> object for maximal flexibility.</p>
<p>Whenever the form is submitted, I create a new message DIV and set up a new <code>EventSource</code> instance.
That instance listens to three events: the standard "message" event, and two custom events of my own invention, "start" and "end". I added the start event so that I could know when to clear a loading indicator from the message area, and I added the end event so that I could close the stream.</p>
<pre><code class="language-javascript">eventSource = new EventSource(`/chat?message=${message}`);
eventSource.addEventListener("start", function(e) {
messageDiv.innerHTML = "";
});
eventSource.addEventListener("message", function(e) {
const message = JSON.parse(e.data);
messageDiv.innerHTML += message.text.replace("\n", "<br/>");
messageDiv.scrollIntoView();
});
eventSource.addEventListener('end', function(e) {
eventSource.close();
});
</code></pre>
<p>See the full code in <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart/blob/main/src/flaskapp/templates/index.html">index.html</a></p>
<h2>The Flask server Python code</h2>
<p>In the Flask server code, I did a few key things differently in order to send back server-sent events:
</p>
<ul>
<li>The response object is a Python generator, a type of function that can continually yield new values.
That is natively supported by Flask as <a target="_blank" href="https://flask.palletsprojects.com/en/2.3.x/patterns/streaming/">the way to stream data</a>.
<li>The call to the OpenAI SDK specifies <code>stream=True</code> and then uses an iterator on the SDK's response.
<li>The response content-type is "text/event-stream".
</ul>
<pre><code class="language-python">@bp.get("/chat")
def chat_handler():
request_message = request.args.get("message")
@stream_with_context
def response_stream():
response = openai.ChatCompletion.create(
engine=os.getenv("AZURE_OPENAI_CHATGPT_DEPLOYMENT", "chatgpt"),
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": request_message},
],
stream=True,
)
for event in response:
current_app.logger.info(event)
if event["choices"][0]["delta"].get("role") == "assistant":
yield "event:start\ndata: stream\n\n"
if event["choices"][0]["delta"].get("content") is not None:
response_message = event["choices"][0]["delta"]["content"]
json_data = json.dumps({"text": response_message})
yield f"event:message\ndata: {json_data}\n\n"
yield "event: end\ndata: stream\n\n"
return Response(response_stream(), mimetype="text/event-stream")
</code></pre>
<p>It's also worth pointing out that the generator is wrapped with the <a target="_blank" href="https://flask.palletsprojects.com/en/2.3.x/patterns/streaming/#streaming-with-context"><code>stream_with_context</code></a> decorator. I added that so that the code inside the generator could access <code>current_app</code> for logging purposes.</p>
<p>See full code in <a target="_blank" href="https://github.com/Azure-Samples/chatgpt-quickstart/blob/main/src/flaskapp/chat.py">chat.py</a></p>
<br>
<h2>Taking it further</h2>
<p>This is intentionally a very minimal example, since the goal is to just get developers up and running with a ChatGPT deployment. There are a lot of ways this could be improved:</p>
<ul>
<li><strong>POST vs. GET</strong>: I used a single HTTP GET request to both send the message and receive the response. An alternative approach is to use an HTTP POST to send the message, use a session to associate the message with the ChatGPT response, and open a GET request to a /response endpoint for with that session.
<li><strong>Message history</strong>: This app only sends the most recent message, but ChatGPT can often give better answers if it remembers previous messages. You could use sessions on the server side or local storage on the browser side to remember the last few messages. That does have budget implications (more tokens == more $$) but could provide a better user experience.
<li><strong>Message formatting</strong>: I've seen some ChatGPT samples that apply Markdown formatting to the message. You could bring in a library to do the Markdown -> HTML transformation in the client. It'd be interesting to see how that works in combination with server-sent events.
</ul>
<script>hljs.highlightAll();</script>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0tag:blogger.com,1999:blog-8501278254137514883.post-89073206476499283462023-05-22T12:51:00.003-07:002023-11-29T16:10:16.433-08:00A Dev Container for SQLAlchemy with SQLTools<p>SQLAlchemy 2.0 was recently released, with a few <a target="_blank" href="https://blog.miguelgrinberg.com/post/what-s-new-in-sqlalchemy-2-0">significant interface differences</a>. I'm working on a video that walks through a SQLAlchemy 2.0 example, and in the process of making that video, I created a <a target="_blank" href="https://code.visualstudio.com/docs/devcontainers/create-dev-container">Dev Container</a> optimized for SQLAlchemy + SQLite + SQLTools.</p>
<p>You can get the Dev Container here (or try it in Codespaces first!):<br>
<a target="_blank" href="https://github.com/pamelafox/sqlalchemy-sqlite-playground">
github.com/pamelafox/sqlalchemy-sqlite-playground
</a>
</p>
<br>
<h2>Dev Container contents</h2>
<p>The <a target="_blank" href="https://github.com/pamelafox/sqlalchemy-sqlite-playground/blob/main/.devcontainer/devcontainer.json">devcontainer.json</a> includes:</p>
<ul>
<li>A base image of <a target="_blank" href="https://mcr.microsoft.com/en-us/product/devcontainers/python/about"><code>mcr.microsoft.com/devcontainers/python:3.11</code></a>
<li>Python linting extensions:
<pre><code>"ms-python.python",
"ms-python.vscode-pylance",
"charliermarsh.ruff"
</code></pre>
<li>SQLTools extension and SQLite driver:
<pre><code>"mtxr.sqltools",
"mtxr.sqltools-driver-sqlite"</code></pre>
<li>The <code>node</code> feature (necessary for the SQLTools SQLite driver)
<li>A related setting necessary for SQLite driver:
<pre><code>"sqltools.useNodeRuntime": true,</code></pre>
<li>Preset connection for a SQLite DB stored in <code>my_database.db</code> file:
<pre><code>"sqltools.connections": [
{
"previewLimit": 50,
"driver": "SQLite",
"name": "database",
"database": "my_database.db"
}],</code></pre>
<li>A post-create command that installs SQLAlchemy and Faker:
<pre><code>python3 -m pip install -r requirements.txt</code></pre>
</ul>
<p>Besides the <code>.devcontainer</code> folder and <code>requirements.txt</code>, the repository also contains <code>example.py</code>, a file with SQLAlchemy 2.0 example code.</p>
<br>
<h2>Using the Dev Container</h2>
<ol>
<li>Open the project in either GitHub Codespaces or VS Code with the Dev Containers extension. As part of starting up, it will auto-install the requirements and SQLTools will detect node is already installed:
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzBClYguoui9g2i8CP4Az2vhwDlThXXElQzhniWHtrr43_69g-Y88WEHHOYtplNboOOCFysXo-BGN60TCWSPNfMVixhUhilmvl-E_gis6f2vVOcz8fBfMx7J-iBoVStT06aEMe-sl4On1N_Oth7msrHcYsieyKM4fSB7z7oGJdaRwJz3vDXaEu3w4/s1600/screenshot_sqlite_devcontainer.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of VS Code terminal after installing pip requirements, with pop-up about node being detected" border="0" data-original-height="726" data-original-width="2788" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjzBClYguoui9g2i8CP4Az2vhwDlThXXElQzhniWHtrr43_69g-Y88WEHHOYtplNboOOCFysXo-BGN60TCWSPNfMVixhUhilmvl-E_gis6f2vVOcz8fBfMx7J-iBoVStT06aEMe-sl4On1N_Oth7msrHcYsieyKM4fSB7z7oGJdaRwJz3vDXaEu3w4/s1600/screenshot_sqlite_devcontainer.png" width="550"/></a>
<li>Run <code>example.py</code>:
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_ZtMJmGUpcM1RFCp_R6B_jGzHWISbBQYcWmI8P7Qt0Bw29PoR3zHeO7yYgckjrLHm2oMbAyQ1FYubzU80dZFRF0DnSik7teWGfu7-LXPQ9Y9RP-OkjYz-Q9jZ-7qY6l8fCJxMRKTPUBmcBZSec-kAqAPRUZ5J_Xoo8BG6LLp5gsDSQ8DcQ3d24ZM/s1600/screenshot_sqlite_run_example.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of python file with cursor on run icon in top left" border="0" data-original-height="1872" data-original-width="2776" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg_ZtMJmGUpcM1RFCp_R6B_jGzHWISbBQYcWmI8P7Qt0Bw29PoR3zHeO7yYgckjrLHm2oMbAyQ1FYubzU80dZFRF0DnSik7teWGfu7-LXPQ9Y9RP-OkjYz-Q9jZ-7qY6l8fCJxMRKTPUBmcBZSec-kAqAPRUZ5J_Xoo8BG6LLp5gsDSQ8DcQ3d24ZM/s1600/screenshot_sqlite_run_example.png" width="550"/></a>
<li>Select the SQLTools icon in the sidebar:
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis7U8Xh6F3-8DOCW1QWYuPXDhNbIVu9u6VAy4iQTA7O_lpf7BmomGiC4mQQcpBBknpPigjlCbSsWTOtrO_nUlJYTa1Z4WuSSDJexX2D6R45NMdKaQh1qBZS9ucNGXko9hmI6IExAush9k3wdHuy4_Fft15-pwtVlhEh7bbBPvwPh_lv86N2IusLKM/s1600/screenshot_sqlite_sqltools_icon.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of VS Code sidebar with cursor over SQLTools icon" border="0" data-original-height="754" data-original-width="704" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEis7U8Xh6F3-8DOCW1QWYuPXDhNbIVu9u6VAy4iQTA7O_lpf7BmomGiC4mQQcpBBknpPigjlCbSsWTOtrO_nUlJYTa1Z4WuSSDJexX2D6R45NMdKaQh1qBZS9ucNGXko9hmI6IExAush9k3wdHuy4_Fft15-pwtVlhEh7bbBPvwPh_lv86N2IusLKM/s1600/screenshot_sqlite_sqltools_icon.png" width="350"/></a>
<li>Next to the preset connection, select the connect icon (a plug):
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrmPZMsDtlHdm6C79107R0gM7mjnau7KZHAtmMqYfsKOtLaTvUbky6xRcU3x7-Vv9w56hRmdpJV1kc42F0BQ4IG-yl8fU4o8mFvmxYqt7kHRSP3YQUnDAn6xFjynJ_STUf7fDJx-pKF4IfiEAarBxxIsGq8KmL32N_tvF8dh9b8wBq3K40faSM-_Q/s1600/screenshot_sqlite_connect_icon.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of VS Code sidebar with SQLTools extension open, cursor on right side of database connection " border="0" data-original-height="832" data-original-width="772" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrmPZMsDtlHdm6C79107R0gM7mjnau7KZHAtmMqYfsKOtLaTvUbky6xRcU3x7-Vv9w56hRmdpJV1kc42F0BQ4IG-yl8fU4o8mFvmxYqt7kHRSP3YQUnDAn6xFjynJ_STUf7fDJx-pKF4IfiEAarBxxIsGq8KmL32N_tvF8dh9b8wBq3K40faSM-_Q/s1600/screenshot_sqlite_connect_icon.png" width="350"/></a>
<li>When it prompts you to install the sqlite npm package, select <em>Install now</em>:
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4YqNaFOhzmaT9U8x_sbRC4C_6_nK-7fZkHtEWXiurIig233zjjWgFwFhKd8WW62Lr5yuMZ5o0P_FPM1wjUOYZ8VUgpmfFbZ_E3_rwPV6j9RDsyYuu6TNOPMtnsQRAeGKL70CfRXuQKXwU7R-WwjgA5alnwnQE9flA7-xOQPiY4hee05KN4bXQsXk/s1600/screenshot_sqlite_sqlite_prompt.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of VS Code terminal with prompt from SQLTools about installing sqlite package" border="0" data-original-height="673" data-original-width="1729" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg4YqNaFOhzmaT9U8x_sbRC4C_6_nK-7fZkHtEWXiurIig233zjjWgFwFhKd8WW62Lr5yuMZ5o0P_FPM1wjUOYZ8VUgpmfFbZ_E3_rwPV6j9RDsyYuu6TNOPMtnsQRAeGKL70CfRXuQKXwU7R-WwjgA5alnwnQE9flA7-xOQPiY4hee05KN4bXQsXk/s1600/screenshot_sqlite_sqlite_prompt.png" width="550"/></a>
<li>When it's done installing, select <em>Connect to database</em>.
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgruN25gNBo7eb8k8EUGEJSBIWtvYjJ3ytK0_5uh4gPbdsPveapCj366lPiDjY5yhQwkzdQULhKcSpmJlaYqQ0h249YHqFmUe8hvlUmrf77cHKESc1wo0dwZJNLAyOAO-OBepT5xOqfLNgLYGMOP1411IWePQgLvqdwcL2OO02UzQl_H3b3JUqqSWg/s1600/screenshot_sqlite_sqlite_installed.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot of VS Code with prompt from SQLTools about sqlite successful installation" border="0" data-original-height="721" data-original-width="1407" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgruN25gNBo7eb8k8EUGEJSBIWtvYjJ3ytK0_5uh4gPbdsPveapCj366lPiDjY5yhQwkzdQULhKcSpmJlaYqQ0h249YHqFmUe8hvlUmrf77cHKESc1wo0dwZJNLAyOAO-OBepT5xOqfLNgLYGMOP1411IWePQgLvqdwcL2OO02UzQl_H3b3JUqqSWg/s1600/screenshot_sqlite_sqlite_installed.png"width="550"/></a>
<li>Browse the table rows by selecting the magnifying glass next to teach table.
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJQPiQJJSQXCKKVoOE1qZRCFJtKONe57SjeFN7xGX-hEJ3tPGT_62aHUZaUNUpRKWa3NxTDzEg3JM_dScKucCi7Ru8DZvtc-zRyFzNQWuC8GvUFoF7pKLAA5PhvkPWSWl70FQXTavseZE84SVCZ2VmKv_6q7s0Eqzwf4R7G3ae_TT9DAPN1c02jBA/s1600/screenshot_sqlite_inspectcustomer.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="Screenshot from SQLTools VS Code extension with customers table open and rows of generated data" border="0" data-original-height="1259" data-original-width="2614" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhJQPiQJJSQXCKKVoOE1qZRCFJtKONe57SjeFN7xGX-hEJ3tPGT_62aHUZaUNUpRKWa3NxTDzEg3JM_dScKucCi7Ru8DZvtc-zRyFzNQWuC8GvUFoF7pKLAA5PhvkPWSWl70FQXTavseZE84SVCZ2VmKv_6q7s0Eqzwf4R7G3ae_TT9DAPN1c02jBA/s1600/screenshot_sqlite_inspectcustomer.png" width="550"/></a>
</ol>
Pamela Foxhttp://www.blogger.com/profile/15947664772001597300noreply@blogger.com0